'Tail handling' in general is an annoying aspect of simd. Masks are great, but no panacea--in particular, if you unroll, then you cannot take care of the tail with a single masked instruction. There are various solutions to this. I favour overlapping accesses, where that's feasible (following a great deal of evangelism from mateusz guzik); a colleague uses a variant of duff's device; you can also just generate multiple masks.
I would expect the linked code is just intended as a quick poc, so it does not bother to be optimal.
Roughly speaking, while you do need idempotence for reductions, you do not need it for maps. (Of course, plenty of things don't look quite like reductions or maps--or they look like both at once--and each is unique.)
I would expect the linked code is just intended as a quick poc, so it does not bother to be optimal.