Vectorization with fast-math on irregular ISA sub-sets

- ARMv7 NEON ignores the rounding mode set in bits 23:22 of FPSCR and always uses round to nearest.
- ARMv7 NEON ignores the trap enable bits (15:8) in FPSCR and always uses default exception handling.

If I read the manuals correctly, these are not strictly defined on
IEEE 754 to be one way or another, so these don't violate the
standard. The subnormal treatment does.

As with denormal support, the issue at hand is not so much that these differ from IEEE 754 as it is that they differ from the behavior of the scalar (VFP) arithmetic.

This one of the practical consequences, yes, but of no relevance to
this work. Right now, I'm only trying to avoid surprises. If a user
has different results using -ffast-math, it's expected. Without, not
so much.

cheers,
--renato