Would it be possible to support generating CPU-specific SVE code?
This could be useful for JIT, e.g. Julia.
Currently, when using -mcpu=a64fx
, <8 x double>
gets split into 4 NEON instructions:
https://godbolt.org/z/cEf1Pfvx8
If I understand correctly, I’d need to use <vscale x 2 x double>
to actually generate SVE code. However, Julia currently has no way of representing such variable sized types without allocating to the heap – awkward for a variable that’s supposed to live in the registers! – for writing intrinsics. Some libraries make extensive use of intrinsics operating on vector types like (<8 x double>
) for defining compute kernels, and as is they are incompatible with SVE.