Would it be possible to support generating CPU-specific SVE code?
This could be useful for JIT, e.g. Julia.
Currently, when using
<8 x double> gets split into 4 NEON instructions:
If I understand correctly, I’d need to use
<vscale x 2 x double> to actually generate SVE code. However, Julia currently has no way of representing such variable sized types without allocating to the heap – awkward for a variable that’s supposed to live in the registers! – for writing intrinsics. Some libraries make extensive use of intrinsics operating on vector types like (
<8 x double>) for defining compute kernels, and as is they are incompatible with SVE.