CPU-specific SVE codegen

Would it be possible to support generating CPU-specific SVE code?
This could be useful for JIT, e.g. Julia.

Currently, when using -mcpu=a64fx, <8 x double> gets split into 4 NEON instructions:
https://godbolt.org/z/cEf1Pfvx8
If I understand correctly, I’d need to use <vscale x 2 x double> to actually generate SVE code. However, Julia currently has no way of representing such variable sized types without allocating to the heap – awkward for a variable that’s supposed to live in the registers! – for writing intrinsics. Some libraries make extensive use of intrinsics operating on vector types like (<8 x double>) for defining compute kernels, and as is they are incompatible with SVE.

Hi Chris,

I’m no an expert on SVE at all, but you might try the “aarch64-sve-vector-bits-min” that was introduced in this patch https://reviews.llvm.org/D80384 Myself and others have been basing a similar feature for RISC-V off of this.

Thank you!
Adding -aarch64-sve-vector-bits-min=512 does solve the problem:

https://godbolt.org/z/hYv4dePx6
E.g., now instead of 4 fmla with neon v registers:

fmla v1.2d, v19.2d, v7.2d
fmla v0.2d, v18.2d, v6.2d
fmla v2.2d, v17.2d, v5.2d
fmla v3.2d, v16.2d, v4.2d

There is just a single fmla with a sve z register:

fmla z0.d, p0/m, z1.d, z2.d

Would it be possible to get a -aarch64-sve-vector-bits=native flag?