[Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load


There is a particular code sequence I would like to optimize at the IR level.

I’d like to turn an Arm/AArch64 table lookup intrinsic that takes a constant vector mask into a shufflevector instruction:

vtbl1(V,mask) ~> shufflevector(V,undef,mask)

The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will generate rev64 instructions instead.

If the mask comes from a vld1 of a global constant I could fold it to allow the above instruction combining.

My question is, does the constant folding of the vld1 seem a good thing to do in the general case, as a standalone transformation, or only when used as a mask for a table lookup?


Yes, constant-folding vld1 seems like a good idea. Actually, we should probably just lower the NEON vld1 intrinsics to an LLVM “load” (which would give us constant-folding for free), but that would be more work to make sure it doesn’t have any unexpected effects. -Eli