[Constant Folder, InstCombine, ARM, AArch64] Question about constant folding of vector load


There is a particular code sequence I would like to optimize at the IR level.

I’d like to turn an Arm/AArch64 table lookup intrinsic that takes a constant vector mask into a shufflevector instruction:

vtbl1(V,mask) ~> shufflevector(V,undef,mask)

The reason is that if the mask is {7,6,5,4,3,2,1,0}, then the backend will generate rev64 instructions instead.

If the mask comes from a vld1 of a global constant I could fold it to allow the above instruction combining.

My question is, does the constant folding of the vld1 seem a good thing to do in the general case, as a standalone transformation, or only when used as a mask for a table lookup?


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Yes, constant-folding vld1 seems like a good idea. Actually, we should probably just lower the NEON vld1 intrinsics to an LLVM “load” (which would give us constant-folding for free), but that would be more work to make sure it doesn’t have any unexpected effects. -Eli