I've run into a problem that I'm trying to figure out how to address and would welcome ideas and feedback.
Today, the vectorizer will nicely vectorize loops using the widest legal vector type for the target. On a reasonable recent machine, this will often end up using AVX2 registers which are 32 bytes wide.
If during register allocation, we decide to spill one of these registers, we use the vmovaps instruction which requires the address in memory accessed to be 32 byte aligned. So far, so good.
However, the C ABI generally only provides 16 bytes of alignment for the stack on entry to the function. To work around this, the backend will create a variable sized frame with a dynamic amount of padding inserted if required to ensure that a 32 byte aligned spill slot is available.
The problem I have is that my runtime's ABI really doesn't like variably sized frames. In particular, the assumption that stack frames are fixed size - except during prolog and epilogue - is fairly baked in.
I'm weighing a couple of options for addressing this and want to gather feedback on the perceived difficulty of each. If someone has another approach, I'm also very open to that.
Option 1 - Fix my runtime to not expect mostly fixed size frames. This isn't a small change to make, but given it's a strictly internal ABI, I can probably get away with doing it. Given things like shrink-wrapping are coming down the pipe, it might also have secondary benefits. However, this is a relatively risky change to make for a fairly corner case.
Option 1a - I could change my ABI to use a 32 byte aligned frame. This has many of the same problems as (1).
Option 2 - Don't compile things which need to spill vector registers. This is actually what we do today and has worked out fairly well in practice. This is what I'm hoping to move away from.
Option 3 - Add an option in the x86 backend to not require aligned spill slots for AVX2 registers. In particular, the VMOVUPS instruction can be used to spill vector registers into an 8 or 16 byte aligned spill slot and not require dynamic frame realignment. This seems like it might be useful in other context as well, but I can't name any at the moment.
One thing that occurs to me is that many spills are down rare paths. Maybe it would make sense to only do dynamic alignment for hot spill/reloads? We could then simply override the heustic to always use unaligned spills.
I don't really have a sense for how hard (3) would be to implement. Anyone have an intuition?
Philip