Auto-generate the memory folding tables

It might not have ever had TB_NO_REVERSE in the table. I might have only used it as a guide and decided it needed TB_NO_REVERSE. There’s no reason to unfold it anyway. It’s only source is a pointer. One of the uses of unfolding is LICM, but the whole instruction could be hoisted without unfolding.

I think TB_NO_REVERSE is needed for the masked forms. The unfolding code won’t create a masked load. It would create a plain load and a masked VEXPANDPDZrrk. That would load elements that should be masked off.

Though that applies to every masked instruction in the unfold table and we don’t have TB_NO_REVERSE on them. I think maybe that’s not an issue because we never fold masked load intrinsics into masked arithmetic. So they always start as a whole register load which makes unfolding ok.

That’s not true for expand since we have masked expand load intrinsics.

PLS: i have a small question, why we do not allow unfolding if the register’s size is greater than the memory’s operand size?

If I remember right, the unfolded load size will be based on the register size. If that’s bigger than the memory operand size, the unfolded load will load more memory and potentially cause a memory fault.