[RISCV] Transition in vector pseudo structure (policy variants)

Just a heads up, I’m working on a set of changes which will restructure one dimension of our vector pseudo matrix. If this succeeds, there are some ideas for tackling the other dimensions as well, but that’s definitely future work.

As a quick bit of background, we currently use pseudo instructions to enumerate variants of each vector instructions. The exact cross product depends on the instruction, but generally we have LMUL x (Tail Undef, TU) x {Masked vs Unmasked}.

This proposal focuses on specifically a change to how we represent tail agnostic, tail undefined, and tail undisturbed operations. In current code, we tend to use an unsuffixed pseudo for undefined (despite calling it TA most places in code), and the _TU form for both agnostic and undisturbed (via the policy operand).

(To prevent confusion - “undisturbed” and “agnostic” are the terms from the vector specification. “undefined” is the stronger property where the respective lanes are undefined (i.e. undef) before the operation and can thus take any value afterwards. “undefined” is a compiler internal concept, and as mentioned above, we sometimes call it “agnostic” in code despite that not being strictly speaking correct.)

The key observation behind this proposal is that we can represent tail undefined via a pseudo with a passthrough operand if that operand is IMPLICIT_DEF (aka undef). We already have a few instances of this in tree - see vmv.s.x and vslide* - but we can do this more universally.

Once complete, we will be able to delete roughly 1/2 of our vector pseudo classes.

In terms of implementation strategy, I’m starting by generalizing the DAG post-combine pieces (D152380, and D152740). Once those are done, we should be able to iterate through all the patterns switching to the _TU variants, and then once fully migrated, start removing the old pseudos and rename the _TU versions.

One thing the initial patches did reveal is that this won’t be fully NFC. There are some minor code gen differences due to differences in scheduling and register allocation. So far, all the diffs look pretty minor, but there’s always a chance we get unpleasantly surprised part way through.

Nit. Reducing by 1/2 is too high. It’s no more than 1/3 since we had 3 pseudos for every arithmetic instruction: unsuffixed, _MASK, and _TU. Also a substantial portion or our pseudos are load/store (primarily due to indexed load/store needing every combination of register classes) and this won’t affect the stores which don’t have _TU.