RFC: Legalizing operations and post-legalization DAG combining

Today, legalization and DAG combining happen in phases. You can have a DAG combine that runs before legalization phases or after. And operation legalization happens once, after which the core logic of operation legalization is never run over DAG nodes again.

As a consequence, if you write a DAG combine for target specific nodes that are produced by legalizing some target independent node (most often with a custom operation action), you cannot produce from your DAG combine any node which is not marked with a ‘legal’ operation action. This is particularly problematic for things that are ‘custom’, but will be lowered without issue. This is incredibly common. For example, constant pool loads on x86. Without being able to produce any of these generic nodes, a post-legalization DAG combine cannot produce very many interesting nodes at all. And we can’t run some of these earlier because they are combining the results of the legalizer itself.

This actually makes a lot of sense. Why not run the legalizer as a part of a “legalizing” DAG combine phase? The simplest form of this can be found here: http://reviews.llvm.org/D4564 (and its attendant review thread on the mailing list). While this works, it isn’t really ideal. We’re just re-legalizing any nodes that need it when they come out of a post-legalization combine. Woe betide the combine that creates a cycle here, but that seems like a bad idea anyways and unlikely to happen.

The obvious “next step” is to just stop running the legalizer outside of this mode. This, however, doesn’t seem to work at all. The reasons overwhelming center around how we update the worklist for the DAG combiner.

The current strategy is to use a DAGUpdateListener to remove deleted nodes from the worklist, and to add nodes based on the results of the combine or the use of the DAGCombineInfo wrapper method to directly add nodes to the worklist. The latter is extremely error prone and I have spent quite some time already debugging DAG combines because this didn’t happen “automatically”.

So question #1: Is this the right direction? Are folks generally OK with me moving legalization into something that happens interleaved with combining? It would still be “phase” oriented so that the combines know what invariants must actually be preserved (for example, we simple can’t create illegal types while combining during the legalization phase). Similarly, there are likely cases where it would be useful to check the phase to avoid cycles or warring combines and legalizations.

Question #2: is there some reason to not make a DAGUpdateListener call for each new node inserted into the DAG, and add them to the worklist automatically? This could likely remove a bunch of existing code to manually add things, and it would allow new nodes added during legalization to go onto the worklist as well. In general, I’m wondering if there is any reason not to leverage the DAG update listener more heavily in the combiner. We’re already paying the price of keeping one in the stack, etc., so it seems like it wouldn’t be too much worse to leverage it to catch the nodes which need to be re-combined.

-Chandler

Just as a small note on this topic – while it may make sense to start incrementally with D4564, it can’t stay that way. I can’t actually write the (I thought) very basic DAG combine needed to match sequences of target-specific shuffle instructions on x86 into the generic PSHUFB instruction with just the legalization power provided by D4564. It will need at least the use of the DAGUpdateListener to drive the re-legalization and re-combining of instructions or else we’ll fail (eventually) to legalize the constant shuffle mask produced by the PSHUFB-forming DAG combine.

Since no one else has replied I will :slight_smile:

I've wanted something like this for a while, but it seemed like a lot
of work when some ordering would solve the issues in
legalization/combining/lowering. Now that we have a definite use case
for it though it seems to make more sense.

I worry about the performance issues we chatted about yesterday
(adding nodes to the list speculatively, etc), but I don't have any
bright ideas here.

-eric