Duplicate successors in LLVM dialect

I think we should take this back to the principles and how they balance out. I can see things like:

  1. Hopefully obvious: we want MLIR infra to be good and a progression vs LLVM decisions.
  2. We desire foreign dialects in MLIR to be close to the thing they model to make mlir-translate logic simpler and easier to test.
  3. The LLVM dialect in particular should be amenable to standard MLIR optimizations, and there is a possible future in which all existing LLVM passes get migrated over to it.

There is some tension between #1 and #2: for example, we have explicit “llvm.mlir.constant” operations and BB arguments instead of an “llvm.phi” operation. I don’t think that any of us are willing to waver on this.

OTOH, the tension I see is between #2 and #3. For example, Alex observes that we can have the LLVM dialect require predecessors to have identical BB args which would favor #2 but hurt #3.

I see two solutions to this:

  1. Give up on optimizing the LLVM dialect, and have a “llvm+” dialect which gets lowered down to the LLVM dialect.
  2. Accept modest changes (e.g. the bb arg thing that started the thread) and accept that the translation is less of an isomorphic change, e.g. requiring a “prepare” pass of some sort.

For me personally, given the things that we’re talking about in practice - I think that #2 is the right way to go. If it turns out that we’re start doing heroic things in the future, then we should reconsider. However, I don’t see anything that warrants the complexity of having two things and introducing a new lowering pass (most of which would be 1 to 1).

-Chris

I understand what you’re saying, but I don’t think that frontends are the primary problem here: it is the invariants placed on the CFG, and what that means for IR transformations.

I haven’t encountered significant problems with critical edges in practice in transformations (LLVM has a “split critical edge” routine which works fine), particularly given that bbargs allow cleanly modeling the behavior of them.

If critical edges were made illegal, what do you expect to be simpler?

-Chris

Well, first, I want to remind you that this conversation started with a feature, indirectbr, that can easily make splitting critical edges impossible (or undesirable) if not handled carefully in the IR. I think we can agree that that is a problematic leap in complexity for transformations.

My experience in writing passes has been that critical edges are a frequent and unanticipated special case. For example, without critical edges, there is always somewhere you can sink code to which immediately precedes any given use. Basic block arguments make finding this point extremely easy — it’s always the point right before the using instruction! — but it at least exists as long as there are no critical edges. Looking for the best place to insert code comes up a lot when writing passes.

On some level, you’re right that calling a utility whenever that special case comes up, or just proactively splitting all critical edges, is mostly painless. It’s only mostly painless because it does mean you have to change the function’s CFG in the middle of your pass running, which you might prefer not to do. But the bigger problem is that pass writers rarely recognize the existence of the special case in advance; it only comes up when their pass runs on the wrong code, and suddenly their pass is e.g. inserting code into a path that shouldn’t have it because that’s the only dominating site for a particular use. It’s a correctness issue that’s usually easy to solve by splitting an edge (unless that’s impossible for a particular edge!), but why not define it away completely? It’s actually quite easy for CFG passes to avoid forming new critical edges, and representationally it means that most kinds of terminator never need to carry phi-like block arguments.

Ok, that makes sense, thank you for elaborating. To summarize your points (to make sure I understand you, please correct me):

  1. You agree that critical edges are required given we need to model indirectbr and similar things in MLIR.
  2. You are pointing out that it is convenient to not have critical edges - if you never have them, you never need to worry about splitting them!
  3. “Inconvenience” in compiler passes often means that compiler authors (who are, after all, merely human) will make mistakes and forget about them, leading to compiler bugs.

Given this, I agree with you that it would be ideal to eliminate critical edges (and the bugs and complexity they imply) entirely by construction. The problems I see are:

  1. MLIR in general has to support indirectbr and other constructs like it, based on its goals of wanting to be able to model a wide variety of IR’s.
  2. The LLVM dialect in particular needs indirectbr if we want to allow Clang to move over to it (which cannot drop support for the GNU C indirect goto extension).
  3. Therefore we need MLIR to support critical edges and cannot eliminate them by construction. If we cannot eliminate them entirely, it would be weird (but possible!) to prevent them from the LLVM dialect.

In addition to the above points, I’ll add one more (now very dated) experience point. I experimented with this early on in LLVM, where an early “split critical edges” pass was used in the pipeline, because presplitting critical edges (and maintaining it as an llvm “analysis pass”) seemed like it could be a great simplifying thing.

However, it turned out that critical edges are very common. My experience showed that eager splitting (even if not “required” by the IR) had two really big problems: 1) memory use, and 2) code layout. The memory use issue is pretty easy to understand: every critical edge required an extra BB, terminator, etc.

OTOH, the code layout issue was far more nefarious. The mere existence of the critical edge block meant that instructions got sunk and otherwise placed into it, and once that happens, it is very difficult to get rid of them. Critical edge splitting means that simple loops like:

br loop

loop:
  stuff
  br cond, loop, out
out:

ended up turning into:

br loop

loop:
  stuff
  br cond, loop_critedge, out
loop_critedge:
  br loop
out:

and then something inconsequential (e.g. a bitcast, or an add %i, 1 of an induction variable only used within the loop) got sunk into loop_critedge because of course some cost model somewhere said that it was better to not have it on the exit edge. Of course, lowering this to assembly leads to “nothing good”.

Overall, I don’t think it is realistic to eliminate critical edges from MLIR. OTOH, I think it will be an interesting (long term) experiment to introduce structured control flow to the Clang lowering path once the basic migration to MLIR is done.

WDYT?

-Chris

Loops certainly can suffer from disallowing critical edges. I wouldn’t ever claim that eliminating critical edges entirely has zero trade-offs. I do think that encouraging correctness in passes is more important than those trade-offs.

I will argue with your take on indirectbr; I think it does not absolutely require critical edges. It is straightforward to restrict IR in ways that structurally eliminate them: since address-taken blocks are quite special anyhow, you can require them to have a unique predecessor. This interferes with the natural code generation of these blocks in two possible ways. First, if there are ordinary gotos to them, or fallthrough into them, those will need to go to an immediate successor; however, as I mentioned, both are very rare in the actual use of indirect goto as a feature. And second, the function will have to have a unique indirectbr, at least for any given set of targets; but this is practically necessary anyway, and is already done (just not enforced) with LLVM IR in order to avoid the quadratic explosion of CFG edges that would otherwise result. So I do think it is practical to restrict indirectbr in a way that defines away the critical-edge problem.

Note that most of the restrictions on indirectbr can be handled in MLIR transformations by just making it an uncloneable instruction. And unfortunately, uncloneability is an inconvenient, bug-inducing corner case that I’m pretty sure you do have to have in MLIR.

I don’t know whether all of the unsplittable terminators in LLVM can be defined away in MLIR’s LLVM dialect. My understanding is that the exceptions ones are generally seen as mistakes by the responsible parties; whether they have workable ideas for fixing them, I don’t know. It would probably require being less “literal” in their translation of cleanup blocks.

Hi John, I don’t understand how that works with indirectbr: indirectbr general has multiple successors, and the address taken blocks can be reached by any indirectbr.

I believe that the only way to avoid critical edges to the address-taken blocks would be to prevent duplication of indirectbrs.

In general, it is considered highly problematic to produce multiple indirectbrs in a function. In the typical use-case, there is one indirect goto for every label whose address is taken. If you don’t emit a common branch block, things will go really badly.

Yeah, fair point. I agree that most indirectbr’s have many many successors, and duplicating them isn’t a good idea. I also agree that having the notion of an unclonable instruction is probably required. Incorrectly written inline asms have that property, and I’m sure there are other examples that can cause problems.