[RFC] Changing the indirectbr/blockaddress representation

Problem

LLVM represents computed gotos using a combination of indirectbr and blockaddress:

  indirectbr ptr %ba, [label %block1, label %block2]

  ; where %ba is one of
  blockaddress(@fn, %block1)
  blockaddress(@fn, %block2)

The current indirectbr/blockaddress representation is a frequent source of bugs, because it places an unusual constraint on control-flow transformations: It is not legal to redirect an indirectbr successor to a different block.

A simple example is this:

  indirectbr ptr %ba, [label %empty_block]
empty_block:
  br %other_block

; Cannot be changed to
  indirectbr ptr %ba, [label %other_block]

; Because the blockaddress(@fn, %empty_block) will still refer
; to the old block, which is no longer a successor of the indirectbr.

This means that many basic transforms like critical edge splitting do not work if indirectbr is involved, and this is why control flow transforms often need special checks for indirectbr terminators and/or address-taken blocks. Because computed gotos are very rare, it often takes quite a while for the issues to be found and fixed.

Proposal

The solution I’m considering is to change blockaddress from referring to a basic block, and instead refer to a specific indirectbr successor index. Something along these lines:

  %indbr = indirectbr ptr %ba, [label %block1, label %block2]

  ; where %ba is one of
  blockaddress(@fn, %indbr, successor 0 /* %block1 */)
  blockaddress(@fn, %indbr, successor 1 /* %block2 */)

This means that it is possible to change an indirectbr successor as usual. As a result of transforms, there may also be multiple identical successors.

Some other implications are:

  • Some additional handling will be needed when removing entries from the indirectbr successor list – but this is something only transforms that specifically work on indirectbr need to do, rather than generic control-flow transforms.
  • It’s not possible to have two distinct indirectbrs in one function that share a common set of labels. This is already considered non-canonical: Clang does not produce IR for this, it will always merge all computed gotos in a function into a single indirectbr.
  • As a result of the previous point, it is illegal to clone indirectbr. We already don’t do this (in the middle-end) to avoid quadratic CFG explosion. And cloning the indirectbr together with its successors was already previously illegal. So I don’t think this has practical impact.

The representation in the backend would stay the same as it is right now, as we do need the ability to tail duplicate indirectbrs there.

Representation

One thing I’m a bit unclear on is the representation of “blockaddress referencing indirectbr”, both in textual IR and in memory.

The above textual IR sample gives the indirectbr a name via %indbr =. This is currently invalid, because void returning instructions can’t have a name. I’m not sure whether there is a better way to handle this.

For the in-memory representation, I’m not sure whether blockaddress should be part of the use-list of indirectbr or not. If it’s in the use-list, that would be the first time an instruction can have a non-instruction user. I’m inclined to say that it shouldn’t be in the use-list.

blockaddress abuse

LLVM only supports the use of blockaddress in conjunction with indirectbr. Other uses of blockaddress can be folded to arbitrary, non-null values.

One such abuse we’ll have to contend with is the _THIS_IP_ macro in the Linux kernel:

#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })

I am proposing to replace this with a builtin like __builtin_instruction_pointer(), which does not make use of blockaddress. See Add builtin/intrinsic to get current instruction pointer? · Issue #138272 · llvm/llvm-project · GitHub.

Feedback

I’d appreciate some feedback on whether the general direction outlined in this RFC makes sense.

In the original design document for indirectbr, there is this:

We actually made a significant effort to find a way to allow the edges
to be split in order to avoid encumbering the optimizer, however that
quickly became very awkward, and would have required very inefficient
codegen. The optimizer would have had to be taught how to avoid
splitting these edges in order to avoid pessimizing the code, however
that basically led to the same optimizer encumbrances that we were
trying to avoid. Consequently, it’s better to just forbid splitting of
these edges altogether, avoiding the awkwardness.

However, I wasn’t able to find any more details on what designs were considered here or why allowing critical edge splitting would cause very inefficient codegen. If someone remembers what this is referring to, that would be great…

1 Like

I have seen and fixed multiple miss-compiles caused by control flow transformation not properly handling indirectbr/blockaddress so I completely agree on the problem. I am interested in helping out if you want/need.

Right now I think its legal to have multiple indirectbr in the same function that share a blockaddress. clang appears to always generate a single indirectbr by functions. If block blockaddress are associated to a specific indirectbr we need may need to disallow it.

For the in-memory representation, I’m not sure whether blockaddress should be part of the use-list of indirectbr or not. If it’s in the use-list, that would be the first time an instruction can have a non-instruction user. I’m inclined to say that it shouldn’t be in the use-list.

I agree It would be kind of weird to have the blockaddress as a part of the use list, but we still need a fast way to find all blockaddress associated to specific operand of the indirectbr.

Note:
One added benefit of associating blockaddress to specific operands of an indirectbr is the we it could make inlining (and maybe other inter-procedural code-motion) of indirectbr/blockaddress easy enought that it would be worth implementing.

This seems like a circular construct, which is something that’s been annoying to deal with when it’s come up in the past (e.g. struct types containing a pointer to themselves).

I do wonder if these days the set of possible successors should be metadata rather than real arguments (with no metadata meaning “all labels other than the entry block’s”)? Would that make it more likely that transformations don’t mess it up?

For “blockaddress referencing indirectbr” you can probably use tokens.

For critical edges, the problem is really blocks which have multiple indirectbr predecessors: in that case, “splitting” the edge becomes a very complicated transform, which at best destroys the performance benefits of the indirectbr.

This proposal imposes a restriction on duplicating indirectbr instructions. We currently avoid doing that, but not because there’s actually any formal rule restricting it; it just tends to cause compile-time explosions due to the CFG complexity. This is a problem even at the MIR level, where we do duplicate indirectbr. There is a rule forbidding duplicating the destinations of indirectbr instructions, but that’s sort of a different issue.

If the issue is just that transforms aren’t detecting that the destinations of indirectbr instructions have unusual properties, how much could we improve that by just having better helper functions? We don’t have a problem with landing pads.

Right. We do have some critical edge splitting support for indirectbr, but it works in an inverted way from normal critical edge splitting and I’m not sure it even tries to handle multiple indirectbr predecessors. It’s a separate transform that is not integrated with normal edge splitting.

But all this is just an artifact of the current representation, right? Unless I’m missing something, edge splitting should work as usual under this proposal, and without performance penalties.

Yes, this is correct. I’ve added this point as an implication of the proposal. And I should clarify that I wouldn’t change the blockaddress representation in the backend, only at the IR level. We do need the ability to duplicate indirectbrs in the backend.

It’s not immediately clear how we can improve this with helpers. The APIs that don’t work with indirectbr are basic things like setSuccessor() or replaceUsesOfWith(). Transforms need to bail out for indirectbr’s early, not at the point where changes get made.

This makes sense. It points out one of the interesting tradeoffs of LLVM IR design. We don’t have explicit “edge” representations, which I’m told is something other compilers do. You’re sort of backing into that concept by numbering the indirectbr successors.

This might be a bad idea, but you could say that the value type of indirectbr is token. That’s basically our stand-in for an opaque value that doesn’t get translated into a real machine value type, but participates in the def-use graph.

invoke and callbr are existing non-void terminators, so we have some prior art for that.


Perhaps a design alternative that avoids the explicit integer numbering would be to store parallel arrays of blockaddresses (really edges) and basic block successors as operands in the indirectbr. The value type of the blockaddress is still ptr, but the blockaddress holds nothing but a reference to the parent function. During codegen, blockaddresses are resolved to MBBs, labels, and symbols by scanning all indirectbrs after we’ve done all the branch folding transforms we want to do. This way, we never have to update the integers held in Constants, creating LLVMContext garbage.

This presents a challenge of how to name them. I guess they are kind of like Arguments, but they can be referenced from globals and other functions. I can imagine some textual syntax like:

define void @foo()
  blockaddresses { %edge0, %edge1, %edge2 }
{
entry:
  indirectbr [ %edge0, %bb0 ], [%edge1, %bb1], ...
...
bb0:
...
bb1:
...
}

References could look like ptr blockaddress(@foo, %edge0). We’d use the Function symbol table to track unique names.

Now, it should be safe to split critical edges, and do simple BB RAUW operations with fewer checks, since the edge will refer to the correct replacement in the end.

The downside is that our representation now has one more Value per edge, but these are super rare, and the whole point is to make normal CFG transformations work seamlessly on indirectbr.

3 Likes

Yeah, that’s a good way of looking at it. Effectively this proposal replaces blockaddress with an edgeaddress.

This makes a lot of sense to me, and also neatly side-steps the problem of “how do we refer to an indirectbr instruction”. We only need to refer to the edge values, which we can name.

In your textual IR sample, what is the purpose of blockaddresses { %edge0, %edge1, %edge2 }? Can’t we treat the indirectbr as defining those edge addresses?

Perhaps a design alternative that avoids the explicit integer
numbering would be to store parallel arrays of blockaddresses (really
edges) and basic block successors as operands in the |indirectbr|. The
value type of the blockaddress is still |ptr|, but the blockaddress
holds nothing but a reference to the parent function. During codegen,
blockaddresses are resolved to MBBs, labels, and symbols by scanning
all indirectbrs after we’ve done all the branch folding transforms we
want to do. This way, we never have to update the integers held in
Constants, creating LLVMContext garbage.

FWIW, one point in favor of this idea is that it made it much more
obvious to me what the proposal was actually changing in such a way that
I could more clearly see that it would have the desired effect.

If there’s a unique indirectbr in the function, or at least different indirectbrs use independent sets of blocks, then yeah, edge splitting should be fine.

I like the idea of formalizing this as some sort of switch on opaque tokens:

switchtoken %0 [ (token @f 0, bbX), (token @f 1, bbY), (token @f 2, bbZ) ]

The compiler has free rein to pick token values, limited only by token values ultimately just being pointers.

If all the switchtokens in the function use the same destination block for any particular token, then they can all just be assigned the address of that block, and all of the switchtokens can just be indirect branches. Edge-splitting never screws this up as long as the switchtokens in the function all have independent sets of tokens they switch on. If that’s not true, though, then edge-splitting will make this degrade very badly. But as you say, we already strongly encourage frontends to emit a unique switchtoken block. We could potentially even make that an IR requirement, that different switchtokens in a function must use independent sets of tokens. Making that happen implicitly with a reference to the actual instruction in the constant seems unnecessarily fraught, though.

I looked at LangRef, and nothing says you can’t have multiple indirectbr instructions. We merge all indirectbrs together to keep the CFG simple, only to tail duplicate them later during codegen, assuming there are no compiler bugs, as in this regression. So, part of the value of listing the tokens separately is so that multiple indirectbr instructions can use them.

Having the indirectbr define the edges makes it awkward to blindly duplicate or delete arbitrary instructions without additional cleanup. Having the function explicitly enumerate a list of all address-taken labels seems nice.

I tried to consider if this model offers any benefits for the other applications of address-taken labels, which look like the Linux trace system built around our other favorite feature, asm-goto, but I think there’s no impact there.

I believe that under this model, we do need/want to forbid multiple indirectbrs using the same set of “tokens”. This is discussed above, but maybe some explicit examples would help clarify why this is problematic.

Let’s start with something like this:

indbr1:
  indirectbr [ %edge0, %bb0 ], ...
indbr2:
  indirectbr [ %edge0, %bb0 ], ...

Now, because the successors of the indirectbrs are now freely modifiable (which is the main goal of this proposal), we can end up rewriting this to something like this, e.g. as a result of jump threading:

indbr1:
  indirectbr [ %edge0, %bb1 ], ...
indbr2:
  indirectbr [ %edge0, %bb2 ], ...

But how do you lower this? You have a common token %edge0 for them, but you need that token to go to two different destinations now. The only real way to do that I see is something along the lines of:

indbr1:
  indirectbr [ %edge0, %bb.common ], ...
indbr2:
  indirectbr [ %edge0, %bb.common ], ...
bb.common:
  %c = phi i1 [ true, %indbr1 ], [ false, %indbr2 ]
  br i1 %c, label %bb1, label %bb2

Which is a big pessimization for a performance-critical control-flow construct.

While it’s possible to support this, I do think it’s better to make sharing of indirectbr tokens across multiple indirectbr instructions invalid IR, rather than create this kind of footgun, which also requires additional handling to support.

We already don’t want to duplicate indirectbrs in IR, and we already have a notion of instructions that cannot be duplicated, so I don’t think having this as a hard limitation rather than only a recommendation would have any practical downsides.

It’s probably worth noting that a variant of this problem already exists in the current indirectbr representation, for the case where two indirectbrs go to common blocks with phi nodes. As it’s impossible to split the critical indirectbr edges, the phi setup for all successor blocks needs to happen in the indirectbr block, which can also be a big pessimization.

For callbr, I already changed the representation a few years ago (see ⚙ D129288 [IR] Don't use blockaddresses as callbr arguments) to not use blockaddress. Before that callbr was in the same boat as indirectbr. But the callbr case is simpler than the indirectbr one, because it only needs to materialize the “block addresses” in inline asm, not as arbitrary IR values.