Problem
LLVM represents computed gotos using a combination of indirectbr and blockaddress:
indirectbr ptr %ba, [label %block1, label %block2]
; where %ba is one of
blockaddress(@fn, %block1)
blockaddress(@fn, %block2)
The current indirectbr/blockaddress representation is a frequent source of bugs, because it places an unusual constraint on control-flow transformations: It is not legal to redirect an indirectbr successor to a different block.
A simple example is this:
indirectbr ptr %ba, [label %empty_block]
empty_block:
br %other_block
; Cannot be changed to
indirectbr ptr %ba, [label %other_block]
; Because the blockaddress(@fn, %empty_block) will still refer
; to the old block, which is no longer a successor of the indirectbr.
This means that many basic transforms like critical edge splitting do not work if indirectbr is involved, and this is why control flow transforms often need special checks for indirectbr terminators and/or address-taken blocks. Because computed gotos are very rare, it often takes quite a while for the issues to be found and fixed.
Proposal
The solution I’m considering is to change blockaddress from referring to a basic block, and instead refer to a specific indirectbr successor index. Something along these lines:
%indbr = indirectbr ptr %ba, [label %block1, label %block2]
; where %ba is one of
blockaddress(@fn, %indbr, successor 0 /* %block1 */)
blockaddress(@fn, %indbr, successor 1 /* %block2 */)
This means that it is possible to change an indirectbr successor as usual. As a result of transforms, there may also be multiple identical successors.
Some other implications are:
- Some additional handling will be needed when removing entries from the indirectbr successor list – but this is something only transforms that specifically work on indirectbr need to do, rather than generic control-flow transforms.
- It’s not possible to have two distinct indirectbrs in one function that share a common set of labels. This is already considered non-canonical: Clang does not produce IR for this, it will always merge all computed gotos in a function into a single indirectbr.
- As a result of the previous point, it is illegal to clone indirectbr. We already don’t do this (in the middle-end) to avoid quadratic CFG explosion. And cloning the indirectbr together with its successors was already previously illegal. So I don’t think this has practical impact.
The representation in the backend would stay the same as it is right now, as we do need the ability to tail duplicate indirectbrs there.
Representation
One thing I’m a bit unclear on is the representation of “blockaddress referencing indirectbr”, both in textual IR and in memory.
The above textual IR sample gives the indirectbr a name via %indbr =. This is currently invalid, because void returning instructions can’t have a name. I’m not sure whether there is a better way to handle this.
For the in-memory representation, I’m not sure whether blockaddress should be part of the use-list of indirectbr or not. If it’s in the use-list, that would be the first time an instruction can have a non-instruction user. I’m inclined to say that it shouldn’t be in the use-list.
blockaddress abuse
LLVM only supports the use of blockaddress in conjunction with indirectbr. Other uses of blockaddress can be folded to arbitrary, non-null values.
One such abuse we’ll have to contend with is the _THIS_IP_ macro in the Linux kernel:
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })
I am proposing to replace this with a builtin like __builtin_instruction_pointer(), which does not make use of blockaddress. See Add builtin/intrinsic to get current instruction pointer? · Issue #138272 · llvm/llvm-project · GitHub.
Feedback
I’d appreciate some feedback on whether the general direction outlined in this RFC makes sense.
In the original design document for indirectbr, there is this:
We actually made a significant effort to find a way to allow the edges
to be split in order to avoid encumbering the optimizer, however that
quickly became very awkward, and would have required very inefficient
codegen. The optimizer would have had to be taught how to avoid
splitting these edges in order to avoid pessimizing the code, however
that basically led to the same optimizer encumbrances that we were
trying to avoid. Consequently, it’s better to just forbid splitting of
these edges altogether, avoiding the awkwardness.
However, I wasn’t able to find any more details on what designs were considered here or why allowing critical edge splitting would cause very inefficient codegen. If someone remembers what this is referring to, that would be great…