Hi debug-info folks,
tl;dr, here’s an idea for better tracking of source location ordering through LLVM, which would allow us to set the is_stmt
flag in DWARF line tables in meaningful ways. Feedback and opinions on whether this is a feasible direction most appreciated. I have a couple of things to explicitly avoid discussing:
- How exactly to represent stepping in DWARF (I’d like to focus on compilery bits here),
- Exactly what representation to use in LLVM, these are broad brush strokes.
I’ve written some code to validate the idea, but wouldn’t want to set anything in stone just yet. This is also purely about instructions that are moved between basic blocks, not the re-ordering caused by instruction scheduling.
Background
In DWARFs .debug_line
section we record information about which instructions correspond to which source location (aka line number). debug-info consumers use this to attribute an instruction to an originating source location, and to determine where in the control-flow of the function that instruction lies relative to other source locations. Unfortunately, sometimes these two things (attribution/ordering) conflict with each other. The classic example I’m aware of is when a load is hoisted from a loop:
10: for (;;) {
11: int foo = *bar;
12: accumul = <arith involving foo>;
13: }
If LLVM hoists the load from “bar” to happen in the loop preheader and we attribute it to line 11, a developer stepping onto the load instruction will believe that execution has reached the loop body, which is not true. On the other hand, if we don’t attribute the load to any line number and the pointer dereference causes an exception, the developer won’t know what line caused the exception. Today, LLVM choses to do the latter, labelling moved instructions with either line zero (no location) or having an empty location (earlier locations “fall through”).
This isn’t ideal; and we should be able to use DWARFs is_stmt
flag to signal what instructions are in-order, and which aren’t. That would allow us to have the best of both worlds: consumers can always attribute an instruction back to a source location, and determine control flow position from instructions with is_stmt
set. Hence:
Preserving source ordering through LLVM
I don’t actually have a proposal for performing preservation during optimisation passes – instead I believe we can use existing facilities to project the original source control flow onto the optimised program, when compilation has finished. Knowledge about where a MachineBasicBlock
lies in the original CFG of the source program can then be used to work out if an instruction is out-of-order. This process comes in two parts, preserving the original CFG and projecting it onto the optimised program.
As far as I’m aware, LLVM currently doesn’t store any information about the ordering of source locations in the source language. We don’t record that line 11 preceeds line 12 in the example above (and we can’t rely on line numbers always increasing). As a pre-requisite, we need to record in some metadata node what the ordering of original source lines is. I’ve made a quick prototype of this, storing:
- A metadata node for each lexical block,
- The order of all
DILocation
s attached to stores in that block (see below), - What other
DILocation
s exist in that block (unordered), and - The successor lexical blocks from each lexical block.
Which gives us a very rough ordering of source locations, where you can at least determine whether a DILocation
is in a block that dominates another block.
The major problem is projecting that CFG onto the optimised program. To do this, I take inspiration from one of @rnk 's past ideas (“maybe we need an instruction that just ‘hangs around’ and doesn’t change order”). Existing debug instructions like dbg.value
usually don’t move when optimisations are performed – typically LLVM either optimises a single instruction at a time, or it threads / joins / re-orders entire basic blocks at a time. If we could rely on the DILocation
of debug instructions being correct, we would be able to “know” where we are in the original programs CFG. Obviously, there might not be a good correspondence between the original CFG and the optimised program, however the debug instructions mark where variable values change, and it’s fairly hard to re-order them relative to stores. Thus, IMHO, their positions correspond well with the visible side-effects of the function, and so it’s very likely to be the order that the developer wants to step their program in.
To explore this, I cooked up a prototype based on @OCHyams 's assignment-tracking prototype, making the DILocation
of each dbg.assign
a meaningful marker of the original source position. That doesn’t account for blocks with no assignments to locals though, so I also emitted dbg.label
’s with meaningful DILocation
s for every non-local-variable store in the program. (Yes, more debug instructions, but see final section). I then used an off-the-shelf automata-equivalence-class algorithm to join-up the positions of assignments / dbg.label
s in the optimised-program / MachineBasicBlock
CFG at the end of compilation, with the unoptimised source CFG. Examples of joined CFGs are illustrated in the attached diagrams.
I picked some functions at random from InstrRefBasedLDV
@ the LLVM14 release, first the constructor for the MLocTracker
class: see diagrams track_mbb_cfg.pdf track_unopt_cfg.pdf track_joined_cfg.pdf . The first two are the MachineBasicBlock
and unoptimised CFGs respectively, the last is the two CFGs joined + reduced into equivalence classes . I’ve added annotations of some features to the PDF. I’m going to skip explaining exactly how this works unless someone’s really interested, but my major point is that even after optimisation the debug instruction positions still represent the original CFG of the program (otherwise they wouldn’t be separable into classes). This continues to work well in the presence of, for example, code duplication – if I force the central loop in the function to be unrolled, we just get four MachineBasicBlock
’s in each relevant equivalence class: track_unroll.pdf.
Here’s an example of what happens when this approach doesn’t work: vlocjoin_bad.pdf . In the vlocJoin
function there’s a sequence of loops and blocks from lines ~2410 to 2454 where there are only function calls and few stores or local assignments. I didn’t attempt instrumenting call sites yet, and as a result the equivalence algorithm can’t properly order all blocks. The result is one mega-class, from which smaller sub-graphs can be peeled away, but they all loop back through the mega-class. My point: I’d expect to see that in most functions if debug instructions were badly re-ordered or deleted, so far it’s only in scenarios that I haven’t fully instrumented.
Finally, extendranges.pdf is the joined CFG of ExtendRanges
, one of the larger functions in InstrRefBasedLDV
, where things seem to work well. There are still some peculiarities, see annotations.
Conclusion: the analysis seems to be effective at mapping one CFG onto the other (i.e., the ordering of debug instructions matches the ordering of the original CFG). I think that means we can rely on instructions that “hang around” in one position, like dbg.label
or dbg.assign
, to determine where MachineBasicBlock
s in the optimised program are, in terms of the original control flow. It seems to be alright with code duplication and deletion, completely re-ordering blocks en-mass doesn’t appear to be something LLVM does. Performing an automata-equivalence analysis might not be necessary in a final design, but it’s given me confidence that this is a good direction.
How to use it
Once we know where in the original CFG a particular MachineBasicBlock
lies, we can determine whether a particular instruction’s DILocation
is out of order. For example, if we had source location ordering [“foo”, “bar”, “baz”], to determine whether an instruction with DILocation
“bar” has been hoisted, we can query:
- That none of the predecessor blocks are in an equivalence class with a lexical scope dominated by bar’s lexical block, and
- at least one successor
MachineBasicBlock
is in an equivalence class with a lexical scope that dominates bar.
Which would mean that a consumer stepping through can only see line numbers before bar, then bar itself, then a line number before bar again: an out-of-order stepping such as “foo bar foo”. Such instructions would be candidates for us to not put is_stmt
on, indicating they’re not in program order. An equivalent scheme can be made for sunk instructions.
I’ve run some experiments with these queries, on a few large-ish functions in InstrRefBasedLDV
. I disabled various parts of LLVM that set line-zero source locations when instructions are hoisted/sunk, because those are the instructions that I want to try and identify. Looking at the output, there are numerous loop preheaders that this technique identifies where instructions have been hoisted to be out of order (and are usually line-zero), plus several instructions that have sunk (usually because they’ve been rematerialized). Unfortunately the most interesting hoisted/sunk instructions aren’t yet identified as I haven’t tried implementing support for inlined scopes yet, plus there are a few false positives due to usual development bugs. Limitations I’m aware of right now:
- The prototype doesn’t recognise blocks that only contain function calls right now.
- Loop rotation causes for-loops (where the condition evaluation dominates the body) to… rotate, to have the body dominating the loop condition.
- Additional control flow introduced by branching to assertion-failure blocks make graphs messy.
- Ordering some source locations is difficult: for the C expression
Array[idx]
, both the whole expression and theArray
identifier get the same source location (I think), which might mean it can’t be strictly ordered. - What do you do if you inline a function and a large portion of it is immediately optimised out through static analysis of the call site?
I think this technique (relying on order of instructions that “hang around” and don’t move) could be useful for collecting the information that is then used to set is_stmt
, and it’s not hugely invasive into the rest of the compiler (in terms of extra instrumentation required). How do other people feel about this path towards a useful is_stmt
? If there’s something clearly infeasible or optimisations that will certainly mess with the ordering of execuion, I’d appreciate knowing about it.
But those instructions…
Of course, the idea above involves adding Even More ™ debug instructions into basic blocks. I’d like to point out that marking the position of a source location order point does not necessarily need to be implemented as an IR instruction. The first thing that comes to mind is that it could be a property of the block itself, making use of ilist
callbacks to be maintained.
And of course, if all variable locations in the future are specified by dbg.assign
, and we had some non-instruction based way of referring to Value
s, then we could roll the position of dbg.assign
instructions into source location markers. Which would then mean… we wouldn’t need any debug instructions.
There’s a lot of scope for bike-shedding about getting rid of debug instructions here, I’m bringing it up because I think statement ordering and recording variable assignment positions are naturally related, and we should be able to design something that solves both of them. Eventually.
–
Thanks,
Jeremy