[DebugInfo] An idea for determining source-location orders after optimisation (aka: plumbing for is_stmt)

Hi debug-info folks,

tl;dr, here’s an idea for better tracking of source location ordering through LLVM, which would allow us to set the is_stmt flag in DWARF line tables in meaningful ways. Feedback and opinions on whether this is a feasible direction most appreciated. I have a couple of things to explicitly avoid discussing:

  • How exactly to represent stepping in DWARF (I’d like to focus on compilery bits here),
  • Exactly what representation to use in LLVM, these are broad brush strokes.

I’ve written some code to validate the idea, but wouldn’t want to set anything in stone just yet. This is also purely about instructions that are moved between basic blocks, not the re-ordering caused by instruction scheduling.

Background

In DWARFs .debug_line section we record information about which instructions correspond to which source location (aka line number). debug-info consumers use this to attribute an instruction to an originating source location, and to determine where in the control-flow of the function that instruction lies relative to other source locations. Unfortunately, sometimes these two things (attribution/ordering) conflict with each other. The classic example I’m aware of is when a load is hoisted from a loop:

  10: for (;;) {
  11:   int foo = *bar;
  12:   accumul = <arith involving foo>;
  13: }

If LLVM hoists the load from “bar” to happen in the loop preheader and we attribute it to line 11, a developer stepping onto the load instruction will believe that execution has reached the loop body, which is not true. On the other hand, if we don’t attribute the load to any line number and the pointer dereference causes an exception, the developer won’t know what line caused the exception. Today, LLVM choses to do the latter, labelling moved instructions with either line zero (no location) or having an empty location (earlier locations “fall through”).

This isn’t ideal; and we should be able to use DWARFs is_stmt flag to signal what instructions are in-order, and which aren’t. That would allow us to have the best of both worlds: consumers can always attribute an instruction back to a source location, and determine control flow position from instructions with is_stmt set. Hence:

Preserving source ordering through LLVM

I don’t actually have a proposal for performing preservation during optimisation passes – instead I believe we can use existing facilities to project the original source control flow onto the optimised program, when compilation has finished. Knowledge about where a MachineBasicBlock lies in the original CFG of the source program can then be used to work out if an instruction is out-of-order. This process comes in two parts, preserving the original CFG and projecting it onto the optimised program.

As far as I’m aware, LLVM currently doesn’t store any information about the ordering of source locations in the source language. We don’t record that line 11 preceeds line 12 in the example above (and we can’t rely on line numbers always increasing). As a pre-requisite, we need to record in some metadata node what the ordering of original source lines is. I’ve made a quick prototype of this, storing:

  • A metadata node for each lexical block,
  • The order of all DILocations attached to stores in that block (see below),
  • What other DILocations exist in that block (unordered), and
  • The successor lexical blocks from each lexical block.

Which gives us a very rough ordering of source locations, where you can at least determine whether a DILocation is in a block that dominates another block.

The major problem is projecting that CFG onto the optimised program. To do this, I take inspiration from one of @rnk 's past ideas (“maybe we need an instruction that just ‘hangs around’ and doesn’t change order”). Existing debug instructions like dbg.value usually don’t move when optimisations are performed – typically LLVM either optimises a single instruction at a time, or it threads / joins / re-orders entire basic blocks at a time. If we could rely on the DILocation of debug instructions being correct, we would be able to “know” where we are in the original programs CFG. Obviously, there might not be a good correspondence between the original CFG and the optimised program, however the debug instructions mark where variable values change, and it’s fairly hard to re-order them relative to stores. Thus, IMHO, their positions correspond well with the visible side-effects of the function, and so it’s very likely to be the order that the developer wants to step their program in.

To explore this, I cooked up a prototype based on @OCHyams 's assignment-tracking prototype, making the DILocation of each dbg.assign a meaningful marker of the original source position. That doesn’t account for blocks with no assignments to locals though, so I also emitted dbg.label’s with meaningful DILocations for every non-local-variable store in the program. (Yes, more debug instructions, but see final section). I then used an off-the-shelf automata-equivalence-class algorithm to join-up the positions of assignments / dbg.labels in the optimised-program / MachineBasicBlock CFG at the end of compilation, with the unoptimised source CFG. Examples of joined CFGs are illustrated in the attached diagrams.

I picked some functions at random from InstrRefBasedLDV @ the LLVM14 release, first the constructor for the MLocTracker class: see diagrams track_mbb_cfg.pdf track_unopt_cfg.pdf track_joined_cfg.pdf . The first two are the MachineBasicBlock and unoptimised CFGs respectively, the last is the two CFGs joined + reduced into equivalence classes . I’ve added annotations of some features to the PDF. I’m going to skip explaining exactly how this works unless someone’s really interested, but my major point is that even after optimisation the debug instruction positions still represent the original CFG of the program (otherwise they wouldn’t be separable into classes). This continues to work well in the presence of, for example, code duplication – if I force the central loop in the function to be unrolled, we just get four MachineBasicBlock’s in each relevant equivalence class: track_unroll.pdf.

Here’s an example of what happens when this approach doesn’t work: vlocjoin_bad.pdf . In the vlocJoin function there’s a sequence of loops and blocks from lines ~2410 to 2454 where there are only function calls and few stores or local assignments. I didn’t attempt instrumenting call sites yet, and as a result the equivalence algorithm can’t properly order all blocks. The result is one mega-class, from which smaller sub-graphs can be peeled away, but they all loop back through the mega-class. My point: I’d expect to see that in most functions if debug instructions were badly re-ordered or deleted, so far it’s only in scenarios that I haven’t fully instrumented.

Finally, extendranges.pdf is the joined CFG of ExtendRanges, one of the larger functions in InstrRefBasedLDV, where things seem to work well. There are still some peculiarities, see annotations.

Conclusion: the analysis seems to be effective at mapping one CFG onto the other (i.e., the ordering of debug instructions matches the ordering of the original CFG). I think that means we can rely on instructions that “hang around” in one position, like dbg.label or dbg.assign, to determine where MachineBasicBlocks in the optimised program are, in terms of the original control flow. It seems to be alright with code duplication and deletion, completely re-ordering blocks en-mass doesn’t appear to be something LLVM does. Performing an automata-equivalence analysis might not be necessary in a final design, but it’s given me confidence that this is a good direction.

How to use it

Once we know where in the original CFG a particular MachineBasicBlock lies, we can determine whether a particular instruction’s DILocation is out of order. For example, if we had source location ordering [“foo”, “bar”, “baz”], to determine whether an instruction with DILocation “bar” has been hoisted, we can query:

  • That none of the predecessor blocks are in an equivalence class with a lexical scope dominated by bar’s lexical block, and
  • at least one successor MachineBasicBlock is in an equivalence class with a lexical scope that dominates bar.

Which would mean that a consumer stepping through can only see line numbers before bar, then bar itself, then a line number before bar again: an out-of-order stepping such as “foo bar foo”. Such instructions would be candidates for us to not put is_stmt on, indicating they’re not in program order. An equivalent scheme can be made for sunk instructions.

I’ve run some experiments with these queries, on a few large-ish functions in InstrRefBasedLDV. I disabled various parts of LLVM that set line-zero source locations when instructions are hoisted/sunk, because those are the instructions that I want to try and identify. Looking at the output, there are numerous loop preheaders that this technique identifies where instructions have been hoisted to be out of order (and are usually line-zero), plus several instructions that have sunk (usually because they’ve been rematerialized). Unfortunately the most interesting hoisted/sunk instructions aren’t yet identified as I haven’t tried implementing support for inlined scopes yet, plus there are a few false positives due to usual development bugs. Limitations I’m aware of right now:

  • The prototype doesn’t recognise blocks that only contain function calls right now.
  • Loop rotation causes for-loops (where the condition evaluation dominates the body) to… rotate, to have the body dominating the loop condition.
  • Additional control flow introduced by branching to assertion-failure blocks make graphs messy.
  • Ordering some source locations is difficult: for the C expression Array[idx], both the whole expression and the Array identifier get the same source location (I think), which might mean it can’t be strictly ordered.
  • What do you do if you inline a function and a large portion of it is immediately optimised out through static analysis of the call site?

I think this technique (relying on order of instructions that “hang around” and don’t move) could be useful for collecting the information that is then used to set is_stmt, and it’s not hugely invasive into the rest of the compiler (in terms of extra instrumentation required). How do other people feel about this path towards a useful is_stmt? If there’s something clearly infeasible or optimisations that will certainly mess with the ordering of execuion, I’d appreciate knowing about it.

But those instructions…

Of course, the idea above involves adding Even More ™ debug instructions into basic blocks. I’d like to point out that marking the position of a source location order point does not necessarily need to be implemented as an IR instruction. The first thing that comes to mind is that it could be a property of the block itself, making use of ilist callbacks to be maintained.

And of course, if all variable locations in the future are specified by dbg.assign, and we had some non-instruction based way of referring to Values, then we could roll the position of dbg.assign instructions into source location markers. Which would then mean… we wouldn’t need any debug instructions.

There’s a lot of scope for bike-shedding about getting rid of debug instructions here, I’m bringing it up because I think statement ordering and recording variable assignment positions are naturally related, and we should be able to design something that solves both of them. Eventually.


Thanks,
Jeremy

1 Like

I like this idea, it would allow us to separate source locations for diagnostics from source locations for stepping. I have a few high-level questions before diving into the implementation:

  • Is there any wording in the DWARF spec to support this interpretation of is_stmt?
  • The primary motivation for zeroing out the location when hoisting an instruction into a different basic block that we give in How to Update Debug Info: A Guide for LLVM Pass Authors — LLVM 16.0.0git documentation is that we don’t want to create an unintuitive/misleading control flow. Should a consumer point out the fact that a line entry has no is_stmt and warn that the instruction may have been moved around by the compiler?
  • Could this break code coverage?

Can you give an example for this that doesn’t involve #line?

The order of all DILocation s attached to stores in that block

Where does the frontend get this information from, if it isn’t inferring it from the line numbers?

Hi Adrian,

In a word, “no”, or at least “not clearly”. As far as I’m aware, the purpose of the flag is entirely specified on DWARF5 S6.2.2 p151 as:

My understanding of this is that is_stmt disambiguates which instruction is “where” a statement occurs, if there’s a one-to-many relationship between source locations and instructions. I don’t believe the spec ever considers how to describe instructions that are out of source order, although there’s informative text in S6.2.4 p155 line 10 which pre-supposes that is_stmt should not be set on instructions that are out of order:

I believe is_stmt has been used to encode stepping information in gcc and gdb, @dblaikie has referred to it in the past, however I’m not fully up-to-speed on exactly how consumers determine where-to-step.

I think this cuts to the heart of the matter, in that the whole development system (compilers+debuggers) needs to describe code that has been optimised to developers, in a way that lets developers understand what happened to their code. Whether to show a line without is_stmt is probably depends on the context: if the developer wants to see a generic disassembly of their function, I would say it’s safe to hide (or de-emphasise) lines without is_stmt. On the other hand, when looking at a stack trace from a crash, you definitely want to know exactly where the faulting instruction came from.

I don’t think there’s an overall “right” answer here; I feel giving consumers more information to make good choices is worthwhile though. It’s also worth noting that we label pretty much every line number change with is_stmt right now.

This hadn’t occurred to me. I suppose it’s possible that a speculated instruction keeping it’s source location (but not having is_stmt) might make a block look executed when it isn’t. I’ll take a closer look at how existing tooling works though.

Hmm – I didn’t intend to say that individual lines or statements could be out-of-order in unoptimised code, instead we don’t record that line 11 always preceeds line 12 on all code paths. Without further instrumentation, I don’t believe we can distinguish the following two code structures, the original where the foo assignment should/will get hoisted:

10: for (;;) {
11:  int foo = *bar;
12:  accuml = <arith involving foo>;
13: }

and one where the assignment to foo is "pre"hoisted in the source code:

10:  int foo = *bar;
11:  for (;;) {
12:    accuml = <arith involving foo>;
13:  }

(ignoring the scope of foo). Let’s say the two produce the same optimised code: in the first sample we would probably want to not set is_stmt on the source location for the load of *bar, as it’s hoisted, but in the second sample the developer expects to step on the load of *bar before the loop. Distinguishing these two scenarios requires being able to identify when source locations are in different lexical blocks or scopes. That leads us to:

So far I’ve just been instrumenting directly: inserting hooks into clang’s CGStmt.cpp and CGDebugInfo.cpp to collect all DILocations created, all DILocations attached to store instructions (corresponding to assignments), and separating them into different “blocks” when various control flow things happen (loops, conditionals). It’s effectively a record of the unoptimised programs control flow. I considered just instrumenting the flow of BasicBlock’s directly but I felt it was too fine-grained, and would replicate what we have now.

I’m not 100% certain what information needs to be preserved, but at a very high level, I think it’s better to reduce the amount of debug-info decision making needed in optimisation passes in favour of making decisions once compilation completes. Doing that means preserving information at the start of compilation, for use at the end.

As mentioned the implementation isn’t especially interesting, if I had to pick out the important part, it would be that instructions that “hang around” and represent the positions of statements and assignments appear to not be re-ordered. We can use them to infer “where we are” in the unoptimised program, and make better is_stmt decisions from that.

Yeah, so a lot of the work around zero locations came out of Google getting LLVM up to being used as our optimizing compiler - and a big part of that was ensuring that profile driven optimization was working well, and a big part of that was ensuring that locations could never provide misleading conclusions about what code was reachable/what things were true.

Essentially, profile driven optimization needs a strict/strong guarantee that it never gives a false positive about the execution of a line of source code under a condition that would mislead the optimizers/profile about whether that condition is true.

So any solution here needs to be provable, “this looks good enough for developers, works in most cases” is not something that would be OK (consider cases where there are no variables/debug locations - does that lead the location handling logic proposed to think that these things are more flexible about order/not “out of order”). For this reason it might be necessary to carry this on the instructions when they are moved, rather than trying to derive it after the fact.

& if we end up using is_stmt to encode this in the final DWARF (regardless of how we determine it/track it to reach there) then we’ll need to update profile collection tools to know/record/drop the non-is_stmt entries.

I think debugging and (non-instrumented) profiling and code-coverage have different needs here.

Profiling, it seems to me, isn’t really interested in statements; it’s interested in basic blocks and control flow. In some cases a block spans multiple statements, in some cases you have multiple blocks in a single statement. This tells me that source location is inappropriate for profiling. Profiling doesn’t want a hoisted load to mislead it into thinking the loop block was executed. However, if we kept track of the fact that the hoisting changed the containing BB, profiling’s needs would be satisfied, and it really wouldn’t mind about source location.

Code coverage arguably has the same interests as profiling; if you’ve entered a BB, you pretty much guarantee you’ve executed the entire BB, by definition. Again, if we tracked BB then code coverage wouldn’t care about source locations, or at least not as much.

Debugging optimized code, of course, is torn between telling the truth to the user and telling the truth is such agonizing detail that the user is hard-pressed to understand what is going on. My own preference is to reflect the code-as-compiled rather than the code-as-written, and therefore retain the source location of the hoisted load; I understand that this viewpoint is not universal. So, I’m not a huge fan of leaving debug-info instructions in the original code-position. I would rather see a stop on the load before entering the loop, and no stops on it within the loop.