[RFC] Improving is_stmt placement for better interactive debugging

OCHyams · October 21, 2024, 8:26pm

Stepping through optimized programs is often a trial, and one significant factor is jumping back and forth between source locations even in linear code. We (Sony) have prototyped a solution that reduces stepping jumpiness by making LLVM’s is_stmt placement smarter.

I’ll discuss this project in my upcoming LLVM talk “Improving optimized code line table quality”.

Quick summary

A lot of the noise in stepping comes from code motion and instruction scheduling. This is reasonable; a long expression on a single line may involve multiple operations that are moved, re-ordered, and interleaved with other instructions by optimisations. But we can eliminate a lot of that noise.

LLVM chooses whether to apply is_stmt (DWARF flag to indicate recommended breakpoint locations in the line table) using simple heuristics that are essentially just “has the line number changed since the last instruction?”.

Taking ideas from two papers [1][2] that explore the issue, especially @cmtice’s:

From the perspective of a source-level debugger user:

Source code is made up of interesting constructs; the level of granularity for “interesting” while stepping is typically assignments, calls, control flow. We’ll call these interesting constructs Atoms.
Atoms usually have one instruction that implements the functionality that a user can observe; once they step “off” that instruction, the atom is finalised. We’ll call that a Key Instruction.
Communicating where the key instructions are to the debugger (using DWARF’s is_stmt) avoids jumpiness introduced by scheduling non-key instructions without losing source attribution (non-key instructions retain an associated source location, they’re just ignored for stepping).

Our prototype looks like this:

2 new fields added to DILocation metadata.
The new fields are assigned values in a pre-optimisation pass that chooses “interesting” instructions (linear scan over instructions).
There’s some middle end upkeep.
- There’s a map and remap function pair that need to be called for instructions cloned when control flow is duplicated, e.g. during loop unroll. CloneBasicBlock maps the metadata and RemapInstruction(s) remaps it, so a bunch of transformations don’t actually need to do anything new in practice.
- Other than that, the new metadata is propagated organically being a part of DILocation.
During DWARF emission, the new metadata is collected (linear scan over instructions) to decide is_stmt placements.

Compile-time-tracker reports this instructions:u overhead on CTMark compile times:

stage1-ReleaseLTO-g

Benchmark	Old	New
kimwitu++	59076M	59540M (+0.79%)
sqlite3	61163M	61616M (+0.74%)
consumer-typeset	54620M	55142M (+0.96%)
Bullet	111784M	112269M (+0.43%)
tramp3d-v4	176555M	177689M (+0.64%)
mafft	38854M	39137M (+0.73%)
ClamAV	85975M	86536M (+0.65%)
lencod	118636M	119331M (+0.59%)
SPASS	77932M	78433M (+0.64%)
7zip	272642M	273848M (+0.44%)
geomean	89451M	90042M (+0.66%)

stage1-O0-g

Benchmark	Old	New
kimwitu++	24932M	25167M (+0.94%)
sqlite3	4823M	4938M (+2.39%)
consumer-typeset	12635M	13114M (+3.79%)
Bullet	65029M	65445M (+0.64%)
tramp3d-v4	21231M	21477M (+1.16%)
mafft	6772M	6921M (+2.20%)
ClamAV	13722M	14000M (+2.02%)
lencod	12818M	13025M (+1.61%)
SPASS	14455M	14647M (+1.33%)
7zip	142937M	143449M (+0.36%)
geomean	18676M	18982M (+1.64%)

The cost of enabling the feature for builds with optimisations is quite reasonable given the improvements to the debugging experience, and there may be scope to reduce it further.

There’s a higher percentage hit for O0 builds, but I still think it’s worth enabling at O0. The feature modifies O0 debugging behaviour; if we enabled it with optimisations but not at O0 we would be providing a different baseline (O0) debugging behaviour. For example, a multi-line assignment might involve several back-and-forth jumps even at O0 without Key Instructions and is less likely to with Key Instructions.

(If you don’t care about implementation details or open questions, skip to the conclusion now)

Details

(Read the quick summary first)

Atoms having “one key instruction” is a simplification; this often isn’t true. Consider the following murky cases:

loop unrolling in which an assignment’s Key Instruction store gets duplicated.
Conditional branches in IR getting lowered to cond-br + br pairs.
A front end may emit multiple stores for one variable assignment. Each is a candidate Key Instruction - they may get re-ordered by optimisations, so the front end can’t tell which it’ll be ahead of time. (In this case there is ony Key Instruction, but it’s not known until after optimisation).

Fear not, these cases can still all be handled (and are handled by our prototype).

Prototype implementation

The metadata - The two new DILocation fields are atomGroup and atomRank. Both are unsigned integers. Instructions in the same function with the same (atomGroup, InlinedAt) pair are part of the same source atom. atomRank determines is_stmt preference within that group, where a lower number is higher precedence. The default atomRank of 0 indicates the instruction isn’t interesting.

Pre-optimisation - A key-instructions pass sets the new DILocation fields on instructions that it deems important for stepping. The following instructions get atomGroup numbers: ReturnInst, SwitchInst, conditional BranchInst, MemIntrinsic, StoreInst. Calls are ignored here - they’re unconditionally marked is_stmt later.

For MemIntrinsic and StoreInst we additionally add the instruction computing the stored value, if there is one, to the same atom (we give it the same atomGroup number) with a higher atomRank (lower precedence). We do the same for the condition of the conditional BranchInst. This is so that we have a good “fallback” location if / when those primary candidates get optimized away.

During optimisation - Throughout optimisation, the DILocation is propagated normally. Cloned instructions get the original’s DILocation, the new fields get merged in getMergedLocation, etc. However, pass writers need to intercede in cases where a code path is duplicated, e.g. unrolling, jump-threading. In these cases we want to emit key instructions in both the original and duplicated code, so the duplicated must be assigned new atomGroup numbers, in a similar way that instruction operands must get remapped. There’s facilities to help this: mapAtomInstance(const DebugLoc &DL, ValueToValueMapTy &VMap) adds an entry to VMap which can later be used for remapping using llvm::RemapSourceAtom(Instruction *I, ValueToValueMapTy &VM). mapAtomInstance is called from llvm::CloneBasicBlock and llvm::RemapSourceAtom is called from llvm::RemapInstruction so in many cases no additional effort is actually needed.

mapAtomInstance ensures LLVMContextImpl::NextAtomGroup is kept up to date, which is the global “next available atom number”.

The DILocations carry over from IR to MIR without any changes.

DWARF emission - Iterate over all instructions in a function. For each (atomGroup, InlinedAt) pair we find the set of instructions sharing the lowest rank number. Typically only the last of these instructions in each basic block is included in the set (not true for branches - all branches of the same rank are included). The instructions in this set get is_stmt applied to their source locations.

Evaluation

I’ve modified dexter to plot debugging traces (those changes are out of scope of this RFC). Dexter drives a debugger, single stepping through requested function while recording the source location of each step. The line number of a step relative to the function start is plotted on a graph over “debugging steps” (i.e., discrete time). We do this for several different build configurations of the debug target, which enables us to compare the stepping behaviour at a glance.

I’ve drawn the traces by hand, for O2 and O2 with key instructions, on these two images:

O2	O2 with key instructions
818×669 291 KB	822×672 194 KB

Plotted on a graph, including an O0 trace (green) we get:

You can see there’ a lot of bouncing around at O2 (orange), in this case due to scheduling. The Key Instructions (purple) debugging behaviour matches the O0 (green) behaviour for this simple example.

I haven’t found a good quantitative measurement to evaluate this project, so my approach has been to generate and look at a number of these graphs and to manually step through “real” code, comparing debugging optimised code with/without the feature and at O0.

I’d like to share more of these graphs - I’ll aim to do so after the conference. Here’s a preview for now, on a trace from Lua, in which you can see another win for Key Instructions.

And another from Opensteer (inlined at O2 - and consequently at O2 with Key Instructions too, since the feature doesn’t affect optimisations):

These graphs demonstrate that Key Instructions reduces the complexity of stepping in “real” code. By removing unecessary noisy steps it makes the optimized stepping experience feel closer to an unoptimized experience.

Open question(s)

Q1 Should we use a pre-optimisation pass or have front ends apply the new metadata?

We currently use a heuristic-based approach to apply the new metadata in a pre-optimisation pass. An alternative would be for front ends to apply do it when creating the instructions (i.e., with new IRBuilder functionality).

Pros of current approach:

Language / front end independent
Simple & understandable; a small loop in one place

Cons of current approach:

Some instructions’ DILocations are regenerated (setting the new fields), which is wasteful / slower than if the front end created them correctly (a significant portion of the O0 compile time increase)
Front ends have limited or no control over what is considered “interesting”

I am not a front end engineer; without diving into it, I’m not sure how difficult it would be to implement that way.

In [DWARF] Specifying that a DILocation should be is_stmt=false @khuey points out that some front ends may (a) perform their own mandatory optimisations even at O0, which is relevant to the question above, and (b) may produce instructions which are undesirable for stepping (in the example in that thread, some constant variable initialisation isn’t desirable for users to step on at O0).

For (b) - this is easy to solve if front ends are responsible for applying the new metadata (simply don’t apply it to “uninteresting” instructions).

Otherwise, we’ll want a mechanism for the front end to convey that an instruction isn’t interesting. I think the approach proposed in that thread could work orthogonally alongside Key Instructions just fine. The idea is essentially to add a “never-is-stmt” bit to DILocations so the front end can exclude some instructions from stepping. At DWARF emission stage, the “never-is-stmt” bit would have precedence over Key Instructions. It’s just a matter of working out the best way to pack the new DILocation fields (and possibly think ahead for future fields).

There’s also the hybrid support option: we could provide the pre-optimisation pass and the tools for front ends to do it themselves. We could even start with one and migrate to the other.

That’s the only major question for me right now, but I’ll add any others brought up in the thread here.

Conclusion

Our prototyped changes to the compiler improve interactive debugging of optimized-code by reducing the impact of code motion and scheduling without affecting codegen or instruction source attribution.

We’d like to implement this in LLVM so we’re seeking general approval and would welcome any concerns and general thoughts.

References

Key Instructions: Solving the Code Location Problem for Optimized Code (C. Tice, . S. L. Graham, 2000)
Debugging Optimized Code: Concepts and Implementation on DIGITAL Alpha Systems (R. F. Brender et al)

adrian.prantl · October 25, 2024, 4:18pm

Thanks for sharing this, I am very excited about the potential here!

Side note: This might be an opportunity to optimize the in-memory encoding since I’d expect all source locations within an atom to share most of their source location details, so we could encode all DILocations belonging to the same atom group as tiny line/col diffs to a shared full DILocation.

If I understand correctly, your prototype starts out with traditional IR and then a key-instructions pass applies a heuristic to assign atoms. That sounds like a good fit for clang. Do you have any insight how this would work for frontends that already do their own optimizations before lowering to LLVM IR, such as flang (with its mlir passes) or swift (with its SIL optimizer)? Would you suggest they do the same thing at a higher level and produce already-annotated IR?

OCHyams · October 28, 2024, 9:51am

True, most locations in a group tend to share details, and I agree there’s definitely room to explore the in-memory representation. Another approach I wanted to try out is subclassing DILocation so that only the instances that need them carry the extra fields.

I am glad you asked as I think this getting this bit right is important. I cover this a bit in the “Open Question(s)” section, including some basic pros and cons of the current approach.

such as flang (with its mlir passes) or swift (with its SIL optimizer)

And from the thread linked in the RFC section I mentioned, it looks like Rust also does pre-LLVM optimisations too (even at O0).

Would you suggest they do the same thing at a higher level and produce already-annotated IR?

I think so. I think even for Clang it might make sense to have the front end annotate the IR while codegenning (rather than after the fact). That avoids the cost of scanning over all instructions and updating (replacing) DILocations.

I’ll have a look at what it’ll take to implement that in Clang (I don’t have a good intuition as to the challenges just yet).

adrian.prantl · October 28, 2024, 3:40pm

I think that would be an important data point — thanks!

avogelsgesang · October 29, 2024, 12:09am

Thanks for sharing this RFC!

Interestingly, I also ran into issues with is_stmt just yesterday, however in my case the issue was that clang is using the is_stmt not frequently enough, while your issue is that clang is overusing it. I am now wondering what the intended semantics of is_stmt in DwarfV5 is.

My use case: “column breakpoints”, i.e. I want to enable the user to set breakpoints on individual columns within a larger expression. E.g., in the statement below, I highlighted all the places where I would like to offer breakpoints to the user:

int x = foo() + bar() + baz();
//      ^       ^       ^
// intended breakpoints: before each function call
auto my_lambda = [](int a) { return a * 2; }
//   ^                       ^
// intended breakpoints: creating the lamdba instance itself, and inside the lambda

The Dwarf standard defines is_stmt as (emphasis mine):

A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.

As such, I was hoping to use the is_stmt to find all relevant positions for column level breakpoints. However, I then realized that clang does not actually mark the individual function calls as “distinct subparts” using the is_stmt flag.

I won’t have time to teach clang to mark those function calls as is_stmt.
But I wonder how this would relate to your RFC. Afaict,

your RFC would mean that multiple statements on a single line would be marked as separate is_stmt instructions. E.g., my lambda example above would be correctly annotated with two is_stmt instructions: one outside the lambda, one inside the lambda body.
your RFC would make it easier for me to teach clang to mark the individual function calls in the first example with is_stmt. I could simply assign a separate atomGroup to the individual function calls
at the same time, if I would do so, I would counteract your proposal. By introducing more is_stmt instructions, I would reintroduce the very problem around “too many is_stmt instructions” which you are trying to solve

I wonder if I am misinterpreting the DWARF standard here? Is there some smart way to reach both our goals at the same time? Or are our two goals just fundamentally incomatible?

(PS: you can find my current implementation without using is_stmt in
[lldb-dap] Support column breakpoints by vogelsgesang · Pull Request #113787 · llvm/llvm-project · GitHub)

OCHyams · October 29, 2024, 9:12am

Thanks for commenting, this is an interesting point!

In both your first example and in the blog post linked in your PR, the expressions only contain function calls. Do you want to enable column-level breakpoints at calls only, or also non-call operations in expressions?

I ask because with our approach, calls specifically always get is_stmt applied (we think calls are “interesting source atoms”, so we don’t want to skip over them while stepping). So there may be no conflict of approaches here?

Yeah, that’s right (I haven’t tested it, but that would be the intended behaviour).

(See above about calls)

Are you interested in unoptimized- or optimized-code debugging in this case (or both)?

@labath raised a similar point at the US LLVM conference for O0 builds. That it’s desirable to be able to stop before any of the expression has been executed, so variables can be adjusted. e.g. int c = a + b; a user might want to modify b before the + is exectued.

I think that being able to specifically stop at the + would be difficult (by the time you’ve stopped at the + some variables might have been loaded into registers which may not be reflected in the O0 variable location), but stopping before any of the expression has evaluated is the current behaviour.

So I’m reconsidering my initial position that we want to enable this at O0 by default (I still think it’s a big win for optimized builds).

avogelsgesang · October 29, 2024, 12:49pm

After lambda bodies, function calls are probably the most important use case. Besides explicit function calls, also “implicit” calls like overloaded + operators would be useful to have column breakpoints on. Would you also consider those as “interesting source atoms”?

Ah, great. I missed that detail in your initial post. Then there is probably indeed no goal conflict here, and both approaches would combine nicely

Preferably both - I will take what I can get

Interestingly enough, with the current implementation in [lldb-dap] Support column breakpoints by vogelsgesang · Pull Request #113787 · llvm/llvm-project · GitHub, this works. I am simply offering all columns where there is any debugging info available as a potential breakpoint, independent of whether it is marked as is_stmt or not.

If the is_stmt implementation gets improved to actually indicate the interesting source atoms, I would probably introduce a setting which allows the user to choose between “display interesting column breakpoints only” and “offer all potential column breakpoints”.

Why? I couldn’t find a comment above which would speak against enabling this also in O0 builds. Or did I miss something?

I think those improved is_stmt annotations would also make sense in O0

StephenTozer · October 29, 2024, 4:38pm

The issue in this case is that with this feature, the instruction that gets used for breakpoints is the instruction that completes the semantic effect of the source statement, rather than the first instruction associated with that statement. This means that for a simple source statement like int c = a + b;, currently at O0 the breakpoint is set at the first instruction, which would probably be loading the value of a or b; this means that at that breakpoint you could modify the value of a or b in the debugger and the modified value would be used to compute c. With this feature, the breakpoint would instead be set on the store to c, which would mean that modifying a or b in the debugger would have no effect on the value of c.

I think this is an unfortunate result, but I personally would prefer to have it than to use different is_stmt logic in unoptimized builds compared to optimized builds. The effect is somewhat visible to the user if column info is enabled, since the breakpoint will be on the = rather than within the RHS expression. It’s also possible that we could find a reasonable workaround for this, either with a compiler flag to change behaviour or a debugger feature (e.g. allowing users to set line breakpoints that ignore is_stmt); my perspective is mostly centered on optimized debugging however, so it would be good to hear more opinions from anyone for whom this is a more direct concern!

pogo59 · October 30, 2024, 9:39pm

What we have here (whether to use KI at O0) is a UX question rather than a correctness question. KI is intended to be a solution to a problem that manifests with optimization, and make the debugging experience smoother.

Using KI at O0 is arguably a solution in search of a problem. If the customary experience of O0 debugging changes, and not in a good way (as in the c = a + b example), I’m hard pressed to call that a positive outcome.

avogelsgesang · October 30, 2024, 10:28pm

Using KI at O0 is arguably a solution in search of a problem.

My understanding is that the enabling this feature also at O0 would improve the UX for column-level breakpoints. That’s at least how I understood

The problem of whether to mark the first or the last instruction of an “interesting source atom” via is_stmt could be treated independently. E.g., under -O0 we could mark the first instruction of an interesting atom as is_stmt and under -O3 the last instruction.

Or am I missing something?

OCHyams · October 31, 2024, 9:48am

The O0 behaviour question is an interesting one and I welcome all this discussion. I’ve got a meeting with our debugger team next week to discuss these sorts of things, so I’ll hold off on vending a strong opinion until then. Still happy to explore ideas though.

My understanding is that the enabling this feature also at O0 would improve the UX for column-level breakpoints. That’s at least how I understood

From your examples, I believe so too. Specifically, single lines with multiple statements and lines with multiple calls.

E.g., under -O0 we could mark the first instruction of an interesting atom as is_stmt and under -O3 the last instruction.

Our debugger team had an interesting thought here - is this possibly less a question of O0 (/optnone) and more of a dynamic property of optimisation.

If in some cases we want to mark the first instruction and sometimes the last, is there a heuristic we can use to determine that, without inspecting the optimisation flags? For example: if the atom’s instructions are contiguous then put is_stmt at the start, else at the end. I think sounds plausible, but haven’t put a great deal of thought into that yet.

labath · October 31, 2024, 5:15pm

I was actually thinking about something similar (I also work on a debugger team :P).

It sounds to me like we don’t necessarily care about stopping directly at the key instruction. Stopping a few instructions before it (assuming they’re all on the same line and everything) should be fine as well.

So, maybe, if we take each contiguous sequence of instructions belonging to one atom (is that the right word?) and just always place the is_stmt flag on the first instruction in that sequence, then we might get the -O0 behavior as a side-effect? (*)

At the same time, I agree with what Paul said about searching for a problem, and I think we should enable this at -O0 only if we’re sure it won’t impact the experience negatively. Nonetheless I think there is some room for improvement at -O0, and this sounds like it might deliver it:

Currently, if we have a function call formatted like so:

foo(a,
    b,
    c,
    d);

then we will stop at every line of that function call (presumably, because it contains a load instruction), even though those “statements” aren’t really interesting, If something like this could make us stop at this function call only once (which I think is what will happen at -O3 with this RFC) then I think that would be a positive outcome.

(*) Although, if that is all we do, then according to Shannon’s information law :P, this modification does not carry any information, and we might be able to let the debugger do it.

pogo59 · November 1, 2024, 1:32pm

It probably would. But not everyone will want column-level breakpoints? This is part of what I’m getting at with it being a UX question rather than correctness question.

Moving is_stmt up to the first instruction in the atom containing the KI, rather than on the KI itself, might help address this point. The scope of the atom could vary depending on whether we had column-level breakpoints enabled, so this movement would or would not cross a column change depending.

OCHyams · November 4, 2024, 11:27am

Rather than just thinking about it as a way to get O0 behaviour as a side effect, I think we should also consider other optimisation levels such as O1 and (ideal) Og. Using the heuristic would let us choose the best option at a source-atom-granularity, rather than optimisation-level-granularity, which could further positively impact all optimisation levels.

labath:

and this sounds like it might deliver it:

Currently, if we have a function call formatted like so:
foo(a,
    b,
    c,
    d);
then we will stop at every line of that function call (presumably, because it contains a load instruction), even though those “statements” aren’t really interesting, If something like this could make us stop at this function call only once (which I think is what will happen at -O3 with this RFC) then I think that would be a positive outcome.

That’s right, with Key Instructions we’d just stop at this source atom once (ideally at the first load at O0 as discussed).

And this was already mentioned, but just to reiterate, if we instead had:

foo(a,
    bar(),
    c,
    d);

we would consider bar() a separate source atom so it can still be stepped on.

OCHyams · March 12, 2025, 11:11am

Hi, we had to put this project down for a bit, but we’ve recently picked it back up so here’s an update.

adrian.prantl:

OCHyams:
I’ll have a look at what it’ll take to implement that in Clang (I don’t have a good intuition as to the challenges just yet).
I think that would be an important data point — thanks!

I’ve started implementing a solution in Clang - here’s a WIP draft PR #130943. I’m not particularly familiar with the front end so I encourage and welcome feedback on the approach. See the PR for more info.

Although having Clang annotate instructions introduces additional complexity over the prototype’s “interpret the IR” approach we do feel it’s the right direction. It gives the front end more control to dictate stepping behaviours for different language constructs, should reduce Key Instructions compile-time impact, and hopefully lays groundwork for adoption in other front ends too as you pointed out @adrian.prantl.

The answer is “sort of”. I’ve implemented a heuristic that basically does this, floating is_stmt to the top of contiguous sequences with the same line number, which has two benefits when optimisations are enabled. First, this floats is_stmt to the top of epilogue instructions (rather than applying it to the ret instruction itself) which is important to avoid losing variable location coverage at return statements. Second, it reduces the difference in optimized code stepping behaviour between when Key Instructions is enabled and disabled in “uninteresting” cases. I.e., it appears to generally reduce unnecessary changes in stepping.

We’ve used contiguous line numbers rather than atom membership as the test there because of our choice to represent source atoms with a single integer ID. We can’t have instructions belonging to multiple atom groups or represent any kind of grouping hierarchy. That means we can’t rely on all the call setup instructions being in the same group currently (e.g., if one of the argument expressions contains key functionality such as a store, it will be in its own group).

I had hoped that implementing that heuristic would “fix” O0 debugging with Key Instructions, but I’ve come up with a simple counterexample:

f(a);

g(
  b
  );

At the call to f the is_stmt floats to the top of the call setup instructions because they’re on the same line. At the call to g the is_stmt doesn’t float up past b’s load because it has a different line number to the call. So, stopped at f you can edit a and the argument value will change but if you stop at g and edit b it won’t affect the argument value (because the load has already happened).

This is more of an implementation limitation rather than a fundamental one, based on the choice to use flat integer IDs to represent source atoms outlined above. However, changing the representation would add additional unknowns (especially around performance); for now I think we should not apply Key Instructions at O0. As it is not a fundamental limitation, there could be room to revisit this in the future.

@StephenTozer is investigating ways to improve the memory charateristics of DILocations in general, so that adding the extra fields needed for Key Instructions doesn’t cause a regression from today’s performance numbers.

Assuming that this front end approach is broadly acceptable, I think we have all the pieces working to be able to start submitting PRs. It seems like there has been generally positive feedback for the project - if there are no fundamental objections it would be great to get concrete feedback on implementation via code review.

I would propose that we start implementing this behind an LLVM compile time flag, because of the initial memory cost, which we will be addressing concurrently. When the DILocations memory work is complete we can remove the Key Instructions compile time guards (but keep the feature as defaulted to off until it’s ready).

OCHyams · March 28, 2025, 6:12pm

sitrep: I’ve put the LLVM-side patch stack up starting here [KeyInstr] Add fields to DILocation behind compile time flag by OCHyams · Pull Request #133477 · llvm/llvm-project · GitHub. It’s not a very big change overall (+500 lines excluding tests) as LLVM mostly propagates the new metadata around “for free” (as it is part of DILocation).

I’ll get my Clang-side patches ready to share shortly.

As mentioned above, we intend the feature to be initially guarded behind an CMake compile time flag LLVM_EXPERIMENTAL_KEY_INSTRUCTIONS, which is enabled on a bot (currently still in staging) Buildbot. This will enable us to continue to upload patches and further polish the feature while the general DILocation memory improvements are worked on independently by @StephenTozer.

I feel we are in the classic RFC limbo - I don’t think it has been explicitly “accepted”, but I believe we’ve discussed and addressed concerns raised so far and there’s been generally positive responses.

Does anyone have any unaddressed concerns or further feedback?

And if not, what’s the best way to proceed with the RFC?

Topic		Replies	Views
[DebugInfo] An idea for determining source-location orders after optimisation (aka: plumbing for is_stmt) IR & Optimizations debuginfo	4	714	August 8, 2022
Prototyping a not-an-instruction dbg.value IR & Optimizations debuginfo	3	726	November 17, 2022
[DWARF] Specifying that a DILocation should be is_stmt=false IR & Optimizations debuginfo	5	123	October 21, 2024
Optimised-code debugging experience Round Table LLVM Dev List Archives	13	124	September 30, 2020
Debugging information problem: code being reordered / debug point jumping around LLVM Dev List Archives	3	85	August 1, 2011