Hi, we had to put this project down for a bit, but we’ve recently picked it back up so here’s an update.
I’ve started implementing a solution in Clang - here’s a WIP draft PR #130943. I’m not particularly familiar with the front end so I encourage and welcome feedback on the approach. See the PR for more info.
Although having Clang annotate instructions introduces additional complexity over the prototype’s “interpret the IR” approach we do feel it’s the right direction. It gives the front end more control to dictate stepping behaviours for different language constructs, should reduce Key Instructions compile-time impact, and hopefully lays groundwork for adoption in other front ends too as you pointed out @adrian.prantl.

So, maybe, if we take each contiguous sequence of instructions belonging to one atom (is that the right word?) and just always place the is_stmt flag on the first instruction in that sequence, then we might get the -O0 behavior as a side-effect? (*)
The answer is “sort of”. I’ve implemented a heuristic that basically does this, floating is_stmt
to the top of contiguous sequences with the same line number, which has two benefits when optimisations are enabled. First, this floats is_stmt
to the top of epilogue instructions (rather than applying it to the ret instruction itself) which is important to avoid losing variable location coverage at return statements. Second, it reduces the difference in optimized code stepping behaviour between when Key Instructions is enabled and disabled in “uninteresting” cases. I.e., it appears to generally reduce unnecessary changes in stepping.
We’ve used contiguous line numbers rather than atom membership as the test there because of our choice to represent source atoms with a single integer ID. We can’t have instructions belonging to multiple atom groups or represent any kind of grouping hierarchy. That means we can’t rely on all the call setup instructions being in the same group currently (e.g., if one of the argument expressions contains key functionality such as a store, it will be in its own group).
I had hoped that implementing that heuristic would “fix” O0 debugging with Key Instructions, but I’ve come up with a simple counterexample:
f(a);
g(
b
);
At the call to f
the is_stmt
floats to the top of the call setup instructions because they’re on the same line. At the call to g
the is_stmt
doesn’t float up past b
’s load because it has a different line number to the call. So, stopped at f
you can edit a
and the argument value will change but if you stop at g
and edit b
it won’t affect the argument value (because the load has already happened).
This is more of an implementation limitation rather than a fundamental one, based on the choice to use flat integer IDs to represent source atoms outlined above. However, changing the representation would add additional unknowns (especially around performance); for now I think we should not apply Key Instructions at O0. As it is not a fundamental limitation, there could be room to revisit this in the future.

Side note: This might be an opportunity to optimize the in-memory encoding since I’d expect all source locations within an atom to share most of their source location details, so we could encode all DILocations belonging to the same atom group as tiny line/col diffs to a shared full DILocation.
@StephenTozer is investigating ways to improve the memory charateristics of DILocations in general, so that adding the extra fields needed for Key Instructions doesn’t cause a regression from today’s performance numbers.
Assuming that this front end approach is broadly acceptable, I think we have all the pieces working to be able to start submitting PRs. It seems like there has been generally positive feedback for the project - if there are no fundamental objections it would be great to get concrete feedback on implementation via code review.
I would propose that we start implementing this behind an LLVM compile time flag, because of the initial memory cost, which we will be addressing concurrently. When the DILocations memory work is complete we can remove the Key Instructions compile time guards (but keep the feature as defaulted to off until it’s ready).