Setting breakpoints before assignments or calls

Hello,

I’d like to improve the situation with setting breakpoints on lines with assignments or inlinable calls. This email outlines problem areas, possible solutions, and why I think emitting extra nops at -O0 might be the best solution.

Problem 1: Assignments

Counter to user expectation, a breakpoint on a line containing an assignment is reached when the assignment happens, not before the r.h.s is evaluated.

Example: Can’t step into bar()

1| foo = // Set a breakpoint here. Note that it’s not possible to step into bar().
2| bar();

One solution is to set the location of the assignment to the location of the r.h.s (line 2). The problem with this approach is that it becomes impossible to set a breakpoint on line 1.

Another solution is to emit a nop (on line 1) prior to emitting the r.h.s, and to emit an artificial location on the assignment’s store instruction. This makes it possible to step to line 1 before line 2, and prevents users from stepping back to line 1 after line 2.

Problem 2: Inlinable calls

Instructions from an inlined function don’t have debug locations within the caller. This can make it impossible to set a breakpoint on a line that contains a call to an inlined function.

Example: Can’t set a breakpoint on a call

It’s easier to see the bug via Godbolt: https://godbolt.org/z/scwF20. Note that it’s not possible to set a breakpoint on line 9 (on “inline_me”). At the IR-level, we do emit an unconditional branch with a location that’s on line 9, but we have to drop that branch during ISel.

The only solution to this problem (afaik) is to insert a nop before inlinable calls. In this example the nop would be on line 9.

One alternative I’ve heard is to make the first inlined instruction look like it’s located within the caller, but that actually introduces a bug. You wouldn’t be able to set a breakpoint on the relevant location in the inlined callee.

Proposal

As outlined above, I think the best way to address issues with setting breakpoints on assignments and calls is to insert nops with suitable locations at -O0. These nops would lower to a target-specific nop at -O0, and lower to nothing at -O1 or greater (like @llvm.donothing).

The tentative plan is to introduce an intrinsic (say, @llvm.dbg.nop) to accomplish this.

I don’t anticipate there being a substantial compile-time impact, but haven’t actually measured this yet. I’d like to get some feedback before going forward. Let me know what you think!

thanks,
vedant

+all the usual debugger folks

I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we’ve tossed it around a bit - is emitting more accurate “is_stmt” flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn’t “is_stmt” could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)

The inlining case seems to me like something that could be fixed without changes to the DWARF, purely in the consumer - the consumer has the info that the call occurs on a certain line and when I ask to break on that line it could break on that first instruction in the inlined subroutine (potentially giving me an artificial view that makes it look like I’m at the call site and leting me ‘step’ (though a no-op) into the inlined function).

  • This is a bit problematic when a statement gets interleaved/mixed up with other statements - DWARF has no equivalent of “ranges” for describing a statement. Likely not a problem at -O0 anyway, though (because little interleaving occurs there).

+all the usual debugger folks

I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we’ve tossed it around a bit - is emitting more accurate “is_stmt” flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn’t “is_stmt” could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)

I hadn’t considered using “is_stmt” at all, thanks for bringing that up!

Given the rule you’ve outlined, I’m not sure that the debugger could do a good job in the following scenarios:

1| foo =
2| bar();

3| foo = //< Given a breakpoint on line 3, would the debugger stop on line 2?
4| bar();

or

1| if (…)
2| foo();
3| else
4| bar();
5| baz = //< Given a breakpoint on line 5, would the debugger stop on line 2, 4, or possibly either?
6| func();

Is there another way to apply and interpret “is_stmt” flags to resolve these sorts of ambiguities?

The inlining case seems to me like something that could be fixed without changes to the DWARF, purely in the consumer - the consumer has the info that the call occurs on a certain line and when I ask to break on that line it could break on that first instruction in the inlined subroutine (potentially giving me an artificial view that makes it look like I’m at the call site and leting me ‘step’ (though a no-op) into the inlined function).

Oh, that’s a great point. Yes, there is a TAG_inlined_subroutine for “inline_me” which contains AT_call_file and AT_call_line. That’s enough information to do the right thing on the debugger side.

thanks,
vedant

+all the usual debugger folks

I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we’ve tossed it around a bit - is emitting more accurate “is_stmt” flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn’t “is_stmt” could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)

I hadn’t considered using “is_stmt” at all, thanks for bringing that up!

Given the rule you’ve outlined, I’m not sure that the debugger could do a good job in the following scenarios:

Ah, one thing that might’ve been confusing is when I say “before” I mean “before in the instruction stream/line table” not “before in the source order”.

1| foo =
2| bar();

3| foo = //< Given a breakpoint on line 3, would the debugger stop on line 2?
4| bar();

Given the line table (assuming a sort of simple pseudocode):

%x1 = call bar(); # line 2
store %x1 → @foo # line 1
%x2 = call bar(); # line 4
store %x2 → @foo # line 3

We could group lines 1 and 2 as part of the same statement and lines 3 and 4 as part of the same statement - then when the line table is emitted, the lexically (in the assembly, not in the source code) first instruction that’s part of that statement gets the “is_stmt” flag. So in this case line 2 and 4 would have “is_stmt”, a debugger, when requested to set a breakpoint on line 3 would look at the line table and say "well, I have this location on line 3, but it’s not the start of a statement/not a good place to break, so I’ll back up in the instruction stream (within a basic block - so that might get tricky for conditional operators and the like) until I find an instruction marked with “is_stmt” - and it would find the second call instruction and break there. The debugger could make the UI nicer (rather than showing line 4 when the user requested line 3) by highlighting the whole statement (all lines attributed to instructions between this “is_stmt” instruction and the next (Or the end of the basic block)).

Does that make sense?

I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted. Then you will (a) curse the debugger a bit 'cause it didn't stop where you told it to, and then (b) and step over a few times, (or uses "sif" which nobody knows to use), but with a lot of arguments the former can get annoying. In this case is seems like we are removing potentially helpful information from the user when setting breakpoints. And adding a --ignore-is-stmt option doesn't seem like the sort of thing anybody would know to use. But maybe the compiler could be smarter about when it applies this is_stmt, so it would know to put on on lines 2,3, & 4 before the function calls, but not on line 2 in Vedant's example? Not sure how hard that would be.

We would also have to have a way to know when to trust the is_stmt for these purposes. DWARF should really have a way to say features are taken seriously by the producer, so the debugger will know whether to trust them or not. But that a slightly orthogonal issue...

Jim

I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions.

That should be "inter-statement line table entries"...

I’m not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into “baz()”, it’s convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that’s somewhere at the beginning of the function call, then that effort will be thwarted.

I’d figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that’s not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.

Then you will (a) curse the debugger a bit 'cause it didn’t stop where you told it to, and then (b) and step over a few times, (or uses “sif” which nobody knows to use), but with a lot of arguments the former can get annoying. In this case is seems like we are removing potentially helpful information from the user when setting breakpoints. And adding a --ignore-is-stmt option doesn’t seem like the sort of thing anybody would know to use. But maybe the compiler could be smarter about when it applies this is_stmt, so it would know to put on on lines 2,3, & 4 before the function calls, but not on line 2 in Vedant’s example? Not sure how hard that would be.

We would also have to have a way to know when to trust the is_stmt for these purposes. DWARF should really have a way to say features are taken seriously by the producer, so the debugger will know whether to trust them or not. But that a slightly orthogonal issue…

Yeah, it’d be nice to have some flag bits to say “hey, this is the /specific/ meaning of this flag/etc we’re guaranteeing to implement in this output”. Though I’m not sure that’d be necessary - I think Clang currently just puts is_stmt on everything, so if you implement the advanced behavior in LLDB and give it Clang’s current output, it’d just degrade to the behavior we already see from LLDB - and when LLDB sees new/fancy Clang DWARF that is more judicious about is_stmt, it’d get better behavior.

I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted.

I'd figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that's not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.

The latter seems workable. Note also, the definition for "is_stmt" says:

A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.

So it seems reasonable if we're going to start doing this to consider nested function calls in a statement semantically distinct subparts of a statement. Actually this would also be useful when you have:

      foo (bar(), baz(),
     after_bar(), after_baz());

You do get separate column entries for the function calls, but we have no idea what that means and if we just set a breakpoint on every individual line entry as currently emitted by clang we end up with annoyingly many breakpoints at present. Marking actually interesting subexpressions would help here too. Anyway, if we started using is_stmt for these subparts then we wouldn't have to fix things up in the debugger.

Jim

I’m not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into “baz()”, it’s convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that’s somewhere at the beginning of the function call, then that effort will be thwarted.

I’d figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that’s not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.

The latter seems workable. Note also, the definition for “is_stmt” says:

A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.

So it seems reasonable if we’re going to start doing this to consider nested function calls in a statement semantically distinct subparts of a statement. Actually this would also be useful when you have:

foo (bar(), baz(),
after_bar(), after_baz());

You do get separate column entries for the function calls, but we have no idea what that means and if we just set a breakpoint on every individual line entry as currently emitted by clang we end up with annoyingly many breakpoints at present. Marking actually interesting subexpressions would help here too. Anyway, if we started using is_stmt for these subparts then we wouldn’t have to fix things up in the debugger.

I’ve some apprehension for having the compiler make particularly subjective judgments about “semantically distinct subpart of a statement” - especially around operator overloads, for instance. Not without merit/certainly something to consider, but my gut reaction is to lean away from that - because the judgment might vary & I could imagine different users wanting different expereinces at different times/situations/etc - so that it’d be useful for consumers to be making some of those judgments, showing them to the user as options, etc.

Given a single line with “foo(bar(), baz())” the user doesn’t have the ability to step into the call lines - I’m not sure that wrapping a line should make a huge difference to debuggability (admittedly the inverse is true - taking two statements and writing them on one line does degrade debugging experience) - seems like it’d provide awkward incentives for users to layout their code to play to these debugging issues.

Column info - especially for a GUI debugger, could be super helpful - you could set a breakpoint on specific calls sites which could be nice.

Massively long-term: it’d be awesome to be able to encode something like Clang’s source ranges into DWARF. Basically attributing source ranges with a “preferred location” (eg: the assignment operator’s range covers the whole “x = y” and the preferred location points to the “=” - as with Clang diagnostics) so that users can see the hierarchy of evaluation, etc. I think I remember throwing some ideas for this around with Chandler a few years ago when I corrected a bunch of source location stuff (there’s still a bug or two outstanding with that with regards to loops especially - Adrian and I spent some time discussing that - oh, right, some weird things about how/where GCC breaks and doesn’t… ). Dunno what that looks like - maybe something sort of like/related to/using/extending Cary’s two level line tables to include effectively scopes for expressions and subexpressions. Then a user could choose to step into an expression evaluation or skip over it & the debugger could highlight the source ranges rather than lines which would be more meaningful to the user about where in the expression evaluation the program is at.

>
>
>
> I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:
>
> 1: x = foo(5,
> 2: bar(),
> 3: baz()
> 4: after_baz());
>
> if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted.
>
> I'd figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that's not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.

The latter seems workable. Note also, the definition for "is_stmt" says:

A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.

So it seems reasonable if we're going to start doing this to consider nested function calls in a statement semantically distinct subparts of a statement. Actually this would also be useful when you have:

      foo (bar(), baz(),
           after_bar(), after_baz());

You do get separate column entries for the function calls, but we have no idea what that means and if we just set a breakpoint on every individual line entry as currently emitted by clang we end up with annoyingly many breakpoints at present. Marking actually interesting subexpressions would help here too. Anyway, if we started using is_stmt for these subparts then we wouldn't have to fix things up in the debugger.

I've some apprehension for having the compiler make particularly subjective judgments about "semantically distinct subpart of a statement" - especially around operator overloads, for instance. Not without merit/certainly something to consider, but my gut reaction is to lean away from that - because the judgment might vary & I could imagine different users wanting different expereinces at different times/situations/etc - so that it'd be useful for consumers to be making some of those judgments, showing them to the user as options, etc.

Not sure about this.

Right now, if there are many line table entries that map to a given line lldb will choose the first one by address within a given block. That's because the line tables are really noisy, and our experience was that if I set a breakpoint location per entry, stepping gets annoyingly jerky and you have to keep hitting step over and over. That's really too draconian and you can't get back to a really independent subsection of a line. Note to self - I should try playing around with one location per distinct column - I haven't revisited the breakpoint by line setting since I was told clang was serious about column info. It would be interesting to see how that works.

But if you emitted the inter-line entries you currently do but added "I think these are the important ones" with is_stmt, that wouldn't remove information, and the debugger could still offer different experiences directed by the user. It would just make the default behavior a little nicer.

Given a single line with "foo(bar(), baz())" the user doesn't have the ability to step into the call lines - I'm not sure that wrapping a line should make a huge difference to debuggability (admittedly the inverse is true - taking two statements and writing them on one line does degrade debugging experience) - seems like it'd provide awkward incentives for users to layout their code to play to these debugging issues.

Column info - especially for a GUI debugger, could be super helpful - you could set a breakpoint on specific calls sites which could be nice.

Massively long-term: it'd be awesome to be able to encode something like Clang's source ranges into DWARF. Basically attributing source ranges with a "preferred location" (eg: the assignment operator's range covers the whole "x = y" and the preferred location points to the "=" - as with Clang diagnostics) so that users can see the hierarchy of evaluation, etc. I think I remember throwing some ideas for this around with Chandler a few years ago when I corrected a bunch of source location stuff (there's still a bug or two outstanding with that with regards to loops especially - Adrian and I spent some time discussing that - oh, right, some weird things about how/where GCC breaks and doesn't... ). Dunno what that looks like - maybe something sort of like/related to/using/extending Cary's two level line tables to include effectively scopes for expressions and subexpressions. Then a user could choose to step into an expression evaluation or skip over it & the debugger could highlight the source ranges rather than lines which would be more meaningful to the user about where in the expression evaluation the program is at.

Caroline Tice (who was one of the original lldb authors)'s PhD thesis had a section on expressing the nesting of operations - IIRC she called them atoms. We talked about this some in the early days but didn't get much traction on the compiler side (at that time we were still using gcc as the front end). So this ended up being only talk. But it would be really handy to know the scope and not just the initial point of the expressions and subexpressions in the debug info.

Jim

+all the usual debugger folks

I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we’ve tossed it around a bit - is emitting more accurate “is_stmt” flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn’t “is_stmt” could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)

I hadn’t considered using “is_stmt” at all, thanks for bringing that up!

Given the rule you’ve outlined, I’m not sure that the debugger could do a good job in the following scenarios:

Ah, one thing that might’ve been confusing is when I say “before” I mean “before in the instruction stream/line table” not “before in the source order”.

1| foo =
2| bar();

3| foo = //< Given a breakpoint on line 3, would the debugger stop on line 2?
4| bar();

Given the line table (assuming a sort of simple pseudocode):

%x1 = call bar(); # line 2
store %x1 → @foo # line 1
%x2 = call bar(); # line 4
store %x2 → @foo # line 3

We could group lines 1 and 2 as part of the same statement and lines 3 and 4 as part of the same statement - then when the line table is emitted, the lexically (in the assembly, not in the source code) first instruction that’s part of that statement gets the “is_stmt” flag. So in this case line 2 and 4 would have “is_stmt”, a debugger, when requested to set a breakpoint on line 3 would look at the line table and say "well, I have this location on line 3, but it’s not the start of a statement/not a good place to break, so I’ll back up in the instruction stream (within a basic block - so that might get tricky for conditional operators and the like) until I find an instruction marked with “is_stmt” - and it would find the second call instruction and break there. The debugger could make the UI nicer (rather than showing line 4 when the user requested line 3) by highlighting the whole statement (all lines attributed to instructions between this “is_stmt” instruction and the next (Or the end of the basic block)).

Does that make sense?

Got it, that makes sense.

vedant

I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted.

In this example, why would the line containing the call to bar() not have the "is_stmt" flag? I assumed that it would because, as a function call, it's "a recommended breakpoint location".

Oh, are you suggesting that locations of call arguments shouldn't be eligible for "is_stmt"? .. This might be a naive question, but is there some relevant standard / source of truth about what constitutes a recommended breakpoint location, or is this just a subjective decision on the part of the compiler?

vedant

I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted.

In this example, why would the line containing the call to bar() not have the "is_stmt" flag? I assumed that it would because, as a function call, it's "a recommended breakpoint location".

I was arguing that it should have an is_stmt, because otherwise the algorithm Dave suggested would move a breakpoint on line 3 to the is_stmt marked one on line 1.

Oh, are you suggesting that locations of call arguments shouldn't be eligible for "is_stmt"? .. This might be a naive question, but is there some relevant standard / source of truth about what constitutes a recommended breakpoint location, or is this just a subjective decision on the part of the compiler?

This is always going to be heuristic, what works well for stepping or for setting breakpoints. For instance, gcc used to only give one line table entry for a complex multi-line expression like the one above. When people were making the transition to clang in the early days, we got some bugs about " why does "step-over" at line 1 above stop at line 2? And admittedly it is odd that if you do:

     x = foo(bar(), baz(),
             after_baz());

step-over on this line doesn't stop at bar or baz, but does stop at after_baz. That's why I was arguing for giving all these function calls "is_stmt" so this would be symmetric.

OTOH, you also get cases in C++ like:

         foo(10,
       20,
       30,
             40);

where when you are stepping along you don't stop at the 20 line, but you do stop at the 30 line because 30 gets converted to something that has a copy constructor, so there is code from the line there. That just makes the debugger look odd when it's stepping. This could be made better by using line 0 for this code, or by using is_stmt if we're going to take that more seriously.

Another odd one is why is there sometimes a line table entry for an end bracket and sometimes not? The user generally can't figure that out, and it is disconcerting not to know where a step is going to go exactly...

Anyway, because the line table information is incomplete, the compiler can't just shove every bit of information it knows in there and let the debugger sort it out. The debugger doesn't know enough to do that. And so, the line table ends up being part art whose goal is getting stepping and breaking to feel natural.

But of course the best thing is if lldb knew that there was a nesting, and the debug information represented the nesting, and knew what code was artificial, etc, from the debug information. Then it would be up to the debugger to make the stepping look right - and we could have a smarter set of gestures to navigate through this sort of code depending on what the user wanted to do.

Jim

I’m not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into “baz()”, it’s convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that’s somewhere at the beginning of the function call, then that effort will be thwarted.

In this example, why would the line containing the call to bar() not have the “is_stmt” flag? I assumed that it would because, as a function call, it’s “a recommended breakpoint location”.

I was arguing that it should have an is_stmt, because otherwise the algorithm Dave suggested would move a breakpoint on line 3 to the is_stmt marked one on line 1.

Got it, thanks.

Oh, are you suggesting that locations of call arguments shouldn’t be eligible for “is_stmt”? … This might be a naive question, but is there some relevant standard / source of truth about what constitutes a recommended breakpoint location, or is this just a subjective decision on the part of the compiler?

This is always going to be heuristic, what works well for stepping or for setting breakpoints. For instance, gcc used to only give one line table entry for a complex multi-line expression like the one above. When people were making the transition to clang in the early days, we got some bugs about " why does “step-over” at line 1 above stop at line 2? And admittedly it is odd that if you do:

x = foo(bar(), baz(),
after_baz());

step-over on this line doesn’t stop at bar or baz, but does stop at after_baz. That’s why I was arguing for giving all these function calls “is_stmt” so this would be symmetric.

OTOH, you also get cases in C++ like:

foo(10,
20,
30,
40);

where when you are stepping along you don’t stop at the 20 line, but you do stop at the 30 line because 30 gets converted to something that has a copy constructor, so there is code from the line there. That just makes the debugger look odd when it’s stepping. This could be made better by using line 0 for this code, or by using is_stmt if we’re going to take that more seriously.

It seems better to not step on “30” if we don’t step on “20”. That should be doable: we can disable is_stmt when emitting implicit constructors.

Another odd one is why is there sometimes a line table entry for an end bracket and sometimes not? The user generally can’t figure that out, and it is disconcerting not to know where a step is going to go exactly…

Yeah, this one is awkward because there might not be an instruction which obviously corresponds to the end brace. I think the “inline_me” example I shared at the beginning of the thread (https://godbolt.org/z/scwF20) would have this issue if you added braces to the “if”. I don’t think you can fix this by applying “is_stmt” anywhere? I’m not sure how to fix this if not by emitting a nop located at the end brace, or by emitting {DW_AT_closing_brace_line, DW_AT_ranges} in each lexical scope.

vedant

I’m not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions. In a case like:

1: x = foo(5,
2: bar(),
3: baz()
4: after_baz());

if I want to step into “baz()”, it’s convenient to break on the line 3 and do a step-in. If we move to the is_stmt line and that’s somewhere at the beginning of the function call, then that effort will be thwarted.

I’d figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that’s not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.

The latter seems workable. Note also, the definition for “is_stmt” says:

A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.

So it seems reasonable if we’re going to start doing this to consider nested function calls in a statement semantically distinct subparts of a statement. Actually this would also be useful when you have:

foo (bar(), baz(),
after_bar(), after_baz());

You do get separate column entries for the function calls, but we have no idea what that means and if we just set a breakpoint on every individual line entry as currently emitted by clang we end up with annoyingly many breakpoints at present. Marking actually interesting subexpressions would help here too. Anyway, if we started using is_stmt for these subparts then we wouldn’t have to fix things up in the debugger.

I’ve some apprehension for having the compiler make particularly subjective judgments about “semantically distinct subpart of a statement” - especially around operator overloads, for instance. Not without merit/certainly something to consider, but my gut reaction is to lean away from that - because the judgment might vary & I could imagine different users wanting different expereinces at different times/situations/etc - so that it’d be useful for consumers to be making some of those judgments, showing them to the user as options, etc.

Not sure about this.

Right now, if there are many line table entries that map to a given line lldb will choose the first one by address within a given block. That’s because the line tables are really noisy, and our experience was that if I set a breakpoint location per entry, stepping gets annoyingly jerky and you have to keep hitting step over and over. That’s really too draconian and you can’t get back to a really independent subsection of a line. Note to self - I should try playing around with one location per distinct column - I haven’t revisited the breakpoint by line setting since I was told clang was serious about column info. It would be interesting to see how that works.

Not sure I quite follow all of that - yeah, having the debugger stop every time it hits every distinct line table entry for a given source line would be annoying. But allowing the user to do that if they want in certain situations could be good… but yeah, mostly comes back to wanting ranges/hierarchy rather than isolated locations.

But if you emitted the inter-line entries you currently do but added “I think these are the important ones” with is_stmt, that wouldn’t remove information, and the debugger could still offer different experiences directed by the user. It would just make the default behavior a little nicer.

nod It wouldn’t be worse than today, but it would be worse for a consumer that did want to do only full source level statements - with is_stmt on more subjective “interesting” things, that’d be baked in by the compiler & less flexibility for users than if it was per statement & the consumer/user could tweak as desired from there.

Given a single line with “foo(bar(), baz())” the user doesn’t have the ability to step into the call lines - I’m not sure that wrapping a line should make a huge difference to debuggability (admittedly the inverse is true - taking two statements and writing them on one line does degrade debugging experience) - seems like it’d provide awkward incentives for users to layout their code to play to these debugging issues.

Column info - especially for a GUI debugger, could be super helpful - you could set a breakpoint on specific calls sites which could be nice.

Massively long-term: it’d be awesome to be able to encode something like Clang’s source ranges into DWARF. Basically attributing source ranges with a “preferred location” (eg: the assignment operator’s range covers the whole “x = y” and the preferred location points to the “=” - as with Clang diagnostics) so that users can see the hierarchy of evaluation, etc. I think I remember throwing some ideas for this around with Chandler a few years ago when I corrected a bunch of source location stuff (there’s still a bug or two outstanding with that with regards to loops especially - Adrian and I spent some time discussing that - oh, right, some weird things about how/where GCC breaks and doesn’t… ). Dunno what that looks like - maybe something sort of like/related to/using/extending Cary’s two level line tables to include effectively scopes for expressions and subexpressions. Then a user could choose to step into an expression evaluation or skip over it & the debugger could highlight the source ranges rather than lines which would be more meaningful to the user about where in the expression evaluation the program is at.

Caroline Tice (who was one of the original lldb authors)'s PhD thesis had a section on expressing the nesting of operations - IIRC she called them atoms. We talked about this some in the early days but didn’t get much traction on the compiler side (at that time we were still using gcc as the front end). So this ended up being only talk. But it would be really handy to know the scope and not just the initial point of the expressions and subexpressions in the debug info.

Ah, cool.

  • Dave