lldb fails to hit breakpoint when line maps to multiple addresses

We have a case where a source breakpoint isn't hit because the source line maps
to 2 addresses in the debug info and lldb only sets 1 BP on the first address
which is in a basic block that is rarely executed. The codegen looks something
like this (in pseudo code):

        some_code
        br lbl2
    lbl1:
        some_more_code
        code_for_line_50_pt1 ; lldb sets BP here but code not executed
    lbl2:
        code_for_line_48
        code_for_line_50_pt2 ; this code is executed
        if (cond) br lbl1

BreakpointResolverFileLine::SearchCallback correctly finds both symbol contexts
for the line. The symbol contexts have different addresses but the same
lldb_private::Block. As a result, BreakpointResolver::SetSCMatchesByLine
thinks they are in the same "contiguous range" and removes the 2nd symbol
context. lldb sets the breakpoint at the 1st address and the user never hits
their breakpoint.

The check for a "contiguous range" in BreakpointResolver::SetSCMatchesByLine
seems wrong. Is it assuming a Block is actually a basic block (i.e. with no
branches)? What's supposed to happen here?

Thanks,
-Dawn

We currently coalesce breakpoints to avoid the user stopping multiple times on the same source line. This might have been done to avoid stepping issues we might have had in the past, but we have made many modifications to stepping such that the step thread plans now know how to step through two hits from the same source line. I would need Jim to confirm this, but he is out for 2 weeks.

So this is on purpose and is expected. Not sure what will happen if we disable this coalescing of locations that are contiguous. Feel free to try and disable it and run the test suite and see how things go. Don't check anything in, but you could try disabling it on your branch and see how things go. Then when Jim returns we can confirm with him what the right thing to do is.

Greg Clayton

We currently coalesce breakpoints to avoid the user stopping multiple times on the same source line. This might have been done to avoid stepping issues we might have had in the past, but we have made many modifications to stepping such that the step thread plans now know how to step through two hits from the same source line. I would need Jim to confirm this, but he is out for 2 weeks.

I (think I) understand the intent, but it is wrong as written. What I
think the code is trying to do is remove duplicate lines within a basic
block (or at least that's the only explanation I can come up with that
makes any sense), but the code appears to assume Block == basic block,
which is not true.

As long as there is branching into/out of an address range, you can miss
breakpoints, which is much worse than stepping to the same line twice,
wouldn't you agree? Note that this isn't even in optimized code - this
issue will be far more common after the optimizer has played with it.
    
As you say, we might have to wait for Jim to hear the real answer.
If this turns out to be intentional, OK to add an option to control it?

So this is on purpose and is expected. Not sure what will happen if we disable this coalescing of locations that are contiguous. Feel free to try and disable it and run the test suite and see how things go. Don't check anything in, but you could try disabling it on your branch and see how things go. Then when Jim returns we can confirm with him what the right thing to do is.

Sure - will do.

Thanks!
-Dawn

We currently coalesce breakpoints to avoid the user stopping multiple times on the same source line. This might have been done to avoid stepping issues we might have had in the past, but we have made many modifications to stepping such that the step thread plans now know how to step through two hits from the same source line. I would need Jim to confirm this, but he is out for 2 weeks.

I (think I) understand the intent, but it is wrong as written. What I
think the code is trying to do is remove duplicate lines within a basic
block (or at least that's the only explanation I can come up with that
makes any sense), but the code appears to assume Block == basic block,
which is not true.

As long as there is branching into/out of an address range, you can miss
breakpoints, which is much worse than stepping to the same line twice,
wouldn't you agree? Note that this isn't even in optimized code - this
issue will be far more common after the optimizer has played with it.

As you say, we might have to wait for Jim to hear the real answer.
If this turns out to be intentional, OK to add an option to control it?

I only want to add options if we have a split community on what the right thing to do is. I believe we should be able to come up with a solution that makes sense that everyone will like. You might check the SVN history on the line to see if that sheds any light on the issue. So lets first try to fix it correctly so everyone is happy without adding an option.

I only want to add options if we have a split community on what the right thing to do is. I believe we should be able to come up with a solution that makes sense that everyone will like. You might check the SVN history on the line to see if that sheds any light on the issue. So lets first try to fix it correctly so everyone is happy without adding an option.

Agreed.

>> So this is on purpose and is expected. Not sure what will happen if we disable this coalescing of locations that are contiguous. Feel free to try and disable it and run the test suite and see how things go. Don't check anything in, but you could try disabling it on your branch and see how things go. Then when Jim returns we can confirm with him what the right thing to do is.
>
> Sure - will do.

I ran tests with the code disabled locally on OSX. When tests are compiled
with clang-602.0.53 (based on llvm 3.6.0svn), we see an additional 34 failures
in the lldb tests, and when tested with clang-503.0.40 (based on llvm 3.4svn),
we see only 6 additional failures.

Most tests are failing because the num_expected_locations is no longer 1.
For that, we could change run_break_set_by_file_and_line to have
num_expected_locations=-1 mean "any number of locations", and if
num_expected_locations == -1, check that the location count is >= 1 (and of
course, fix the affected tests).

Other tests are harder to fix, because they'd need additional "continue"s added
as needed. But we can't hard-code the number of BP locations in those cases,
as we've already seen differences based on the version of clang, and llvm could
schedule the code differently based on optimizations or platform. So we'd need
a way to get the number of BP locations and conditionally add "continue"s, but
even that would also be dependant on llvm's codegen.

FYI - the 6 test which fail in both older and newer clangs are:
    CreateAfterAttachTestCase.test_create_after_attach_with_dsym
    CreateAfterAttachTestCase.test_create_after_attach_with_dwarf_and_fork
    CreateAfterAttachTestCase.test_create_after_attach_with_dwarf_and_popen
    ThreadExitTestCase.test_with_dsym
    TypeCompletionTestCase.test_with_dsym_and_run_command
    TypeCompletionTestCase.test_with_dwarf_and_run_command

-Dawn

We currently coalesce breakpoints to avoid the user stopping multiple times on the same source line. This might have been done to avoid stepping issues we might have had in the past, but we have made many modifications to stepping such that the step thread plans now know how to step through two hits from the same source line. I would need Jim to confirm this, but he is out for 2 weeks.

I (think I) understand the intent, but it is wrong as written. What I
think the code is trying to do is remove duplicate lines within a basic
block (or at least that's the only explanation I can come up with that
makes any sense), but the code appears to assume Block == basic block,
which is not true.

The debugger doesn’t really know anything about basic blocks. IIRC there is a way to specify this in the DWARF line tables but nobody sets it so we don’t try to use it.

Given that, the best lldb can do is use heuristics, and the best heuristic I had was Block == basic block…

The motivation is that compilers in general and certainly clang in particular love to put multiple line table entries in for a given line that are either contiguous or interrupted by artificial book-keeping code. So if we didn’t coalesce these line entries, when you set a breakpoint on such a line, you’’d have to hit continue some unpredictable number of times before you actually get past that line. You could figure out how many time by counting the number of locations, but nobody could be expected to do that… And if you are chasing multiple hits of the breakpoint through code it was really a pain since one “continue” didn’t result in one pass through the function containing the code. This happens very frequently and was a font of bugs for lldb early on.

Note, this doesn’t affect the stepping algorithms, since when we step we just look at where we land and if it has the same line number as we were stepping through we keep going. Of course, it also makes stepping over such a line annoying for the same reason that it made continue annoying...

Note also that gdb plays the same trick with setting breakpoints on multiple line table entries (or at least it did last time I looked.) This wasn’t something new in lldb.

I don’t think it will make people happy to turn this coalescing off by default, I would urge you not to do that. Yours is the first report we’ve had where this causes trouble, whereas it makes general stepping work much more nicely. So if you have some specific reason to need it either (a) if there’s some better heuristic you can come up with that detects when you should not coalesce, that would be awesome or (b) if there’s no way lldb can tell, you’ll have to add an option.

Jim

Given that, the best lldb can do is use heuristics, and the best heuristic I had was Block == basic block???

Can you at least check for branches then? (Yes, that would require dissassebly).

The motivation is that compilers in general and certainly clang in particular love to put multiple line table entries in for a given line that are either contiguous or interrupted by artificial book-keeping code. So if we didn???t coalesce these line entries, when you set a breakpoint on such a line, you???d have to hit continue some unpredictable number of times before you actually get past that line. You could figure out how many time by counting the number of locations, but nobody could be expected to do that??? And if you are chasing multiple hits of the breakpoint through code it was really a pain since one ???continue??? didn???t result in one pass through the function containing the code. This happens very frequently and was a font of bugs for lldb early on.

Understood - we get reports like this all the time, and I've also thought of
ways to workaround it, but for each idea I had, I could always find a way to
break it. So now I tell users it "works as designed", and that it's better
to hit a BP a couple times than none at all.

Note, this doesn???t affect the stepping algorithms, since when we step we just look at where we land and if it has the same line number as we were stepping through we keep going. Of course, it also makes stepping over such a line annoying for the same reason that it made continue annoying...

What about tail recursion? You must at least check the stack ptr, no?

Note also that gdb plays the same trick with setting breakpoints on multiple line table entries (or at least it did last time I looked.) This wasn???t something new in lldb.

No, gdb gets this case right.

Yours is the first report we???ve had where this causes trouble, whereas it makes general stepping work much more nicely.

I'm quite surprised. FWIW, the typical case we see this in is exception handling.

So if you have some specific reason to need it either (a) if there???s some better heuristic you can come up with that detects when you should not coalesce, that would be awesome

Better: check for branches out of the block.

But this would still fail for cases where code branches around the initial block
and into the blocks belonging to that line further down.

    ; Example pseudo code: Set BP at line 10 in the following:
          br @lbl2 ; belongs to line 9
    lbl1: insns_for_line10_part1 ; lldb sets BP here
    lbl2: insns_for_line10_part2 ; BP at line 10 never hit
          if (false_cond) br @lbl1

So best would be to also check for labels which lines into the block,
but that's unrealistic.
    

or (b) if there???s no way lldb can tell, you???ll have to add an option.

Sounds like we'll have to go with an option then.

Thanks,
-Dawn

Given that, the best lldb can do is use heuristics, and the best heuristic I had was Block == basic block???

Can you at least check for branches then? (Yes, that would require disassembly).

Breakpoint setting doesn't have to be blazingly fast, we look as disassembly and worse in other cases (e.g. resolver symbols actually have to be resolved - which involves a function call in the debugee - to figure out the target of the resolver...) So I'm not opposed to this in general. But I wouldn't want (you ;-)) to do this work if it isn't going to cover all the cases you care about, which it sounds from below like it wouldn't. And you would have to be careful since things like calls shouldn't cause extra locations to be generated...

Another way to do this - which I thought about originally but rejected as too much delicate machinery for the desired effect - is to add the notion of "clusters" of locations to the breakpoint. Instead of eliding all the segments with the same line number into one location, you'd make a location per segment but treat them as a cluster, where a hit on one location in the cluster would set a flag telling you to auto-continue the other locations in the cluster till "something happened to reset the cluster". You'd have to figure out good heuristics for that "something happened". You could probably get away with "frame changed" and "hit the location that I hit the first time I hit the cluster". But I'd have to think a bit harder about this to assure myself this was good enough. And you'd have to keep a side table of history for each breakpoint which you'd have to manage... Nothing impossible, but it didn't seem worth the effort at the time.

Anyway, if you are sufficiently motivated to give this a try, it would be a more general solution to the problem without requiring user intervention - either having to press continue some undetermined number of times, or create the breakpoints with some special option.

The motivation is that compilers in general and certainly clang in particular love to put multiple line table entries in for a given line that are either contiguous or interrupted by artificial book-keeping code. So if we didn???t coalesce these line entries, when you set a breakpoint on such a line, you???d have to hit continue some unpredictable number of times before you actually get past that line. You could figure out how many time by counting the number of locations, but nobody could be expected to do that??? And if you are chasing multiple hits of the breakpoint through code it was really a pain since one ???continue??? didn???t result in one pass through the function containing the code. This happens very frequently and was a font of bugs for lldb early on.

Understood - we get reports like this all the time, and I've also thought of
ways to workaround it, but for each idea I had, I could always find a way to
break it. So now I tell users it "works as designed", and that it's better
to hit a BP a couple times than none at all.

Note, this doesn???t affect the stepping algorithms, since when we step we just look at where we land and if it has the same line number as we were stepping through we keep going. Of course, it also makes stepping over such a line annoying for the same reason that it made continue annoying...

What about tail recursion? You must at least check the stack ptr, no?

Of course, and not just for tail recursion: the current line might have called something that calls the current function that gets you back to the current line. Stepping can't stop for that either. I wasn't giving a complete description of the stepping algorithm, just how it pertains to passing through blocks of code in the same function.

Note also that gdb plays the same trick with setting breakpoints on multiple line table entries (or at least it did last time I looked.) This wasn???t something new in lldb.

No, gdb gets this case right.

Interesting. Maybe they've changed how they do the line coalescing, I haven't looked in years.

Yours is the first report we???ve had where this causes trouble, whereas it makes general stepping work much more nicely.

I'm quite surprised. FWIW, the typical case we see this in is exception handling.

Maybe you have a different code generation model from clang (or swift?)

So if you have some specific reason to need it either (a) if there???s some better heuristic you can come up with that detects when you should not coalesce, that would be awesome

Better: check for branches out of the block.

But this would still fail for cases where code branches around the initial block
and into the blocks belonging to that line further down.

   ; Example pseudo code: Set BP at line 10 in the following:
         br @lbl2 ; belongs to line 9
   lbl1: insns_for_line10_part1 ; lldb sets BP here
   lbl2: insns_for_line10_part2 ; BP at line 10 never hit
         if (false_cond) br @lbl1

So best would be to also check for labels which lines into the block,
but that's unrealistic.

or (b) if there???s no way lldb can tell, you???ll have to add an option.

Sounds like we'll have to go with an option then.

Yeah, that sounds like the safest way to go as a short term solution.

Jim

Another way to do this - which I thought about originally but rejected as too much delicate machinery for the desired effect - is to add the notion of "clusters" of locations to the breakpoint. Instead of eliding all the segments with the same line number into one location, you'd make a location per segment but treat them as a cluster, where a hit on one location in the cluster would set a flag telling you to auto-continue the other locations in the cluster till "something happened to reset the cluster". You'd have to figure out good heuristics for that "something happened". You could probably get away with "frame changed" and "hit the location that I hit the first time I hit the cluster". But I'd have to think a bit harder about this to assure myself this was good enough. And you'd have to keep a side table of history for each breakpoint which you'd have to manage... Nothing impossible, but it didn't seem worth the effort at the time.

Anyway, if you are sufficiently motivated to give this a try, it would be a more general solution to the problem without requiring user intervention - either having to press continue some undetermined number of times, or create the breakpoints with some special option.

BTW, Greg points out that one could teach the Thread Plans to treat step over a source line that has a breakpoint on it to do this job as well. That will save you from having to store history information off to one side. This approach might be worth pursuing, though I bet it will end up being trickier than it seems at first glance...

Jim