[llvm] r343874 - DwarfDebug: Pick next location in case of missing location at block begin

Sadly, after this commit TestDataFormatterLibcxxList.py started failing with:


output: Process 67333 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100000eb0 a.out`main at main.cpp:33:16
30 (text_list.push_back(std::string("smart")));
31 
32 printf("// Set second break point at this line.");
-> 33 (text_list.push_back(std::string("!!!")));
^
34 
35 std::list<int> countingList = {3141, 3142, 3142,3142,3142, 3142, 3142, 3141};
36 countingList.sort();

runCmd: frame variable text_list[0]
output: (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) [0] = "goofy"

Expecting sub string: goofy
Matched

runCmd: frame variable text_list[3]
output: None

Expecting sub string: !!!

URL: http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/10854

I confirmed that reverting this fixes the issue.

I think the problem is that we’re breaking on an instruction that looks like it’s on line 33, but it’s actually “before” the full effect of the push_back() is visible.

In which case the test is wrong, because it’s setting the breakpoint too early…

vedant

So what should we do? Revert the llvm commit, fix the LLDB test, xfail on lldb? I’d be fine with any but don’t want to learn how lldb tests work at this moment…

No worries, I’ve relaxed the test for now in r343899 to get the bots going again. Essentially, the test will just step one more time to make sure the full effect of the push_back() is visible.

If folks on lldb-dev (cc’d) have a better idea we can try it out.

vedant

So in the test scenario, we have this code:

    printf("// Set second break point at this line.");
    (text_list.push_back(std::string("!!!")));

and we have a breakpoint on the printf line. We've just continued to hit the breakpoint at printf. Then we do next twice. That should certainly get us past the push_back. If it does not that's either a bug in the line tables or in lldb's handling of them. You would not expect to have to next three times to go from the start of the printf line to past the push_back of !!!.

Considered as a test about stepping, we should applaud the test for having caught a bug, and go figure out whether the line tables are bogus in this instance or just innovative and lldb will have to cope...

Considered as a test about data formatters, it is a little silly to try to drive it using next's since the push_backs are going to introduce a bunch of inlining and the debug information for inlining is often a bit wonky...

Best course is to use the breakpoints in this test to drive from point to point, and make a reduced stepping test case that just shows this bad stepping behavior. Then we can fix the latter test failure either in clang or lldb as is appropriate.

Jim

Sorry for being late to the party, I have been away.

I am not persuaded that this patch is functionally correct.
It's lovely to reduce size, but not at the cost of telling
un-truths about source locations of instructions. If the
instructions at the top of the block have no source location,
and you don't like using line 0, the correct path forward is to
understand why those instructions do not have a source location,
not to just pick one from a nearby instruction.

My choice would be to revert the LLVM patch rather than make
the LLDB test accommodate the un-truth.
--paulr

So I haven’t worked much on debug info, but here’s the explanation for my patches:
My original motivation was getting rid of code some code in the llvm codegen that for spills and reloads just picked the next debug location around. That just seemed wrong to me, as spills and reloads really are bookkeeping, we just move values around between registers and memory and none of it is related to anything the user wrote in the source program. So not assigning any debug information felt right for these instructions. Then people noticed line table bloat because of this and I guess we assumed that having exact line-0 annotations isn’t that useful that it warrants bloating the debug information…

  • Matthias

So I haven’t worked much on debug info, but here’s the explanation for my patches:
My original motivation was getting rid of code some code in the llvm codegen that for spills and reloads just picked the next debug location around. That just seemed wrong to me, as spills and reloads really are bookkeeping, we just move values around between registers and memory and none of it is related to anything the user wrote in the source program. So not assigning any debug information felt right for these instructions. Then people noticed line table bloat because of this and I guess we assumed that having exact line-0 annotations isn’t that useful that it warrants bloating the debug information…

Right. This doesn’t seem any more arbitrary than reusing the previous instruction location, which we do all the time. I think it’s a reasonable tradeoff.

  • Matthias

Sorry for being late to the party, I have been away.

I am not persuaded that this patch is functionally correct.
It’s lovely to reduce size, but not at the cost of telling
un-truths about source locations of instructions. If the
instructions at the top of the block have no source location,
and you don’t like using line 0, the correct path forward is to
understand why those instructions do not have a source location,
not to just pick one from a nearby instruction.

I think we should do that regardless, but it’s a much bigger project which won’t fix the bloat issues the hwasan folks ran into.

My choice would be to revert the LLVM patch rather than make
the LLDB test accommodate the un-truth.

As Jim pointed out, the LLDB test is doing a whole lot more than it needs to. It should be rewritten to focus on data formatting, instead of stepping.

vedant

So I haven't worked much on debug info, but here's the explanation for my patches:
My original motivation was getting rid of code some code in the llvm codegen that for spills and reloads just picked the next debug location around. That just seemed wrong to me, as spills and reloads really are bookkeeping, we just move values around between registers and memory and none of it is related to anything the user wrote in the source program. So not assigning any debug information felt right for these instructions. Then people noticed line table bloat because of this and I guess we assumed that having exact line-0 annotations isn't that useful that it warrants bloating the debug information...

Right. This doesn't seem any more arbitrary than reusing the previous instruction location, which we do all the time. I think it's a reasonable tradeoff.

- Matthias

Sorry for being late to the party, I have been away.

I am not persuaded that this patch is functionally correct.
It's lovely to reduce size, but not at the cost of telling
un-truths about source locations of instructions. If the
instructions at the top of the block have no source location,
and you don't like using line 0, the correct path forward is to
understand why those instructions do not have a source location,
not to just pick one from a nearby instruction.

I think we should do that regardless, but it's a much bigger project which won't fix the bloat issues the hwasan folks ran into.

My choice would be to revert the LLVM patch rather than make
the LLDB test accommodate the un-truth.

As Jim pointed out, the LLDB test is doing a whole lot more than it needs to. It should be rewritten to focus on data formatting, instead of stepping.

I didn't quite say that. I said this test inadvertently showed itself a good test for stepping, as well as being a test for data formatting. So we should extract the part that proved its worth in catching stepping bugs into a stepping-only test, and then rewrite the data formatter test so it wasn't reliant on stepping behavior.

Jim

For spills and reloads, the next valid source location might be reasonable.
For top-of-block instructions, I really don't think so; we had this issue
in FastISel, some years back, and ultimately went with line-0 at top of
block because it caused way fewer problems than hoisting the next valid
source location.

Please revert, and make top-of-block not work the same as spills and reloads.
--paulr

I still do not understand why the new behavior would be a problem. What does the debugger do with line-0 location except for setting the breakpoints slightly later? Spills/reloads being attribute to a following line or not, how would that make a difference?

I don't really care about the commit itself, it was the third step in a yak-shaving situation. If you prefer bigger line tables then why not (though then you should just enable the "UnknownLocations=Enable" behavior in DwarfDebug.cpp to at least be consistent. But before I have to go hunting down lldb bugs as the next step in the yak shaving I will revert...

I assume what happened in the past was that the previous debug location spilled over to the next basic block when the top-of-the-block instruction had line-0 set. I can immediately see why that is a problem. And I assume that was the case that was overlooked in the past. I cannot see however how taking the following location in the same basic block is a problem.

From: Matthias Braun [mailto:matze@braunis.de]
Sent: Wednesday, October 10, 2018 3:48 PM
To: Robinson, Paul
Cc: jingham@apple.com; vsk@apple.com; llvm-commits@lists.llvm.org; lldb-
dev@lists.llvm.org
Subject: Re: [lldb-dev] [llvm] r343874 - DwarfDebug: Pick next location in
case of missing location at block begin

>
>
>
>> From: jingham@apple.com [mailto:jingham@apple.com]
>> Sent: Wednesday, October 10, 2018 2:20 PM
>> To: Vedant Kumar
>> Cc: Robinson, Paul; Vedant Kumar via llvm-commits; LLDB; Matthias Braun
>> Subject: Re: [lldb-dev] [llvm] r343874 - DwarfDebug: Pick next location
in
>> case of missing location at block begin
>>
>>
>>
>>>
>>>>
>>>> So I haven't worked much on debug info, but here's the explanation
for
>> my patches:
>>>> My original motivation was getting rid of code some code in the llvm
>> codegen that for spills and reloads just picked the next debug location
>> around. That just seemed wrong to me, as spills and reloads really are
>> bookkeeping, we just move values around between registers and memory
and
>> none of it is related to anything the user wrote in the source program.
So
>> not assigning any debug information felt right for these instructions.
>> Then people noticed line table bloat because of this and I guess we
>> assumed that having exact line-0 annotations isn't that useful that it
>> warrants bloating the debug information...
>>>
>>> Right. This doesn't seem any more arbitrary than reusing the previous
>> instruction location, which we do all the time. I think it's a
reasonable
>> tradeoff.
>
> For spills and reloads, the next valid source location might be
reasonable.
> For top-of-block instructions, I really don't think so; we had this
issue
> in FastISel, some years back, and ultimately went with line-0 at top of
> block because it caused way fewer problems than hoisting the next valid
> source location.
>
> Please revert, and make top-of-block not work the same as spills and
reloads.
I still do not understand why the new behavior would be a problem. What
does the debugger do with line-0 location except for setting the
breakpoints slightly later? Spills/reloads being attribute to a following
line or not, how would that make a difference?

I don't really care about the commit itself, it was the third step in a
yak-shaving situation. If you prefer bigger line tables then why not
(though then you should just enable the "UnknownLocations=Enable" behavior
in DwarfDebug.cpp to at least be consistent.

Temper, temper.

The line tables are there for a reason. Actually multiple reasons. I have
been fussing with them for years and it is not easy to get them to work
for all consumers. Work with me here, please.
--paulr

Because those instructions actually do something, and whether variables are
visible and expressions evaluate correctly may depend on those instructions
being executed before the debugger stops. If you mark them with line 0 the
debugger doesn't stop. If you hoist the next source location, it does.

Spills and reloads in the middle of a block *probably* can be associated
with the next source line without doing any real damage; although it may
turn out that reloads need to attach to the previous source line. I'd
need to see a variety of examples to be sure, one way or the other.

But long experience has taught me that instructions at the top of a block
really are better off at line 0 than at the next real source line. If
you can prove they are *always* better off with the next source line, for
all consumers, I'd be very interested to hear about it.

But, this thread is not the best place for that; bring it up on llvm-dev
so other interested parties can see what you're looking for.
--paulr

1) So I went and figured out why the lldb testcase at hand fails.

- It seems the debugger stepping logic will follow the program along until it finds a new source location. However if that new source location is part of a new DW_AT_abstract_location it is ignored in the step over mode.
- In the testcase at hand the .loc location of an inlined function was moved to an earlier place without the DW_AT_abstract_location entry getting extended. So the debugger mistook the inlined function as the same scope and stepping incorrectly halted there.

On the LLVM side DW_AT_abstract_location is generated by LexicalScopes which is created by lib/CodeGen/LexicalScopes.cpp / extractLexicalScopes() with completely separate code from the line table generation code in lib/CodeGen/AsmPrinter/DwarfDebug.cpp you have to be aware of that and keep the two algorithms in sync :-/

I fixed LexicalScopes.cpp to be in sync and the lldb test works fine again for me.

2) However talking to Adrian earlier he also expressed having a bad feeling about moving debug location upwards instead of emitting line-0 entries. So I think consensus here is to rather live with some line table bloat instead.

- Unless there are objections I will not go with option 1) but instead revert this commit tomorrow. Note that I will only revert r343874 (i.e. the behavior change for what to emit when the first instructions of a basic block have no location attached), but not r343895 (removing debuglocs from spill/reload instructions) as I still consider this a sensible commit. So in the end we may have bigger line tables than before my changes, but simpler code in the spill/reload generation and occasionally can avoid random source location when spill/reloads get rescheduled.

- Matthias

Thanks for looking into this!

When I was first working on inlined stepping, I found a bunch of cases where the line table info and the ranges for the inlined subroutines disagreed. I remember seeing cases, for instance, where the address given for foo.h:xxx in the line table was contained in one of the debug_info's inlined subroutine blocks from a different file. I tried for a little while to put in heuristics to try to work past the disagreement. But that was too easy to get wrong, and when I got that wrong it often had the result of turning a step into a continue. It is annoying to stop too early, but it is much worse to stop too late (or never). So I ended up bagging my attempts at heroics and whenever I get to a place where the line table and the inlined_subroutine bits of info are out of sync, I just stop.

Jim

I reverted things in r344318 now.

Thanks for looking into this!

When I was first working on inlined stepping, I found a bunch of cases where the line table info and the ranges for the inlined subroutines disagreed. I remember seeing cases, for instance, where the address given for foo.h:xxx in the line table was contained in one of the debug_info's inlined subroutine blocks from a different file. I tried for a little while to put in heuristics to try to work past the disagreement. But that was too easy to get wrong, and when I got that wrong it often had the result of turning a step into a continue. It is annoying to stop too early, but it is much worse to stop too late (or never). So I ended up bagging my attempts at heroics and whenever I get to a place where the line table and the inlined_subroutine bits of info are out of sync, I just stop.

Makes total sense from lldbs point of view. If anything this is something a verifier could catch during testing, but it's a small issue so I'm not sure it's worth starting this now.

- Matthias

I reverted things in r344318 now.

Thanks for looking into this!

When I was first working on inlined stepping, I found a bunch of cases where the line table info and the ranges for the inlined subroutines disagreed. I remember seeing cases, for instance, where the address given for foo.h:xxx in the line table was contained in one of the debug_info's inlined subroutine blocks from a different file. I tried for a little while to put in heuristics to try to work past the disagreement. But that was too easy to get wrong, and when I got that wrong it often had the result of turning a step into a continue. It is annoying to stop too early, but it is much worse to stop too late (or never). So I ended up bagging my attempts at heroics and whenever I get to a place where the line table and the inlined_subroutine bits of info are out of sync, I just stop.

Makes total sense from lldbs point of view. If anything this is something a verifier could catch during testing, but it's a small issue so I'm not sure it's worth starting this now.

Thanks for getting to the bottom of this!

-- adrian