Setting breakpoint on file and function name doesn't work as expected.

Hello,

I noticed this behavior for LLDB under Linux when setting a breakpoint on a file and a function name:

When doing "breakpoint set --file --name ", the is that of the compile unit (CU) and not necessarily where the function is defined. This is not what an end-user expects.

Take this simple example program:

$ cat foo.h
int foo(){ return 42; }

$ cat main.c
#include “foo.h”
int main(){return foo();}

$ clang -g main.c

As you can see, the function foo is defined in foo.h so it seems natural to set a breakpoint on foo.h, doesn’t it?

$ lldb -x -b -o “breakpoint set --file foo.h --name foo” ./a.out

(lldb) target create “./a.out”
Current executable set to ‘./a.out’ (x86_64).
(lldb) breakpoint set --file foo.h --name foo
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.

Apparently, LLDB cannot find the symbol like this. Let’s try the only other file that we have in the project:

$ lldb -x -b -o “breakpoint set --file main.c --name foo” ./a.out
(lldb) target create “./a.out”
Current executable set to ‘./a.out’ (x86_64).
(lldb) breakpoint set --file main.c --name foo
Breakpoint 1: where = a.out`foo + 4 at foo.h:1:12, address = 0x0000000000401114

Isn’t that remarkable? LLDB uses main.c as the file to search in but then finds it in foo.h.

Let’s recall what the parameters --file and --name mean:

-n ( --name )
Set the breakpoint by function name. Can be repeated multiple times to make one breakpoint for multiple names

-f ( --file )
Specifies the source file in which to set this breakpoint. Note, by default lldb only looks for files that are #included if they use the standard include file extensions. To
set breakpoints on .c/.cpp/.m/.mm files that are #included, set target.inline-breakpoint-strategy to “always”.

Let’s check if setting the target.inline-breakpoint strategy to “always” changes something:

$ lldb -x -b -o “settings set target.inline-breakpoint-strategy always” -o “breakpoint set --file foo.h --name foo” ./a.out
(lldb) target create “./a.out”
Current executable set to ‘./a.out’ (x86_64).
(lldb) settings set target.inline-breakpoint-strategy always
(lldb) breakpoint set --file foo.h --name foo
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.

No, it didn’t change anything.

The only evidence for my assumption that LLDB uses the CU’s name for --file is the DWARF dump:

$ llvm-dwarfdump a.out
a.out: file format ELF64-x86-64

.debug_info contents:
0x00000000: Compile Unit: length = 0x00000060 version = 0x0004 abbr_offset = 0x0000 addr_size = 0x08 (next unit at 0x00000064)

0x0000000b: DW_TAG_compile_unit
DW_AT_producer (“clang version 8.0.0 (Fedora 8.0.0-3.fc30)”)
DW_AT_language (DW_LANG_C99)
DW_AT_name (“main.c”)
DW_AT_stmt_list (0x00000000)
DW_AT_comp_dir ("/home/kkleine")
DW_AT_low_pc (0x0000000000401110)
DW_AT_high_pc (0x000000000040113a)

0x0000002a: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000401110)
DW_AT_high_pc (0x000000000040111b)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_name (“foo”)
DW_AT_decl_file ("/home/kkleine/./foo.h")
DW_AT_decl_line (1)
DW_AT_type (0x0000005c “int”)
DW_AT_external (true)

0x00000043: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000401120)
DW_AT_high_pc (0x000000000040113a)
DW_AT_frame_base (DW_OP_reg6 RBP)
DW_AT_name (“main”)
DW_AT_decl_file ("/home/kkleine/main.c")
DW_AT_decl_line (2)
DW_AT_type (0x0000005c “int”)
DW_AT_external (true)

0x0000005c: DW_TAG_base_type
DW_AT_name (“int”)
DW_AT_encoding (DW_ATE_signed)
DW_AT_byte_size (0x04)

As you can see, the DWARF is very small and simply. The function foo has a DW_AT_decl_file which is probably used to report the breakpoint location but for the actual filtering, it seems as if the CU is crucial for the --file argument.

The only reasonable implementation for --file to me seems to be when combined with the line number:

$ lldb -x -b -o “breakpoint set --file foo.h --line 1” ./a.out
(lldb) target create “./a.out”
Current executable set to ‘./a.out’ (x86_64).
(lldb) breakpoint set --file foo.h --line 1
Breakpoint 1: where = a.out`foo + 4 at foo.h:1:12, address = 0x0000000000401114

This works as expected.

For myself I think that the --file --name combination works not like an end-user expects because in bigger projects you typically look at the definition of foo and want to pause, when execution reaches this. You don’t care if a function is inlined or in which CU the function is located. Moreover I think the DWARF actually supports more than enough information with the DW_TAG_subprogram DIE’s attributes for the file and line number. Enough to filter by file and line number.

IMHO we should come up with a very strong argument to justify that --file limits by a CU’s DW_AT_name. To me the only reasonable argument is speed. But speed doesn’t justify such a drastic limitation. Shouldn’t we rather change what --file currently does and keep the old behavior when using --cu instead of --file?

  • Konrad

I read the LLDB troubleshooting page 1 and found interesting quotes:

When setting breakpoints in implementation source files (.c, cpp, cxx, .m, .mm, etc), LLDB by
default will only search for compile units whose filename matches.

[…]
% echo “settings set target.inline-breakpoint-strategy always” >> ~/.lldbinit
This tells LLDB to always look in all compile units and search for breakpoint locations
by file and line even if the implementation file doesn’t match. Setting breakpoints in header
files always searches all compile units because inline functions are commonly defined in
header files and often cause multiple breakpoints to have source line information that matches
many header file paths.

In my email before I did this

$ lldb -x -b -o “breakpoint set --file foo.h --name foo” ./a.out

I now added the breakpoint strategy and ran the above command without the -x in order to pick up
the LLDB init code. Still no luck.

I read the LLDB troubleshooting page [1] and found interesting quotes:

> When setting breakpoints in implementation source files (.c, cpp, cxx,
.m, .mm, etc), LLDB by
> default will only search for compile units whose filename matches.
> [...]
> % echo "settings set target.inline-breakpoint-strategy always" >>
~/.lldbinit

...

I now added the breakpoint strategy and ran the above command without the
-x in order to pick up
the LLDB init code. Still no luck.

That doc is obsolete, LLDB has this setting by default:

(lldb) settings show target.inline-breakpoint-strategy
target.inline-breakpoint-strategy (enum) = always

It has been changed:
https://github.com/llvm/llvm-project/commit/ad6eee639952090684aa84c35218ec327a017ca1

This setting does work but only for the --line option:

==> inc.C <==
#include "inc2.C"
int main() {
  func();
}

==> inc2.C <==
static volatile int i;
static void func() {
  i++;
}
$ clang++ -o inc inc.C -Wall -g;lldb ./inc
(lldb) target create "./inc"
Current executable set to './inc' (x86_64).
(lldb) settings set target.inline-breakpoint-strategy headers
(lldb) breakpoint set --file inc2.C --line 3
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) settings set target.inline-breakpoint-strategy always
(lldb) breakpoint set --file inc2.C --line 3
Breakpoint 2: where = inc`func() + 4 at inc2.C:3:4, address = 0x0000000000401124
(lldb) _

To make it working also for the -n option will require some fix, maybe some
code around lldb/source/Commands/CommandObjectBreakpoint.cpp line 500, dunno.

Jan

Note first, "break set -f foo.c -l 10" and "break set -f foo.c -n bar" are two very different breakpoints. In the first case, we are looking for a specific file & line combination in the line table or inlined functions info. In the latter case we are searching for a function names bar, and then restricting the results to the matching functions whose definition is in foo.c. It seemed neat to not make "break set" be a three-level command, like:

(lldb) break set file-and-line
(lldb) break set function-name
etc...

and use options for everything, but it does make it a little harder to document the cases where one option means something slightly different when used in combination with one or the other option.

Anyway, so setting a breakpoint by file name and function name was meant to support the case where you have:

$ cat foo.c
static int baz() {}
...
$ cat bar.c
static int baz() {}
...

and you want to limit the breakpoint to only the version of foo that was in foo.c. You would do:

(lldb) b s -n baz -f foo.c
Breakpoint 1: where = a.out`baz + 4 at foo.c:5:10, address = 0x0000000100000ee4

In that case searching in the CU makes sense, and is efficient.

Adding the name to a function name breakpoint is really only useful if you have multiple different instances of the same name that you need to disambiguate. In your example, you gain nothing by providing the header file name in which the symbol is defined.

I don't think I designed the behavior for when the colliding functions are actually defined in separate header files - I don't remember considering the possibility. So what's happening is just an accidental outcome. It would certainly be possible, though a little trickier to do the filtering based on defining header. You will have to look for the symbol and then chase down its declaration records to make sure they are in the right .h file, and also look for any inlined instances. You probably also want to take some care not to regress the performance of the case where the function is defined in a CU, since that's the more common usage.

Jim

Jim,

thank you for the explanation. I’m trying to see the situation more from an end user’s perspective. When --file or -f have two different meanings depending on how they are combined, that’s bad IMHO.

From what I read in your response I get the feeling that you assume a user knows about the difference between CU and his or her source file and the implications it can have when for example LTO is enabled and we make heavy use of inlining. I see this as a problem because source-level debugging for a function name and a file to an end user means exactly that, nomatter where the function is inlined. Do you agree?

Konrad

Jim,

thank you for the explanation. I'm trying to see the situation more from an end user's perspective. When --file or -f have two different meanings depending on how they are combined, that's bad IMHO.

I don't think that it is bad that the file parameter in a "file and line" breakpoint and the file parameter in a function name breakpoint have different meanings. That might very well make sense when you think about the kind of search the breakpoint is likely to do. But this does raise a problem with the documentation.

One way to do it is to try to list all the meanings for each option ("when used in conjunction with...") I don't think there are actually enough variants that this will bloat the documentation over much, but that's something to watch out for.

Another thing I've thought about doing is adding the ability to have help include one of the non-optional, non-overlapping options to the command, so you could say:

(lldb) help break set -n

and that would tell you that this is a "by function name" breakpoint, and in that case -n means... That might help reduce the information overload, and give a better sense of what these complex commands do.

As I said, it would have been better from a documentation standpoint to make all these different breakpoint commands sub-commands of "break set"("break set function", "break set file-and-line', etc...) but I think people would find that too verbose.

From what I read in your response I get the feeling that you assume a user knows about the difference between CU and his or her source file and the implications it can have when for example LTO is enabled and we make heavy use of inlining. I see this as a problem because source-level debugging for a function name and a file to an end user means exactly that, nomatter where the function is inlined. Do you agree?

I am not sure what you are asking me to agree to.

(lldb) break set -n foo -f bar.*

means "set the breakpoint on functions named foo DEFINED in the file "bar.*".

It could mean other things in the context of inlining, for instance you might want to tell lldb to break on the function "foo" whenever it is inlined INTO the CU bar.*. That's also a perfectly valid thing to do, and you might think "-n -f" was the combination to do that, but it is not what it does. Again, the feature was intended to disambiguate between different functions with the same name by definition site which the current definition does. So in this sense the user will have to know what the -f means (and we do need some good solution for documenting this more clearly.)

Back to your original query... If the function is defined in a .h file, or gets inlined by LTO, this filtering is trickier, and I didn't implement that behavior when I implemented this breakpoint type. So in that case, and in the case where LTO inlines a function, the feature isn't implemented correctly. The -n searche always looks for out of line and inline instances when doing the search. So we already get the searcher to all the instances. You would just have to widen the search beyond "Does the CU match" to try to figure out where the inlined instance was defined.

Jim

Hi Jim,

Jim,

thank you for the explanation. I’m trying to see the situation more from an end user’s perspective. When --file or -f have two different meanings depending on how they are combined, that’s bad IMHO.

I don’t think that it is bad that the file parameter in a “file and line” breakpoint and the file parameter in a function name breakpoint have different meanings. That might very well make sense when you think about the kind of search the breakpoint is likely to do. But this does raise a problem with the documentation.

I think it is dangerous to make too many assumptions and given that people are lazy I think even with a good piece of documentation people would still get it wrong. To me this is similar to a documentation that says: When you cross the street in this direction, go when the light is green. When you cross the street in the other direction, go when the light is red. With any documentation, no matter how accurate it is you will have people not read it entirely. For example, I used -f -n in a way and it worked. Then I enabled LTO and my debugging habits began to break.

One way to do it is to try to list all the meanings for each option (“when used in conjunction with…”) I don’t think there are actually enough variants that this will bloat the documentation over much, but that’s something to watch out for.

For the time being and the code working the way it is I’m totally not against any documentation updates.

Another thing I’ve thought about doing is adding the ability to have help include one of the non-optional, non-overlapping options to the command, so you could say:

(lldb) help break set -n

and that would tell you that this is a “by function name” breakpoint, and in that case -n means… That might help reduce the information overload, and give a better sense of what these complex commands do.

As I said, it would have been better from a documentation standpoint to make all these different breakpoint commands sub-commands of “break set”(“break set function”, "break set file-and-line’, etc…) but I think people would find that too verbose.

From what I read in your response I get the feeling that you assume a user knows about the difference between CU and his or her source file and the implications it can have when for example LTO is enabled and we make heavy use of inlining. I see this as a problem because source-level debugging for a function name and a file to an end user means exactly that, nomatter where the function is inlined. Do you agree?

I am not sure what you are asking me to agree to.

(lldb) break set -n foo -f bar.*

means "set the breakpoint on functions named foo DEFINED in the file “bar.*”.

My example with foo() was very contrived but not unusual. Take any template function for example:

// foo.h

int foo(){ return 42; }

template
T twice(T arg) { return arg+arg; }

// main.cpp

#include “foo.h”
int main(){return twice(foo());}

When I want to break on the function twice() defined in foo.h I would go for “-f foo.h -n twice” but I have to go for “-f main.cpp -n twice”. And in terms of DWARF, there’s enough to let me do the first variant:

DW_AT_name (“twice”)
DW_AT_decl_file ("/home/kkleine/./foo.h")
DW_AT_decl_line (4)

Why don’t we respect those DW_AT_decl_file and DW_AT_decl_line? Those are always there, for inlining (with LTO), for templates and for regular functions.

It could mean other things in the context of inlining, for instance you might want to tell lldb to break on the function “foo” whenever it is inlined INTO the CU bar.*. That’s also a perfectly valid thing to do, and you might think “-n -f” was the combination to do that, but it is not what it does.

I understand that, the question, as I asked it above, is why we just search by CU?

Again, the feature was intended to disambiguate between different functions with the same name by definition site which the current definition does. So in this sense the user will have to know what the -f means (and we do need some good solution for documenting this more clearly.)

Back to your original query… If the function is defined in a .h file, or gets inlined by LTO, this filtering is trickier, and I didn’t implement that behavior when I implemented this breakpoint type. So in that case, and in the case where LTO inlines a function, the feature isn’t implemented correctly.

Okay, that’s good to know.

The -n searche always looks for out of line and inline instances when doing the search. So we already get the searcher to all the instances. You would just have to widen the search beyond “Does the CU match” to try to figure out where the inlined instance was defined.

I will take a look at the code but I already noticed that call to CompUnitPasses which by default returns true and is only overwritten in some circumstances.

Thank you for taking the time to go through this.

Konrad