test system, when using clang, flags

Hey all,

When running lldb tests with clang, on non-Apple platforms, we really need this flag set:

-fstandalone-debug

It addresses issues where we end up missing complete debuginfo without it in some cases. That ends up breaking the debugger’s ability to perform some operations. So, we’re going to want that flag set when running tests on a host other than MacOSX.

We have a couple ways I suppose we can go about that: modify all the test build files to check if CC/CXX is clang, and add the symbol if so. Or we can do something in the test framework build functions that tries to figure that out, and adds it if appropriate.

Any preferences on the way to handle this?

Hey all,

When running lldb tests with clang, on non-Apple platforms, we really need
this flag set:
-fstandalone-debug

We think this is a bug in LLDB, and would *really* like to see it fixed.
gdb works fine without this flag, and it *drastically* reduces the amount
of debug info stored in object files. My understanding is that the debug
info that LLDB needs is there, but it's in a different compilation unit.

Usually the missing information is type information about a class with a
vtable where the first virtual method (aka the key function) is defined in
a different TU. That TU will emit the vtable and all type information for
that class.

That said, it's fine if LLDB has to add the flag as a short-term way to
stabilize the test suite. I just want to make sure we're on the same page
here: this is probably an LLDB bug, not a Clang bug.

Hey all,

When running lldb tests with clang, on non-Apple platforms, we really need this flag set:
-fstandalone-debug

We think this is a bug in LLDB, and would *really* like to see it fixed. gdb works fine without this flag, and it *drastically* reduces the amount of debug info stored in object files. My understanding is that the debug info that LLDB needs is there, but it's in a different compilation unit.

I don't know what tests are failing, but very few of our tests have more than one file in them, and they don't tend to have very sophisticated classes in them. I'd bet it has more to do with the fact that you have to be a little tricky when writing test cases to convince the compiler that you are really using stuff that you are defining just so you can test it. If you don't do that right, the compiler outsmarts you and elides the debug info for the thing you're trying to test. Rather than getting into an arms race with the compiler, it's far easier to just tell the compiler not to try to be smart about this.

BTW, lldb handles finding types from a compile unit that has the real definition and applying them to cases where they are only present as references in another compilation unit just fine. I am pretty sure we even have tests for that. If you have any cases where this is failing please file bugs about that.

The problems we've seen recently come when the compiler plays tricks like emitting some of the debug information for a class in a given compile unit, but eliding the debug information for one of the classes contained in that class because it thinks the sub-class isn't used. That becomes kind of a nightmare because you now potentially have a whole array of "broken" class definitions in different compilation units where some or other of the contained class definitions are omitted, and the debugger has to figure out how to assemble the "best version" from the pieces. Or you have to present different broken views of the class that don't contain the elided parts and make sure that never causes problems. We don't do any of that work at present.

Jim

Hey all,

When running lldb tests with clang, on non-Apple platforms, we really need this flag set:
-fstandalone-debug

We think this is a bug in LLDB, and would *really* like to see it fixed.

gdb works fine without this flag, and it *drastically* reduces the amount of debug info stored in object files. My understanding is that the debug info that LLDB needs is there, but it's in a different compilation unit.

There are indeed some cases where this is true and LLDB does need to get better about being able to find full definitions of classes that aren't fully defined in some places, but it can also mean some class definitions might never be available. Also, if we get one shared library that has a definition for class A which inherits from B and we try to write an expression that uses a version of A from another shared library, and one library has a full definition and another doesn't, then we run into problems. The main issue is

Usually the missing information is type information about a class with a vtable where the first virtual method (aka the key function) is defined in a different TU. That TU will emit the vtable and all type information for that class.

This isn't always true. We have kernel sources here at Apple where header files define base classes, but these base classes are never compiled in their own TU. This means we end up with just a forward declaration for this other class and we never get it fully defined. I consider this a compiler bug when I must be able to find debug info for shared library "B" in order to be able debug shared library "A".

That said, it's fine if LLDB has to add the flag as a short-term way to stabilize the test suite. I just want to make sure we're on the same page here: this is probably an LLDB bug, not a Clang bug.

I agree there are bugs in LLDB. I also don't like the compiler just assuming debug info will be available from somewhere else. It isn't always available, and we do have code that proves that here at Apple and this is the reason the flag defaults to enable the -fstandalone-debug.

I believe the real solution is to do proper type uniquing in the compiler and linker. This is definitely a hard problem to solve correctly without increasing .o file size, but it is an effort I do believe is well worth it.

A few possible solutions:
1 - if the compiler uses precompiled headers, emit all types from the precompiled headers with their full definitions once into standalone PCH DWARF file. Then refer to these full types in this external file with new DWARF tags. This keeps the .o files small, and would keep the types properly organized inside a DW_TAG_compile_unit for each header file. When the linker links the final executable and is going to make the DWARF for the linked executable, it will copy the DWARF for the precompiled header over into the final binary, and then link each .o file and fix up any external type references to the types in the PCH DWARF. Then we get full debug info, no type duplication what so ever. The downside is this requires PCH.
2 - emit full types for everything in the .o files and then have a quick way to unique them. DWARF has a .debug_types section that does this using special ELF sections that the linker knows how to unique, but I don't really like the DWARF output for this as it spreads everything into separate sections. This also makes the .o files much larger as they all carry full type info.

So we know the current limitations and many of these are driven by how we re-create types in Clang ASTs and we run into problems when we have two differing definitions for a type. Imagine one share library that has a full definition for A which inherits from B and B is just a forward declaration because the compiler thought it didn't have to emit it. Now another shared library has a full definition of A and B, then we try to mix those types by using them in an expression. Clang gets unhappy when two types in the same decl context differ.

Also if you only have a forward declaration for B and kind find a full definition, you end up not being able to call any functions from B or use any instance variables from B.

After running into these very issues, we decided to always emit full debug info for things that are inherited from. It is ok for pointers and reference to have forward declarations, but when A inherits from B, we do expect to be able to find a full definition in the current binary. If not, we have no idea what other binary to load in order to get the debug info for any classes we can't find.

Greg

>
> Hey all,
>
> When running lldb tests with clang, on non-Apple platforms, we really
need this flag set:
> -fstandalone-debug
>
> We think this is a bug in LLDB, and would *really* like to see it fixed.

> gdb works fine without this flag, and it *drastically* reduces the
amount of debug info stored in object files. My understanding is that the
debug info that LLDB needs is there, but it's in a different compilation
unit.

There are indeed some cases where this is true and LLDB does need to get
better about being able to find full definitions of classes that aren't
fully defined in some places, but it can also mean some class definitions
might never be available. Also, if we get one shared library that has a
definition for class A which inherits from B and we try to write an
expression that uses a version of A from another shared library, and one
library has a full definition and another doesn't, then we run into
problems. The main issue is
>
> Usually the missing information is type information about a class with a
vtable where the first virtual method (aka the key function) is defined in
a different TU. That TU will emit the vtable and all type information for
that class.

This isn't always true. We have kernel sources here at Apple where header
files define base classes, but these base classes are never compiled in
their own TU. This means we end up with just a forward declaration for this
other class and we never get it fully defined. I consider this a compiler
bug when I must be able to find debug info for shared library "B" in order
to be able debug shared library "A".

Is gdb able to debug in this situation, so long as you don't look inside
the base class? I suppose this gets to the core issue, which is that LLDB
is trying to build a full-fledged Clang AST, which requires a definition of
the base class, and now we've come full circle: either the compiler needs
to accept an AST with less information, or it needs to produce all the
debug info it will need. Definitely worth thinking about. It might be
easier to solve this by accepting incomplete base class types on the clang
side.

> That said, it's fine if LLDB has to add the flag as a short-term way to
stabilize the test suite. I just want to make sure we're on the same page
here: this is probably an LLDB bug, not a Clang bug.

I agree there are bugs in LLDB. I also don't like the compiler just
assuming debug info will be available from somewhere else. It isn't always
available, and we do have code that proves that here at Apple and this is
the reason the flag defaults to enable the -fstandalone-debug.

I believe the real solution is to do proper type uniquing in the compiler
and linker. This is definitely a hard problem to solve correctly without
increasing .o file size, but it is an effort I do believe is well worth it.

A few possible solutions:
1 - if the compiler uses precompiled headers, emit all types from the
precompiled headers with their full definitions once into standalone PCH
DWARF file. Then refer to these full types in this external file with new
DWARF tags. This keeps the .o files small, and would keep the types
properly organized inside a DW_TAG_compile_unit for each header file. When
the linker links the final executable and is going to make the DWARF for
the linked executable, it will copy the DWARF for the precompiled header
over into the final binary, and then link each .o file and fix up any
external type references to the types in the PCH DWARF. Then we get full
debug info, no type duplication what so ever. The downside is this requires
PCH.

It's an interesting idea. I wonder if MSVC's PCH step adds type info to
the PDB. Unfortunately, PCH is kind of a non-starter for Google's usage.

2 - emit full types for everything in the .o files and then have a quick

way to unique them. DWARF has a .debug_types section that does this using
special ELF sections that the linker knows how to unique, but I don't
really like the DWARF output for this as it spreads everything into
separate sections. This also makes the .o files much larger as they all
carry full type info.

Yep, I believe we currently do this at Google to reduce the size of the
final linked image. I don't know which flags control it. But we also need
-fno-standalone-debug because if the object files are too large, the link
step will overflow the memory quota and die. That's why this is important
to us: without this cleverness, programs actually fail to link.

In theory, smaller object files and faster links are useful to all the
other consumers of Clang, so all this work has been upstream and
on-by-default, but obviously it isn't working for LLDB.

Anyway, we should try to figure something out. I understand if you're not
interested in pursuing this work, I just hope that patches to make LLDB
smarter about this are welcome, and that we can help out as necessary on
the Clang side.

I'd really like to see this get sorted out. Right now on FreeBSD we
also have -fstandalone-debug enabled by default due to this and other
reasons; I'd really like to be able to turn it back off. I haven't
seen anything to suggest there would be an objection to improving this
in LLDB.

Again, I'd like to separate the concerns of "supporting code built with -fno-standalone-debug" in real life and supporting it in the testsuite. If what's mostly going on in the testsuite is that this option makes the compiler's "you're not using it so you get no debug info for it" optimization too aggressive for the style of code you write in test suites (which is quite artificial, not much significant code is 30 or 40 lines long in toto, mostly defining things so we can poke at them) then it doesn't seem like getting ourselves into a fight with the compiler in this venue is really worth the effort. My suspicion is that this is what is going on, just because we don't have a lot of complex classes in the testsuite, but I don't have time to chase this down right now.

This is not to take a stance one way or the other about support for -fno-standalone-debug. I'm just saying that in the past it has always been tricky to write test cases that the compiler's desire to reduce debug info doesn't fight against. In "normal" code this generally sorts itself out because there isn't much of interest which gets introduced to your code that somebody doesn't use somewhere... But having to write more complex than necessary test cases just so we can win this fight against the compiler seems a waste of time.

And of course as we work to support -fno-standalone-debug we could add some test cases that exercise the kinds of elisions that cause trouble IRL. Just maybe not impose that burden on the whole testsuite.

Jim

I’m actually kind of curious what kinds of test code this is breaking. I’m imagining this program:

struct A {
virtual void foo();
int x;
};
uintptr_t buf[2] = { 0xdeadbeef, 0x1234 };
int main() {}

(lldb) print (A)buf

In this case, there will be no debug info for A, and the program still compiles and links, because nobody ever created an A.

It isn’t really here or there, though, since it sounds like nobody has time to work on this at the moment.

I haven't kept track of clang's debug reduction flags. There used to be an -funused-types or -funused-declarations or something like that that covered not emitting type information declarations (originally in gcc it was just types, then it was made into all declarations.) That is pretty much always on, and our test cases work with that on (for instance we'd never write a test like the one you cite, since it would pretty much never work.) I don't know what extra -fno-standalone-debug does that is tripping up the testsuite. But again, I don't have any time to look into it either.

Jim

I was looking to have this flag set in the test suite to just avoid cases where MacOSX test results differ from Linux test results because of the different clang behaviors on the two systems w/r/t debug info. Right now we have a large number of tests marked XFAIL on Linux, and I’m looking for easy ways to reduce differences between the MacOSX and Linux test suite. Debug info presence/absence/differences, in a debugger test suite, seem a pretty significant difference.

However, Jim has convinced me that for our relatively small main.cpp files, this is probably not a significant cause of difference enough to blanket change it without having some proof that it is the issue. Besides, if it ends up being the flag that is the issue, having one platform that the flag one way and another that runs it the other is perhaps a better test of lldb anyway, assuming we get them all to pass in the end :slight_smile:

-Todd

-fstandalone-debug was introduced last January in r198655, and
replaced the previous flag:

    Implement a new -fstandalone-debug option. rdar://problem/15685848
    It controls everything that -flimit-debug-info used to, plus the
    vtable type optimization. The old -fno-limit-debug-info option is now an
    alias to -fstandalone-debug and vice versa.

    Standalone is the default on Darwin until dtrace is updated to work with
    non-standalone debug info (rdar://problem/15758808).

I'm not aware of any more granular control. On Darwin and FreeBSD
-fstandalone-debug is the default and everything is emitted. On Linux
-fno-standalone-debug is the default, and as much as possible is
omitted.

To the original point, I don't think this has much effect on the
testsuite -- there's a decent correlation between Linux and FreeBSD
wrt XFAIL tests, with opposite defaults for the flag. It's the
real-life problems with -fno-standalone-debug that I'm actually
interested in, but we (FreeBSD) currently won't encounter them with
the default options.

Thanks, Ed.

I think we’ll only bother diving into this difference if we find it to be the cause of a test behavior difference.

Good to know that FreeBSD goes the same way as MacOSX on this since, as you mentioned Ed, we do tend to have very similar XFAIL correlation between FreeBSD and Linux even with differences in this flag.

-Todd