[BUG] Many lookup failures

Hi,

Basic things are failing.

(lldb) p lhs
(CG::VarExpr *) $0 = 0x000000010d445ca0
(lldb) p lhs->rootStmt()
(CG::ExprStmt *) $1 = 0x000000010d446290
(lldb) p cg_pp_see_it(lhs->rootStmt())
(const char *) $2 = 0x000000010d448020 "%A = $3;"
(lldb) p cg_pp_see_it(def->rootStmt())
error: no member named 'rootStmt' in 'CG::Node'
error: 1 errors parsing expression
(lldb) p cg_pp_see_it(def)
error: no matching function for call to 'cg_pp_see_it'
note: candidate function not viable: no known conversion from
'CG::Node *' to 'CG_Obj *' for 1st argument
error: 1 errors parsing expression

It's total junk; why can't it see the inheritance VarExpr -> Node ->
CG_Obj? The worst part is that rootStmt() is a function defined on
Node!

Ram

So be sure to enable -fno-limit-debug-info to make sure the compiler isn't emitting lame debug info.

If things are still failing, check to see what we think "CG::Node" contains by dumping the type info for it:

(lldb) image lookup -t CG::Node

This will print out the complete class definition that we have for "CG::Node" including ivars and methods. You should be able to see the inheritance structure and you might need to also dump the type info for each inherited class.

Compilers have been trying to not output a bunch of debug info and in the process they started to omit class info for base classes. So if you have:

class A : public B
{
};

where class "B" has all sorts of interesting methods, the debug info will often look like:

class B; // Forward declaration for class B

class A : public B
{
};

When this happens, we must make class A in a clang::ASTContext in DWARFASTParserClang and if "B" is a forward declaration, we can't leave it as a forward declaration or clang will assert and kill the debugger, so currently we just say "oh well, the compiler gave us lame debug info, and clang will crash if we don't fix this, so I am going to pretend we have a definition for class B and it contains nothing".

I really don't like that the compiler thinks this is OK to do, but that is the reality and we have to deal with it.

So the best thing I can offer it you must use -fno-limit-debug-info when compiling to stop the compiler from doing this and things should be back to normal for you. If this isn't what is happening, let us know what the "image lookup -t" output looks like and we can see what we can do.

Greg Clayton

So be sure to enable -fno-limit-debug-info to make sure the compiler isn’t emitting lame debug info.

Greg cannot be more wrong here. There are some limitations to be aware of when using limit debug info, but if the debug info exists for the type in the object and debug info then it’s the fault of the debugger. The limitations are pretty well defined, which is “if you ship the debug info for all parts of the project you’ve just built then it should work just fine”. It isn’t clear whether or not this is the case here, but the compiler isn’t “emitting lame debug info”. (Also, it’s not clear which compiler you’re using anyhow so Greg’s advice is doubly bad).

-eric

So the main question is probably that you have a "CG::Node" and it inherits from one or more other classes and we probably either didn't find definitions for one or more of these classes. I would like to verify this to make sure we understand the problem. Can you do a:

(lldb) image dump -t "CG::Node"

And then see what classes it inherits from by looking at the textual version of the class that gets dumped. And repeat the "image dump -t <typename>" for each class that "CG::Node" inherits from and let me know if any of these classes are empty (no methods or ivars) when they shouldn't be empty?

The theory behind the compiler debug info reduction that is enabled by default (-flimit-debug-info) is there should be an actual definition of the base classes somewhere and the debugger should be able to find it. We need to determine if this is the case for this binary. There could be an LLDB bug where there is debug info for the base class, but LLDB doesn't find it. The main issue the debugger has with the reduced debug info is we don't currently look across shared library boundaries when creating the type from the debug info. So if you have "CG::Node" in one shared library (liba.so) and it inherits from "Bar" that is in another shared library (libb.so) we will run into this problem as the compiler might emit a forward declaration for "Bar" in the debug info for liba.so. When we debug, we might have the debug info from libb.so or we might not, so even if we are able to fix LLDB to mark classes that it has completed just to keep clang from asserting, we might not be able to fix up the debug info to make it work when debugging later. This is why I prefer to have the information up front even though it might be duplicated elsewhere. There are many valid reasons for reducing the size of debug info: compile times and link timers are shorter, debug info size is smaller, and more.

Greg

So be sure to enable -fno-limit-debug-info to make sure the compiler
isn't emitting lame debug info.

Greg cannot be more wrong here. There are some limitations to be aware of
when using limit debug info, but if the debug info exists for the type in
the object and debug info then it's the fault of the debugger. The
limitations are pretty well defined, which is "if you ship the debug info
for all parts of the project you've just built then it should work just
fine". It isn't clear whether or not this is the case here, but the
compiler isn't "emitting lame debug info". (Also, it's not clear which
compiler you're using anyhow so Greg's advice is doubly bad).

-eric

If things are still failing, check to see what we think "CG::Node"
contains by dumping the type info for it:

(lldb) image lookup -t CG::Node

This will print out the complete class definition that we have for
"CG::Node" including ivars and methods. You should be able to see the
inheritance structure and you might need to also dump the type info for
each inherited class.

Compilers have been trying to not output a bunch of debug info and in the
process they started to omit class info for base classes. So if you have:

class A : public B
{
};

where class "B" has all sorts of interesting methods, the debug info will
often look like:

class B; // Forward declaration for class B

class A : public B
{
};

When this happens, we must make class A in a clang::ASTContext in
DWARFASTParserClang and if "B" is a forward declaration, we can't leave it
as a forward declaration or clang will assert and kill the debugger, so
currently we just say "oh well, the compiler gave us lame debug info, and
clang will crash if we don't fix this, so I am going to pretend we have a
definition for class B and it contains nothing".

Why not lookup the definition of B in the debug info at this point rather
than making a stub/empty definition? (& if there is none, then, yes, I
suppose an empty definition of B is as good as anything, maybe - it's going
to produce some weird results, maybe)

I really don't like that the compiler thinks this is OK to do, but that is

the reality and we have to deal with it.

GCC's been doing it for a while longer than Clang & it represents a
substantial space savings in debug info size - it'd be hard to explain to
users why Clang's debug info is so much (20% or more) larger than GCC's
when GCC's contains all the information required and GDB gives a good user
experience with that information and LLDB does not.

So be sure to enable -fno-limit-debug-info to make sure the compiler isn't emitting lame debug info.

Greg cannot be more wrong here. There are some limitations to be aware of when using limit debug info, but if the debug info exists for the type in the object and debug info then it's the fault of the debugger.

This is true and what we are going to determine. My follow up emails has instructions to help us determine this.

The limitations are pretty well defined, which is "if you ship the debug info for all parts of the project you've just built then it should work just fine". It isn't clear whether or not this is the case here, but the compiler isn't "emitting lame debug info". (Also, it's not clear which compiler you're using anyhow so Greg's advice is doubly bad).

I shouldn't have said "lame", sorry for that. We do need to determine the root cause of this, so hopefully we can find out what the main issue is, again, see my follow up email that preceded this email.

LLDB can start marking up classes that it completes just to keep clang from asserting (we do this for base classes that are forward declarations and for member variable that are structs/classes whose type is a forward declaration) in a way such that when we copy the type from one AST to another we can possibly find the real version of the type from the other shared library and use the real definition. This does mean if you have debug info for one shared library but not the other we will still have reduced debugging abilities.

On MacOSX, we don't enable -flimit-debug-info by default so we tend not to run into this issue as we don't require people to have debug info for everything in order to debug as kernel debugging is very affected by this on darwin. All other platforms have it on by default.

This will print out the complete class definition that we have for "CG::Node" including ivars and methods. You should be able to see the inheritance structure and you might need to also dump the type info for each inherited class.

Compilers have been trying to not output a bunch of debug info and in the process they started to omit class info for base classes. So if you have:

class A : public B
{
};

where class "B" has all sorts of interesting methods, the debug info will often look like:

class B; // Forward declaration for class B

class A : public B
{
};

When this happens, we must make class A in a clang::ASTContext in DWARFASTParserClang and if "B" is a forward declaration, we can't leave it as a forward declaration or clang will assert and kill the debugger, so currently we just say "oh well, the compiler gave us lame debug info, and clang will crash if we don't fix this, so I am going to pretend we have a definition for class B and it contains nothing".

Why not lookup the definition of B in the debug info at this point rather than making a stub/empty definition? (& if there is none, then, yes, I suppose an empty definition of B is as good as anything, maybe - it's going to produce some weird results, maybe)

LLDB creates types using only the debug info from the currently shared library and we don't take a copy of a type from another shared library when creating the types for a given shared library. Why? LLDB has a global repository of modules (the class that represents an executable or shared library in LLDB). If Xcode, or any other IDE that can debug more that one thing at a time has two targets: "a.out" and "b.out", they share all of the shared library modules so that if debug info has already been parsed in the target for "a.out" for the shared library "liba.so" (or any other shared library), then the "b.out" target has the debug info already loaded for "liba.so" because "a.out" already loaded that module (LLDB runs in the same address space as our IDE). This means that all debug info in LLDB currently creates types using only the info in the current shared library. When we debug "a.out" again, we might have recompiled "liba.so", but not "libb.so" and when we debug again, we don't need to reload the debug info for "libb.so" if it hasn't changed, we just reload "liba.so" and its debug info. When we rerun a target (run a.out again), we don't need to spend any time reloading any shared libraries that haven't changed since they are still in our global shared library cache. So to keep this global library cache clean, we don't allow types from another shared library (libb.so) to be loaded into another (liba.so), otherwise we wouldn't be able to reap the benefits of our shared library cache as we would always need to reload debug info every time we run.

LLDB does have the ability, when displaying types, to grab types from the best source (other shared libraries), we just don't transplant types in the LLDB shared library objects (lldb_private::Module) versions of the types. We do currently assume that all classes that aren't pointers or references (or other types that can legally have forward declarations of structs or classes) are complete in our current model.

There are modifications we can do to LLDB to deal with the partial debug info and possible lack thereof when the debug info for other shared libraries are not present, but we haven't done this yet in LLDB.

I really don't like that the compiler thinks this is OK to do, but that is the reality and we have to deal with it.

GCC's been doing it for a while longer than Clang & it represents a substantial space savings in debug info size - it'd be hard to explain to users why Clang's debug info is so much (20% or more) larger than GCC's when GCC's contains all the information required and GDB gives a good user experience with that information and LLDB does not.

LLDB currently recreates types in a clang::ASTContext and this imposes much stricter rules on how we represent types which is one of the weaknesses of the LLDB approach to type representation as the clang codebase often asserts when it is not happy with how things are represented. This does payoff IMHO in the complex expressions we can evaluate where we can use flow control, define and use C++ lambdas, and write more than one statement when writing expressions. But it is definitely a tradeoff. GDB has its own custom type representation which can be better for dealing with the different kinds and completeness of debug info, but I am comfortable with our approach.

So we need to figure out what the root problem is here before we can go further and talk about any additional solutions or fixes that may be required.

Greg

>
> This will print out the complete class definition that we have for
"CG::Node" including ivars and methods. You should be able to see the
inheritance structure and you might need to also dump the type info for
each inherited class.
>
> Compilers have been trying to not output a bunch of debug info and in
the process they started to omit class info for base classes. So if you
have:
>
> class A : public B
> {
> };
>
> where class "B" has all sorts of interesting methods, the debug info
will often look like:
>
> class B; // Forward declaration for class B
>
> class A : public B
> {
> };
>
> When this happens, we must make class A in a clang::ASTContext in
DWARFASTParserClang and if "B" is a forward declaration, we can't leave it
as a forward declaration or clang will assert and kill the debugger, so
currently we just say "oh well, the compiler gave us lame debug info, and
clang will crash if we don't fix this, so I am going to pretend we have a
definition for class B and it contains nothing".
>
> Why not lookup the definition of B in the debug info at this point
rather than making a stub/empty definition? (& if there is none, then, yes,
I suppose an empty definition of B is as good as anything, maybe - it's
going to produce some weird results, maybe)

LLDB creates types using only the debug info from the currently shared
library and we don't take a copy of a type from another shared library when
creating the types for a given shared library. Why? LLDB has a global
repository of modules (the class that represents an executable or shared
library in LLDB). If Xcode, or any other IDE that can debug more that one
thing at a time has two targets: "a.out" and "b.out", they share all of the
shared library modules so that if debug info has already been parsed in the
target for "a.out" for the shared library "liba.so" (or any other shared
library), then the "b.out" target has the debug info already loaded for
"liba.so" because "a.out" already loaded that module (LLDB runs in the same
address space as our IDE). This means that all debug info in LLDB currently
creates types using only the info in the current shared library. When we
debug "a.out" again, we might have recompiled "liba.so", but not "libb.so"
and when we debug again, we don't need to reload the debug info for
"libb.so" if it hasn't changed, we just reload "liba.so" and its debug
info. When we rerun a target (run a.out again), we don't need to spend any
time reloading any shared libraries that haven't changed since they are
still in our global shared library cache. So to keep this global library
cache clean, we don't allow types from another shared library (libb.so) to
be loaded into another (liba.so), otherwise we wouldn't be able to reap the
benefits of our shared library cache as we would always need to reload
debug info every time we run.

Ah, right - I do remember you describing this to me before. Sorry I forgot.

Wouldn't it be sufficient to just copy the definition when needed? If the
type changes in an incompatible way in a dependent library, the user is up
a creek already, aren't they? (eg: libb.so is rebuilt with a new,
incompatible version of some type that liba.so uses, but liba.so is not
rebuilt) Perhaps you wouldn't be responsible for rebuilding the liba.so
cache until it's actually recompiled. Maybe?

LLDB does have the ability, when displaying types, to grab types from the
best source (other shared libraries), we just don't transplant types in the
LLDB shared library objects (lldb_private::Module) versions of the types.
We do currently assume that all classes that aren't pointers or references
(or other types that can legally have forward declarations of structs or
classes) are complete in our current model.

There are modifications we can do to LLDB to deal with the partial debug
info and possible lack thereof when the debug info for other shared
libraries are not present, but we haven't done this yet in LLDB.

>
> I really don't like that the compiler thinks this is OK to do, but that
is the reality and we have to deal with it.
>
> GCC's been doing it for a while longer than Clang & it represents a
substantial space savings in debug info size - it'd be hard to explain to
users why Clang's debug info is so much (20% or more) larger than GCC's
when GCC's contains all the information required and GDB gives a good user
experience with that information and LLDB does not.

LLDB currently recreates types in a clang::ASTContext and this imposes
much stricter rules on how we represent types which is one of the
weaknesses of the LLDB approach to type representation as the clang
codebase often asserts when it is not happy with how things are represented.

Sure, but it seems like it's the cache that's the real issue/stumbling
block here, rather than Clang's AST requirements. As Eric said, the DWARF
is (usually) available (unless you aren't building your whole program with
debug info, when the -fstandalone-debug (aka -fno-limit-debug-info) is
intended for "hey, I need this object file to have debug info that doesn't
depend on any other file"), LLDB just isn't using it.

This does payoff IMHO in the complex expressions we can evaluate where we
can use flow control, define and use C++ lambdas, and write more than one
statement when writing expressions. But it is definitely a tradeoff. GDB
has its own custom type representation which can be better for dealing with
the different kinds and completeness of debug info, but I am comfortable
with our approach.

So we need to figure out what the root problem is here before we can go
further and talk about any additional solutions or fixes that may be
required.

For sure, for this particular user - perhaps there's some other reason
they're seeing this behavior that's got nothing to do with this tangent.
(but, as you say, judging by the specific situation/behavior, it's a fair
guess/bet that it's this quirk/bug/mismatch of expectations)

- Dave

>
> This will print out the complete class definition that we have for "CG::Node" including ivars and methods. You should be able to see the inheritance structure and you might need to also dump the type info for each inherited class.
>
> Compilers have been trying to not output a bunch of debug info and in the process they started to omit class info for base classes. So if you have:
>
> class A : public B
> {
> };
>
> where class "B" has all sorts of interesting methods, the debug info will often look like:
>
> class B; // Forward declaration for class B
>
> class A : public B
> {
> };
>
> When this happens, we must make class A in a clang::ASTContext in DWARFASTParserClang and if "B" is a forward declaration, we can't leave it as a forward declaration or clang will assert and kill the debugger, so currently we just say "oh well, the compiler gave us lame debug info, and clang will crash if we don't fix this, so I am going to pretend we have a definition for class B and it contains nothing".
>
> Why not lookup the definition of B in the debug info at this point rather than making a stub/empty definition? (& if there is none, then, yes, I suppose an empty definition of B is as good as anything, maybe - it's going to produce some weird results, maybe)

LLDB creates types using only the debug info from the currently shared library and we don't take a copy of a type from another shared library when creating the types for a given shared library. Why? LLDB has a global repository of modules (the class that represents an executable or shared library in LLDB). If Xcode, or any other IDE that can debug more that one thing at a time has two targets: "a.out" and "b.out", they share all of the shared library modules so that if debug info has already been parsed in the target for "a.out" for the shared library "liba.so" (or any other shared library), then the "b.out" target has the debug info already loaded for "liba.so" because "a.out" already loaded that module (LLDB runs in the same address space as our IDE). This means that all debug info in LLDB currently creates types using only the info in the current shared library. When we debug "a.out" again, we might have recompiled "liba.so", but not "libb.so" and when we debug again, we don't need to reload the debug info for "libb.so" if it hasn't changed, we just reload "liba.so" and its debug info. When we rerun a target (run a.out again), we don't need to spend any time reloading any shared libraries that haven't changed since they are still in our global shared library cache. So to keep this global library cache clean, we don't allow types from another shared library (libb.so) to be loaded into another (liba.so), otherwise we wouldn't be able to reap the benefits of our shared library cache as we would always need to reload debug info every time we run.

Ah, right - I do remember you describing this to me before. Sorry I forgot.

Wouldn't it be sufficient to just copy the definition when needed? If the type changes in an incompatible way in a dependent library, the user is up a creek already, aren't they? (eg: libb.so is rebuilt with a new, incompatible version of some type that liba.so uses, but liba.so is not rebuilt) Perhaps you wouldn't be responsible for rebuilding the liba.so cache until it's actually recompiled. Maybe?

The fix to LLDB I want to do is to complete the type when we need to for base classes, but mark it with metadata. When we run expressions we create a new clang::ASTContext for each expression, and copy types over into it. The ASTImporter can be taught to look for the metadata on the class that says "I completed this class because I had to", and when copying it, we would grab the right type from the current version of libb.so. This keeps everyone happy: modules get their types with some classes completed but marked, and the expressions get the best version available in their AST contexts where if a complete version of the type is available we find it and copy it in place of the completed but incomplete version from the module AST.

LLDB does have the ability, when displaying types, to grab types from the best source (other shared libraries), we just don't transplant types in the LLDB shared library objects (lldb_private::Module) versions of the types. We do currently assume that all classes that aren't pointers or references (or other types that can legally have forward declarations of structs or classes) are complete in our current model.

There are modifications we can do to LLDB to deal with the partial debug info and possible lack thereof when the debug info for other shared libraries are not present, but we haven't done this yet in LLDB.

>
> I really don't like that the compiler thinks this is OK to do, but that is the reality and we have to deal with it.
>
> GCC's been doing it for a while longer than Clang & it represents a substantial space savings in debug info size - it'd be hard to explain to users why Clang's debug info is so much (20% or more) larger than GCC's when GCC's contains all the information required and GDB gives a good user experience with that information and LLDB does not.

LLDB currently recreates types in a clang::ASTContext and this imposes much stricter rules on how we represent types which is one of the weaknesses of the LLDB approach to type representation as the clang codebase often asserts when it is not happy with how things are represented.

Sure, but it seems like it's the cache that's the real issue/stumbling block here, rather than Clang's AST requirements. As Eric said, the DWARF is (usually) available (unless you aren't building your whole program with debug info, when the -fstandalone-debug (aka -fno-limit-debug-info) is intended for "hey, I need this object file to have debug info that doesn't depend on any other file"), LLDB just isn't using it.

So that problem goes away with my ASTImporter changes as mentioned above where when we import a type from liba.so into the expression AST, we copy all complete types and any types marked with the "I was completed just to keep clang happy" metadata get imported from the best source available or just left complete but empty if the debug info is missing since that is the best we can do.

This does payoff IMHO in the complex expressions we can evaluate where we can use flow control, define and use C++ lambdas, and write more than one statement when writing expressions. But it is definitely a tradeoff. GDB has its own custom type representation which can be better for dealing with the different kinds and completeness of debug info, but I am comfortable with our approach.

So we need to figure out what the root problem is here before we can go further and talk about any additional solutions or fixes that may be required.

For sure, for this particular user - perhaps there's some other reason they're seeing this behavior that's got nothing to do with this tangent. (but, as you say, judging by the specific situation/behavior, it's a fair guess/bet that it's this quirk/bug/mismatch of expectations)

Yes, something is failing and we need to fix the problem so users don't need to worry about it, it should just work and be efficiently stored debug info.

Greg

>
>
>
> >
> > This will print out the complete class definition that we have for
"CG::Node" including ivars and methods. You should be able to see the
inheritance structure and you might need to also dump the type info for
each inherited class.
> >
> > Compilers have been trying to not output a bunch of debug info and in
the process they started to omit class info for base classes. So if you
have:
> >
> > class A : public B
> > {
> > };
> >
> > where class "B" has all sorts of interesting methods, the debug info
will often look like:
> >
> > class B; // Forward declaration for class B
> >
> > class A : public B
> > {
> > };
> >
> > When this happens, we must make class A in a clang::ASTContext in
DWARFASTParserClang and if "B" is a forward declaration, we can't leave it
as a forward declaration or clang will assert and kill the debugger, so
currently we just say "oh well, the compiler gave us lame debug info, and
clang will crash if we don't fix this, so I am going to pretend we have a
definition for class B and it contains nothing".
> >
> > Why not lookup the definition of B in the debug info at this point
rather than making a stub/empty definition? (& if there is none, then, yes,
I suppose an empty definition of B is as good as anything, maybe - it's
going to produce some weird results, maybe)
>
> LLDB creates types using only the debug info from the currently shared
library and we don't take a copy of a type from another shared library when
creating the types for a given shared library. Why? LLDB has a global
repository of modules (the class that represents an executable or shared
library in LLDB). If Xcode, or any other IDE that can debug more that one
thing at a time has two targets: "a.out" and "b.out", they share all of the
shared library modules so that if debug info has already been parsed in the
target for "a.out" for the shared library "liba.so" (or any other shared
library), then the "b.out" target has the debug info already loaded for
"liba.so" because "a.out" already loaded that module (LLDB runs in the same
address space as our IDE). This means that all debug info in LLDB currently
creates types using only the info in the current shared library. When we
debug "a.out" again, we might have recompiled "liba.so", but not "libb.so"
and when we debug again, we don't need to reload the debug info for
"libb.so" if it hasn't changed, we just reload "liba.so" and its debug
info. When we rerun a target (run a.out again), we don't need to spend any
time reloading any shared libraries that haven't changed since they are
still in our global shared library cache. So to keep this global library
cache clean, we don't allow types from another shared library (libb.so) to
be loaded into another (liba.so), otherwise we wouldn't be able to reap the
benefits of our shared library cache as we would always need to reload
debug info every time we run.
>
> Ah, right - I do remember you describing this to me before. Sorry I
forgot.
>
> Wouldn't it be sufficient to just copy the definition when needed? If
the type changes in an incompatible way in a dependent library, the user is
up a creek already, aren't they? (eg: libb.so is rebuilt with a new,
incompatible version of some type that liba.so uses, but liba.so is not
rebuilt) Perhaps you wouldn't be responsible for rebuilding the liba.so
cache until it's actually recompiled. Maybe?
>

The fix to LLDB I want to do is to complete the type when we need to for
base classes, but mark it with metadata. When we run expressions we create
a new clang::ASTContext for each expression, and copy types over into it.
The ASTImporter can be taught to look for the metadata on the class that
says "I completed this class because I had to", and when copying it, we
would grab the right type from the current version of libb.so. This keeps
everyone happy: modules get their types with some classes completed but
marked, and the expressions get the best version available in their AST
contexts where if a complete version of the type is available we find it
and copy it in place of the completed but incomplete version from the
module AST.

> LLDB does have the ability, when displaying types, to grab types from
the best source (other shared libraries), we just don't transplant types in
the LLDB shared library objects (lldb_private::Module) versions of the
types. We do currently assume that all classes that aren't pointers or
references (or other types that can legally have forward declarations of
structs or classes) are complete in our current model.
>
> There are modifications we can do to LLDB to deal with the partial debug
info and possible lack thereof when the debug info for other shared
libraries are not present, but we haven't done this yet in LLDB.
>
> >
> > I really don't like that the compiler thinks this is OK to do, but
that is the reality and we have to deal with it.
> >
> > GCC's been doing it for a while longer than Clang & it represents a
substantial space savings in debug info size - it'd be hard to explain to
users why Clang's debug info is so much (20% or more) larger than GCC's
when GCC's contains all the information required and GDB gives a good user
experience with that information and LLDB does not.
>
> LLDB currently recreates types in a clang::ASTContext and this imposes
much stricter rules on how we represent types which is one of the
weaknesses of the LLDB approach to type representation as the clang
codebase often asserts when it is not happy with how things are represented.
>
> Sure, but it seems like it's the cache that's the real issue/stumbling
block here, rather than Clang's AST requirements. As Eric said, the DWARF
is (usually) available (unless you aren't building your whole program with
debug info, when the -fstandalone-debug (aka -fno-limit-debug-info) is
intended for "hey, I need this object file to have debug info that doesn't
depend on any other file"), LLDB just isn't using it.

So that problem goes away with my ASTImporter changes as mentioned above
where when we import a type from liba.so into the expression AST, we copy
all complete types and any types marked with the "I was completed just to
keep clang happy" metadata get imported from the best source available or
just left complete but empty if the debug info is missing since that is the
best we can do.

Yep, seems plausible to me. Looking forward to it, some day - maybe the
Windows guys'll get to this before you do, not sure. But good to have a
plan described/to work from whenever anyone decides this is their longest
pole.

What? Didn't we just fix this (for the case where there's conflicting
debug info from two different libraries)?

(lldb) image lookup -t ...

prints expected results, with no empty classes. Let me emphasize that
the strange behavior is seen with _some_ variables: in the debugging
session referenced in the original email, the problem was particularly
bad.

I didn't try the expensive experiment (-fno-limit-debug-info) on
account of lldb finding all the classes in full.

Is there some systematic way to test lldb? [Looks in the unittests/
and test/ directories]

When we debug "a.out" again, we might have recompiled "liba.so", but not "libb.so" and when we debug again, we don't need to reload the debug info for "libb.so" if it hasn't changed, we just reload "liba.so" and its debug info. When we rerun a target (run a.out again), we don't need to spend any time reloading any shared libraries that haven't changed since they are still in our global shared library cache. So to keep this global library cache clean, we don't allow types from another shared library (libb.so) to be loaded into another (liba.so), otherwise we wouldn't be able to reap the benefits of our shared library cache as we would always need to reload debug info every time we run.

Tangential: gdb starts up significantly faster than lldb. I wonder
what lldb is doing wrong.

Oh, this is if I use the lldb that Apple supplied. If I compile my own
lldb with llvm-release, clang-release, and lldb-release, it takes like
20x the time to start up: why is this? And if I use llvm-debug,
clang-debug, lldb-debug, the time it takes is completely unreasonable.

LLDB currently recreates types in a clang::ASTContext and this imposes much stricter rules on how we represent types which is one of the weaknesses of the LLDB approach to type representation as the clang codebase often asserts when it is not happy with how things are represented. This does payoff IMHO in the complex expressions we can evaluate where we can use flow control, define and use C++ lambdas, and write more than one statement when writing expressions. But it is definitely a tradeoff. GDB has its own custom type representation which can be better for dealing with the different kinds and completeness of debug info, but I am comfortable with our approach.

Yeah, about that. I question the utility of evaluating crazy
expressions in lldb: I've not felt the need to do that even once, and
I suspect a large userbase is with me on this. What's important is
that lldb should _never_ fail to inspect a variable: isn't this the #1
job of the debugger?

> When we debug "a.out" again, we might have recompiled "liba.so", but not
"libb.so" and when we debug again, we don't need to reload the debug info
for "libb.so" if it hasn't changed, we just reload "liba.so" and its debug
info. When we rerun a target (run a.out again), we don't need to spend any
time reloading any shared libraries that haven't changed since they are
still in our global shared library cache. So to keep this global library
cache clean, we don't allow types from another shared library (libb.so) to
be loaded into another (liba.so), otherwise we wouldn't be able to reap the
benefits of our shared library cache as we would always need to reload
debug info every time we run.

Tangential: gdb starts up significantly faster than lldb. I wonder
what lldb is doing wrong.

Oh, this is if I use the lldb that Apple supplied. If I compile my own
lldb with llvm-release, clang-release, and lldb-release, it takes like
20x the time to start up: why is this? And if I use llvm-debug,
clang-debug, lldb-debug, the time it takes is completely unreasonable.

If you built your own you probably built a +Asserts build which slows
things down a lot. You'll want to make sure you're building Release-Asserts
(Release "minus" Asserts) builds if you want them to be usable.

> LLDB currently recreates types in a clang::ASTContext and this imposes
much stricter rules on how we represent types which is one of the
weaknesses of the LLDB approach to type representation as the clang
codebase often asserts when it is not happy with how things are
represented. This does payoff IMHO in the complex expressions we can
evaluate where we can use flow control, define and use C++ lambdas, and
write more than one statement when writing expressions. But it is
definitely a tradeoff. GDB has its own custom type representation which can
be better for dealing with the different kinds and completeness of debug
info, but I am comfortable with our approach.

Yeah, about that. I question the utility of evaluating crazy
expressions in lldb: I've not felt the need to do that even once, and
I suspect a large userbase is with me on this. What's important is
that lldb should _never_ fail to inspect a variable: isn't this the #1
job of the debugger?

Depends on the language - languages with more syntactic sugar basically
need crazy expression evaluation to function very well in a debugger for
the average user. (evaluating operator overloads in C++ expressions, just
being able to execute non-trivial pretty-printers for interesting types
(std::vector being a simple example, or a small-string optimized
std::string, etc - let alone examples in ObjC or even Swift))

- Dave

What do you mean under startup speed and how do you measure it? I use Release+Assert build of ToT LLDB on Linux and it takes significantly less time for it to start up when debugging a large application (I usually test with debug clang) then what you mentioned.

For me just to start up LLDB is almost instantaneous (~100ms) as it don’t parse any symbol or debug information at that time. If I trigger some debug info parsing/indexing (with setting a breakpoint) then the startup time will be around 3-5 seconds (40 core + ssd machine) what include an indexing of all DIEs (it should be faster on darwin as the index is already in the executable). On the other hand doing the same with gdb takes ~30 seconds (independently if I set a breakpoint or not) because gdb parses all symbol info at startup.

I would like to understand why are you seeing so slow startup time as I worked on optimizing symbol parsing quite a bit in the last few month. Can you send me some information about how you measure the startup time (lldb commands, some info about the inferior) and can you do a quick profiling to see where the time is spent?

If you just want to inspect the content of a variable then I suggest to use the “frame variable” command as it require significantly less context then evaluating an expression. Unfortunately it can still fail in some cases with the same lookup failure you see but it happens in significantly less cases.

When we debug "a.out" again, we might have recompiled "liba.so", but not "libb.so" and when we debug again, we don't need to reload the debug info for "libb.so" if it hasn't changed, we just reload "liba.so" and its debug info. When we rerun a target (run a.out again), we don't need to spend any time reloading any shared libraries that haven't changed since they are still in our global shared library cache. So to keep this global library cache clean, we don't allow types from another shared library (libb.so) to be loaded into another (liba.so), otherwise we wouldn't be able to reap the benefits of our shared library cache as we would always need to reload debug info every time we run.

Tangential: gdb starts up significantly faster than lldb. I wonder
what lldb is doing wrong.

LLDB loads all shared libraries that it can when it first launches and GDB doesn't. In GDB you pay for this when you run your program. So if you want to compare things, do something like:

- get the current time as time1
- set an executable file in GDB and LLDB
- set a breakpoint
- run and hit the breakpoint
- get the current time as time2 and subtract from time1 and note time measurement.

Load GDB and LLDB up with a nice fat clang with debug info and the time to run to a breakpoint as a nice example.

Oh, this is if I use the lldb that Apple supplied. If I compile my own
lldb with llvm-release, clang-release, and lldb-release, it takes like
20x the time to start up: why is this?

Not sure one this.

Are you saying you build on MacOSX using the Xcode project and build the "Release" build configuration and it runs slower than the LLDB that is shipped by Apple? This shouldn't happen. One reason that this might be the case is the binaries you build are not stripped and have a TON of C++ names that are not exported through the LLDB.framework, but are in the symbol table. You can run "strip -Sx" on your LLDB.framework and it should remove these extra symbols.

And if I use llvm-debug,
clang-debug, lldb-debug, the time it takes is completely unreasonable.

Simple: optimized code is much faster

LLDB currently recreates types in a clang::ASTContext and this imposes much stricter rules on how we represent types which is one of the weaknesses of the LLDB approach to type representation as the clang codebase often asserts when it is not happy with how things are represented. This does payoff IMHO in the complex expressions we can evaluate where we can use flow control, define and use C++ lambdas, and write more than one statement when writing expressions. But it is definitely a tradeoff. GDB has its own custom type representation which can be better for dealing with the different kinds and completeness of debug info, but I am comfortable with our approach.

Yeah, about that. I question the utility of evaluating crazy
expressions in lldb: I've not felt the need to do that even once, and
I suspect a large userbase is with me on this. What's important is
that lldb should _never_ fail to inspect a variable: isn't this the #1
job of the debugger?

I agree with you on this. "frame variable" should never fail you as it doesn't use the expression parser. "frame variable" and its rock solid functionality is what IDEs use to display variables in variable views. So use "frame variable" and all will be good.

The expression parser is more complex and requires some special handling due to how we represent the ASTs as a single AST for all compile units within a module (executable/shared library). We use clang as the expression parser and this won't change so even if we tried to limit the expression parser to only do what the GDB expression parser does it won't help. So we need to tame the beast and we need to track down why things are not evaluating correctly when they do go wrong.

So please help us to track any issues down that we are having as one bug fix can often lead to fixing a whole variety of different issues.

So one other issue with removing debug info from the current binary for base classes that are virtual: if the definition for the base class changes in libb.so, but liba.so was linked against an older version of class B from libb.so, like for example:

class A : public B
{
    int m_a;
};

If A was linked against a B that looked like this:

class B
{
    virtual ~B();
    int m_b;
};

Then libb.so was rebuilt and B now looks like:

class B
{
    virtual ~B();
    virtual int foo();
    int m_b;
    int m_bb;
};

Then we when displaying an instance of "A" using in liba.so that was linked against the first version of B, we would actually show you the new version of "B" and everything would look like it was using the new definition for B, but liba.so is actually linked against the old instance and the code in class A would probably crash at some point due to the compilation mismatch, but the user would never really see actually what the original program was linked against and possibly be able to see the issue and realize they need to recompile liba.so against libb.so. If full debug info is emitted we would be able to show the original structure for B. Not an issue that people are always going to run into, but it is a reason that I like to have all the info complete in the current binary.

Greg

So one other issue with removing debug info from the current binary for
base classes that are virtual: if the definition for the base class changes
in libb.so, but liba.so was linked against an older version of class B from
libb.so, like for example:

class A : public B
{
    int m_a;
};

If A was linked against a B that looked like this:

class B
{
    virtual ~B();
    int m_b;
};

Then libb.so was rebuilt and B now looks like:

class B
{
    virtual ~B();
    virtual int foo();
    int m_b;
    int m_bb;
};

Then we when displaying an instance of "A" using in liba.so that was
linked against the first version of B, we would actually show you the new
version of "B" and everything would look like it was using the new
definition for B, but liba.so is actually linked against the old instance
and the code in class A would probably crash at some point due to the
compilation mismatch, but the user would never really see actually what the
original program was linked against and possibly be able to see the issue
and realize they need to recompile liba.so against libb.so. If full debug
info is emitted we would be able to show the original structure for B. Not
an issue that people are always going to run into, but it is a reason that
I like to have all the info complete in the current binary.

Sure - pretty substantial cost to pay (disk usage, link time, etc) & more
targeted features might be able to diagnose this more directly (& actually
diagnose it, rather than leaving it to the user to happen to look at the
debug info in a very specific way).

A DWARF linter (possibly built into a debugger) could catch /some/ cases of
the mismatch even with the minimal debug info (eg: if the offset of the
derived class's members don't make sense for the base class (if they
overlap with the base class's members because the base class got bigger, or
left a big gap because the base class got smaller, for example) it could
produce a warning)

A more tailored tool might just produce a table of type hashes of some kind.

- Dave