[BUG?] Confusion between translation units?

Hi,

At one point in the debugging session, I get this when I try to print
a particular value:

error: field '__r_' declared with incompatible types in different
translation units
('std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >' vs.
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >')
error: field '__r_' declared with incompatible types in different
translation units
('std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >' vs.
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >')
error: field '__r_' declared with incompatible types in different
translation units
('std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >' vs.
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >')
error: field '__r_' declared with incompatible types in different
translation units
('std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >' vs.
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >')
note: declared here with type
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >'
note: declared here with type
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >'
note: declared here with type
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >'
note: declared here with type
'std::__1::__compressed_pair<std::__1::basic_string<char,
std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
std::__1::allocator<char> >'

(which makes no sense at all; lhs and rhs are identical)

After that point, whatever I try to print returns this error.

What is going on?

Thanks.

Ram

In LLDB we create clang::ASTContext objects for the modules (executable and shared libraries), one for the target to contain the expression results, and one for each expression.

When we evaluate an expression we might do something like:

(lldb) expr a + b

where "a" is from liba.so and "b" is from libb.so. We must copy types from the clang::ASTContext for each module, so we will copy the type of "a" into the expression clang::ASTContext and we will also copy type "b" from the clang::ASTContext from libb.so into the expression clang::ASTContext. Many times we the same types, but one has more information in it. Like lets say both "a" and "b" are type "foo<int>". We can often end up with different definitions of "foo<int>" in liba.so and libb.so and when we try to copy the types, we first copy "foo<int>" from liba.so into the expression AST, and then we do the same with "b" from libb.so, but it notices that the types are the same level, so it tries to verify the types are the same. This often fails due to debug info being more complete in one of the shared libraries. One example is the compiler might omit the complete definition for a base class in libb.so where it has a complete definition for the base class in liba.so. When parsing types we must always give clang something it is happy with, so if we run into debug info that has a complete definition for "foo<int>", but it inherits from class "C". So the definition for "C" in liba.so is:

class C
{
public:
    C();
    ~C();
    int callme();
};

and "C" in "libb.so" is just a forward declaration:

class C;

But then int libb.so we must create a type for foo<int> but we can't since C isn't complete, but we do anyway by just saying C looks like:

class C
{
};

So now we have two types that differ, and importing both foo<int> types into the expression clang::ASTContext will fail. This happens a lot for C++ template classes because of the haphazard way that compilers generate debug info for templates. It could be a bug in the type importer where the two types are actually the same, but the type importer thinks they are different, but often it is because the types actually do differ.

One way to get around the compiler emitting forward declarations to base classes is to specify: -fno-limit-debug-info

This will disable the debug info minimizing feature and make the compiler emit more complete debug info and it might fix your problem.

Greg Clayton

Thanks for an excellent explanation.

Unfortunately, -fno-limit-debug-info did not fix the problem; and that
I don't see the problem with a gcc/gdb setup.

So what I'm doing is forward-declaring LLVM IR entities (like `Value',
`Type', `Function'), so that multiple downstream modules don't include
those LLVM headers potentially double-including global statics. I'm
trying to look inside an llvm::Function * in the debugger now, and it
fails.

I'm going to try building LLVM itself with -fno-limit-debug-info now.
Let me know if there are other things I can try.

Thanks.

Ram

Alright, let's try to fix the bug.

Let's work backward from the leaves: clang's ASTImporter.cpp:2979 and
AstImporter.cpp:3044. In the backtrace, what seems to be most relevant
is a call inside layoutRecordType, namely ClangASTSource.cpp:1709. The
codebase clearly shows efforts to emit "Please retry with
-fno-limit-debug-info", so I can infer that we intend to catch every
non-IsStructurallyEquivalent before it goes to clang, and emit a good
error message if best-effort fails. ClangASTContext.cpp is littered
with `omit_empty_base_classes`, so some machinery to handle forward
declarations properly is already in place.

Back to where we were debugging. GetCompleteDecl seems relevant, and
we aren't using its return value, so we have no way of telling if it's
a complete definition, right? Why am I guessing instead of
interactively debugging? Because the debugger is useless at this
stage, thanks to the same bug :slight_smile:

I think the bug is just a matter of missing a corner case, but I could
be wrong. Let me know your thoughts.

Ram

Alright, let's try to fix the bug.

Let's work backward from the leaves: clang's ASTImporter.cpp:2979 and
AstImporter.cpp:3044. In the backtrace, what seems to be most relevant
is a call inside layoutRecordType, namely ClangASTSource.cpp:1709. The
codebase clearly shows efforts to emit "Please retry with
-fno-limit-debug-info", so I can infer that we intend to catch every
non-IsStructurallyEquivalent before it goes to clang, and emit a good
error message if best-effort fails. ClangASTContext.cpp is littered
with `omit_empty_base_classes`, so some machinery to handle forward
declarations properly is already in place.

No this is just so you don't see a mess in the variable view. If you have empty base classes A, B, and C, and you have class D:

class D : public A, public B, public C
{
    int m_a;
};

You don't want to have your variable view look like:

d-.
  >- A
  >- B
  >- C
  \- m_a

You would rather see:

d-.
  \- m_a

So this is what imit_empty_base_classes aims to fix: showing tons of class structure that doesn't contribute to efficient variable viewing in the debugger. C++ classes in the "std" namespace have a ton of useless stuff that shows up if you don't do this:

my_str-.
       >- <std::allocator <blah blah>
       >- <std::allocator2 <blah blah>
       >- <std::allocator3 <blah blah>
       \- m_data

Back to where we were debugging. GetCompleteDecl seems relevant, and
we aren't using its return value, so we have no way of telling if it's
a complete definition, right? Why am I guessing instead of
interactively debugging? Because the debugger is useless at this
stage, thanks to the same bug :slight_smile:

Yes I have seen a bunch of problems like this on linux due to types being incomplete in the debug info (my guess). But I would like to verify that the manual DWARF indexing isn't to blame for this. We have great accelerator tables that the clang makes for us that actually have all of the info we need to find types and functions quickly, whereas all other platforms must run SymbolFileDWARF::Index() to manually index the DWARF.

I think the bug is just a matter of missing a corner case, but I could
be wrong. Let me know your thoughts.

I should be able to tell if you can send me an ELF file and say where you were and wait wasn't showing up correctly (which variables) in an exact code context (which file + line or exact line in a function). Then I can verify that SymbolFileDWARF::Index() is correctly indexing things so that we can find types and functions when we need them.

Greg

Greg Clayton wrote:

Yes I have seen a bunch of problems like this on linux due to types being incomplete in the debug info (my guess). But I would like to verify that the manual DWARF indexing isn't to blame for this. We have great accelerator tables that the clang makes for us that actually have all of the info we need to find types and functions quickly, whereas all other platforms must run SymbolFileDWARF::Index() to manually index the DWARF.

I'm on OS X, so none of this applies?

I should be able to tell if you can send me an ELF file and say where you were and wait wasn't showing up correctly (which variables) in an exact code context (which file + line or exact line in a function). Then I can verify that SymbolFileDWARF::Index() is correctly indexing things so that we can find types and functions when we need them.

I've been mulling over this problem: do you want to be able to run the
Mach-O, or do you just want to inspect it? The transitive closure of
the dependencies is atleast 30 .dylibs, and I can't take out that much
IP.

So what are we looking for exactly?

Thanks.

Ram

Greg Clayton wrote:

Yes I have seen a bunch of problems like this on linux due to types being incomplete in the debug info (my guess). But I would like to verify that the manual DWARF indexing isn't to blame for this. We have great accelerator tables that the clang makes for us that actually have all of the info we need to find types and functions quickly, whereas all other platforms must run SymbolFileDWARF::Index() to manually index the DWARF.

I'm on OS X, so none of this applies?

Yes, then you are using good accelerator tables.

I should be able to tell if you can send me an ELF file and say where you were and wait wasn't showing up correctly (which variables) in an exact code context (which file + line or exact line in a function). Then I can verify that SymbolFileDWARF::Index() is correctly indexing things so that we can find types and functions when we need them.

I've been mulling over this problem: do you want to be able to run the
Mach-O, or do you just want to inspect it? The transitive closure of
the dependencies is atleast 30 .dylibs, and I can't take out that much
IP.

I would just inspect a type for a variable that isn't showing up from a specific shared library. If you can send just the dSYM file for a library, and give me a specific function from a specific file and what variables were not showing up, I can inspect the DWARF and see why the type isn't showing up. So just a single dylib + its dSYM file. If you don't have a dSYM file next to your libfoo.dylib, you can easily create one:

% dsymutil libfoo.dylib

This will create a libfoo.dylib.dSYM file, which is linked DWARF from all the .o files that made the dylib.

So if you can send me a copy of the dSYM file and a file + line (foo.cpp:11), or function + compile unit (function is "int foo(int)" inside "foo.cpp") and let me know which variable wasn't able to be expanded (name of variable), I should be able to tell you more.

Greg

[Quoting entire email for the benefit of everyone else]

Ok, so try this on all of your dSYM files:

1 - load the dsym file into lldb:

% xcrun lldb libmwcgir_vm_rt.dylib.dSYM/Contents/Resources/DWARF/libmwcgir_vm_rt.dylib
(lldb) image lookup -t "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >"
2 matches found in /Volumes/work/gclayton/Downloads/libmwcgir_vm_rt.dylib.dSYM/Contents/Resources/DWARF/libmwcgir_vm_rt.dylib:
id = {0x000211dc}, name = "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", qualified = "llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", byte-size = 24, decl = ilist.h:313, clang_type = "class iplist : public llvm::ilist_traits<llvm::Function> {
    llvm::Function *Head;
    llvm::Function *getTail();
    const llvm::Function *getTail() const;
    void setTail(llvm::Function *) const;
    void CreateLazySentinel() const;
    static bool op_less(llvm::Function &, llvm::Function &);
    static bool op_equal(llvm::Function &, llvm::Function &);
    iplist(const llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void operator=(const llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    iplist();
    ~iplist();
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;
    reverse_iterator rbegin();
    const_reverse_iterator rbegin() const;
    reverse_iterator rend();
    const_reverse_iterator rend() const;
    size_type max_size() const;
    bool empty() const;
    reference front();
    const_reference front() const;
    reference back();
    const_reference back() const;
    void swap(llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    iterator insert(iterator, llvm::Function *);
    iterator insertAfter(iterator, llvm::Function *);
    llvm::Function *remove(iterator &);
    llvm::Function *remove(const iterator &);
    iterator erase(iterator);
    void clearAndLeakNodesUnsafely();
    void transfer(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator, iterator);
    size_type size() const;
    iterator erase(iterator, iterator);
    void clear();
    void push_front(llvm::Function *);
    void push_back(llvm::Function *);
    void pop_front();
    void pop_back();
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator);
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator, iterator);
    void erase(const llvm::Function &);
    void unique();
    void merge(llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void sort();
}
"
id = {0x001a658a}, name = "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", qualified = "llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", byte-size = 24, decl = ilist.h:313, clang_type = "class iplist : public llvm::ilist_traits<llvm::Function> {
    llvm::Function *Head;
    llvm::Function *getTail();
    const llvm::Function *getTail() const;
    void setTail(llvm::Function *) const;
    void CreateLazySentinel() const;
    static bool op_less(llvm::Function &, llvm::Function &);
    static bool op_equal(llvm::Function &, llvm::Function &);
    iplist(const llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void operator=(const llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    iplist();
    ~iplist();
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;
    reverse_iterator rbegin();
    const_reverse_iterator rbegin() const;
    reverse_iterator rend();
    const_reverse_iterator rend() const;
    size_type max_size() const;
    bool empty() const;
    reference front();
    const_reference front() const;
    reference back();
    const_reference back() const;
    void swap(llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    iterator insert(iterator, llvm::Function *);
    iterator insertAfter(iterator, llvm::Function *);
    llvm::Function *remove(iterator &);
    llvm::Function *remove(const iterator &);
    iterator erase(iterator);
    void clearAndLeakNodesUnsafely();
    void transfer(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator, iterator);
    size_type size() const;
    iterator erase(iterator, iterator);
    void clear();
    void push_front(llvm::Function *);
    void push_back(llvm::Function *);
    void pop_front();
    void pop_back();
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator);
    void splice(iterator, llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &, iterator, iterator);
    void erase(const llvm::Function &);
    void unique();
    void merge(llvm::iplist<llvm::Function, llvm::ilist_traits<llvm::Function> > &);
    void sort();
}
"

Do the same thing for any other shared libraries that you have and compare the data in quotes of the 'clang_type = "<copy>"' and save the <copy> to a file. See if any of them differ from each other.

What is interesting here is that we have two of the same copies of this type in the same file, this shouldn't happen. I looked into why this is happening and found the reason:

Looking at the two types that were found above we see two types:

id = {0x000211dc}, name = "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", ...
id = {0x001a658a}, name = "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >", ...

The "id" in this case is actually the DWARF offset: 0x000211dc and 0x001a658a which we can use to dump the DWARF in the dSYM file:

% dwarfdump --show-parents --debug-info=0x000211dc libmwcgir_vm_rt.dylib.dSYM/
----------------------------------------------------------------------
File: libmwcgir_vm_rt.dylib.dSYM/Contents/Resources/DWARF/libmwcgir_vm_rt.dylib (x86_64)
----------------------------------------------------------------------
.debug_info[0x000211dc]:

0x00017fa3: TAG_compile_unit [1] *
             AT_producer( "Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)" )
             AT_language( DW_LANG_C_plus_plus )
             AT_name( "vm/CgInstDbgPrint.cpp" )
             AT_stmt_list( 0x000012a1 )
             AT_comp_dir( "/mathworks/devel/sbs/34/rramacha.idivide-final-lap/matlab/src/cgir_vm_rt" )
             AT_low_pc( 0x0000000000002ef0 )

0x00017fbe: TAG_namespace [2] *
                 AT_name( "llvm" )
                 AT_decl_file( "../../../3p_mirror/maci64/LLVM/include/llvm/ADT/iterator_range.h" )
                 AT_decl_line( 24 )

0x000211dc: TAG_class_type [3] *
                     AT_name( "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >" )
                     AT_byte_size( 0x18 )
                     AT_decl_file( "../../../3p_mirror/maci64/LLVM/include/llvm/ADT/ilist.h" )
                     AT_decl_line( 313 )

% dwarfdump --show-parents --debug-info=0x001a658a libmwcgir_vm_rt.dylib.dSYM/
----------------------------------------------------------------------
File: libmwcgir_vm_rt.dylib.dSYM/Contents/Resources/DWARF/libmwcgir_vm_rt.dylib (x86_64)
----------------------------------------------------------------------
.debug_info[0x001a658a]:

0x00172c2a: TAG_compile_unit [185] *
             AT_producer( "Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)" )
             AT_language( DW_LANG_C_plus_plus )
             AT_name( "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/lib/Transforms/ObjCARC/ObjCARCOpts.cpp" )
             AT_low_pc( 0x0000000000027460 )
             AT_stmt_list( 0x00013f74 )
             AT_comp_dir( "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/build/lib/Transforms/ObjCARC" )

0x001a601c: TAG_namespace [2] *
                 AT_name( "llvm" )
                 AT_decl_file( "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/include/llvm/IR/SymbolTableListTraits.h" )
                 AT_decl_line( 30 )

0x001a658a: TAG_class_type [3] *
                     AT_name( "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >" )
                     AT_byte_size( 0x18 )
                     AT_decl_file( "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/include/llvm/ADT/ilist.h" )

LLDB uniques types based off of decl file + decl line + full qualified named + byte size. Note that the decl file is "../../../3p_mirror/maci64/LLVM/include/llvm/ADT/ilist.h" and "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/include/llvm/ADT/ilist.h" for the other... This is why we incorrectly create two types, because the keys (decl file) we are using to unique the types are not the same.

So the easy way to fix your problem is fix your build system so that it doesn't do this. It seems that your build is using a relative path for one file "vm/CgInstDbgPrint.cpp" and a full path for another: "/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/lib/Transforms/ObjCARC/ObjCARCOpts.cpp". If you can fix your build system to always use full paths for source files this problem might go away (if no other competing versions of "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >" exist anywhere).

The other bad thing is even after you normalize the paths you are comparing:

/mathworks/devel/sbs/34/rramacha.idivide-final-lap/3p_mirror/maci64/LLVM/include/llvm/ADT/ilist.h
/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/include/llvm/ADT/ilist.h

Are you pulling in data from two different copies of LLVM in your project? Or is something in here symlink to the other somewhere?

Excellent find. Yes, 3p_mirror is a symlink to the 3p-tmw-osx location.

So to sum up: LLDB uniques types by decl file + decl line + byte size + fully qualified typename and that is failing because the decl files are different for these two types from the debug infos point of view. And these types could actually differ since they come from different files and we need to allow this so that we can display these types.

I'm slightly confused: can't we ask Clang to tell us if the two types
are structurally equivalent? Is this some short-cut? We need to
account for symlinks then, it seems.

Thanks!

Ram

Are you pulling in data from two different copies of LLVM in your project? Or is something in here symlink to the other somewhere?

Excellent find. Yes, 3p_mirror is a symlink to the 3p-tmw-osx location.

So to sum up: LLDB uniques types by decl file + decl line + byte size + fully qualified typename and that is failing because the decl files are different for these two types from the debug infos point of view. And these types could actually differ since they come from different files and we need to allow this so that we can display these types.

I'm slightly confused: can't we ask Clang to tell us if the two types
are structurally equivalent? Is this some short-cut? We need to
account for symlinks then, it seems.

Yep. Try replacing Declaration::Compare() in lldb/source/Symbol/Declaration.cpp. You will need to include:

#include "lldb/Host/FileSystem.h"

Then replace Declaration::Compare() with this:

int
Declaration::Compare(const Declaration& a, const Declaration& b)
{
    int result = FileSpec::Compare(a.m_file, b.m_file, true);
    if (result)
    {
        int symlink_result = result;
        if (a.m_file.GetFilename() == b.m_file.GetFilename())
        {
            // Check if the directories in a and b are symlinks to each other
            FileSpec resolved_a;
            FileSpec resolved_b;
            if (FileSystem::ResolveSymbolicLink(a.m_file, resolved_a).Success() &&
                FileSystem::ResolveSymbolicLink(b.m_file, resolved_b).Success())
            {
                symlink_result = FileSpec::Compare(resolved_a, resolved_b, true);
            }
        }
        if (symlink_result != 0)
            return symlink_result;
    }
    if (a.m_line < b.m_line)
        return -1;
    else if (a.m_line > b.m_line)
        return 1;
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
    if (a.m_column < b.m_column)
        return -1;
    else if (a.m_column > b.m_column)
        return 1;
#endif
    return 0;
}

Then try running and let me know what your results are!

The other bad thing is even after you normalize the paths you are comparing:

/mathworks/devel/sbs/34/rramacha.idivide-final-lap/3p_mirror/maci64/LLVM/include/llvm/ADT/ilist.h
/mathworks/devel/sandbox/rramacha/3p-tmw-osx/3p/derived/maci64/LLVM/llvm/include/llvm/ADT/ilist.h

Are you pulling in data from two different copies of LLVM in your project? Or is something in here symlink to the other somewhere?

Excellent find. Yes, 3p_mirror is a symlink to the 3p-tmw-osx location.

So to sum up: LLDB uniques types by decl file + decl line + byte size + fully qualified typename and that is failing because the decl files are different for these two types from the debug infos point of view. And these types could actually differ since they come from different files and we need to allow this so that we can display these types.

I'm slightly confused: can't we ask Clang to tell us if the two types
are structurally equivalent?

How would you temporarily make a new version of this type so that you can compare it to the one in the clang::ASTContext for the DWARF file? Make another clang::ASTContext for each type and then try to construct the type in there along with any other types that are needed and then compare the types in the two different clang::ASTContext objects and if they compare don't copy the type? And if they don't copy the type from one AST to the other? That would be way too expensive and time consuming.

Is this some short-cut? We need to account for symlinks then, it seems.

So in LLDB we have 1 AST context per executable file and we create _one_ instance of a type in each AST context based on the equivalent of C++ ODR (only one copy of a type at a given decl context). Why? Lets see how many copies of "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >" you actually have in your one shared library: 152 to be exact (see output below). All full definitions of the same thing over and over and over and over and over. This is how current compilers emit debug info: one copy per source file. So you end up with millions of copies of types all over the place. So to deal with this since we have one AST context per DWARF file, we create the type once and only once. See the results:

% dwarfdump --apple-types="iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >" libmwcgir_vm_rt.dylib.dSYM/ -r0

Hm, there seems to be something seriously wrong. I triple-checked
everything, but Declaration::Compare is not even called when the error
is triggered! How should we proceed now?

So first, an addendum: I found a way to make the project build without
using a symlink, and use a direct reference instead. The problem still
persists. It may be that symlink is one of the problems, but it is
certainly not the only problem.

int
Declaration::Compare(const Declaration& a, const Declaration& b)
{
    int result = FileSpec::Compare(a.m_file, b.m_file, true);
    if (result)

Wait, won't FileSpec::Compare be true iff a.m_file is the same as
b.m_file (excluding symlink resolution)? If so, why are we putting the
symlink-checking logic in the true branch of the original
FileSpec::Compare? Aren't we expanding the scope of what we match,
instead of narrowing it?

    {
        int symlink_result = result;
        if (a.m_file.GetFilename() == b.m_file.GetFilename())
        {
            // Check if the directories in a and b are symlinks to each other
            FileSpec resolved_a;
            FileSpec resolved_b;
            if (FileSystem::ResolveSymbolicLink(a.m_file, resolved_a).Success() &&
                FileSystem::ResolveSymbolicLink(b.m_file, resolved_b).Success())
            {
                symlink_result = FileSpec::Compare(resolved_a, resolved_b, true);

I'm confused. Shouldn't the logic be "check literal equality; if true,
return immediately; if not, check equality with symlink resolution"?

            }
        }
        if (symlink_result != 0)
            return symlink_result;
    }
    if (a.m_line < b.m_line)
        return -1;
    else if (a.m_line > b.m_line)
        return 1;
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
    if (a.m_column < b.m_column)
        return -1;
    else if (a.m_column > b.m_column)
        return 1;
#endif
    return 0;
}

Here's my version of the patch, although I'm not sure when the code
will be reached.

int
Declaration::Compare(const Declaration& a, const Declaration& b)
{
    int result = FileSpec::Compare(a.m_file, b.m_file, true);
    if (result)
        return result;
    if (a.m_file.GetFilename() == b.m_file.GetFilename()) {
        // Check if one of the directories is a symlink to the other
        int symlink_result = result;
        FileSpec resolved_a;
        FileSpec resolved_b;
        if (FileSystem::ResolveSymbolicLink(a.m_file, resolved_a).Success() &&
            FileSystem::ResolveSymbolicLink(b.m_file, resolved_b).Success())
        {
            symlink_result = FileSpec::Compare(resolved_a, resolved_b, true);
            if (symlink_result)
                return symlink_result;
        }
    }
    if (a.m_line < b.m_line)
        return -1;
    else if (a.m_line > b.m_line)
        return 1;
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
    if (a.m_column < b.m_column)
        return -1;
    else if (a.m_column > b.m_column)
        return 1;
#endif
    return 0;
}

If you're confident that this solves a problem, I can send it as a
code review or something (and set up git-svn, sigh).

I seen very similar error messages when debugging an application compiled with fission (split/dwo) debug info on Linux with a release version of LLDB compiled from ToT. When I tested the same with a debug or with a release+assert build I hit some assertion inside clang. It might worth a try to check if the same is happening in your case as it might help finding out the root cause.

In my case the issue is that we somehow end up with 2 FilldDecl object for a given field inside one of the CXXRecordDecl object and then when we are doing a pointer based lookup we will go wrong. I haven’t figured out why it is happening and haven’t manage to reproduce it reliably either, but plan to look into it in the near future if nobody beats me.

Tamas

So first, an addendum: I found a way to make the project build without
using a symlink, and use a direct reference instead. The problem still
persists. It may be that symlink is one of the problems, but it is
certainly not the only problem.

int
Declaration::Compare(const Declaration& a, const Declaration& b)
{
   int result = FileSpec::Compare(a.m_file, b.m_file, true);
   if (result)

Wait, won't FileSpec::Compare be true iff a.m_file is the same as
b.m_file (excluding symlink resolution)?

No, it returns -1 for less than, 0 for equal and +1 for greater than.

If so, why are we putting the symlink-checking logic in the true branch of the original
FileSpec::Compare?

My concern is that it is expensive to be stat'ing files all the time and that it dirties the file cache on your system.

Also many times the FileSpec objects are remote paths. What happens when you build your project and send someone the dSYM file? You sent me your dSYM file and I have no way to know if your two types were the same since they had different paths and I have no way to resolve your symbolic links. I would consider the types different.

We could modify FileSpec::Compare, but I would like to try to limit this to only happen for symbolic links if we do. We have also had problems with paths like:

/tmp/foo/../bar.txt
/tmp/bar.txt

If they aren't resolved they won't compare correctly. Same with:

/tmp/./bar.txt
/tmp/bar.txt

We don't want to go changing the FileSpec object on people by resolving the path all the time. Because sometimes people don't want the path changing since they might have other FileSpec objects that are encoded with "/tmp/./bar.txt" and they will expect a FileSpec object they create to maintain what they put into it. So if we can't update the FileSpec objects, then our compares would constantly have to try to "stat()" objects they may or may not have come from the current system. So we actually can't resolve them because of that.

Aren't we expanding the scope of what we match,
instead of narrowing it?

   {
       int symlink_result = result;
       if (a.m_file.GetFilename() == b.m_file.GetFilename())
       {
           // Check if the directories in a and b are symlinks to each other
           FileSpec resolved_a;
           FileSpec resolved_b;
           if (FileSystem::ResolveSymbolicLink(a.m_file, resolved_a).Success() &&
               FileSystem::ResolveSymbolicLink(b.m_file, resolved_b).Success())
           {
               symlink_result = FileSpec::Compare(resolved_a, resolved_b, true);

I'm confused. Shouldn't the logic be "check literal equality; if true,
return immediately; if not, check equality with symlink resolution"?

These are compare routines that return -1, 0 or 1.

           }
       }
       if (symlink_result != 0)
           return symlink_result;
   }
   if (a.m_line < b.m_line)
       return -1;
   else if (a.m_line > b.m_line)
       return 1;
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
   if (a.m_column < b.m_column)
       return -1;
   else if (a.m_column > b.m_column)
       return 1;
#endif
   return 0;
}

Here's my version of the patch, although I'm not sure when the code
will be reached.

int
Declaration::Compare(const Declaration& a, const Declaration& b)
{
   int result = FileSpec::Compare(a.m_file, b.m_file, true);
   if (result)
       return result;

The code in the if statement below is useless. If we reach this location, "result" is zero and the two file specs are equal.

   if (a.m_file.GetFilename() == b.m_file.GetFilename()) {
       // Check if one of the directories is a symlink to the other
       int symlink_result = result;
       FileSpec resolved_a;
       FileSpec resolved_b;
       if (FileSystem::ResolveSymbolicLink(a.m_file, resolved_a).Success() &&
           FileSystem::ResolveSymbolicLink(b.m_file, resolved_b).Success())
       {
           symlink_result = FileSpec::Compare(resolved_a, resolved_b, true);
           if (symlink_result)
               return symlink_result;
       }
   }
   if (a.m_line < b.m_line)
       return -1;
   else if (a.m_line > b.m_line)
       return 1;
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
   if (a.m_column < b.m_column)
       return -1;
   else if (a.m_column > b.m_column)
       return 1;
#endif
   return 0;
}

If you're confident that this solves a problem, I can send it as a
code review or something (and set up git-svn, sigh).

We actually can't really do this because we might have a dSYM file from another system that we are debugging locally so we can't actually rely on symlink resolving. We could try to ignore the path to the file and just make the decl file use only the basename. But this can cause problems when you actually have two copies of "Point.h" that have differing versions of "Point" like "struct Point { int x; int y; };" and "struct Point { int XXX; int YYY; };". The would both be defined in a file Point.h and lets say they are both defined on line 10, but they do differ.

So the code you want to look into to see where the uniquing is in DWARFASTParserClang.cpp and the code is:

                    if (type_name_const_str &&
                        dwarf->GetUniqueDWARFASTTypeMap().Find (type_name_const_str,
                                                                die,
                                                                decl,
                                                                byte_size_valid ? byte_size : -1,
                                                                *unique_ast_entry_ap))
                    {
                        // We have already parsed this type or from another
                        // compile unit. GCC loves to use the "one definition
                        // rule" which can result in multiple definitions
                        // of the same class over and over in each compile
                        // unit.
                        type_sp = unique_ast_entry_ap->m_type_sp;
                        if (type_sp)
                        {
                            dwarf->GetDIEToType()[die.GetDIE()] = type_sp.get();
                            return type_sp;
                        }
                    }

You can debug why this code fails. It is actually probably calling:

bool
lldb_private::operator == (const Declaration &lhs, const Declaration &rhs)
{
#ifdef LLDB_ENABLE_DECLARATION_COLUMNS
    if (lhs.GetColumn () == rhs.GetColumn ())
        if (lhs.GetLine () == rhs.GetLine ())
            return lhs.GetFile() == rhs.GetFile();
#else
    if (lhs.GetLine () == rhs.GetLine ())
        return FileSpec::Equal(lhs.GetFile(),rhs.GetFile(), true, true);
#endif
    return false;
}

So avoid any fixes to Declaration::Compare(). The FileSpec::Equal() will compare the following two correctly:

/tmp/foo/../bar.txt
/tmp/bar.txt

But probably wouldn't catch:

/tmp/./bar.txt
/tmp/bar.txt

Or your two paths... I would rather not start doing any extra file stat calls on systems since we can't trust the file paths originated on the current system, so I am not sure what the right solution is and would leave things as is.

It sounds like you fixed your symlink issue. So a few questions:
1 - do you have just one type now in your libmwcgir_vm_rt.dylib.dSYM when you type:

(lldb) image lookup -t "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >"

If so, then you will need to find other competing definitions in other shared libraries and see if any of them differ by comparing the full "clang_type" value.

Hi,

....

Atleast, can we have lldb report a nicer error?

There is conflicting DWARF information for type ilist...:
/sandbox/rramacha/3p/derived/List.h
/sandbox/rramacha/3p/install/List.h

/sandbox/rramacha/idivide/bin/libmwcgir_vm.so is to blame.

This is likely a problem with your build scripts. In any case, the
compiler is responsible for this mess.

It sounds like you fixed your symlink issue. So a few questions:
1 - do you have just one type now in your libmwcgir_vm_rt.dylib.dSYM when you type:

(lldb) image lookup -t "iplist<llvm::Function, llvm::ilist_traits<llvm::Function> >"

If so, then you will need to find other competing definitions in other shared libraries and see if any of them differ by comparing the full "clang_type" value.

Yeah, after resolving the symlink, I realized that there are two
different paths. I'm attempting to fix my build system.

I guess LLDB was just helping your resolve build issues and make your product better... :slight_smile:

Let us know how things go once you get your build fixed.

Greg

Okay, I'm stuck again. Let's back up and see what's happening:

~/src$ git clone llvm/
~/src$ mkdir llvm-build/
~/src/llvm-build$ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug ../llvm
~/src/llvm-build$ ninja

Now, ~/src/llvm-build/lib/libLLVMCore.a contains DWARF information
that points to files ~/src/llvm/include/llvm/ADT/ilist.h,
~/src/llvm/lib/IR/Core.cpp etc.

~/src/llvm-build$ ninja install

The *.a files are copied to /usr/local/lib, but the *.h files are also
copied to /usr/local/include/llvm. The DWARF information is not
rewritten as part of the "install".

~/src/fooapp$ clang++ -g -I/usr/local/include -L/usr/local/lib ...

The fooapp binary is going to contain DWARF information pointing to
/usr/local/include/llvm/ADT/ilist.h (because I did -I) _and_
~/src/llvm/include/llvm/ADT/ilist.h (because of libLLVMCore.a).

lldb crashes. gdb hums along just fine in the face of this conflict
(the codebase is enormous; sorry, I couldn't find out how exactly).

Now, I cannot "fix" my build by -I'ing ~/src/llvm/include because some
essential headers are build artifacts. The only thing I can do is to
try and put a plist into the dSYM (which doesn't seem to work either,
or I'm doing something wrong). In the general case, there's nothing
special about my build: this problem needs to be solved in lldb for
the general audience.

Please advise.

Thanks.

Ram

So when LLDB parses the DW_AT_decl_file attributes, it uses the files from the line table for the current compile unit. Each of those files is passed through the module source remapping function:

bool
Module::RemapSourceFile (const char *path, std::string &new_path) const
{
    Mutex::Locker locker (m_mutex);
    return m_source_mappings.RemapPath(path, new_path);
}

So if you have a plist, it should be being added to this m_source_mappings list. You might want to debug what is happening by stepping through:

SymbolVendorMacOSX::CreateInstance()

for the dSYM file that is being used by your build. Inside the "if (XMLDocument::XMLEnabled())" statement is where we get the path remappings.

A quick note on the plist files in the dSYM: you must create one for each UUID plist for each architecture slice inside the dSYM bundle:

/tmp/foo.dSYM/Contents/Resources/9FE9CADA-7460-3F80-B881-42443C5FA2E1.plist
/tmp/foo.dSYM/Contents/Resources/DF977301-4A63-32ED-9939-1EE3122D18D4.plist

And an example plist for you would need to look like:

% cat /tmp/foo.dSYM/Contents/Resources/9FE9CADA-7460-3F80-B881-42443C5FA2E1.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd&quot;&gt;
<plist version="1.0">
<dict>
  <key>DBGBuildSourcePath</key>
  <string>~/src/llvm/include/llvm</string>
  <key>DBGSourcePath</key>
  <string>/usr/local/include/llvm</string>
</dict>
</plist>

After speaking with Adrian Prantl over here we came up with a solution. Currently we do this when uniquing types:

// Only try and unique the type if it has a name.
if (type_name_const_str &&
    dwarf->GetUniqueDWARFASTTypeMap().Find (type_name_const_str,
                                            die,
                                            decl,
                                            byte_size_valid ? byte_size : -1,
                                            *unique_ast_entry_ap))
{
    // We have already parsed this type or from another
    // compile unit. GCC loves to use the "one definition
    // rule" which can result in multiple definitions
    // of the same class over and over in each compile
    // unit.
    type_sp = unique_ast_entry_ap->m_type_sp;
    if (type_sp)
    {
        dwarf->GetDIEToType()[die.GetDIE()] = type_sp.get();
        return type_sp;
    }
}

We try to find a type by name and Declaration. We need to do this for all language except for C++ because C++ has the one definition rule where a type must be unique in a decl context (all types llvm::foo::bar must be the same). Since this only applies to C++ we could do something like:

if (type_name_const_str)
{
  LanguageType die_language = die.GetLanguage();
  bool handled = false;
  if (Language::LanguageIsCPlusPlus(die_language))
  {
    std::string qualified_name;
    if (die.GetQualifiedName (qualified_name))
    {
      handled = true;
      ConstString const_qualified_name(qualified_name);
      if (dwarf->GetUniqueDWARFASTTypeMap().Find (const_qualified_name,
                                                  die,
                                                  Declaration(),
                                                  byte_size_valid ? byte_size : -1,
                                                  *unique_ast_entry_ap))
      {
        type_sp = unique_ast_entry_ap->m_type_sp;
        if (type_sp)
        {
          dwarf->GetDIEToType()[die.GetDIE()] = type_sp.get();
          return type_sp;
        }
      }
    }
  }
  
  if (!handled)
  {
    if (dwarf->GetUniqueDWARFASTTypeMap().Find (type_name_const_str,
                                                die,
                                                decl,
                                                byte_size_valid ? byte_size : -1,
                                                *unique_ast_entry_ap))
    {
      type_sp = unique_ast_entry_ap->m_type_sp;
      if (type_sp)
      {
        dwarf->GetDIEToType()[die.GetDIE()] = type_sp.get();
        return type_sp;
      }
    }
  }
}

Note that for C++ we get the fully qualified name and we pass in an empty Declaration() so they all will compare to the same thing. This would solve our current issue. We would also need to add the items to this map in the same way: for C++ get the fully qualified name and add the entry to the map with the fully qualified name and an empty Declaration...

Can you try this solution out and see if it fixes our issues?

Greg

Greg,

Note that for C++ we get the fully qualified name and we pass in an empty Declaration() so they all will compare to the same thing. This would solve our current issue. We would also need to add the items to this map in the same way: for C++ get the fully qualified name and add the entry to the map with the fully qualified name and an empty Declaration...

I didn't realize that the solution would be this simple, conceptually.

Can you try this solution out and see if it fixes our issues?

This does seem to work for one library but not another.

My best guess would be that the solution works when dynamically
linking to conflicting symbols, but not when the conflicting symbols
are statically linked. Does that make sense?