DWARFASTParserClang and DW_TAG_typedef for anonymous structs

Hi All

I'm hoping that someone might be able to give me some direction
regarding `Type` resolution from DWARF informationfor functions taking
anonymous structs hidden behind a typedef

e.g.

typedef struct {
    int i;
    float f;
} my_untagged_struct;

void __attribute__((noinline)) myfunc(my_untagged_struct *s)
{
    s->i = 0;
    s->f = 3.14f;
}

int main()
{
    my_untagged_struct s;
    myfunc(&s);
    return 0;
}

I [recently reported a
bug](https://llvm.org/bugs/show_bug.cgi?id=26790) relating to the
clang expression evaluator no longer being able to resolve calls to
functions with arguments to typedefed anonymous structs, after a cleanup
to the expression parsing code.
I was perfectly wrong in my assumptions about the cause of the bug, and
after some more digging, I think I've tracked it down to a section of
code in `DWARFASTParserClang::ParseTypeFromDWARF`.

(DWARFASTParserClang::ParseTypeFromDwarf:254)

switch (tag)
{
    case DW_TAG_typedef:
        // Try to parse a typedef from the DWO file first as modules
        // can contain typedef'ed structures that have no names like:
        //
        //  typedef struct { int a; } Foo;
        //
        // In this case we will have a structure with no name and a
        // typedef named "Foo" that points to this unnamed structure.
        // The name in the typedef is the only identifier for the
struct, // so always try to get typedefs from DWO files if possible.
        //
        // The type_sp returned will be empty if the typedef doesn't
exist // in a DWO file, so it is cheap to call this function just to
check. //
        // If we don't do this we end up creating a TypeSP that says
this // is a typedef to type 0x123 (the DW_AT_type value would be 0x123
        // in the DW_TAG_typedef), and this is the unnamed structure
type. // We will have a hard time tracking down an unnammed structure
        // type in the module DWO file, so we make sure we don't get
into // this situation by always resolving typedefs from the DWO file.
        type_sp = ParseTypeFromDWO(die, log);
        if (type_sp)
            return type_sp;
    LLVM_FALLTHROUGH

In my case, the type information for the typedef is included within the
main executable's DWARF rather than an external .dwo file (snippet from
the DWARF included the end of this message), and therefore the `case`
for `DW_TAG_typedef` falls through as `ParseTypeFromDWO` returns a NULL
value.

As this is code I'm not familiar with, I'd appreciate if any one on the
list was able to give some guidance as to the best way to resolve this
issue, so that `ClangExpressionDeclMap::FindExternalVisibleDecls` can
correctly resolve calls to functions taking typedef names to anonymous
structs. I'm happy to take a whack at implementing this feature, but
I'm a bit stuck as to how to resolve this type given the current DIE
object.

Any help or guidance on where to start with this would be really
helpful.

All the best

Luke

So we ran into a problem where we had anonymous structs in modules. They have no name, so we had no way to say "module A, please give me a struct named... nothing in the namespace 'foo'". Obviously this doesn't work, so we always try to make sure a typedef doesn't come from a module first, by asking us to get the typedef from the DWO file:

type_sp = ParseTypeFromDWO(die, log);

If this fails, it just means we have the typedef in hand. If I compile your example I end up with:

0x0000000b: TAG_compile_unit [1] *
             AT_producer( "Apple LLVM version 8.0.0 (clang-800.0.5.3)" )
             AT_language( DW_LANG_C99 )
             AT_name( "main.c" )
             AT_stmt_list( 0x00000000 )
             AT_comp_dir( "/tmp" )
             AT_low_pc( 0x0000000100000f60 )
             AT_high_pc( 0x0000000100000fb0 )

0x0000002e: TAG_subprogram [2] *
                 AT_low_pc( 0x0000000100000f60 )
                 AT_high_pc( 0x0000000100000f85 )
                 AT_frame_base( rbp )
                 AT_name( "myfunc" )
                 AT_decl_file( "/private/tmp/main.c" )
                 AT_decl_line( 6 )
                 AT_prototyped( 0x01 )
                 AT_external( 0x01 )

0x00000049: TAG_formal_parameter [3]
                     AT_location( fbreg -8 )
                     AT_name( "s" )
                     AT_decl_file( "/private/tmp/main.c" )
                     AT_decl_line( 6 )
                     AT_type( {0x0000008c} ( my_untagged_struct* ) )

0x00000057: NULL

0x00000058: TAG_subprogram [4] *
                 AT_low_pc( 0x0000000100000f90 )
                 AT_high_pc( 0x0000000100000fb0 )
                 AT_frame_base( rbp )
                 AT_name( "main" )
                 AT_decl_file( "/private/tmp/main.c" )
                 AT_decl_line( 12 )
                 AT_type( {0x00000085} ( int ) )
                 AT_external( 0x01 )

0x00000076: TAG_variable [5]
                     AT_location( fbreg -16 )
                     AT_name( "s" )
                     AT_decl_file( "/private/tmp/main.c" )
                     AT_decl_line( 14 )
                     AT_type( {0x00000091} ( my_untagged_struct ) )

0x00000084: NULL

0x00000085: TAG_base_type [6]
                 AT_name( "int" )
                 AT_encoding( DW_ATE_signed )
                 AT_byte_size( 0x04 )

0x0000008c: TAG_pointer_type [7]
                 AT_type( {0x00000091} ( my_untagged_struct ) )

0x00000091: TAG_typedef [8]
                 AT_type( {0x0000009c} ( struct ) )
                 AT_name( "my_untagged_struct" )
                 AT_decl_file( "/private/tmp/main.c" )
                 AT_decl_line( 4 )

0x0000009c: TAG_structure_type [9] *
                 AT_byte_size( 0x08 )
                 AT_decl_file( "/private/tmp/main.c" )
                 AT_decl_line( 1 )

0x000000a0: TAG_member [10]
                     AT_name( "i" )
                     AT_type( {0x00000085} ( int ) )
                     AT_decl_file( "/private/tmp/main.c" )
                     AT_decl_line( 2 )
                     AT_data_member_location( +0 )

0x000000ae: TAG_member [10]
                     AT_name( "f" )
                     AT_type( {0x000000bd} ( float ) )
                     AT_decl_file( "/private/tmp/main.c" )
                     AT_decl_line( 3 )
                     AT_data_member_location( +4 )

0x000000bc: NULL

0x000000bd: TAG_base_type [6]
                 AT_name( "float" )
                 AT_encoding( DW_ATE_float )
                 AT_byte_size( 0x04 )

0x000000c4: NULL

Note that the typedef is at 0x00000091, and it is a typedef to 0x0000009c. Also note that the DWARF DIE at 0x0000009c is a complete definition as it has children describing its members and 0x0000009c doesn't have a DW_AT_declaration(1) attribute. Is this how your DWARF looks for your stuff? The DWARF you had looked like:

0x0000005c: DW_TAG_typedef [6]
               DW_AT_name( "my_untagged_struct" )
               DW_AT_decl_file("/home/luke/main.cpp")
               DW_AT_decl_line(4)
               DW_AT_type({0x0000002d})

What did the type at 0x0000002d look like? Similar to 0x0000009c in my DWARF I presume?

The DWARFASTParserClang class is responsible for making up a clang type in the clang::ASTContext for this typedef. What will happen in the code where the flow falls through is the we will make a lldb_private::Type that says "I am a typedef to type whose user ID is 0x0000002d (in your example)". A NULL pointer should not be returned from the DWARFASTParserClang::ParseTypeFromDWARF() function. If it is, please step through and figure out why. I compiled your example and did the following:

% lldb a.out
(lldb) b main
(lldb) r
Process 89808 launched: '/private/tmp/a.out' (x86_64)
Process 89808 stopped
* thread #1: tid = 0xf7473, 0x0000000100000fa3 a.out main + 19, stop reason = breakpoint 1.1, queue = com.apple.main-thread
    frame #0: 0x0000000100000fa3 a.out main + 19 at main.c:15
   12 int main()
   13 {
   14 my_untagged_struct s;
-> 15 myfunc(&s);
   16 return 0;
   17 }
(lldb) p myfunc(&s)
(lldb)

So I was able to call this function. Are you not able to call it?

Likewise if I step into this function I can see the variable:

(lldb) s
(lldb) fr var s
(my_untagged_struct *) s = 0x00007fff5fbff8d0
(lldb) fr var *s
(my_untagged_struct) *s = (i = 0, f = 3.1400001)

So to sum up: when we parse the DW_TAG_typedef in DWARFASTParserClang::ParseTypeFromDWARF(), we should return a valid TypeSP that contains a valid pointer. If that isn't happening, that is a bug. Feel free to send me the example binary and I can figure things out if you have any trouble. I wrote all of this code so I am quite familiar with it.

Greg Clayton

Hi Greg

First of all thanks for taking the time to help out with this.

So we ran into a problem where we had anonymous structs in modules. They have no name, so we had no way to say "module A, please give me a struct named... nothing in the namespace 'foo'". Obviously this doesn't work, so we always try to make sure a typedef doesn't come from a module first, by asking us to get the typedef from the DWO file:

type_sp = ParseTypeFromDWO(die, log);

If this fails, it just means we have the typedef in hand. If I compile your example I end up with:

0x0000000b: TAG_compile_unit [1] *
              AT_producer( "Apple LLVM version 8.0.0 (clang-800.0.5.3)" )
              AT_language( DW_LANG_C99 )
              AT_name( "main.c" )
              AT_stmt_list( 0x00000000 )
              AT_comp_dir( "/tmp" )
              AT_low_pc( 0x0000000100000f60 )
              AT_high_pc( 0x0000000100000fb0 )

0x0000002e: TAG_subprogram [2] *
                  AT_low_pc( 0x0000000100000f60 )
                  AT_high_pc( 0x0000000100000f85 )
                  AT_frame_base( rbp )
                  AT_name( "myfunc" )
                  AT_decl_file( "/private/tmp/main.c" )
                  AT_decl_line( 6 )
                  AT_prototyped( 0x01 )
                  AT_external( 0x01 )

0x00000049: TAG_formal_parameter [3]
                      AT_location( fbreg -8 )
                      AT_name( "s" )
                      AT_decl_file( "/private/tmp/main.c" )
                      AT_decl_line( 6 )
                      AT_type( {0x0000008c} ( my_untagged_struct* ) )

0x00000057: NULL

0x00000058: TAG_subprogram [4] *
                  AT_low_pc( 0x0000000100000f90 )
                  AT_high_pc( 0x0000000100000fb0 )
                  AT_frame_base( rbp )
                  AT_name( "main" )
                  AT_decl_file( "/private/tmp/main.c" )
                  AT_decl_line( 12 )
                  AT_type( {0x00000085} ( int ) )
                  AT_external( 0x01 )

0x00000076: TAG_variable [5]
                      AT_location( fbreg -16 )
                      AT_name( "s" )
                      AT_decl_file( "/private/tmp/main.c" )
                      AT_decl_line( 14 )
                      AT_type( {0x00000091} ( my_untagged_struct ) )

0x00000084: NULL

0x00000085: TAG_base_type [6]
                  AT_name( "int" )
                  AT_encoding( DW_ATE_signed )
                  AT_byte_size( 0x04 )

0x0000008c: TAG_pointer_type [7]
                  AT_type( {0x00000091} ( my_untagged_struct ) )

0x00000091: TAG_typedef [8]
                  AT_type( {0x0000009c} ( struct ) )
                  AT_name( "my_untagged_struct" )
                  AT_decl_file( "/private/tmp/main.c" )
                  AT_decl_line( 4 )

0x0000009c: TAG_structure_type [9] *
                  AT_byte_size( 0x08 )
                  AT_decl_file( "/private/tmp/main.c" )
                  AT_decl_line( 1 )

0x000000a0: TAG_member [10]
                      AT_name( "i" )
                      AT_type( {0x00000085} ( int ) )
                      AT_decl_file( "/private/tmp/main.c" )
                      AT_decl_line( 2 )
                      AT_data_member_location( +0 )

0x000000ae: TAG_member [10]
                      AT_name( "f" )
                      AT_type( {0x000000bd} ( float ) )
                      AT_decl_file( "/private/tmp/main.c" )
                      AT_decl_line( 3 )
                      AT_data_member_location( +4 )

0x000000bc: NULL

0x000000bd: TAG_base_type [6]
                  AT_name( "float" )
                  AT_encoding( DW_ATE_float )
                  AT_byte_size( 0x04 )

0x000000c4: NULL

Note that the typedef is at 0x00000091, and it is a typedef to 0x0000009c. Also note that the DWARF DIE at 0x0000009c is a complete definition as it has children describing its members and 0x0000009c doesn't have a DW_AT_declaration(1) attribute. Is this how your DWARF looks for your stuff? The DWARF you had looked like:

0x0000005c: DW_TAG_typedef [6]
                DW_AT_name( "my_untagged_struct" )
                DW_AT_decl_file("/home/luke/main.cpp")
                DW_AT_decl_line(4)
                DW_AT_type({0x0000002d})

What did the type at 0x0000002d look like? Similar to 0x0000009c in my DWARF I presume?

In the case of C89/C99, yes, but regrettably when you compile my example as C++ or use __attribute__((overloadable)) the DWARF does not include the DW_AT_name for the typedef in the formal parameter[0] of myfunc

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000b> DW_TAG_compile_unit
                     DW_AT_producer "GNU C++ 4.8.4 -mtune=generic -march=x86-64 -g -fstack-protector"
                     DW_AT_language DW_LANG_C_plus_plus
                     DW_AT_name "main.cpp"
                     DW_AT_comp_dir "/tmp"
                     DW_AT_low_pc 0x004004ed
                     DW_AT_high_pc <offset-from-lowpc>60
                     DW_AT_stmt_list 0x00000000

LOCAL_SYMBOLS:
< 1><0x0000002d> DW_TAG_structure_type
                       DW_AT_byte_size 0x00000008
                       DW_AT_decl_file 0x00000001 /tmp/main.cpp
                       DW_AT_decl_line 0x00000001
                       DW_AT_linkage_name "18my_untagged_struct"
                       DW_AT_sibling <0x0000004e>
< 2><0x00000039> DW_TAG_member
                         DW_AT_name "i"
                         DW_AT_decl_file 0x00000001 /tmp/main.cpp
                         DW_AT_decl_line 0x00000002
                         DW_AT_type <0x0000004e>
                         DW_AT_data_member_location 0
< 2><0x00000043> DW_TAG_member
                         DW_AT_name "f"
                         DW_AT_decl_file 0x00000001 /tmp/main.cpp
                         DW_AT_decl_line 0x00000003
                         DW_AT_type <0x00000055>
                         DW_AT_data_member_location 4
< 1><0x0000004e> DW_TAG_base_type
                       DW_AT_byte_size 0x00000004
                       DW_AT_encoding DW_ATE_signed
                       DW_AT_name "int"
< 1><0x00000055> DW_TAG_base_type
                       DW_AT_byte_size 0x00000004
                       DW_AT_encoding DW_ATE_float
                       DW_AT_name "float"
< 1><0x0000005c> DW_TAG_typedef
                       DW_AT_name "my_untagged_struct"
                       DW_AT_decl_file 0x00000001 /tmp/main.cpp
                       DW_AT_decl_line 0x00000004
                       DW_AT_type <0x0000002d>
< 1><0x00000067> DW_TAG_subprogram
                       DW_AT_external yes(1)
                       DW_AT_name "myfunc"
                       DW_AT_decl_file 0x00000001 /tmp/main.cpp
                       DW_AT_decl_line 0x00000006
                       DW_AT_linkage_name "_Z6myfuncP18my_untagged_struct"
                       DW_AT_low_pc 0x004004ed
                       DW_AT_high_pc <offset-from-lowpc>33
                       DW_AT_frame_base len 0x0001: 9c: DW_OP_call_frame_cfa
                       DW_AT_GNU_all_call_sites yes(1)
                       DW_AT_sibling <0x00000095>
< 2><0x00000088> DW_TAG_formal_parameter
                         DW_AT_name "s"
                         DW_AT_decl_file 0x00000001 /tmp/main.cpp
                         DW_AT_decl_line 0x00000006
                         DW_AT_type <0x00000095>
                         DW_AT_location len 0x0002: 9168: DW_OP_fbreg -24
< 1><0x00000095> DW_TAG_pointer_type
                       DW_AT_byte_size 0x00000008
                       DW_AT_type <0x0000005c>
< 1><0x0000009b> DW_TAG_subprogram
                       DW_AT_external yes(1)
                       DW_AT_name "main"
                       DW_AT_decl_file 0x00000001 /tmp/main.cpp
                       DW_AT_decl_line 0x0000000c
                       DW_AT_type <0x0000004e>
                       DW_AT_low_pc 0x0040050e
                       DW_AT_high_pc <offset-from-lowpc>27
                       DW_AT_frame_base len 0x0001: 9c: DW_OP_call_frame_cfa
                       DW_AT_GNU_all_tail_call_sitesyes(1)
< 2><0x000000b8> DW_TAG_lexical_block
                         DW_AT_low_pc 0x00400516
                         DW_AT_high_pc <offset-from-lowpc>17
< 3><0x000000c9> DW_TAG_variable
                           DW_AT_name "s"
                           DW_AT_decl_file 0x00000001 /tmp/main.cpp
                           DW_AT_decl_line 0x0000000e
                           DW_AT_type <0x0000005c>
                           DW_AT_location len 0x0002: 9160: DW_OP_fbreg -32

The DWARFASTParserClang class is responsible for making up a clang type in the clang::ASTContext for this typedef. What will happen in the code where the flow falls through is the we will make a lldb_private::Type that says "I am a typedef to type whose user ID is 0x0000002d (in your example)". A NULL pointer should not be returned from the DWARFASTParserClang::ParseTypeFromDWARF() function. If it is, please step through and figure out why. I compiled your example and did the following:

% lldb a.out
(lldb) b main
(lldb) r
Process 89808 launched: '/private/tmp/a.out' (x86_64)
Process 89808 stopped
* thread #1: tid = 0xf7473, 0x0000000100000fa3 a.out main + 19, stop reason = breakpoint 1.1, queue = com.apple.main-thread
     frame #0: 0x0000000100000fa3 a.out main + 19 at main.c:15
    12 int main()
    13 {
    14 my_untagged_struct s;
-> 15 myfunc(&s);
    16 return 0;
    17 }
(lldb) p myfunc(&s)
(lldb)

So I was able to call this function. Are you not able to call it?

I tried compiling with standard C99, and as you note, this works fine; however, C++ fails:

$ lldb a.out -o 'b 15' -o 'process launch'
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) b 15
Breakpoint 1: where = a.out`main + 8 at main.cpp:15, address = 0x0000000000400516
(lldb) process launch
Process 18718 stopped
* thread #1: tid = 18718, 0x0000000000400516 a.out`main + 8 at main.cpp:15, name = 'a.out', stop reason = breakpoint 1.1
     frame #0: 0x0000000000400516 a.out`main + 8 at main.cpp:15

Process 18718 launched: '/tmp/a.out' (x86_64)
(lldb) expr myfunc(&s)
error: Couldn't lookup symbols:
   myfunc($_0*)
(lldb)

Likewise if I step into this function I can see the variable:

(lldb) s
(lldb) fr var s
(my_untagged_struct *) s = 0x00007fff5fbff8d0
(lldb) fr var *s
(my_untagged_struct) *s = (i = 0, f = 3.1400001)

This does indeed seem to work

(lldb) s
Process 18769 stopped
* thread #1: tid = 18769, 0x00000000004004f5 a.out`myfunc(s=0x00007fffffffe2b0) + 8 at main.cpp:8, name = 'a.out', stop reason = step in
frame #0: 0x00000000004004f5 a.out`myfunc(s=0x00007fffffffe2b0) + 8 at main.cpp:8
(lldb) fr var s
(my_untagged_struct *) s = 0x00007fffffffe2b0
(lldb) fr var *s
(my_untagged_struct) *s = (i = -7264, f = 0.0000000000000000000000000000000000000000459163468)
(lldb)

So to sum up: when we parse the DW_TAG_typedef in DWARFASTParserClang::ParseTypeFromDWARF(), we should return a valid TypeSP that contains a valid pointer. If that isn't happening, that is a bug. Feel free to send me the example binary and I can figure things out if you have any trouble. I wrote all of this code so I am quite familiar with it.

I've confirmed you're absolutely right about returning a non-null TypeSP after fallthrough in DWARFASTParserClang::ParseTypeFromDWARF, but it seems that with an empty name it doesn't allow clang to resolve the type, failing to locate mangled function as the typename is wrong (_Z6myfuncP3$_0).

A colleague took a look at this today, and as a quick sanity test, threw together this hack:

--- a/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -553,6 +553,19 @@ DWARFASTParserClang::ParseTypeFromDWARF (const SymbolContext& sc,
                          }
                      }

+ {
+ uint32_t list_size = type_list->GetSize();
+ for (uint32_t i = 0; i < list_size; ++i)
+ {
+ TypeSP t = type_list->GetTypeAtIndex(i);
+ if (t->IsTypedef())
+ {
+ type_name_const_str = t->GetName();
+ type_name_cstr = t->GetName().AsCString();
+ }
+ }
+ }

It seems to fix our problem here and expression evaluation works again for the presented case, but unfortunately, a few other tests break, which is a little frustrating. If you have time to take another look at why this might be the case, it'd be very much appreciated.

I've attached an example Mac binary of this issue in action built with an older Apple clang++, (it's simply the test above) but the result is the same for me on Linux with upstream clang++ and g++5.3, so I don't think the age of the compiler is a problem here.

Thanks again

Luke

mac-expr-anon-struct-example.tar.gz (20.5 KB)

Thanks for the example, this is indeed a new regression. It used to work (Xcode 7.2), but now with top of tree it doesn't. Sean Callanan recently pulled out a bunch of work around we used to have in the expression/JIT so that we can avoid certain issues that were caused by said work arounds, and those are causing problems now. I looked at the old expression parser and it will still making up the name _Z6myfuncP3$_0, but it would first try the mangled name, and if it didn't find that, then it would fall back to just looking for the demangled basename ("myfunc"). We removed this work around because now we are trying to be more correct, and that caused this regression. Sean Callanan will take a look at this and get a fix for it sometime soon. What is probably happening is we are removing the typedef sugar from the function arguments somewhere that we shouldn't be (like maybe in the clang::ASTImporter or our lldb_private::ClangASTImporter). We should be trying to lookup the mangle name "_Z6myfuncP18my_untagged_struct", but somehow when we go to lookup the type we lost the my_untagged_struct and are looking for an anonymous struct "$_0" instead.

Greg Clayton

Please file a bug for this and I will relate it to our internal apple bug that tracks this issue.