LTO debug info

Hello,

We noticed an issue with dwarf parsing when debugging a large application compiled with LTO.

Since an LTO application’s debug info can have inter-compile-unit references, (for example: a type defined in one compile-unit and used in another) we noticed that the “m_type_index” vector in SymbolFileDWARF can get corrupted. This is because many index structures in SymbolFileDWARF are generated in parallel.

Is is a bug or is it expected to be the compiler’s/linker’s job to generate self-contained DWARF data without inter-CU references?

-Philippe

TEach SymbolFileDWARF should generate its own accelerator tables in a self contained way. No SymbolFileDWARF should be adding references to their accelerator maps that actually refer to DIEs from another SymbolFileDWARF. We might need to look at the indexing function that make sure if they see and DW_FORM_ref_addr attributes, that they don't add any of them to their index. That seems like the bug?

Greg

What made me suspect a data corruption issue were asserts triggered in the VC++ vector implementation when we use an LTO binary with a DEBUG lldb build on Windows.

   For example,
   at source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp:624,

   File: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\INCLUDE\vector
   Line: 240
   Expression: vector iterators incompatible

Digging deeper, the assertion is triggered by accesses to DWARFCompileUnit::m_die_array

In the 'parser_fn' lambda in SymbolFileDWARF::Index() that indexes each compile unit in parallel, DWARFCompileUnit::ExtractDIEsIfNeeded(..) , DWARFCompileUnit::Index(..) and DWARFCompileUnit::ClearDIEs() are called in sequence.

'ExtractDIEsIfNeeded' and 'ClearDIEs' respectively populates and clears DWARFCompileUnit::m_die_array.

But when DWARFCompileUnit::Index(..) looks up a DIE in another CU and that CU is running 'ExtractDIEsIfNeeded' or 'ClearDIEs', it will access a stale or invalid DWARFCompileUnit::m_die_array ...

-Philippe

What made me suspect a data corruption issue were asserts triggered in the VC++ vector implementation when we use an LTO binary with a DEBUG lldb build on Windows.

  For example,
  at source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp:624,

  File: C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\INCLUDE\vector
  Line: 240
  Expression: vector iterators incompatible

Digging deeper, the assertion is triggered by accesses to DWARFCompileUnit::m_die_array

In the 'parser_fn' lambda in SymbolFileDWARF::Index() that indexes each compile unit in parallel, DWARFCompileUnit::ExtractDIEsIfNeeded(..) , DWARFCompileUnit::Index(..) and DWARFCompileUnit::ClearDIEs() are called in sequence.

'ExtractDIEsIfNeeded' and 'ClearDIEs' respectively populates and clears DWARFCompileUnit::m_die_array.

But when DWARFCompileUnit::Index(..) looks up a DIE in another CU and that CU is running 'ExtractDIEsIfNeeded' or 'ClearDIEs', it will access a stale or invalid DWARFCompileUnit::m_die_array ...

So the main question is why is anything that is indexing the current CU only, accessing anything from another CU? It can't and shouldn't. That is the bug.

So the main question is why is anything that is indexing the current CU only, accessing anything from another CU? It can't and shouldn't. That is the bug.

Indexing is accessing another CU because there is a reference (offset) to another CU.

For example, in the 2nd compile unit below, the DW_AT_type of DW_TAG_formal_parameter at offset 0x1772 is referencing the 1st compile unit (offset 0x1741).

Compilation Unit @ offset 0x16f9:
   Length: 0x4c (32-bit)
   Version: 4
   Abbrev Offset: 0x763
   Pointer Size: 4
<0><1704>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <1705> DW_AT_producer : (indirect string, offset: 0x0): clang version 3.9.0 (trunk)
    <1709> DW_AT_language : 12 (ANSI C99)
    <170b> DW_AT_name : (indirect string, offset: 0x4be): main.c
    <170f> DW_AT_stmt_list : 0x1329
    <1713> DW_AT_comp_dir : (indirect string, offset: 0x4c5): C:\Users\phlav\Documents\opus studio 2013\Projects\OpusProject9
    <1717> Unknown AT value: 3fe1: 1
    <1717> DW_AT_low_pc : 0x400000
    <171b> DW_AT_high_pc : 0x8
<1><171f>: Abbrev Number: 2 (DW_TAG_subprogram)
    <1720> DW_AT_low_pc : 0x400000
    <1724> DW_AT_high_pc : 0x8
    <1728> Unknown AT value: 3fe7: 1
    <1728> DW_AT_frame_base : 1 byte block: 51 (DW_OP_reg1 (r1))
    <172a> DW_AT_name : (indirect string, offset: 0xab): main
    <172e> DW_AT_decl_file : 1
    <172f> DW_AT_decl_line : 4
    <1730> DW_AT_type : <0x1741>
    <1734> DW_AT_external : 1
    <1734> Unknown AT value: 3fe1: 1
<2><1734>: Abbrev Number: 3 (DW_TAG_variable)
    <1735> DW_AT_const_value : 55
    <1736> DW_AT_name : (indirect string, offset: 0xd9): x
    <173a> DW_AT_decl_file : 1
    <173b> DW_AT_decl_line : 7
    <173c> DW_AT_type : <0x1741>
<1><1741>: Abbrev Number: 4 (DW_TAG_base_type)
    <1742> DW_AT_name : (indirect string, offset: 0x8f): int
    <1746> DW_AT_encoding : 5 (signed)
    <1747> DW_AT_byte_size : 4

  Compilation Unit @ offset 0x1749:
   Length: 0x32 (32-bit)
   Version: 4
   Abbrev Offset: 0x763
   Pointer Size: 4
<0><1754>: Abbrev Number: 5 (DW_TAG_compile_unit)
    <1755> DW_AT_producer : (indirect string, offset: 0x0): clang version 3.9.0 (trunk)
    <1759> DW_AT_language : 12 (ANSI C99)
    <175b> DW_AT_name : (indirect string, offset: 0x505): Source1.c
    <175f> DW_AT_stmt_list : 0x1362
    <1763> DW_AT_comp_dir : (indirect string, offset: 0x50f): C:\Users\phlav\Documents\opus studio 2013\Projects\OpusLibrary
    <1767> Unknown AT value: 3fe1: 1
<1><1767>: Abbrev Number: 6 (DW_TAG_subprogram)
    <1768> DW_AT_name : (indirect string, offset: 0x54e): lib
    <176c> DW_AT_decl_file : 1
    <176d> DW_AT_decl_line : 3
    <176e> DW_AT_prototyped : 1
    <176e> DW_AT_type : <0x1741>
    <1772> DW_AT_external : 1
    <1772> Unknown AT value: 3fe1: 1
<2><1772>: Abbrev Number: 7 (DW_TAG_formal_parameter)
    <1773> DW_AT_name : (indirect string, offset: 0xd9): x
    <1777> DW_AT_decl_file : 1
    <1778> DW_AT_decl_line : 3
    <1779> DW_AT_type : <0x1741>

I don't know if this is related or not, but on OS X it seems that using LTO causes debug info to be lost altogether (though IIRC binaries *are* significantly larger).

R.

The code accessing another CU while indexing is in DWARFCompileUnit::GetDIE.
If the DIE is not in the current CU, it finds the CU that has it and calls its ::GetDIE method, accessing its m_die_array data...

So the question is: is it a producer (CU should be self-contained) or a consumer (the indexing) bug ?

//----------------------------------------------------------------------
// GetDIE()
//
// Get the DIE (Debug Information Entry) with the specified offset by
// first checking if the DIE is contained within this compile unit and
// grabbing the DIE from this compile unit. Otherwise we grab the DIE
// from the DWARF file.
//----------------------------------------------------------------------
DWARFDIE
DWARFCompileUnit::GetDIE (dw_offset_t die_offset)
{
    if (die_offset != DW_INVALID_OFFSET)
    {
        if (m_dwo_symbol_file)
            return m_dwo_symbol_file->GetCompileUnit()->GetDIE(die_offset);

        if (ContainsDIEOffset(die_offset))
        {
            ExtractDIEsIfNeeded (false);
            DWARFDebugInfoEntry::iterator end = m_die_array.end();
            DWARFDebugInfoEntry::iterator pos = lower_bound(m_die_array.begin(), end, die_offset, CompareDIEOffset);
            if (pos != end)
            {
                if (die_offset == (*pos).GetOffset())
                    return DWARFDIE(this, &(*pos));
            }
        }
        else
        {
            // Don't specify the compile unit offset as we don't know it because the DIE belongs to
            // a different compile unit in the same symbol file.
            return m_dwarf2Data->DebugInfo()->GetDIEForDIEOffset(die_offset);
        }
    }
    return DWARFDIE(); // Not found
}

The indexing needs to be fixed. We probably need to start checking the DW_FORM of an attribute before we try to get the DIE for it. So the problem seems to be in DWARFCompileUnit::IndexPrivate() where when the tag is a DW_TAG_subprogram it eventually does:

if (specification_die_form.IsValid())
{
    DWARFDIE specification_die = dwarf_cu->GetSymbolFileDWARF()->DebugInfo()->GetDIE (DIERef(specification_die_form));
    if (specification_die.GetParent().IsStructOrClass())
        is_method = true;
}

This is the only call to GetDIE in the entire function. It is getting the DIE so it can look to see who the parent it to see if the parent of the function is a struct/class, which means it is a method, or anything else and the function is just the basename of a raw function. So it seems like this should be OK to access other DIEs as it isn't adding any information from the other DWARF file, it is just using it to check who the parent of function (DW_TAG_subprogram) is.

So the real problem is that "ClearDIEs()" is being called too soon before everyone is done using the DIEs. The reason this was being done is that you might have 10000 compile units and we want to partially parse the the DWARF and only use what we need, but if we index everything we want to load the DIEs for a compile unit and clear them after indexing if they weren't loaded before we did the index so we don't take up loads of memory.

So what we need to do to fix this is:

1 - Run through all of the compile units first and see if each CU has the DIEs parsed and remember this
2 - Index all compile units
3 - Clear any DIEs in any compile units that didn't have their DIEs parsed _after_ all compile units have been indexed

A proposed patch is attached:

dwarf.patch (4.74 KB)

Thanks for the explanation and the patch.

I tried it and the exception is gone.
The debug info also seems complete. As complete as can be for an LTO build…

I attached a modified patch with the calls to ExtractDIEsIfNeeded() on separate threads.

Should I submit it for review ?

-Philippe

dwarf.patch (5.64 KB)

Please do submit this for review, yes.

Just a bump to observe that debug information still behaves as if it's not there
in binaries built with LTO on Mac (except possibly in very simple executables
that don't link to any shared libraries).

R.