DWARF Generator

I have recently been modifying the DWARF parser and have more patches planned and I want to be able to add unit tests that test the internal llvm DWARF APIs to ensure they continue to work and also validate the changes that I am making. There are not many DWARF unit tests other than very simple ones that test DWARF forms currently. I would like to expand this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the issues with the patch was a stand alone DWARF generator that can turn a few API calls into the section data required for the DWARFContextInMemory class to be able to load DWARF from. The idea is to generate a small blurb of DWARF, parse it using our built in DWARF parser and validate that the API calls we do when consuming the DWARF match what we expect. The original stand along DWARF generator class is in unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached. The original review suggested that I try to use the AsmPrinter and many of its associated classes to generate the DWARF. I attempted to do so and the AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

dwarfgen.patch (88.2 KB)

I have recently been modifying the DWARF parser and have more patches planned and I want to be able to add unit tests that test the internal llvm DWARF APIs to ensure they continue to work and also validate the changes that I am making. There are not many DWARF unit tests other than very simple ones that test DWARF forms currently. I would like to expand this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the issues with the patch was a stand alone DWARF generator that can turn a few API calls into the section data required for the DWARFContextInMemory class to be able to load DWARF from. The idea is to generate a small blurb of DWARF, parse it using our built in DWARF parser and validate that the API calls we do when consuming the DWARF match what we expect. The original stand along DWARF generator class is in unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached. The original review suggested that I try to use the AsmPrinter and many of its associated classes to generate the DWARF. I attempted to do so and the AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was able to get simple DWARF to be emitted with the AsmPrinter version of the DWARF generator with code like:

initLLVM();
DwarfGen DG;
Triple Triple(“x86_64–”);
StringRef Path("/tmp/test.elf");
bool DwarfInitSuccess = DG.init(Triple, Path);
EXPECT_TRUE(DwarfInitSuccess);
uint16_t Version = 4;
uint8_t AddrSize = 8;
DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
DwarfGenDIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, “/tmp/main.c”);
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, “main”);
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, “int”);
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

DwarfGenDIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, “argc”);
//ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); // Crashes here…

DG.generate();

auto Obj = object::ObjectFile::createObjectFile(Path);
if (Obj) {
DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
uint32_t NumCUs = DwarfContext.getNumCompileUnits();
for (uint32_t i=0; i<NumCUs; ++i) {
DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
if (U)
U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
}
}

But things fall down if I try to uncomment the DW_FORM_ref_addr line above. The problem is that AsmPrinter really expects a full stack of stuff to be there and expects people to use the DwarfDebug class and all of its associated classes. These associated classes really want to use the “DI” objects (DICompileUnit, etc) so to create a compile unit we would need to create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit. That stack is pretty heavy and requires the code shown above to create many many classes just to represent the simple output we wish to emit. Another downside of the AsmPrinter method is we don’t know which targets people are going to build into their binaries and thus we don’t know which triples we will be able to use when generating DWARF info. Adrian Prantl attempted to help me get things working over here and we kept running into roadblocks.

It’d be great to have more detail about the roadblocks you hit to better understand how bad/what the issues are.

Even if we end up adding another set of code to generate DWARF (which I’d really like to avoid) we’d want to, at some point, coalesce them back together. Given the goal is to try to coalesce the DWARF parsing code in LLDB and LLVM, it’d seem unfortunate if that effort just created another similar (or larger) amount of work for DWARF generation.

I wanted to pass this patch along in case someone wants to take a look at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and lib/CodeGen/DwarfGenerator.h. The code that sets up all the required classes for the AsmPrinter method is in the DwarfGen class from lib/CodeGen/DwarfGenerator.cpp in the following function:

bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);

The code in this function was looted from existing DwarfLinker.cpp code. This functions requires a valid triple and that triple is used to create a lot of the classes required to make the AsmPrinter. I am not sure if any other code uses the AsmPrinter like this besides the DwarfLinker.cpp code and that code uses its own magic to actually link the DWARF. It does reuse some of the functions as I did, but the DwarfLinker doesn’t use any of the DwarfDebug, DwarfCompileUnit or any of the classes that the compiler/assembler uses when making DWARF.

What’s the DwarfLinker code missing that you need? If that code is generating essentially arbitrary DWARF, what’s blocking using the same technique for generating DWARF for parsing tests?

The amount of work required for refactoring the AsmPrinter exceeds the time I am going to have, but I would still like to have DWARF API testing in the unit tests.

So my question is if anyone would have objections to using the stand along DWARF generator in unittests/DebugInfo/DWARF until we can later get the YAML tools to be able to produce DWARF and we can switch to testing the DWARF data that way? Chris Bieneman has expressed interest in getting a DWARF/YAML layer going.

Those tools would still want to use pretty similar (conceptually) abstractions to LLVM’s codegen and llvm-dsymutil. I’d still strongly prefer to generalize/keep common APIs here - or better understand why it’s not practical now (& what it will take/how we make sure we have a plan and resources to get there eventually).

My reasoning is:

  • I want to be able to test DWARF APIs we have to ensure they work correctly as there are no Dwarf API tests right now. I will be adding code that changes many things in the DWARF parser and it will be essential to verify that there are no regressions in the DWARF APIs.
  • Not sure which targets would be built into LLVM so it might be hard to write tests that cover 32/64 bit addresses and all the variants if we have to do things legally via AsmPrinter and valid targets

Seems like it might be plausible to refactor out whatever features of the AsmPrinter these APIs require (so we just harvest that data out of AsmPrinter and pass it down in a struct, say - so that other users can pass their own struct without needing an AsmPrinter). Though, again, interested to know how dsymutil is working in these situations.

  • Not enough time to modify AsmPrinter to not require the full DebugInfo stack and the classes that it uses (llvm::DwarfCompileUnit which must use llvm::DICompileUnit, llvm::DIE class which uses many local classes that all depend on the full DwarfDebug stack).

Will you have time at some later date to come back and revisit this? It’s understandable that we may choose to incur short term technical debt with an understanding that it will be paid off in some timely manner. It’d be less desirable if there’s no such plan/possibility and we incur a fairly clear case of technical debt (redundant DWARF generation libraries - especially when this effort is to remove a redundant DWARF parser).

It looks like DIE.cpp needs DwarfDebug (via AsmPrinter) for basically two things: finding location-list entries, because the location list is managed by DwarfDebug; and, handling DW_FORM_ref_addr, which requires (a) the DWARF version, because the ref_addr size depends on the version, and (b) looking at other units, because units are managed by DwarfDebug.

It might be feasible to refactor DwarfDebug into (broadly speaking) the MD-to-DWARF translation part and the hunks-of-DWARF management part. Then the MD translator and the DWARF-generator could use the hunks-of-DWARF API to achieve their respective goals. I spent about 30 seconds thinking about that, so the opinion might not be worth much.

–paulr

I have recently been modifying the DWARF parser and have more patches planned and I want to be able to add unit tests that test the internal llvm DWARF APIs to ensure they continue to work and also validate the changes that I am making. There are not many DWARF unit tests other than very simple ones that test DWARF forms currently. I would like to expand this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the issues with the patch was a stand alone DWARF generator that can turn a few API calls into the section data required for the DWARFContextInMemory class to be able to load DWARF from. The idea is to generate a small blurb of DWARF, parse it using our built in DWARF parser and validate that the API calls we do when consuming the DWARF match what we expect. The original stand along DWARF generator class is in unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached. The original review suggested that I try to use the AsmPrinter and many of its associated classes to generate the DWARF. I attempted to do so and the AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was able to get simple DWARF to be emitted with the AsmPrinter version of the DWARF generator with code like:

   initLLVM();
   DwarfGen DG;
   Triple Triple("x86_64--");
   StringRef Path("/tmp/test.elf");
   bool DwarfInitSuccess = DG.init(Triple, Path);
   EXPECT_TRUE(DwarfInitSuccess);
   uint16_t Version = 4;
   uint8_t AddrSize = 8;
   DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
   DwarfGenDIE CUDie = CU.getUnitDIE();

   CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
   CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

   DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
   SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
   SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
   SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

   DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
   IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
   IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
   IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

   DwarfGenDIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter);
   ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
   //ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); // Crashes here...

   DG.generate();

   auto Obj = object::ObjectFile::createObjectFile(Path);
   if (Obj) {
     DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
     uint32_t NumCUs = DwarfContext.getNumCompileUnits();
     for (uint32_t i=0; i<NumCUs; ++i) {
       DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
       if (U)
         U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
     }
   }

But things fall down if I try to uncomment the DW_FORM_ref_addr line above. The problem is that AsmPrinter really expects a full stack of stuff to be there and expects people to use the DwarfDebug class and all of its associated classes. These associated classes really want to use the "DI" objects (DICompileUnit, etc) so to create a compile unit we would need to create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit. That stack is pretty heavy and requires the code shown above to create many many classes just to represent the simple output we wish to emit. Another downside of the AsmPrinter method is we don't know which targets people are going to build into their binaries and thus we don't know which triples we will be able to use when generating DWARF info. Adrian Prantl attempted to help me get things working over here and we kept running into roadblocks.

It'd be great to have more detail about the roadblocks you hit to better understand how bad/what the issues are.

A few blocks:

- DIEString doesn't support DW_FORM_string. DW_FORM_string support might have been pulled so that we never emit it from clang, but we would want to have a unit test that covers being able to read an inlined C string from a DIE. Support won't be that hard to add, but we might not want it so that people can't use it by accident and make less efficient DWARF.
- Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts in bool AsmPrinter::doInitialization(Module &M). On the first line:

  MMI = getAnalysisIfAvailable<MachineModuleInfo>();

This asserts if you use the AsmPrinter the way the DwarfLinker and the AsmPrinter based DwarfGen does if you call this. You must call this to generate the DebugDwarf. If you get past this by installing a Pass then we assert at:

  GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
  assert(MI && "AsmPrinter didn't require GCModuleInfo?");

If we don't have this, we don't get a DwarfDebug.

Even if we end up adding another set of code to generate DWARF (which I'd really like to avoid) we'd want to, at some point, coalesce them back together. Given the goal is to try to coalesce the DWARF parsing code in LLDB and LLVM, it'd seem unfortunate if that effort just created another similar (or larger) amount of work for DWARF generation.

This DWARF generator could just live in the unittests/DebugInfo/DWARF directory so it wouldn't pollute anything in LLVM it we do choose to use it.

I wanted to pass this patch along in case someone wants to take a look at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and lib/CodeGen/DwarfGenerator.h. The code that sets up all the required classes for the AsmPrinter method is in the DwarfGen class from lib/CodeGen/DwarfGenerator.cpp in the following function:

bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);

The code in this function was looted from existing DwarfLinker.cpp code. This functions requires a valid triple and that triple is used to create a lot of the classes required to make the AsmPrinter. I am not sure if any other code uses the AsmPrinter like this besides the DwarfLinker.cpp code and that code uses its own magic to actually link the DWARF. It does reuse some of the functions as I did, but the DwarfLinker doesn't use any of the DwarfDebug, DwarfCompileUnit or any of the classes that the compiler/assembler uses when making DWARF.

What's the DwarfLinker code missing that you need? If that code is generating essentially arbitrary DWARF, what's blocking using the same technique for generating DWARF for parsing tests?

They don't use any of the DwarfDebug, DwarfCompileUnit classes. They also don't use any of the DI classes when making up the debug info. So both the DWARF linker and the generator have similar needs: make DWARF that isn't tied too closely to the clang internal classes and DI classes.

The amount of work required for refactoring the AsmPrinter exceeds the time I am going to have, but I would still like to have DWARF API testing in the unit tests.

So my question is if anyone would have objections to using the stand along DWARF generator in unittests/DebugInfo/DWARF until we can later get the YAML tools to be able to produce DWARF and we can switch to testing the DWARF data that way? Chris Bieneman has expressed interest in getting a DWARF/YAML layer going.

Those tools would still want to use pretty similar (conceptually) abstractions to LLVM's codegen and llvm-dsymutil. I'd still strongly prefer to generalize/keep common APIs here - or better understand why it's not practical now (& what it will take/how we make sure we have a plan and resources to get there eventually).

My reasoning is:
- I want to be able to test DWARF APIs we have to ensure they work correctly as there are no Dwarf API tests right now. I will be adding code that changes many things in the DWARF parser and it will be essential to verify that there are no regressions in the DWARF APIs.
- Not sure which targets would be built into LLVM so it might be hard to write tests that cover 32/64 bit addresses and all the variants if we have to do things legally via AsmPrinter and valid targets

Seems like it might be plausible to refactor out whatever features of the AsmPrinter these APIs require (so we just harvest that data out of AsmPrinter and pass it down in a struct, say - so that other users can pass their own struct without needing an AsmPrinter). Though, again, interested to know how dsymutil is working in these situations.

I can try that method if indeed the only places that use the DwarfDebug are the DW_FORM_ref_addr and location lists. I'll let you know how that goes.

- Not enough time to modify AsmPrinter to not require the full DebugInfo stack and the classes that it uses (llvm::DwarfCompileUnit which must use llvm::DICompileUnit, llvm::DIE class which uses many local classes that all depend on the full DwarfDebug stack).

Will you have time at some later date to come back and revisit this? It's understandable that we may choose to incur short term technical debt with an understanding that it will be paid off in some timely manner. It'd be less desirable if there's no such plan/possibility and we incur a fairly clear case of technical debt (redundant DWARF generation libraries - especially when this effort is to remove a redundant DWARF parser).

Not sure anyone else will need to generate DWARF manually. The two clients currently are the DWARF unittests and the DwarfLinker. The DwarfLinker worked around these issues. If the AsmPrinter wasn't such an integral part of the entire compiler stack, I could take a stab at refactoring it, but I don't believe I am the right person to do this at this point as I have no experience or knowledge of the various ways that this class is used, or how it interacts with other support classes (DwarfDebug, and many many other classes).

Things that still worry me:
- not being able to generate DWARF for 32/64 if targets are missing
- DIEString not supporting DW_FORM_string. I can add support, but I don't know if we want it as if we add it people might start using it.
- hacking around asserts by constructing classes and copying code from places that properly use the AsmPrinter that way it is supposed to be used so that we can use it in a way that it wasn't designed to be used.

It looks like DIE.cpp needs DwarfDebug (via AsmPrinter) for basically two things: finding location-list entries, because the location list is managed by DwarfDebug; and, handling DW_FORM_ref_addr, which requires (a) the DWARF version, because the ref_addr size depends on the version, and (b) looking at other units, because units are managed by DwarfDebug.

I will see if I can work around this if this truly is the only place where DwarfDebug was being used in the AsmPrinter. There is a whole bunch of code that is built into DwarfCompileUnit, which uses DICompileUnit and other DI and MC classes, that emits the DWARF that I ported over into more generic areas. Not sure what else I will run into.

It might be feasible to refactor DwarfDebug into (broadly speaking) the MD-to-DWARF translation part and the hunks-of-DWARF management part. Then the MD translator and the DWARF-generator could use the hunks-of-DWARF API to achieve their respective goals. I spent about 30 seconds thinking about that, so the opinion might not be worth much.

I will take a fresh look and see if I can factor this out.

From: Greg Clayton [mailto:gclayton@apple.com]
Sent: Thursday, November 17, 2016 5:01 PM
To: David Blaikie
Cc: llvm-dev@lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
Prantl
Subject: Re: [llvm-dev] DWARF Generator

>
>
>
> I have recently been modifying the DWARF parser and have more patches
planned and I want to be able to add unit tests that test the internal
llvm DWARF APIs to ensure they continue to work and also validate the
changes that I am making. There are not many DWARF unit tests other than
very simple ones that test DWARF forms currently. I would like to expand
this to include many more tests.
>
> I had submitted a patch that I aborted as it was too large. One of the
issues with the patch was a stand alone DWARF generator that can turn a
few API calls into the section data required for the DWARFContextInMemory
class to be able to load DWARF from. The idea is to generate a small blurb
of DWARF, parse it using our built in DWARF parser and validate that the
API calls we do when consuming the DWARF match what we expect. The
original stand along DWARF generator class is in
unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
The original review suggested that I try to use the AsmPrinter and many of
its associated classes to generate the DWARF. I attempted to do so and the
AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.
>
>
>
> I am having trouble getting things to work with the AsmPrinter. I was
able to get simple DWARF to be emitted with the AsmPrinter version of the
DWARF generator with code like:
>
>
> initLLVM();
> DwarfGen DG;
> Triple Triple("x86_64--");
> StringRef Path("/tmp/test.elf");
> bool DwarfInitSuccess = DG.init(Triple, Path);
> EXPECT_TRUE(DwarfInitSuccess);
> uint16_t Version = 4;
> uint8_t AddrSize = 8;
> DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
> DwarfGenDIE CUDie = CU.getUnitDIE();
>
> CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
> CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);
>
> DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
> SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
> SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
> SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);
>
> DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
> IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
> IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
> IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);
>
> DwarfGenDIE ArgcDie =
SubprogramDie.addChild(DW_TAG_formal_parameter);
> ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
> //ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //
Crashes here...
>
> DG.generate();
>
> auto Obj = object::ObjectFile::createObjectFile(Path);
> if (Obj) {
> DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
> uint32_t NumCUs = DwarfContext.getNumCompileUnits();
> for (uint32_t i=0; i<NumCUs; ++i) {
> DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
> if (U)
> U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
> }
> }
>
>
> But things fall down if I try to uncomment the DW_FORM_ref_addr line
above. The problem is that AsmPrinter really expects a full stack of stuff
to be there and expects people to use the DwarfDebug class and all of its
associated classes. These associated classes really want to use the "DI"
objects (DICompileUnit, etc) so to create a compile unit we would need to
create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
That stack is pretty heavy and requires the code shown above to create
many many classes just to represent the simple output we wish to emit.
Another downside of the AsmPrinter method is we don't know which targets
people are going to build into their binaries and thus we don't know which
triples we will be able to use when generating DWARF info. Adrian Prantl
attempted to help me get things working over here and we kept running into
roadblocks.
>
> It'd be great to have more detail about the roadblocks you hit to better
understand how bad/what the issues are.

A few blocks:

- DIEString doesn't support DW_FORM_string. DW_FORM_string support might
have been pulled so that we never emit it from clang, but we would want to
have a unit test that covers being able to read an inlined C string from a
DIE. Support won't be that hard to add, but we might not want it so that
people can't use it by accident and make less efficient DWARF.

Seems to me we originally supported only DW_FORM_string, and then at some
point it was tossed in favor of DW_FORM_strp in order to get space savings
from string pooling. In fact using DW_FORM_string for small strings would
save some more space (admittedly not much) and a bunch of relocations.
(I found data from an old experiment, in a debug build of Clang it saved
~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

I'd favor an API that passed the string down and let the DIE generator
(as opposed to the DWARF generator) pick the form.

- Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts
in bool AsmPrinter::doInitialization(Module &M). On the first line:

  MMI = getAnalysisIfAvailable<MachineModuleInfo>();

This asserts if you use the AsmPrinter the way the DwarfLinker and the
AsmPrinter based DwarfGen does if you call this. You must call this to
generate the DebugDwarf. If you get past this by installing a Pass then we
assert at:

  GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
  assert(MI && "AsmPrinter didn't require GCModuleInfo?");

If we don't have this, we don't get a DwarfDebug.

>
> Even if we end up adding another set of code to generate DWARF (which
I'd really like to avoid) we'd want to, at some point, coalesce them back
together. Given the goal is to try to coalesce the DWARF parsing code in
LLDB and LLVM, it'd seem unfortunate if that effort just created another
similar (or larger) amount of work for DWARF generation.

This DWARF generator could just live in the unittests/DebugInfo/DWARF
directory so it wouldn't pollute anything in LLVM it we do choose to use
it.
>
> I wanted to pass this patch along in case someone wants to take a look
at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and
lib/CodeGen/DwarfGenerator.h. The code that sets up all the required
classes for the AsmPrinter method is in the DwarfGen class from
lib/CodeGen/DwarfGenerator.cpp in the following function:
>
> bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);
>
> The code in this function was looted from existing DwarfLinker.cpp code.
This functions requires a valid triple and that triple is used to create a
lot of the classes required to make the AsmPrinter. I am not sure if any
other code uses the AsmPrinter like this besides the DwarfLinker.cpp code
and that code uses its own magic to actually link the DWARF. It does reuse
some of the functions as I did, but the DwarfLinker doesn't use any of the
DwarfDebug, DwarfCompileUnit or any of the classes that the
compiler/assembler uses when making DWARF.
>
> What's the DwarfLinker code missing that you need? If that code is
generating essentially arbitrary DWARF, what's blocking using the same
technique for generating DWARF for parsing tests?

They don't use any of the DwarfDebug, DwarfCompileUnit classes. They also
don't use any of the DI classes when making up the debug info. So both the
DWARF linker and the generator have similar needs: make DWARF that isn't
tied too closely to the clang internal classes and DI classes.
>
> The amount of work required for refactoring the AsmPrinter exceeds the
time I am going to have, but I would still like to have DWARF API testing
in the unit tests.
>
> So my question is if anyone would have objections to using the stand
along DWARF generator in unittests/DebugInfo/DWARF until we can later get
the YAML tools to be able to produce DWARF and we can switch to testing
the DWARF data that way? Chris Bieneman has expressed interest in getting
a DWARF/YAML layer going.
>
> Those tools would still want to use pretty similar (conceptually)
abstractions to LLVM's codegen and llvm-dsymutil. I'd still strongly
prefer to generalize/keep common APIs here - or better understand why it's
not practical now (& what it will take/how we make sure we have a plan and
resources to get there eventually).
>
> My reasoning is:
> - I want to be able to test DWARF APIs we have to ensure they work
correctly as there are no Dwarf API tests right now. I will be adding code
that changes many things in the DWARF parser and it will be essential to
verify that there are no regressions in the DWARF APIs.
> - Not sure which targets would be built into LLVM so it might be hard to
write tests that cover 32/64 bit addresses and all the variants if we have
to do things legally via AsmPrinter and valid targets
>
> Seems like it might be plausible to refactor out whatever features of
the AsmPrinter these APIs require (so we just harvest that data out of
AsmPrinter and pass it down in a struct, say - so that other users can
pass their own struct without needing an AsmPrinter). Though, again,
interested to know how dsymutil is working in these situations.

I can try that method if indeed the only places that use the DwarfDebug
are the DW_FORM_ref_addr and location lists. I'll let you know how that
goes.
>
> - Not enough time to modify AsmPrinter to not require the full DebugInfo
stack and the classes that it uses (llvm::DwarfCompileUnit which must use
llvm::DICompileUnit, llvm::DIE class which uses many local classes that
all depend on the full DwarfDebug stack).
>
> Will you have time at some later date to come back and revisit this?
It's understandable that we may choose to incur short term technical debt
with an understanding that it will be paid off in some timely manner. It'd
be less desirable if there's no such plan/possibility and we incur a
fairly clear case of technical debt (redundant DWARF generation libraries
- especially when this effort is to remove a redundant DWARF parser).

Not sure anyone else will need to generate DWARF manually. The two clients
currently are the DWARF unittests and the DwarfLinker. The DwarfLinker
worked around these issues. If the AsmPrinter wasn't such an integral part
of the entire compiler stack, I could take a stab at refactoring it, but I
don't believe I am the right person to do this at this point as I have no
experience or knowledge of the various ways that this class is used, or
how it interacts with other support classes (DwarfDebug, and many many
other classes).

Things that still worry me:
- not being able to generate DWARF for 32/64 if targets are missing

You mean DWARF-32 and DWARF-64 formats? LLVM doesn't do DWARF-64.
If you mean 64-bit target-machine addresses, I guess I don't understand
the problem. If you have target-dependent tests, then they only work
when the right targets are there. This is extremely common and I'm
not clear why it would be a problem for the DWARF tests.

- DIEString not supporting DW_FORM_string. I can add support, but I don't
know if we want it as if we add it people might start using it.

See above. If the API picked the form this would not be a concern.

- hacking around asserts by constructing classes and copying code from
places that properly use the AsmPrinter that way it is supposed to be used
so that we can use it in a way that it wasn't designed to be used.

>
> I made a large effort to try and get things working with the AsmPrinter,
so I wanted everyone to know that I tried to get that solution working.
Let me know what you anyone thinks.
>
> Greg Clayton

--paulr

From: Greg Clayton [mailto:gclayton@apple.com]
Sent: Thursday, November 17, 2016 5:01 PM
To: David Blaikie
Cc: llvm-dev@lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
Prantl
Subject: Re: [llvm-dev] DWARF Generator

I have recently been modifying the DWARF parser and have more patches

planned and I want to be able to add unit tests that test the internal
llvm DWARF APIs to ensure they continue to work and also validate the
changes that I am making. There are not many DWARF unit tests other than
very simple ones that test DWARF forms currently. I would like to expand
this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the

issues with the patch was a stand alone DWARF generator that can turn a
few API calls into the section data required for the DWARFContextInMemory
class to be able to load DWARF from. The idea is to generate a small blurb
of DWARF, parse it using our built in DWARF parser and validate that the
API calls we do when consuming the DWARF match what we expect. The
original stand along DWARF generator class is in
unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
The original review suggested that I try to use the AsmPrinter and many of
its associated classes to generate the DWARF. I attempted to do so and the
AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was

able to get simple DWARF to be emitted with the AsmPrinter version of the
DWARF generator with code like:

  initLLVM();
  DwarfGen DG;
  Triple Triple("x86_64--");
  StringRef Path("/tmp/test.elf");
  bool DwarfInitSuccess = DG.init(Triple, Path);
  EXPECT_TRUE(DwarfInitSuccess);
  uint16_t Version = 4;
  uint8_t AddrSize = 8;
  DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
  DwarfGenDIE CUDie = CU.getUnitDIE();

  CUDie.addAttribute(DW_AT_name, DW_FORM_strp, "/tmp/main.c");
  CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

  DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
  SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, "main");
  SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
  SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

  DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
  IntDie.addAttribute(DW_AT_name, DW_FORM_strp, "int");
  IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
  IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

  DwarfGenDIE ArgcDie =

SubprogramDie.addChild(DW_TAG_formal_parameter);

  ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, "argc");
  //ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //

Crashes here...

  DG.generate();

  auto Obj = object::ObjectFile::createObjectFile(Path);
  if (Obj) {
    DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
    uint32_t NumCUs = DwarfContext.getNumCompileUnits();
    for (uint32_t i=0; i<NumCUs; ++i) {
      DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
      if (U)
        U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
    }
  }

But things fall down if I try to uncomment the DW_FORM_ref_addr line

above. The problem is that AsmPrinter really expects a full stack of stuff
to be there and expects people to use the DwarfDebug class and all of its
associated classes. These associated classes really want to use the "DI"
objects (DICompileUnit, etc) so to create a compile unit we would need to
create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
That stack is pretty heavy and requires the code shown above to create
many many classes just to represent the simple output we wish to emit.
Another downside of the AsmPrinter method is we don't know which targets
people are going to build into their binaries and thus we don't know which
triples we will be able to use when generating DWARF info. Adrian Prantl
attempted to help me get things working over here and we kept running into
roadblocks.

It'd be great to have more detail about the roadblocks you hit to better

understand how bad/what the issues are.

A few blocks:

- DIEString doesn't support DW_FORM_string. DW_FORM_string support might
have been pulled so that we never emit it from clang, but we would want to
have a unit test that covers being able to read an inlined C string from a
DIE. Support won't be that hard to add, but we might not want it so that
people can't use it by accident and make less efficient DWARF.

Seems to me we originally supported only DW_FORM_string, and then at some
point it was tossed in favor of DW_FORM_strp in order to get space savings
from string pooling. In fact using DW_FORM_string for small strings would
save some more space (admittedly not much) and a bunch of relocations.
(I found data from an old experiment, in a debug build of Clang it saved
~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

This is true, but it also adversely affects DWARF parsing speed as you will need to manually skip each C string when parsing the DIEs.

I'd favor an API that passed the string down and let the DIE generator
(as opposed to the DWARF generator) pick the form.

I have currently added a DIEInlinedString class that can be used for DW_FORM_string attributes.

- Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts
in bool AsmPrinter::doInitialization(Module &M). On the first line:

MMI = getAnalysisIfAvailable<MachineModuleInfo>();

This asserts if you use the AsmPrinter the way the DwarfLinker and the
AsmPrinter based DwarfGen does if you call this. You must call this to
generate the DebugDwarf. If you get past this by installing a Pass then we
assert at:

GCModuleInfo *MI = getAnalysisIfAvailable<GCModuleInfo>();
assert(MI && "AsmPrinter didn't require GCModuleInfo?");

If we don't have this, we don't get a DwarfDebug.

Even if we end up adding another set of code to generate DWARF (which

I'd really like to avoid) we'd want to, at some point, coalesce them back
together. Given the goal is to try to coalesce the DWARF parsing code in
LLDB and LLVM, it'd seem unfortunate if that effort just created another
similar (or larger) amount of work for DWARF generation.

This DWARF generator could just live in the unittests/DebugInfo/DWARF
directory so it wouldn't pollute anything in LLVM it we do choose to use
it.

I wanted to pass this patch along in case someone wants to take a look

at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and
lib/CodeGen/DwarfGenerator.h. The code that sets up all the required
classes for the AsmPrinter method is in the DwarfGen class from
lib/CodeGen/DwarfGenerator.cpp in the following function:

bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);

The code in this function was looted from existing DwarfLinker.cpp code.

This functions requires a valid triple and that triple is used to create a
lot of the classes required to make the AsmPrinter. I am not sure if any
other code uses the AsmPrinter like this besides the DwarfLinker.cpp code
and that code uses its own magic to actually link the DWARF. It does reuse
some of the functions as I did, but the DwarfLinker doesn't use any of the
DwarfDebug, DwarfCompileUnit or any of the classes that the
compiler/assembler uses when making DWARF.

What's the DwarfLinker code missing that you need? If that code is

generating essentially arbitrary DWARF, what's blocking using the same
technique for generating DWARF for parsing tests?

They don't use any of the DwarfDebug, DwarfCompileUnit classes. They also
don't use any of the DI classes when making up the debug info. So both the
DWARF linker and the generator have similar needs: make DWARF that isn't
tied too closely to the clang internal classes and DI classes.

The amount of work required for refactoring the AsmPrinter exceeds the

time I am going to have, but I would still like to have DWARF API testing
in the unit tests.

So my question is if anyone would have objections to using the stand

along DWARF generator in unittests/DebugInfo/DWARF until we can later get
the YAML tools to be able to produce DWARF and we can switch to testing
the DWARF data that way? Chris Bieneman has expressed interest in getting
a DWARF/YAML layer going.

Those tools would still want to use pretty similar (conceptually)

abstractions to LLVM's codegen and llvm-dsymutil. I'd still strongly
prefer to generalize/keep common APIs here - or better understand why it's
not practical now (& what it will take/how we make sure we have a plan and
resources to get there eventually).

My reasoning is:
- I want to be able to test DWARF APIs we have to ensure they work

correctly as there are no Dwarf API tests right now. I will be adding code
that changes many things in the DWARF parser and it will be essential to
verify that there are no regressions in the DWARF APIs.

- Not sure which targets would be built into LLVM so it might be hard to

write tests that cover 32/64 bit addresses and all the variants if we have
to do things legally via AsmPrinter and valid targets

Seems like it might be plausible to refactor out whatever features of

the AsmPrinter these APIs require (so we just harvest that data out of
AsmPrinter and pass it down in a struct, say - so that other users can
pass their own struct without needing an AsmPrinter). Though, again,
interested to know how dsymutil is working in these situations.

I can try that method if indeed the only places that use the DwarfDebug
are the DW_FORM_ref_addr and location lists. I'll let you know how that
goes.

- Not enough time to modify AsmPrinter to not require the full DebugInfo

stack and the classes that it uses (llvm::DwarfCompileUnit which must use
llvm::DICompileUnit, llvm::DIE class which uses many local classes that
all depend on the full DwarfDebug stack).

Will you have time at some later date to come back and revisit this?

It's understandable that we may choose to incur short term technical debt
with an understanding that it will be paid off in some timely manner. It'd
be less desirable if there's no such plan/possibility and we incur a
fairly clear case of technical debt (redundant DWARF generation libraries
- especially when this effort is to remove a redundant DWARF parser).

Not sure anyone else will need to generate DWARF manually. The two clients
currently are the DWARF unittests and the DwarfLinker. The DwarfLinker
worked around these issues. If the AsmPrinter wasn't such an integral part
of the entire compiler stack, I could take a stab at refactoring it, but I
don't believe I am the right person to do this at this point as I have no
experience or knowledge of the various ways that this class is used, or
how it interacts with other support classes (DwarfDebug, and many many
other classes).

Things that still worry me:
- not being able to generate DWARF for 32/64 if targets are missing

You mean DWARF-32 and DWARF-64 formats? LLVM doesn't do DWARF-64.
If you mean 64-bit target-machine addresses, I guess I don't understand
the problem. If you have target-dependent tests, then they only work
when the right targets are there. This is extremely common and I'm
not clear why it would be a problem for the DWARF tests.

I wasn't aware that there were target-dependent tests. Do you know of one in the unittest directory you can point me to? I did mean 32 bit address target, versus 64 bit address targets. I am not sure how I can test 4 and 8 byte addresses reliably. What triple to I use in the unittest? I can't assume x86_64 as we may have been built on a 32 bit ARM system with only the 32 bit ARM targets.

- DIEString not supporting DW_FORM_string. I can add support, but I don't
know if we want it as if we add it people might start using it.

See above. If the API picked the form this would not be a concern.

For DWARF parsing speed I still like the DW_FORM_strp.

I have recently been modifying the DWARF parser and have more patches planned and I want to be able to add unit tests that test the internal llvm DWARF APIs to ensure they continue to work and also validate the changes that I am making. There are not many DWARF unit tests other than very simple ones that test DWARF forms currently. I would like to expand this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the issues with the patch was a stand alone DWARF generator that can turn a few API calls into the section data required for the DWARFContextInMemory class to be able to load DWARF from. The idea is to generate a small blurb of DWARF, parse it using our built in DWARF parser and validate that the API calls we do when consuming the DWARF match what we expect. The original stand along DWARF generator class is in unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached. The original review suggested that I try to use the AsmPrinter and many of its associated classes to generate the DWARF. I attempted to do so and the AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was able to get simple DWARF to be emitted with the AsmPrinter version of the DWARF generator with code like:

initLLVM();
DwarfGen DG;
Triple Triple(“x86_64–”);
StringRef Path("/tmp/test.elf");
bool DwarfInitSuccess = DG.init(Triple, Path);
EXPECT_TRUE(DwarfInitSuccess);
uint16_t Version = 4;
uint8_t AddrSize = 8;
DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
DwarfGenDIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, “/tmp/main.c”);
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, “main”);
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, “int”);
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

DwarfGenDIE ArgcDie = SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, “argc”);
//ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); // Crashes here…

DG.generate();

auto Obj = object::ObjectFile::createObjectFile(Path);
if (Obj) {
DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
uint32_t NumCUs = DwarfContext.getNumCompileUnits();
for (uint32_t i=0; i<NumCUs; ++i) {
DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
if (U)
U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
}
}

But things fall down if I try to uncomment the DW_FORM_ref_addr line above. The problem is that AsmPrinter really expects a full stack of stuff to be there and expects people to use the DwarfDebug class and all of its associated classes. These associated classes really want to use the “DI” objects (DICompileUnit, etc) so to create a compile unit we would need to create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit. That stack is pretty heavy and requires the code shown above to create many many classes just to represent the simple output we wish to emit. Another downside of the AsmPrinter method is we don’t know which targets people are going to build into their binaries and thus we don’t know which triples we will be able to use when generating DWARF info. Adrian Prantl attempted to help me get things working over here and we kept running into roadblocks.

It’d be great to have more detail about the roadblocks you hit to better understand how bad/what the issues are.

A few blocks:

  • DIEString doesn’t support DW_FORM_string. DW_FORM_string support might have been pulled so that we never emit it from clang, but we would want to have a unit test that covers being able to read an inlined C string from a DIE. Support won’t be that hard to add, but we might not want it so that people can’t use it by accident and make less efficient DWARF.

The codepaths for generating strings in DWARF in LLVM are pretty streamlined (see DwarfUnit::addString) - it’s not likely the presence of support for DW_FORM_string will suddenly cause anyone to mistakenly create such an attribute because they all go through that API because it’s convenient to do so & handles all the cases.

If we ever wanted to support DW_FORM_string (perhaps for users who care a lot more about size than the performance characteristics you’re concerned with (I don’t know if GDB does the same optimizations - so perhaps we’re paying the size for little/no benefit there, for example)) we would do it in that one place, quite deliberately/explicitly.

  • Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts in bool AsmPrinter::doInitialization(Module &M). On the first line:

MMI = getAnalysisIfAvailable();

This asserts if you use the AsmPrinter the way the DwarfLinker and the AsmPrinter based DwarfGen does if you call this. You must call this to generate the DebugDwarf. If you get past this by installing a Pass then we assert at:

GCModuleInfo *MI = getAnalysisIfAvailable();
assert(MI && “AsmPrinter didn’t require GCModuleInfo?”);

If we don’t have this, we don’t get a DwarfDebug.

Right - my general thinking is: DwarfLinker uses these APIs somehow so I expect you could use them similarly. If there are differences in your use case that make DwarfLinker’s usage not applicable to yours, it’d be good to understand that. If it’s just that DwarfLinker’s putting up with pain you’d rather not - well, seems like it’d be good to fix/ease that pain for both users.

Even if we end up adding another set of code to generate DWARF (which I’d really like to avoid) we’d want to, at some point, coalesce them back together. Given the goal is to try to coalesce the DWARF parsing code in LLDB and LLVM, it’d seem unfortunate if that effort just created another similar (or larger) amount of work for DWARF generation.

This DWARF generator could just live in the unittests/DebugInfo/DWARF directory so it wouldn’t pollute anything in LLVM it we do choose to use it.

It’s not so much about pollution - much like the DWARF parser in LLDB and LLVM didn’t pollute each other - it’s still two systems that end up growing their own features, bugs, fixes, etc, which is best avoided.

I wanted to pass this patch along in case someone wants to take a look at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and lib/CodeGen/DwarfGenerator.h. The code that sets up all the required classes for the AsmPrinter method is in the DwarfGen class from lib/CodeGen/DwarfGenerator.cpp in the following function:

bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);

The code in this function was looted from existing DwarfLinker.cpp code. This functions requires a valid triple and that triple is used to create a lot of the classes required to make the AsmPrinter. I am not sure if any other code uses the AsmPrinter like this besides the DwarfLinker.cpp code and that code uses its own magic to actually link the DWARF. It does reuse some of the functions as I did, but the DwarfLinker doesn’t use any of the DwarfDebug, DwarfCompileUnit or any of the classes that the compiler/assembler uses when making DWARF.

What’s the DwarfLinker code missing that you need? If that code is generating essentially arbitrary DWARF, what’s blocking using the same technique for generating DWARF for parsing tests?

They don’t use any of the DwarfDebug, DwarfCompileUnit classes. They also don’t use any of the DI classes when making up the debug info. So both the DWARF linker and the generator have similar needs: make DWARF that isn’t tied too closely to the clang internal classes and DI classes.

Right - I don’t think either of these use cases should use the DI classes. So how is the DwarfLinker working to generate DWARF in ways that don’t work for your needs?

The amount of work required for refactoring the AsmPrinter exceeds the time I am going to have, but I would still like to have DWARF API testing in the unit tests.

So my question is if anyone would have objections to using the stand along DWARF generator in unittests/DebugInfo/DWARF until we can later get the YAML tools to be able to produce DWARF and we can switch to testing the DWARF data that way? Chris Bieneman has expressed interest in getting a DWARF/YAML layer going.

Those tools would still want to use pretty similar (conceptually) abstractions to LLVM’s codegen and llvm-dsymutil. I’d still strongly prefer to generalize/keep common APIs here - or better understand why it’s not practical now (& what it will take/how we make sure we have a plan and resources to get there eventually).

My reasoning is:

  • I want to be able to test DWARF APIs we have to ensure they work correctly as there are no Dwarf API tests right now. I will be adding code that changes many things in the DWARF parser and it will be essential to verify that there are no regressions in the DWARF APIs.
  • Not sure which targets would be built into LLVM so it might be hard to write tests that cover 32/64 bit addresses and all the variants if we have to do things legally via AsmPrinter and valid targets

Seems like it might be plausible to refactor out whatever features of the AsmPrinter these APIs require (so we just harvest that data out of AsmPrinter and pass it down in a struct, say - so that other users can pass their own struct without needing an AsmPrinter). Though, again, interested to know how dsymutil is working in these situations.

I can try that method if indeed the only places that use the DwarfDebug are the DW_FORM_ref_addr and location lists. I’ll let you know how that goes.

  • Not enough time to modify AsmPrinter to not require the full DebugInfo stack and the classes that it uses (llvm::DwarfCompileUnit which must use llvm::DICompileUnit, llvm::DIE class which uses many local classes that all depend on the full DwarfDebug stack).

Will you have time at some later date to come back and revisit this? It’s understandable that we may choose to incur short term technical debt with an understanding that it will be paid off in some timely manner. It’d be less desirable if there’s no such plan/possibility and we incur a fairly clear case of technical debt (redundant DWARF generation libraries - especially when this effort is to remove a redundant DWARF parser).

Not sure anyone else will need to generate DWARF manually.

I think all 3 cases (unit tests, dsymutil, llvm codegen) all generate DWARF about as ‘manually’ as each other & would likely benefit from using the same APIs/commonality as much as possible.

The two clients currently are the DWARF unittests and the DwarfLinker. The DwarfLinker worked around these issues. If the AsmPrinter wasn’t such an integral part of the entire compiler stack, I could take a stab at refactoring it, but I don’t believe I am the right person to do this at this point as I have no experience or knowledge of the various ways that this class is used, or how it interacts with other support classes (DwarfDebug, and many many other classes).

Right - I’d generally just suggest doing whatever DwarfLinker’s doing, as a baseline - if that situation can be improved (in small or large ways) that’d be cool too. But I don’t think creating a second DWARF generation library is exactly forward progress.

Things that still worry me:

  • not being able to generate DWARF for 32/64 if targets are missing

How’s DwarfLinker deal with this? Or does it punt on it?

In any case, again, hopefully things can be extracted/refactored a bit to simplify this.

  • DIEString not supporting DW_FORM_string. I can add support, but I don’t know if we want it as if we add it people might start using it.

I think it’ll be fine - we only generate them in one place, if we ever support DW_FORM_string it’ll be a very explicit choice/explicit code to do so.

  • hacking around asserts by constructing classes and copying code from places that properly use the AsmPrinter that way it is supposed to be used so that we can use it in a way that it wasn’t designed to be used.

Right - refactoring is probably required (or at least cribbing whatever techniques DwarfLinker’s already using to do the same thing), which is how these things tend to go.

From: Greg Clayton [mailto:gclayton@apple.com]
Sent: Thursday, November 17, 2016 5:01 PM
To: David Blaikie
Cc: llvm-dev@lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
Prantl
Subject: Re: [llvm-dev] DWARF Generator

I have recently been modifying the DWARF parser and have more patches
planned and I want to be able to add unit tests that test the internal
llvm DWARF APIs to ensure they continue to work and also validate the
changes that I am making. There are not many DWARF unit tests other than
very simple ones that test DWARF forms currently. I would like to expand
this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the
issues with the patch was a stand alone DWARF generator that can turn a
few API calls into the section data required for the DWARFContextInMemory
class to be able to load DWARF from. The idea is to generate a small blurb
of DWARF, parse it using our built in DWARF parser and validate that the
API calls we do when consuming the DWARF match what we expect. The
original stand along DWARF generator class is in
unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
The original review suggested that I try to use the AsmPrinter and many of
its associated classes to generate the DWARF. I attempted to do so and the
AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was
able to get simple DWARF to be emitted with the AsmPrinter version of the
DWARF generator with code like:

initLLVM();
DwarfGen DG;
Triple Triple(“x86_64–”);
StringRef Path("/tmp/test.elf");
bool DwarfInitSuccess = DG.init(Triple, Path);
EXPECT_TRUE(DwarfInitSuccess);
uint16_t Version = 4;
uint8_t AddrSize = 8;
DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
DwarfGenDIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, “/tmp/main.c”);
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, “main”);
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, “int”);
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

DwarfGenDIE ArgcDie =
SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, “argc”);
//ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //
Crashes here…

DG.generate();

auto Obj = object::ObjectFile::createObjectFile(Path);
if (Obj) {
DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
uint32_t NumCUs = DwarfContext.getNumCompileUnits();
for (uint32_t i=0; i<NumCUs; ++i) {
DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
if (U)
U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
}
}

But things fall down if I try to uncomment the DW_FORM_ref_addr line
above. The problem is that AsmPrinter really expects a full stack of stuff
to be there and expects people to use the DwarfDebug class and all of its
associated classes. These associated classes really want to use the “DI”
objects (DICompileUnit, etc) so to create a compile unit we would need to
create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
That stack is pretty heavy and requires the code shown above to create
many many classes just to represent the simple output we wish to emit.
Another downside of the AsmPrinter method is we don’t know which targets
people are going to build into their binaries and thus we don’t know which
triples we will be able to use when generating DWARF info. Adrian Prantl
attempted to help me get things working over here and we kept running into
roadblocks.

It’d be great to have more detail about the roadblocks you hit to better
understand how bad/what the issues are.

A few blocks:

  • DIEString doesn’t support DW_FORM_string. DW_FORM_string support might
    have been pulled so that we never emit it from clang, but we would want to
    have a unit test that covers being able to read an inlined C string from a
    DIE. Support won’t be that hard to add, but we might not want it so that
    people can’t use it by accident and make less efficient DWARF.

Seems to me we originally supported only DW_FORM_string, and then at some
point it was tossed in favor of DW_FORM_strp in order to get space savings
from string pooling. In fact using DW_FORM_string for small strings would
save some more space (admittedly not much) and a bunch of relocations.
(I found data from an old experiment, in a debug build of Clang it saved
~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

This is true, but it also adversely affects DWARF parsing speed as you will need to manually skip each C string when parsing the DIEs.

I’d favor an API that passed the string down and let the DIE generator
(as opposed to the DWARF generator) pick the form.

I have currently added a DIEInlinedString class that can be used for DW_FORM_string attributes.

  • Asserts, asserts, asserts. As we tried to emit DWARF, we got an asserts
    in bool AsmPrinter::doInitialization(Module &M). On the first line:

MMI = getAnalysisIfAvailable();

This asserts if you use the AsmPrinter the way the DwarfLinker and the
AsmPrinter based DwarfGen does if you call this. You must call this to
generate the DebugDwarf. If you get past this by installing a Pass then we
assert at:

GCModuleInfo *MI = getAnalysisIfAvailable();
assert(MI && “AsmPrinter didn’t require GCModuleInfo?”);

If we don’t have this, we don’t get a DwarfDebug.

Even if we end up adding another set of code to generate DWARF (which
I’d really like to avoid) we’d want to, at some point, coalesce them back
together. Given the goal is to try to coalesce the DWARF parsing code in
LLDB and LLVM, it’d seem unfortunate if that effort just created another
similar (or larger) amount of work for DWARF generation.

This DWARF generator could just live in the unittests/DebugInfo/DWARF
directory so it wouldn’t pollute anything in LLVM it we do choose to use
it.

I wanted to pass this patch along in case someone wants to take a look
at how we can possibly fix the lib/CodeGen/DwarfGenerator.cpp and
lib/CodeGen/DwarfGenerator.h. The code that sets up all the required
classes for the AsmPrinter method is in the DwarfGen class from
lib/CodeGen/DwarfGenerator.cpp in the following function:

bool DwarfGen::init(Triple TheTriple, StringRef OutputFilename);

The code in this function was looted from existing DwarfLinker.cpp code.
This functions requires a valid triple and that triple is used to create a
lot of the classes required to make the AsmPrinter. I am not sure if any
other code uses the AsmPrinter like this besides the DwarfLinker.cpp code
and that code uses its own magic to actually link the DWARF. It does reuse
some of the functions as I did, but the DwarfLinker doesn’t use any of the
DwarfDebug, DwarfCompileUnit or any of the classes that the
compiler/assembler uses when making DWARF.

What’s the DwarfLinker code missing that you need? If that code is
generating essentially arbitrary DWARF, what’s blocking using the same
technique for generating DWARF for parsing tests?

They don’t use any of the DwarfDebug, DwarfCompileUnit classes. They also
don’t use any of the DI classes when making up the debug info. So both the
DWARF linker and the generator have similar needs: make DWARF that isn’t
tied too closely to the clang internal classes and DI classes.

The amount of work required for refactoring the AsmPrinter exceeds the
time I am going to have, but I would still like to have DWARF API testing
in the unit tests.

So my question is if anyone would have objections to using the stand
along DWARF generator in unittests/DebugInfo/DWARF until we can later get
the YAML tools to be able to produce DWARF and we can switch to testing
the DWARF data that way? Chris Bieneman has expressed interest in getting
a DWARF/YAML layer going.

Those tools would still want to use pretty similar (conceptually)
abstractions to LLVM’s codegen and llvm-dsymutil. I’d still strongly
prefer to generalize/keep common APIs here - or better understand why it’s
not practical now (& what it will take/how we make sure we have a plan and
resources to get there eventually).

My reasoning is:

  • I want to be able to test DWARF APIs we have to ensure they work
    correctly as there are no Dwarf API tests right now. I will be adding code
    that changes many things in the DWARF parser and it will be essential to
    verify that there are no regressions in the DWARF APIs.
  • Not sure which targets would be built into LLVM so it might be hard to
    write tests that cover 32/64 bit addresses and all the variants if we have
    to do things legally via AsmPrinter and valid targets

Seems like it might be plausible to refactor out whatever features of
the AsmPrinter these APIs require (so we just harvest that data out of
AsmPrinter and pass it down in a struct, say - so that other users can
pass their own struct without needing an AsmPrinter). Though, again,
interested to know how dsymutil is working in these situations.

I can try that method if indeed the only places that use the DwarfDebug
are the DW_FORM_ref_addr and location lists. I’ll let you know how that
goes.

  • Not enough time to modify AsmPrinter to not require the full DebugInfo
    stack and the classes that it uses (llvm::DwarfCompileUnit which must use
    llvm::DICompileUnit, llvm::DIE class which uses many local classes that
    all depend on the full DwarfDebug stack).

Will you have time at some later date to come back and revisit this?
It’s understandable that we may choose to incur short term technical debt
with an understanding that it will be paid off in some timely manner. It’d
be less desirable if there’s no such plan/possibility and we incur a
fairly clear case of technical debt (redundant DWARF generation libraries

  • especially when this effort is to remove a redundant DWARF parser).

Not sure anyone else will need to generate DWARF manually. The two clients
currently are the DWARF unittests and the DwarfLinker. The DwarfLinker
worked around these issues. If the AsmPrinter wasn’t such an integral part
of the entire compiler stack, I could take a stab at refactoring it, but I
don’t believe I am the right person to do this at this point as I have no
experience or knowledge of the various ways that this class is used, or
how it interacts with other support classes (DwarfDebug, and many many
other classes).

Things that still worry me:

  • not being able to generate DWARF for 32/64 if targets are missing

You mean DWARF-32 and DWARF-64 formats? LLVM doesn’t do DWARF-64.
If you mean 64-bit target-machine addresses, I guess I don’t understand
the problem. If you have target-dependent tests, then they only work
when the right targets are there. This is extremely common and I’m
not clear why it would be a problem for the DWARF tests.

I wasn’t aware that there were target-dependent tests. Do you know of one in the unittest directory you can point me to? I did mean 32 bit address target, versus 64 bit address targets. I am not sure how I can test 4 and 8 byte addresses reliably. What triple to I use in the unittest? I can’t assume x86_64 as we may have been built on a 32 bit ARM system with only the 32 bit ARM targets.

  • DIEString not supporting DW_FORM_string. I can add support, but I don’t
    know if we want it as if we add it people might start using it.

See above. If the API picked the form this would not be a concern.

For DWARF parsing speed I still like the DW_FORM_strp.

FWIW I’m with Greg here. I don’t find that the “inlined small strings” optimization is really worth it for size, but could be convinced to add it.

-eric

After Paul Robinson's comments on how few places were using DwarfDebug in AsmPrinter, I switched from trying to create a DwarfDebug to a solution that works with requiring a DebugDebug. There were in fact only 2 places that used the DD (DwarfDebug) variable in AsmPrinter and they are easy to work around.

I added a "uint16_6 DwarfVersion;" and "bool SplitDwarf;" member variables into AsmPrinter. DwarfVersion is initialized with zero and set to a valid version if the AsmPrinter creates a DebugDebug object in AsmPrinter::doInitialization(...), or if clients manually call "void AsmPrinter::setDwarfInfo(uint16_t Version, bool UseSplitDwarf);". I then added accessors in AsmPrinter:

  uint16_t getDwarfVersion() const;
  bool getUseSplitDwarf() const;

The "AsmPrinter::getDwarfVersion()" will assert if the DwarfVersion is zero and explains that either a DwarfDebug must be created via AsmPrinter::doInitialization() or users should manally call AsmPrinter::setDwarfInfo(...)".

I also added DW_FORM_string support.

So I believe I have a workable solution using the AsmPrinter.

Thanks for the help. Will post a patch to review soon.

Greg

From: Greg Clayton [mailto:gclayton@apple.com]
Sent: Thursday, November 17, 2016 5:01 PM
To: David Blaikie
Cc: llvm-dev@lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
Prantl
Subject: Re: [llvm-dev] DWARF Generator

I have recently been modifying the DWARF parser and have more patches
planned and I want to be able to add unit tests that test the internal
llvm DWARF APIs to ensure they continue to work and also validate the
changes that I am making. There are not many DWARF unit tests other than
very simple ones that test DWARF forms currently. I would like to expand
this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the
issues with the patch was a stand alone DWARF generator that can turn a
few API calls into the section data required for the DWARFContextInMemory
class to be able to load DWARF from. The idea is to generate a small blurb
of DWARF, parse it using our built in DWARF parser and validate that the
API calls we do when consuming the DWARF match what we expect. The
original stand along DWARF generator class is in
unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
The original review suggested that I try to use the AsmPrinter and many of
its associated classes to generate the DWARF. I attempted to do so and the
AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was
able to get simple DWARF to be emitted with the AsmPrinter version of the
DWARF generator with code like:

initLLVM();
DwarfGen DG;
Triple Triple(“x86_64–”);
StringRef Path("/tmp/test.elf");
bool DwarfInitSuccess = DG.init(Triple, Path);
EXPECT_TRUE(DwarfInitSuccess);
uint16_t Version = 4;
uint8_t AddrSize = 8;
DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
DwarfGenDIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, “/tmp/main.c”);
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, “main”);
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, “int”);
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

DwarfGenDIE ArgcDie =
SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, “argc”);
//ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //
Crashes here…

DG.generate();

auto Obj = object::ObjectFile::createObjectFile(Path);
if (Obj) {
DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
uint32_t NumCUs = DwarfContext.getNumCompileUnits();
for (uint32_t i=0; i<NumCUs; ++i) {
DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
if (U)
U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
}
}

But things fall down if I try to uncomment the DW_FORM_ref_addr line
above. The problem is that AsmPrinter really expects a full stack of stuff
to be there and expects people to use the DwarfDebug class and all of its
associated classes. These associated classes really want to use the “DI”
objects (DICompileUnit, etc) so to create a compile unit we would need to
create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
That stack is pretty heavy and requires the code shown above to create
many many classes just to represent the simple output we wish to emit.
Another downside of the AsmPrinter method is we don’t know which targets
people are going to build into their binaries and thus we don’t know which
triples we will be able to use when generating DWARF info. Adrian Prantl
attempted to help me get things working over here and we kept running into
roadblocks.

It’d be great to have more detail about the roadblocks you hit to better
understand how bad/what the issues are.

A few blocks:

  • DIEString doesn’t support DW_FORM_string. DW_FORM_string support might
    have been pulled so that we never emit it from clang, but we would want to
    have a unit test that covers being able to read an inlined C string from a
    DIE. Support won’t be that hard to add, but we might not want it so that
    people can’t use it by accident and make less efficient DWARF.

Seems to me we originally supported only DW_FORM_string, and then at some
point it was tossed in favor of DW_FORM_strp in order to get space savings
from string pooling. In fact using DW_FORM_string for small strings would
save some more space (admittedly not much) and a bunch of relocations.
(I found data from an old experiment, in a debug build of Clang it saved
~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

This is true, but it also adversely affects DWARF parsing speed as you will need to manually skip each C string when parsing the DIEs.

Sorry, thread got derailed a bit - the main point was that Greg wants support for generating DW_FORM_string to test the parsing support for it (for use in the debugger with debug info generated by other sources that do use that form), even if LLVM never generates it.

Greg was hesitant to add support for generating it to the existing LLVM DWARF Code (lib/CodeGen/AsmPrinter) on the risk that someone might make the mistake of writing code in LLVM that would generate that form in its object output. I’m pretty comfortable with (& have mentioned above in the thread) that risk being quite low, given the way the code works there (we have one common API for generating string forms that picks between strp and str_index, for example - so I don’t think _string would sneak in by accident. If it were added, it’d be done so explicitly/intentionally in that code with whatever tradeoffs appropriately considered)

I think it’s reasonable to add the support in the common APIs so we can use it to generate sample DWARF for testing the DWARF parsing APIs.

From: Greg Clayton [mailto:gclayton@apple.com]
Sent: Thursday, November 17, 2016 5:01 PM
To: David Blaikie
Cc: llvm-dev@lists.llvm.org; Robinson, Paul; Eric Christopher; Adrian
Prantl
Subject: Re: [llvm-dev] DWARF Generator

I have recently been modifying the DWARF parser and have more patches
planned and I want to be able to add unit tests that test the internal
llvm DWARF APIs to ensure they continue to work and also validate the
changes that I am making. There are not many DWARF unit tests other than
very simple ones that test DWARF forms currently. I would like to expand
this to include many more tests.

I had submitted a patch that I aborted as it was too large. One of the
issues with the patch was a stand alone DWARF generator that can turn a
few API calls into the section data required for the DWARFContextInMemory
class to be able to load DWARF from. The idea is to generate a small blurb
of DWARF, parse it using our built in DWARF parser and validate that the
API calls we do when consuming the DWARF match what we expect. The
original stand along DWARF generator class is in
unittests/DebugInfo/DWARF/DWARFGenerator2.{h,cpp} in the patch attached.
The original review suggested that I try to use the AsmPrinter and many of
its associated classes to generate the DWARF. I attempted to do so and the
AsmPrinter version is in lib/CodeGen/DwarfGenerator.{h,cpp} in the patch
attached. This AsmPrinter based code steals code from the DwarfLinker.cpp.

I am having trouble getting things to work with the AsmPrinter. I was
able to get simple DWARF to be emitted with the AsmPrinter version of the
DWARF generator with code like:

initLLVM();
DwarfGen DG;
Triple Triple(“x86_64–”);
StringRef Path("/tmp/test.elf");
bool DwarfInitSuccess = DG.init(Triple, Path);
EXPECT_TRUE(DwarfInitSuccess);
uint16_t Version = 4;
uint8_t AddrSize = 8;
DwarfGenCU &CU = DG.appendCompileUnit(Version, AddrSize);
DwarfGenDIE CUDie = CU.getUnitDIE();

CUDie.addAttribute(DW_AT_name, DW_FORM_strp, “/tmp/main.c”);
CUDie.addAttribute(DW_AT_language, DW_FORM_data2, DW_LANG_C);

DwarfGenDIE SubprogramDie = CUDie.addChild(DW_TAG_subprogram);
SubprogramDie.addAttribute(DW_AT_name, DW_FORM_strp, “main”);
SubprogramDie.addAttribute(DW_AT_low_pc, DW_FORM_addr, 0x1000U);
SubprogramDie.addAttribute(DW_AT_high_pc, DW_FORM_addr, 0x2000U);

DwarfGenDIE IntDie = CUDie.addChild(DW_TAG_base_type);
IntDie.addAttribute(DW_AT_name, DW_FORM_strp, “int”);
IntDie.addAttribute(DW_AT_encoding, DW_FORM_data1, DW_ATE_signed);
IntDie.addAttribute(DW_AT_byte_size, DW_FORM_data1, 4);

DwarfGenDIE ArgcDie =
SubprogramDie.addChild(DW_TAG_formal_parameter);
ArgcDie.addAttribute(DW_AT_name, DW_FORM_strp, “argc”);
//ArgcDie.addAttribute(DW_AT_type, DW_FORM_ref_addr, IntDie); //
Crashes here…

DG.generate();

auto Obj = object::ObjectFile::createObjectFile(Path);
if (Obj) {
DWARFContextInMemory DwarfContext(*Obj.get().getBinary());
uint32_t NumCUs = DwarfContext.getNumCompileUnits();
for (uint32_t i=0; i<NumCUs; ++i) {
DWARFCompileUnit *U = DwarfContext.getCompileUnitAtIndex(i);
if (U)
U->getUnitDIE(false)->dump(llvm::outs(), U, -1u);
}
}

But things fall down if I try to uncomment the DW_FORM_ref_addr line
above. The problem is that AsmPrinter really expects a full stack of stuff
to be there and expects people to use the DwarfDebug class and all of its
associated classes. These associated classes really want to use the “DI”
objects (DICompileUnit, etc) so to create a compile unit we would need to
create DICompileUnit object and then make a AsmPrinter/DwarfCompileUnit.
That stack is pretty heavy and requires the code shown above to create
many many classes just to represent the simple output we wish to emit.
Another downside of the AsmPrinter method is we don’t know which targets
people are going to build into their binaries and thus we don’t know which
triples we will be able to use when generating DWARF info. Adrian Prantl
attempted to help me get things working over here and we kept running into
roadblocks.

It’d be great to have more detail about the roadblocks you hit to better
understand how bad/what the issues are.

A few blocks:

  • DIEString doesn’t support DW_FORM_string. DW_FORM_string support might
    have been pulled so that we never emit it from clang, but we would want to
    have a unit test that covers being able to read an inlined C string from a
    DIE. Support won’t be that hard to add, but we might not want it so that
    people can’t use it by accident and make less efficient DWARF.

Seems to me we originally supported only DW_FORM_string, and then at some
point it was tossed in favor of DW_FORM_strp in order to get space savings
from string pooling. In fact using DW_FORM_string for small strings would
save some more space (admittedly not much) and a bunch of relocations.
(I found data from an old experiment, in a debug build of Clang it saved
~0.7MB out of a total 340MB of debug-info size, and >360K ELF relocations.)

This is true, but it also adversely affects DWARF parsing speed as you will need to manually skip each C string when parsing the DIEs.

Sorry, thread got derailed a bit - the main point was that Greg wants support for generating DW_FORM_string to test the parsing support for it (for use in the debugger with debug info generated by other sources that do use that form), even if LLVM never generates it.

Greg was hesitant to add support for generating it to the existing LLVM DWARF Code (lib/CodeGen/AsmPrinter) on the risk that someone might make the mistake of writing code in LLVM that would generate that form in its object output. I’m pretty comfortable with (& have mentioned above in the thread) that risk being quite low, given the way the code works there (we have one common API for generating string forms that picks between strp and str_index, for example - so I don’t think _string would sneak in by accident. If it were added, it’d be done so explicitly/intentionally in that code with whatever tradeoffs appropriately considered)

I think it’s reasonable to add the support in the common APIs so we can use it to generate sample DWARF for testing the DWARF parsing APIs.

Agreed :slight_smile:

-eric

Re DW_FORM_string the savings is small but since it came up I thought
I'd mention the idea. I agree it's not worth pursuing any further.
An API to use that form explicitly (that isn't the normal debug-info
generation API) to facilitate testing is fine.

Regarding DWARF parsing speed where strings are concerned: DWARF 5 will
ruin this because DW_FORM_strx is a ULEB; so, every DIE that has a
DW_AT_name or DW_AT_linkage_name will become variable-size (just as bad
as DW_FORM_string). In a RelWithDebInfo build of Clang, I got just over
100 million DIEs and 40.2% had DW_AT_name. I didn't try to count DIEs
that had DW_AT_linkage_name without DW_AT_name; it can happen, though, so
the number of variable-size DIEs in DWARF 5 will actually be higher than
that.

Maybe we should propose a fixed-size variant of FORM_strx, or that strx
just be fixed size? The public-comment review period for DWARF 5 is not
over yet, and this seems like a valid concern.
--paulr

Regarding DWARF parsing speed where strings are concerned: DWARF 5 will
ruin this because DW_FORM_strx is a ULEB; so, every DIE that has a
DW_AT_name or DW_AT_linkage_name will become variable-size (just as bad
as DW_FORM_string). In a RelWithDebInfo build of Clang, I got just over
100 million DIEs and 40.2% had DW_AT_name. I didn't try to count DIEs
that had DW_AT_linkage_name without DW_AT_name; it can happen, though, so
the number of variable-size DIEs in DWARF 5 will actually be higher than
that.

More data from my RelWithDebInfo build of Clang:

102,654,608 total DIEs
95,864,960 fixed-size in DWARF 4
58,189,225 fixed-size in DWARF 5, assuming
            all FORM_strp attributes become FORM_strx

That is, fixed-size DIEs go from 93% to 57% of all DIEs.
If it's really faster to load fixed-size DIEs, this is likely to have
a noticeable effect.

Maybe we should propose a fixed-size variant of FORM_strx, or that strx
just be fixed size? The public-comment review period for DWARF 5 is not
over yet, and this seems like a valid concern.

I have a little more data to look at but I am going to propose this to
the DWARF committee.

Regarding DWARF parsing speed where strings are concerned: DWARF 5 will
ruin this because DW_FORM_strx is a ULEB; so, every DIE that has a
DW_AT_name or DW_AT_linkage_name will become variable-size (just as bad
as DW_FORM_string). In a RelWithDebInfo build of Clang, I got just over
100 million DIEs and 40.2% had DW_AT_name. I didn’t try to count DIEs
that had DW_AT_linkage_name without DW_AT_name; it can happen, though, so
the number of variable-size DIEs in DWARF 5 will actually be higher than
that.

More data from my RelWithDebInfo build of Clang:

102,654,608 total DIEs
95,864,960 fixed-size in DWARF 4
58,189,225 fixed-size in DWARF 5, assuming
all FORM_strp attributes become FORM_strx

That is, fixed-size DIEs go from 93% to 57% of all DIEs.
If it’s really faster to load fixed-size DIEs, this is likely to have
a noticeable effect.

Maybe we should propose a fixed-size variant of FORM_strx, or that strx
just be fixed size? The public-comment review period for DWARF 5 is not
over yet, and this seems like a valid concern.

I have a little more data to look at but I am going to propose this to
the DWARF committee.

Agreed. Let’s talk offline about a proposal here if you’d like.

-eric