Adding accelerator tables to existing linked DWARF files

I am looking to create a tool that can add Apple or DWARF5 accelerator tables to fully linked executables that contain DWARF. This will help us benchmark how much accelerator tables can improve the debugging experience as debuggers don’t need to manually index all of the debug info during debugging.

Looking at how accelerator tables are currently emitted, they seem to be built up as DWARF is being created or linked, and then emitted using a subclass of DWARFEmitter. The only subclass if this right now that I see is one in dsymutil which ends up emitting everything using an AsmPrinter by eventually emitAppleAccelTable(…) from llvm/include/llvm/CodeGen/AccelTable.h.

I spoke briefly with Shoaib on this subject and he suggested adding code to llvm-objcopy. I briefly looked through the code and from what I can tell, llvm-objcopy doesn’t seem to have any DWARF abilities other than compressing DWARF sections. If we do add functionality to llvm-objcopy, are we ok pulling in DebugInfoDWARF and the LLVM object model? AFAICT the code for this tools has its own object file layer which doesn’t match the full layer inside of LLVM (llvm::ObjectFile and DWARFContext). Also, no AsmPrinter objects are used in this codebase either.

Looking at lld sources, is seems to use the DebugInfoDWARF library to some extent already. Not sure if this tool uses the standard LLVM object model or has all of its own emitters. Does lld use AsmPrinter at all? I don’t see any mention of it in there.

dsymutil has a --update feature which seems to load all of the DWARF and pretend to link it all the while generating the new accelerator table data, but I fear using this would pull it way too much code (AsmPrinter, all targets required to load all object files types, the standard llvm object file model (not the lld or llvm-objcopy versions), targets, etc).

My initial thoughts are:
1 - load a DWARFContext and iterate through the DWARF and build accelerator table data
2 - create the sections for the accelerator tables and either keep in memory or save to disk
3 - call functions to add the newly created sections to the binary

#1 should be easy as long as I can use a DWARFContext from DebugInfoDWARF.
#2 might need to be re-implemented using something other than an AsmPrinter?
#3 can use llvm-objcopy code if needed since it can add sections?

Any advice on how this can or should be implemented would be appreciated from anyone with experience.

Greg Clayton

I am looking to create a tool that can add Apple or DWARF5 accelerator tables to fully linked executables that contain DWARF. This will help us benchmark how much accelerator tables can improve the debugging experience as debuggers don't need to manually index all of the debug info during debugging.

Is it for ELF, Mach-O, wasm, COFF, or any of the combinations?

Looking at how accelerator tables are currently emitted, they seem to be built up as DWARF is being created or linked, and then emitted using a subclass of DWARFEmitter. The only subclass if this right now that I see is one in dsymutil which ends up emitting everything using an AsmPrinter by eventually emitAppleAccelTable(...) from llvm/include/llvm/CodeGen/AccelTable.h.

I spoke briefly with Shoaib on this subject and he suggested adding code to llvm-objcopy. I briefly looked through the code and from what I can tell, llvm-objcopy doesn't seem to have any DWARF abilities other than compressing DWARF sections. If we do add functionality to llvm-objcopy, are we ok pulling in DebugInfoDWARF and the LLVM object model? AFAICT the code for this tools has its own object file layer which doesn't match the full layer inside of LLVM (llvm::ObjectFile and DWARFContext). Also, no AsmPrinter objects are used in this codebase either.

llvm-objcopy supports various ad-hoc binary manipulation features where each feature does a very
simple task. Neither llvm-objcopy nor GNU objcopy knows DWARF. --strip-debug,
--compress-debug-sections, --add-gnu-debuglink and --only-keep-debug have "debug" in their names but
these features don't need to parse DWARF. (GNU objcopy has a --debugging but that only works for
a.out and coff, not elf).

Do we have a more suitable tool for such debugging functionality? dsymutil for ELF?

Looking at lld sources, is seems to use the DebugInfoDWARF library to some extent already. Not sure if this tool uses the standard LLVM object model or has all of its own emitters. Does lld use AsmPrinter at all? I don't see any mention of it in there.

--gdb-index and diagnostics (line tables) use DebugInfoDWARF. I have a plan to implement .debug_names, which is similar to --gdb-index.

FWIW I’m with Ray here :slight_smile:

-eric

I am looking to create a tool that can add Apple or DWARF5 accelerator tables to fully linked executables that contain DWARF. This will help us benchmark how much accelerator tables can improve the debugging experience as debuggers don’t need to manually index all of the debug info during debugging.

Is it for ELF, Mach-O, wasm, COFF, or any of the combinations?

Yes, for any object files that LLVM currently supports. But I am looking to support ELF first as MachO already has these tables available since dsymutil already creates either Apple or DWARF accelerator tables. COFF and Wasm can come later.

Looking at how accelerator tables are currently emitted, they seem to be built up as DWARF is being created or linked, and then emitted using a subclass of DWARFEmitter. The only subclass if this right now that I see is one in dsymutil which ends up emitting everything using an AsmPrinter by eventually emitAppleAccelTable(…) from llvm/include/llvm/CodeGen/AccelTable.h.

I spoke briefly with Shoaib on this subject and he suggested adding code to llvm-objcopy. I briefly looked through the code and from what I can tell, llvm-objcopy doesn’t seem to have any DWARF abilities other than compressing DWARF sections. If we do add functionality to llvm-objcopy, are we ok pulling in DebugInfoDWARF and the LLVM object model? AFAICT the code for this tools has its own object file layer which doesn’t match the full layer inside of LLVM (llvm::ObjectFile and DWARFContext). Also, no AsmPrinter objects are used in this codebase either.

llvm-objcopy supports various ad-hoc binary manipulation features where each feature does a very
simple task. Neither llvm-objcopy nor GNU objcopy knows DWARF. --strip-debug,
–compress-debug-sections, --add-gnu-debuglink and --only-keep-debug have “debug” in their names but
these features don’t need to parse DWARF. (GNU objcopy has a --debugging but that only works for
a.out and coff, not elf).

Do we have a more suitable tool for such debugging functionality? dsymutil for ELF?

dsymutil is such a tool, but it uses the llvm::ObjectFile layer and the llvm targets, so if you open a file that contains “armv7” architecture and the ARM target hasn’t been built into your bistro, it will fail to open this binary with an error that says:

No available targets are compatible with triple “arm-unknown-unknown”

I ran into this with a recent gsym patch that is trying to fix the buildbots for testing, but it fails when the ARM targets are not enabled and I try to load the DWARF from an object file:

http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/63473/steps/test-check-all/logs/stdio

And this is part of the reason for this email. I would love to not require llvm-gsymutil to require all LLVM targets to be there. DebugInfoDWARF doesn’t need the targets, it just needs to know address byte size and endianness and it can parse the debug info in the DWARF.

So my main question stands: do we want all tools that must manipulate DWARF to require the llvm::ObjectFile layer and all of the targets to be enabled just so that the object files can be parsed, or do we want to make lighter layer available, akin to what llvm-objcopy has, so more tools can take advantage of this lighter weight layer.

Looking at lld sources, is seems to use the DebugInfoDWARF library to some extent already. Not sure if this tool uses the standard LLVM object model or has all of its own emitters. Does lld use AsmPrinter at all? I don’t see any mention of it in there.

–gdb-index and diagnostics (line tables) use DebugInfoDWARF. I have a plan to implement .debug_names, which is similar to --gdb-index.

That is great, and we will share code for this of course between the tool I write and the modifications to lld. Does lld use the llvm::ObjectFile layer? or does it have its own lighter weight layer?

On other options would be to make a new “llvm-dwarfld” tool, where most of the functionality would exist llvm/lib/DwarfLinker and other locations. The idea would be to do any post processing to DWARF using this tool. For accelerator tables, it could just create the new sections and then call “llvm-objcopy” to add them to the binary.

This tool could eventually be used to optimize DWARF (dead strip code, remove unused types, unique types with ODR like llvm-dsymutil, etc).

Which seems like what we’d want dsymutil to do anyhow?

-eric

Yes. I am fine with adding ELF support to llvm-dsymutil if that is the way people think we should go?

I’d like it… Adrian? Fred?

-eric

We could specify ELF support for the --update feature only for now, which adds the accelerator tables adding support. For full linking to work, we will need some sort of ELF specific of stand alone debug map file that can be read for real linking, though that won’t be too hard.

I would be fine with adding ELF support to dsymutil.

I’d like it… Adrian? Fred?

-eric

Yes. I am fine with adding ELF support to llvm-dsymutil if that is the way people think we should go?

Which seems like what we’d want dsymutil to do anyhow?

-eric

On other options would be to make a new “llvm-dwarfld” tool, where most of the functionality would exist llvm/lib/DwarfLinker and other locations. The idea would be to do any post processing to DWARF using this tool. For accelerator tables, it could just create the new sections and then call “llvm-objcopy” to add them to the binary.

This tool could eventually be used to optimize DWARF (dead strip code, remove unused types, unique types with ODR like llvm-dsymutil, etc).

I am looking to create a tool that can add Apple or DWARF5 accelerator tables to fully linked executables that contain DWARF. This will help us benchmark how much accelerator tables can improve the debugging experience as debuggers don’t need to manually index all of the debug info during debugging.

Is it for ELF, Mach-O, wasm, COFF, or any of the combinations?

Yes, for any object files that LLVM currently supports. But I am looking to support ELF first as MachO already has these tables available since dsymutil already creates either Apple or DWARF accelerator tables. COFF and Wasm can come later.

Looking at how accelerator tables are currently emitted, they seem to be built up as DWARF is being created or linked, and then emitted using a subclass of DWARFEmitter. The only subclass if this right now that I see is one in dsymutil which ends up emitting everything using an AsmPrinter by eventually emitAppleAccelTable(…) from llvm/include/llvm/CodeGen/AccelTable.h.

I spoke briefly with Shoaib on this subject and he suggested adding code to llvm-objcopy. I briefly looked through the code and from what I can tell, llvm-objcopy doesn’t seem to have any DWARF abilities other than compressing DWARF sections. If we do add functionality to llvm-objcopy, are we ok pulling in DebugInfoDWARF and the LLVM object model? AFAICT the code for this tools has its own object file layer which doesn’t match the full layer inside of LLVM (llvm::ObjectFile and DWARFContext). Also, no AsmPrinter objects are used in this codebase either.

llvm-objcopy supports various ad-hoc binary manipulation features where each feature does a very
simple task. Neither llvm-objcopy nor GNU objcopy knows DWARF. --strip-debug,
–compress-debug-sections, --add-gnu-debuglink and --only-keep-debug have “debug” in their names but
these features don’t need to parse DWARF. (GNU objcopy has a --debugging but that only works for
a.out and coff, not elf).

Do we have a more suitable tool for such debugging functionality? dsymutil for ELF?

dsymutil is such a tool, but it uses the llvm::ObjectFile layer and the llvm targets, so if you open a file that contains “armv7” architecture and the ARM target hasn’t been built into your bistro, it will fail to open this binary with an error that says:

This is unexpected. Are you saying that libObject depends on targets? I would have expected it to be the other way around.

– adrian

Is there/could you further explain the use-case for adding an index to an existing binary? Certainly not the worst idea/could come in handy sometimes, but you mention benchmarking - is the benefit of not recompiling/relinking that significant to such experiments?

If it’s not for use in a common workflow, but only in a compiler/debugger development workflow, it doesn’t seem so important to me.

We could specify ELF support for the --update feature only for now, which adds the accelerator tables adding support. For full linking to work, we will need some sort of ELF specific of stand alone debug map file that can be read for real linking, though that won’t be too hard.

Not sure I follow here - --update would be given a fully linked binary, yes? So why would it need a debug map? It’d have the debug info & the linked executable code available, so you’d be able to see which bits of the executable code are referred to by which bits of debug info.

I’d like it… Adrian? Fred?

-eric

Yes. I am fine with adding ELF support to llvm-dsymutil if that is the way people think we should go?

Feels like a bit of a weird fit to me (equally llvm-objcopy seems like a weird fit too) - given the specific name & nature of Darwin debug info distribution being a bit different (reading object files, having input from the linker, etc) & the specific name being pretty uniquely applied to that model/output.

(does that way lie moving dwp functionality to llvm-dsymutil too? )

But don’t feel super strongly about any of it.

Is there/could you further explain the use-case for adding an index to an existing binary? Certainly not the worst idea/could come in handy sometimes, but you mention benchmarking - is the benefit of not recompiling/relinking that significant to such experiments?

If it’s not for use in a common workflow, but only in a compiler/debugger development workflow, it doesn’t seem so important to me.

It could be useful if you want to, say, minimize the maximum RAM usage during a link and you’re okay with the link taking effectively two steps?

We could specify ELF support for the --update feature only for now, which adds the accelerator tables adding support. For full linking to work, we will need some sort of ELF specific of stand alone debug map file that can be read for real linking, though that won’t be too hard.

Not sure I follow here - --update would be given a fully linked binary, yes? So why would it need a debug map? It’d have the debug info & the linked executable code available, so you’d be able to see which bits of the executable code are referred to by which bits of debug info.

I’d like it… Adrian? Fred?

-eric

Yes. I am fine with adding ELF support to llvm-dsymutil if that is the way people think we should go?

Feels like a bit of a weird fit to me (equally llvm-objcopy seems like a weird fit too) - given the specific name & nature of Darwin debug info distribution being a bit different (reading object files, having input from the linker, etc) & the specific name being pretty uniquely applied to that model/output.

(does that way lie moving dwp functionality to llvm-dsymutil too? )

Yep. That’s the idea.

-eric

Is there/could you further explain the use-case for adding an index to an existing binary? Certainly not the worst idea/could come in handy sometimes, but you mention benchmarking - is the benefit of not recompiling/relinking that significant to such experiments?

It is hard to get people to adopt new toolchains into their workflows. So the idea here is to allow people to use what ever toolchains they want to to produce a binary, then add accelerator tables to their linked binaries so they can “try before you buy” kind of thing. It also allows accelerator tables to be updated in case there were bugs in older versions of the tools, or as new accelerator tables come out or newer versions are available. Once we prove that the accelerator tables are viable and worth it, we get people to want to migrate to newer toolchains that have this functionality built in.

If it’s not for use in a common workflow, but only in a compiler/debugger development workflow, it doesn’t seem so important to me.

It is for production workflows for people that are on older toolchains and are not able to upgrade for stability to business purposes. So it isn’t just for compiler/debugger development.

We could specify ELF support for the --update feature only for now, which adds the accelerator tables adding support. For full linking to work, we will need some sort of ELF specific of stand alone debug map file that can be read for real linking, though that won’t be too hard.

Not sure I follow here - --update would be given a fully linked binary, yes? So why would it need a debug map?

It wouldn’t. If you take a look at the dsymutil code though, they did some hackery to make it pretend like it is relinking the DWARF. So no debug map is needed, but dsymutil will run it through the same kind of machinery. But that being said, it would be very easy to repurpose this code to take a linker map for ELF and become the linker for DWARF in .o files in LLD (where it does smart linking, not just concatenate and relocate like most linkers currently do).

It’d have the debug info & the linked executable code available, so you’d be able to see which bits of the executable code are referred to by which bits of debug info.

–update was in the original dsymutil that I wrote and it would just update the accelerator tables in the dSYM file and rewrite the binary and exit. There were a few iterations of the accelerator tables and this allowed us to quickly update older and out of date versions. The llvm-dsymutil has this feature and also allows you to specify DWARF5 or Apple accelerator tables.

Is there/could you further explain the use-case for adding an index to an existing binary? Certainly not the worst idea/could come in handy sometimes, but you mention benchmarking - is the benefit of not recompiling/relinking that significant to such experiments?

It is hard to get people to adopt new toolchains into their workflows. So the idea here is to allow people to use what ever toolchains they want to to produce a binary, then add accelerator tables to their linked binaries so they can “try before you buy” kind of thing. It also allows accelerator tables to be updated in case there were bugs in older versions of the tools, or as new accelerator tables come out or newer versions are available. Once we prove that the accelerator tables are viable and worth it, we get people to want to migrate to newer toolchains that have this functionality built in.

If it’s not for use in a common workflow, but only in a compiler/debugger development workflow, it doesn’t seem so important to me.

It is for production workflows for people that are on older toolchains and are not able to upgrade for stability to business purposes. So it isn’t just for compiler/debugger development.

Fair enough - thanks for the framing!

We could specify ELF support for the --update feature only for now, which adds the accelerator tables adding support. For full linking to work, we will need some sort of ELF specific of stand alone debug map file that can be read for real linking, though that won’t be too hard.

Not sure I follow here - --update would be given a fully linked binary, yes? So why would it need a debug map?

It wouldn’t. If you take a look at the dsymutil code though, they did some hackery to make it pretend like it is relinking the DWARF. So no debug map is needed, but dsymutil will run it through the same kind of machinery. But that being said, it would be very easy to repurpose this code to take a linker map for ELF and become the linker for DWARF in .o files in LLD (where it does smart linking, not just concatenate and relocate like most linkers currently do).

Yeah, that work’s already being done, by the sounds of it (using DwarfLinker (the implementation of dsymutil) in lld for ELF and such).

It’d have the debug info & the linked executable code available, so you’d be able to see which bits of the executable code are referred to by which bits of debug info.

–update was in the original dsymutil that I wrote and it would just update the accelerator tables in the dSYM file and rewrite the binary and exit. There were a few iterations of the accelerator tables and this allowed us to quickly update older and out of date versions. The llvm-dsymutil has this feature and also allows you to specify DWARF5 or Apple accelerator tables.

Ah, good to know there’s precedent there, then.

Sounds like you’ve got the right people involved/signing off on the dsymutil direction - so I’ll leave you folks to it!

  • Dave

3/3/2020 12:48 AM, Greg Clayton via llvm-dev пишет:

I am looking to create a tool that can add Apple or DWARF5 accelerator tables to fully linked executables that contain DWARF. This will help us benchmark how much accelerator tables can improve the debugging experience as debuggers don’t need to manually index all of the debug info during debugging.

Is it for ELF, Mach-O, wasm, COFF, or any of the combinations?

Yes, for any object files that LLVM currently supports. But I am looking to support ELF first as MachO already has these tables available since dsymutil already creates either Apple or DWARF accelerator tables. COFF and Wasm can come later.

Looking at how accelerator tables are currently emitted, they seem to be built up as DWARF is being created or linked, and then emitted using a subclass of DWARFEmitter. The only subclass if this right now that I see is one in dsymutil which ends up emitting everything using an AsmPrinter by eventually emitAppleAccelTable(…) from llvm/include/llvm/CodeGen/AccelTable.h.

I spoke briefly with Shoaib on this subject and he suggested adding code to llvm-objcopy. I briefly looked through the code and from what I can tell, llvm-objcopy doesn’t seem to have any DWARF abilities other than compressing DWARF sections. If we do add functionality to llvm-objcopy, are we ok pulling in DebugInfoDWARF and the LLVM object model? AFAICT the code for this tools has its own object file layer which doesn’t match the full layer inside of LLVM (llvm::ObjectFile and DWARFContext). Also, no AsmPrinter objects are used in this codebase either.

llvm-objcopy supports various ad-hoc binary manipulation features where each feature does a very
simple task. Neither llvm-objcopy nor GNU objcopy knows DWARF. --strip-debug,
–compress-debug-sections, --add-gnu-debuglink and --only-keep-debug have “debug” in their names but
these features don’t need to parse DWARF. (GNU objcopy has a --debugging but that only works for
a.out and coff, not elf).

Do we have a more suitable tool for such debugging functionality? dsymutil for ELF?

dsymutil is such a tool, but it uses the llvm::ObjectFile layer and the llvm targets, so if you open a file that contains “armv7” architecture and the ARM target hasn’t been built into your bistro, it will fail to open this binary with an error that says:

No available targets are compatible with triple “arm-unknown-unknown”

I ran into this with a recent gsym patch that is trying to fix the buildbots for testing, but it fails when the ARM targets are not enabled and I try to load the DWARF from an object file:

http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/63473/steps/test-check-all/logs/stdio

And this is part of the reason for this email. I would love to not require llvm-gsymutil to require all LLVM targets to be there. DebugInfoDWARF doesn’t need the targets, it just needs to know address byte size and endianness and it can parse the debug info in the DWARF.

Hi Greg, dsymutil code was recently refactored so that it`s linking part was moved into lib/DWARFLinker library.

As an advantage, it allows making DWARFLinker to not depend on ObjectFile : .

It might make sense that your tool uses DWARFLinker library(probably extending it with new functionality) to work with DWARF.

Alexey.