Questions about DWARF handling (esp. re: file and line data)

Dear LLVM devs,

I’m just working on the LLVM assembler’s handling of .file and .line directives. I’ve noticed some strange things and am wondering if there is a reason the code works the way it does.

  1. MCContext keeps a map of number → MCDwarfLineTable called MCDwarfLineTablesCUMap. However, as far as I can see, the only compilation unit number which is ever used is zero! If this is true, then the map is not serving any useful purpose.

What was the intention of the CU ID numbers? If multiple .file directives appear in the asm, each one should open a new “CU”, is that not so?

  1. Although it appears that each MCDwarfLineTable should be for a single compilation unit, MCDwarfLineTable has a vector called MCDwarfFiles (so each MCDwarfLineTable can have multiple files).

Although you can have references to multiple files in the same MCDwarfLineTable, it appears that the only one which is ever used is the first one! MCDwarfLineTable doesn’t do any recordkeeping to remember which line number data belongs to which file, either.

Dear LLVM devs,

I'm just working on the LLVM assembler's handling of .file and .line
directives. I've noticed some strange things and am wondering if there is a
reason the code works the way it does.

1. MCContext keeps a map of number -> MCDwarfLineTable called
MCDwarfLineTablesCUMap. However, as far as I can see, the only compilation
unit number which is *ever* used is zero! If this is true, then the map is
not serving any useful purpose.

What was the intention of the CU ID numbers? If multiple .file directives
appear in the asm, each one should open a new "CU", is that not so?

2. Although it appears that each MCDwarfLineTable should be for a single
compilation unit, MCDwarfLineTable has a *vector* called MCDwarfFiles (so
each MCDwarfLineTable can have multiple files).

Although you can have references to multiple files in the same
MCDwarfLineTable, it appears that the only one which is ever used is the
first one! MCDwarfLineTable doesn't do any recordkeeping to remember which
line number data belongs to which file, either.

------------------------

I can go through the code and clean up some of these inconsistencies, but
I need to know how it is *intended* to work. Can someone explain?

I /think/ a lot of the complexity you're seeing is there for the non-asm
debug info case (look into how these data structures are used when emitting
debug info for LTO, for example - when the debug info metadata in the IR
describes multiple compile units, multiple files, etc)

Dear LLVM devs,

I'm just working on the LLVM assembler's handling of .file and .line
directives. I've noticed some strange things and am wondering if there is a
reason the code works the way it does.

1. MCContext keeps a map of number -> MCDwarfLineTable called
MCDwarfLineTablesCUMap. However, as far as I can see, the only compilation
unit number which is *ever* used is zero! If this is true, then the map is
not serving any useful purpose.

What was the intention of the CU ID numbers? If multiple .file
directives appear in the asm, each one should open a new "CU", is that not
so?

2. Although it appears that each MCDwarfLineTable should be for a single
compilation unit, MCDwarfLineTable has a *vector* called MCDwarfFiles (so
each MCDwarfLineTable can have multiple files).

Although you can have references to multiple files in the same
MCDwarfLineTable, it appears that the only one which is ever used is the
first one! MCDwarfLineTable doesn't do any recordkeeping to remember which
line number data belongs to which file, either.

------------------------

I can go through the code and clean up some of these inconsistencies,
but I need to know how it is *intended* to work. Can someone explain?

I /think/ a lot of the complexity you're seeing is there for the non-asm
debug info case (look into how these data structures are used when emitting
debug info for LTO, for example - when the debug info metadata in the IR
describes multiple compile units, multiple files, etc)

Thanks for the reply! To show you how clueless I am (just started reading
the source a couple days ago): what is LTO?

Link Time Optimization - where we put merge multiple LLVM modules together
and optimize/codegen them together for better optimization opportunities.

In this case, the merged module (the result of merging multiple other
modules) would contain a CU for each of the original modules.

I guess the dichotomy between CUs and files is for situations where (for
example) several .c files are compiled into a single .o file, and then
several such .o files are linked into a single executable?

Multiple files in a CU occur commonly due to #includes in C code - a single
CU has entities from multiple files - types defined in headers, subprograms
from inline functions in headers, plus types/functions/etc in the main
source file too.

- David