Compile units in debugging intrinsics / globals

I have a question about the llvm debugging records, especially wrt compile
units.

In the non-LLVM sense, a compile unit is essentially everything contained
within a single .o file, and it is derived from one or more source and
header files. Included in a compile unit are functions and global data.

Dwarf records refer to compile units in the same way: a compile unit record
has children which define subprograms and variables, and definitions of
types used with these.

In the LLVM sense, there is a compile unit record for _every_ source file.
The globals in the llvm.metadata section and debug intrisics reference the
appropriate compile unit record to indicate where in source the item they
are describing appears.

By the time the llvm linker has finished its work all the functions and
global data, along with their debugging records, may be rearranged and I
want to pull them all together into the appropriate Dwarf compile units. To
do that I can look at the (llvm) compile unit records but this only works
when all are defined in the same source file. If data or a function is
defined in an included file then they appear to be in a different compile
unit.

Suppose I have the following source:

file1:
  #include "file2"
  #include "file3"
  int fn1(void) ...

file2:
  int a;

file3:
  int fn2(void) ...

then fn1, along with all the base types etc appear to be in compile unit
"file1", the variable a appears to be in compile unit "file2" (and there are
no basic types in file2, so int is not defined), and fn2 appears to be in
compile unit "file3". My dwarf records are therefore incorrect, appearing
something like

TAG_compile_unit "file1"
  TAG_subprogram "fn1" ...
    ...
  TAG_base_type "int" ...

TAG_compile_init "file2"
  TAG_variable "a" ...

TAG_compile_unit "file3"
  TAG_subprogram "fn2" ...
    ...

When, in fact, these compile units "file2" and "file3" are bogus and
everything should be part of compile_unit "file1".

My question is: can I tell that these three (llvm) compile units are in fact
components of the single (non-LLVM) compile unit? Or is there some other way
I should be determining which (non-LLVM) compile unit the records are part
of?

Many thanks!

Hi,

Suppose I have the following source:

file1:
  #include "file2"
  #include "file3"
  int fn1(void) ...

file2:
  int a;

file3:
  int fn2(void) ...

then fn1, along with all the base types etc appear to be in compile unit
"file1", the variable a appears to be in compile unit "file2" (and there are
no basic types in file2, so int is not defined), and fn2 appears to be in
compile unit "file3". My dwarf records are therefore incorrect, appearing
something like

TAG_compile_unit "file1"
  TAG_subprogram "fn1" ...
    ...
  TAG_base_type "int" ...

TAG_compile_init "file2"
  TAG_variable "a" ...

TAG_compile_unit "file3"
  TAG_subprogram "fn2" ...
    ...

When, in fact, these compile units "file2" and "file3" are bogus and
everything should be part of compile_unit "file1".

this is not clear to me. Isn't it useful to know where to find the
definition of fn2 (in file3)? I'm pretty sure this is how gcc does
things too: the debugger seems to know that some objects were defined
in header files.

My question is: can I tell that these three (llvm) compile units are in fact
components of the single (non-LLVM) compile unit? Or is there some other way
I should be determining which (non-LLVM) compile unit the records are part
of?

If you compile file1 into an LLVM module M, then by definition all debug info
in M is for the compile unit file1. So as long as you're not doing link time
optimization, can't you just grab all debug info from M?

Ciao,

Duncan.

Hi, thanks for responding. I think I did not explain my problem well. To
illustrate it further, consider these two modules which I will compile and
link together using gcc:

Module 1 is comprised of one source file:

main.c:
  static int a = 1;
  extern int fn1(void);

  int main (int argc, char **argv) {
    return fn1();
  }

I compile this with the command-line

gcc main.c -g -c -o main.o

Module 2 is comprised of three source files:

file1.c:
  #include "file2.h"
  #include "file3.h"
  int fn1(void) {
    return fn2(a);
  }

file2.h:
  static int a = 2;

file3.h:
  int fn2(int p) {
    return p * 2;
  }

I compile this with the command-line

gcc file1.c -g -c -o file1.o

Finally I link the modules

gcc main.o file1.o -o main

In the non-llvm sense, each of these two modules is a compile unit.

To see the debug records I use:

objdump -W main > objdump.gcc.txt

Looking at this file, I see two compile units as I would expect (plus the C
libraries):

Compilation Unit @ offset 0x1a1:
...
<0><1ac>: Abbrev Number: 1 (DW_TAG_compile_unit)
...
DW_AT_name : main.c
...
<1><208>: Abbrev Number: 2 (DW_TAG_subprogram)
...
DW_AT_name : main
...
<1><25a>: Abbrev Number: 6 (DW_TAG_variable)
DW_AT_name : a

And

Compilation Unit @ offset 0x25b:
...
<0><266>: Abbrev Number: 1 (DW_TAG_compile_unit)
...
DW_AT_name : file1.c
...
<1><2c3>: Abbrev Number: 2 (DW_TAG_subprogram)
...
DW_AT_name : fn2
...
<1><2f4>: Abbrev Number: 5 (DW_TAG_subprogram)
...
DW_AT_name : fn1
...
<1><30d>: Abbrev Number: 6 (DW_TAG_variable)
DW_AT_name : a

The problem I have is that llvm considers a _source file_ to be a compile
unit. My code generator - a back-end I have built for llc - uses the compile
unit information in the llvm *but an llvm compile unit is indistinct from a
source file*. It is true that I _also_ want to know what source file the
declarations are in, but using the information I have, my code generator
erroneously emits debug records for _four_ different compile units: the one
named "main.c" contains the definition of main and variable a, compile unit
"file1.c" contains the definition of fn1, compile unit "file2.h" contains
the definition of variable a and compile unit "file3.h" contains the
definition of fn2.

The problem with using the module information you suggested is that at the
time of code generation the linker has created a single module, and using
this technique you only get _one_ compile unit, which is also wrong. The 2.2
release seems to have this problem. If I compile my sources as follows:

llvm-gcc -c -g main.c -o main.o
llvm-gcc -c -g file1.c -o file1.o
llvm-ld -disable-opt main.o file1.o -o main
llc main.bc -f -o main -march=x86
gcc main.s -o main
objdump -W main > objdump.llvm.txt

I find that the debug records claim that everything is contained in a single
compile unit named "file1.c". I also note that because both of the compile
units contained variables named a, llvm has only emitted one debug record
for such a variable and no matter where I query the value of it when
debugging I always get given the value of the variable in main.c.

As the "standard" code generators get this wrong I suspect the answer is
"no", but what I what I wanted to establish was whether I could determine
the actual compile units (in the non-llvm sense) the debug records were part
of, not simply the source files. It appears that the llvm records are
incorrect in not making a distinction between compile units and source
files, but this could be resolved if there was some way of linking the
source files (llvm compile units) together to determine the modules
(non-llvm compile units).

Hi, thanks for responding. I think I did not explain my problem well. To illustrate it further

You might be interested in:

   gcc -combine file1.c main.c -S -o t.s -g -dA

:slight_smile:

no matter where I query the value of it when debugging I always get given the value of the variable in main.c.

gcc does the same thing. Yeah, seems like a bug, would be nice to fix it.