llvm dwarf emission

I wanted to check that I’m headed in the right direction before I work more on LLVM debug info. What I’d like to do is update DIBuilder to expose exactly the facilities represented in modern DWARF, and to store that in the bitcode. In LLVM we would have a DwarfOpts that specifies the major version of DWARF we’re targetting and some additional compatibility flags to work around debugger deficiencies.

So for example, DIBuilder’s interface has no createPtrToMemberType() which is odd since it’s been there since DWARF 2, but gcc didn’t emit that until very recently so neither does llvm. Instead clang lowers it to a difference of two other pointer types, which effectively throws away information. My plan is to always emit the bitcode with the DWARF 4 representation of a pointer to member, then in codegen, look at DwarfOpts.PtrToMember to decide whether to emit it as DW_TAG_ptr_to_member_type or the way gcc does for compatibility with older GDB’s, etc.

My premise is that storing the DWARF 4 equivalent data in the bitcode retains more information, so we can safely lower to another format later.

Does this sound sensible?

Nick

Hi Nick,

I wanted to check that I'm headed in the right direction before I work more on
LLVM debug info. What I'd like to do is update DIBuilder to expose exactly the
facilities represented in modern DWARF,

wouldn't it be better in the long term to make the debug info layer more
abstract rather than a direct mapping onto dwarf?

Ciao, Duncan.

  and to store that in the bitcode. In

And to have a Dwarf/COFF layers on top?

And have the code generators output the appropriate dwarf or whatever.

Ciao, Duncan.

Duncan Sands wrote:

I wanted to check that I'm headed in the right direction before I work more on
LLVM debug info. What I'd like to do is update DIBuilder to expose exactly the
facilities represented in modern DWARF,

wouldn't it be better in the long term to make the debug info layer more
abstract rather than a direct mapping onto dwarf?

And to have a Dwarf/COFF layers on top?

And have the code generators output the appropriate dwarf or whatever.

There's a good split of responsibilities between what's in the .bc file and what llvm has to output. We simply use the dwarf numberings to indicate things like "this is a pointer type" and the type hierarchy and the hierarchy of line numbers in files and function ("subprograms") etc., while leaving llvm to do the work of encoding how instructions map to line numbers or how to recompute the value in a variable from registers/memory.

I don't see any reason we can't store DWARF in the bitcode and then lower it to anything else later, even STABS if you want.

The reason for using dwarf is that it's very expressive (it's likely that anybody who is interested in debug info is working/has worked with the dwarf committee to get the necessary extensions into dwarf) and we don't need to invent our own language.

Which leads to my next point, if you want us to emit to an abstract layer on top, please propose a spec for that system, and be sure to explain why it's better than just using dwarf.

The sort of thing I'd like to fix is that we currently don't allow encoding some data, even if dwarf allows it, just because it couldn't be expressed in C or C++, so clang didn't need it. For example, DWARF permits all types to have names. In C++, "int&" can't have a name (you can have a typedef to it, but that's a separate type declaration), so the API in DIBuilder createReferenceType simply hard-codes NULL in the name field. Unlike createClassType, which does take a Name parameter. That's silly, and we should stop that; we should actually expose the functionality available in DWARF. LLVM is useful for languages other than C/C++.

Nick

Yep. Sounds fine. The current MD layout is very tied to dwarf as it is. A new format is possible, but would need to be new and this is just an extension of the existing format.

-eric