More DIFactory questions - still stumped

I hate to be a nag, but after several days of working on this I am still utterly stumped.

Let me recap the situation as it currently stands: I’m trying to write code that generates DWARF debugging information for my compiler using DIFactory and friends. Unfortunately the information I am generating appears to be invalid, but I can’t figure out the cause.

Based on the advice in the earlier thread, I’ve been using dwarfdump to try and isolate the problem. This was helpful in solving the earlier problem, but isn’t helping me with the current problem.

When I run dwarfdump -a, it prints a couple hundred pages of debug info, and then segfaults. The last few lines before the segfault look like this:

.debug_inlined contents:

< EMPTY >

.debug_frame contents:

0x00000000: CIE

length: 0x00000010

CIE_id: 0xffffffff

version: 0x01

augmentation: “”

code_align: 1

data_align: -4

ra_register: 0x08

DW_CFA_def_cfa (esp, 4)

DW_CFA_offset (eip, 0)

DW_CFA_nop

DW_CFA_nop

Instructions: Init State: CFA=esp+4 eip=[esp+4]

0x00000014: FDE

length: 0x00000028

CIE_pointer: 0x00000000

Segmentation fault

If I grep through the output of dwarfdump, there are no other CIE or FDE definitions that occur before this point, so I assume that the problem isn’t just this particular FDE.

One difficulty here is that I don’t know which calls to DIFactory produce these data structures. Usually my solution of last resort when dealing with intractable debugging problems like this is to start commenting out code until the problem goes away, but in this case I don’t know where to even start. If I comment out all the DWARF-generating code, then obviously the problem goes away. :slight_smile:

I did in fact discover that if I comment out all calls to DIFactory::CreateSubprogram, the problem disappears - but then I don’t have any debugging info. (Well, I still have all the DINodes for my data structures, just not functions.) I’ve also commented out all of the declarations of parameters and local variables, which doesn’t prevent the problem from occurring. (Since my current understanding is that CIE and FDE are used to describe the call frame, I’m trying to simplify the problem as much as possible.)

I’ve carefully studied the source code of CGDebugInfo in clang as a working example. One puzzlement is that there’s a discrepancy between what the “source level debugging with LLVM” docs say and what clang does: According to the docs, DW_TAG_formal_parameter is used to specify a formal parameter in a function type descriptor, but according to a code search, the name “DW_TAG_formal_parameter” does not appear anywhere in the clang source code. Instead, the argument array that is used when creating a function type descriptor contains only the bare types, not types wrapped in a formal parameter DIE.

However, since I’ve tried it both ways (wrapped and unwrapped) and the dwarfdump crash occurs either way, this latter issue is of lesser concern.

At the moment I’m experimenting with the parameters to CreateSubprogram, trying every possible permutation of inputs that I can think of in hope of stumbling on the right answer. I can’t think of what else to do.

Note that I am calling assert(diNode.Verify()) on every DINode after it’s created, so I know that it’s valid up to that point at least. However, the checks in Verify() aren’t very extensive. (Also, I’ve observed in the past that it’s kind of inconvenient that Verify() only returns a boolean result when it fails, with no indication of what you did wrong.)

All of this is with the current LLVM head, although I was getting the same problems with the version from about 2 months ago.

Hi Talin,

Like in CGDebugInfo, you have to use the Subprogram type only for the
return type. What gives you the parameters is passing the Function* as
the last parameter on DIFactory.CreateSubprogram().

I suppose DIFactory was done tailored to C-like languages using Clang
as the primary driver for changes. I'd not be surprised if you could
do things that it didn't expect and then it'd generate images with bad
Dwarf (enough to cause segfault in dwarfdump).

I'd try to mimic exactly what CGDebugInfo does, even if that makes
your "info func" look like C functions in GDB, or if the parameters
are all mixed up. At least you get something out of it and can, then,
work your way to fix it in LLVM.

I'm putting together some help with using DIFactory, maybe I can turn
that into a proper doc. I'll keep you posted.

I’ve carefully studied the source code of CGDebugInfo in clang as a working
example. One puzzlement is that there’s a discrepancy between what the
“source level debugging with LLVM” docs say and what clang does: According
to the docs, DW_TAG_formal_parameter is used to specify a formal parameter
in a function type descriptor, but according to a code search, the name
“DW_TAG_formal_parameter” does not appear anywhere in the clang source code.
Instead, the argument array that is used when creating a function type
descriptor contains only the bare types, not types wrapped in a formal
parameter DIE.

Hi Talin,

Like in CGDebugInfo, you have to use the Subprogram type only for the
return type. What gives you the parameters is passing the Function* as
the last parameter on DIFactory.CreateSubprogram().

I understand about passing the Function* as the last argument. I’m not sure I understand the first sentance (“You have to use the Subprogram type only for the return type”).

Here’s what my code for creating function descriptors currently looks like (note that some parts are commented out for debugging purposes):

DISubprogram CodeGenerator::genDISubprogram(const FunctionDefn * fn, Function * f) {

DASSERT(fn != NULL);

// Look up in the map to see if already generated.

DISubprogram & sp = dbgSubprograms_[fn];

if (!sp.isSubprogram()) {

DIType dbgFuncType = genDIType(fn->functionType());

DASSERT(dbgFuncType.Verify());

DASSERT(dbgCompileUnit_.Verify());

sp = dbgFactory_.CreateSubprogram(

dbgCompileUnit_,

fn->name(),

fn->qualifiedName(),

fn->linkageName(),

dbgFile_, // genDIFile(fn),

1, // getSourceLineNumber(fn->location()),

dbgFuncType,

fn->isSynthetic() /* isLocalToUnit */,

false /* isDefinition */,

0, 0 /* VK, Index */,

DIType(),

false /* isArtificial */,

false /* isOptimized */,

f);

DASSERT(sp.Verify());

}

return sp;

}

And here’s the code that generates function type descriptors:

DISubprogram CodeGenerator::genDISubprogram(const FunctionDefn * fn,

(...)

    false /\* isDefinition \*/,

(...)

Hi Talin,

The only difference from what I'm doing is that I only export debug
symbols in definitions, not declarations. I may be doing wrong,
though, for multi-file compilation (haven't tested thoroughly).

DICompositeType CodeGenerator::genDIFunctionType(const FunctionType * type)

(...)

for (ParameterList::const_iterator it = params.begin(); it !=
params.end(); ++it) {
const ParameterDefn * param = *it;
args.push_back(genDIParameterType(param->type()));
}

Don't do that. I know it looks right, but it's broken in DIFactory.

DICompositeType fnType = dbgFactory_.CreateCompositeType(
dwarf::DW_TAG_subroutine_type,
dbgCompileUnit_,

I use the file here, not the compile unit... But again, I could be wrong.

Hope that puts you in the right direction.

DISubprogram CodeGenerator::genDISubprogram(const FunctionDefn * fn,

(…)

false /* isDefinition */,
(…)

Hi Talin,

The only difference from what I’m doing is that I only export debug
symbols in definitions, not declarations. I may be doing wrong,
though, for multi-file compilation (haven’t tested thoroughly).

DICompositeType CodeGenerator::genDIFunctionType(const FunctionType * type)

(…)

for (ParameterList::const_iterator it = params.begin(); it !=
params.end(); ++it) {
const ParameterDefn * param = *it;
args.push_back(genDIParameterType(param->type()));
}

Don’t do that. I know it looks right, but it’s broken in DIFactory.

DICompositeType fnType = dbgFactory_.CreateCompositeType(
dwarf::DW_TAG_subroutine_type,
dbgCompileUnit_,

I use the file here, not the compile unit… But again, I could be wrong.

Hope that puts you in the right direction.

OK I made 3 changes:
– changed isDefinition to true. (I’m also only generating debug info for definitions.)
– commented out the code that pushes parameter types to the arg list. (Still push the return type however.)
– changed the call that creates the subroutine type to use DIFile rather than DICompileUnit as the context param.

It still segfaults however. :frowning:

I should mention that I don’t actually know if the CreateSubprogram call is even related to the problem. I know that when I comment out the call, the segfault goes away - however, that just might mean that the problem is still there but is not being triggered.

I have to admit I am rather confused about the proper usage of DIFile and DICompileUnit. Both of these are DIScopes, but it’s not clear to me whether the symbols within a module should be the children of one or the other. Many of the DIFactory parameters take an explicit DIFile, so those cases are clear - but many of the other context params only have DIDescriptor as their type, so there’s not a lot of guidance as to which is the right type of DIDescriptor to use.

If llvm compiler mis compiles a code then it is unlikely to be a bug in IRBuilder. Most likely it could be a bug in FE’s use of IRBuilder or codegen/optimization bug. In either case IRBuilder won’t save you. Same is true for DIFactory. It is a utility to construct MDNodes. It does not strictly enforce semantic correctness of debug info. (In fact, it is on my list somewhere to absorb DIFactory into IRBuilder). BTW, DIFactory should be independent of a debugging format used by code generator. But until a target that implements format other than DWARF arrives, this remains theory only.

In your case, most likely you’re running into a bug in DwarfDebug (or your encoding is violating hidden assumptions made by DwarfDebug, which is also not good.) Your best bet is to reduce the test case as much as possible and watch DIEs (DIE.cpp). I have seen this symptom once where a constructed DIE was not emitted in the end due to a bug.

Your recent changes mentioned below would change correctness of debug info, but it would unlikely to impact structure of DWARF generated. And somehow, this structure is invalid in your case.

I was hoping for a quick-fix on the assumptions of DwarfDebug about
Subprograms' MDNodes, but it might be anywhere.

Reducing the test case is the best solution, but it might not be easy.

Validating the MDNodes in DIFactory (or anywhere before DwarfDebug)
would be a good step to ensure IR consistency and isolate problems.
Unfortunately, it is the kind of thing that is not fundamental to get
things working, so it always gets left behind... :wink:

I understand your point and certainly acknowledge need for better documentation.

There are couple of wrinkles to note here

  • While constructing IR using DIFactory you are not seeing entire picture. When you see a declaration you’re not sure whether you’ll see matching def. or not. When you build a inlined subprogram, you don’t know whether you’ll see a out of line definition of this function or not. etc…
  • DwarfDebug must be able to handle malformed debug info gracefully and make best out of whatever is fed. This is because, optimizer may have chewed on IR mercilessly and optimizer must not be influenced by presence of encoded debug info in IR.

BTW, the reason I stopped responding to this thread is not because I solved the problem, but because I simply gave up and decided to work on other things for a while since I was making no progress. Having finished those other things (the stack crawler, for one), I’m hoping that time and a fresh start will yield better results. Unfortunately after about a day spent reviewing old llvm-dev threads and trying different permutations of calls to DIFactory, I have not discovered anything that I didn’t already know.

With respect to the suggestion of building my own copy of dwarfdump (so that I could run it under gdb and see where it breaks), I never did get it to compile and run on OS X, since it requires ELF headers and libs. I thought about doing the same thing under Linux, however dwarfdump doesn’t segfault on Linux when I feed it my LLVM-generated binary. (It does report errors, however: “dwarf_srclines: DW_DLE_ATTR_FORM_BAD”. The weird part is that it does this even when I completely disable the code that calls IRBuilder.SetCurrentDebugLocation()).

As per usual, this is with a recent LLVM head (like about a week old).

BTW, the reason I stopped responding to this thread is not because I solved the problem, but because I simply gave up and decided to work on other things for a while since I was making no progress. Having finished those other things (the stack crawler, for one), I’m hoping that time and a fresh start will yield better results. Unfortunately after about a day spent reviewing old llvm-dev threads and trying different permutations of calls to DIFactory, I have not discovered anything that I didn’t already know.

With respect to the suggestion of building my own copy of dwarfdump (so that I could run it under gdb and see where it breaks), I never did get it to compile and run on OS X, since it requires ELF headers and libs. I thought about doing the same thing under Linux, however dwarfdump doesn’t segfault on Linux when I feed it my LLVM-generated binary. (It does report errors, however: “dwarf_srclines: DW_DLE_ATTR_FORM_BAD”. The weird part is that it does this even when I completely disable the code that calls IRBuilder.SetCurrentDebugLocation()).

Interestingly enough, I just upgraded to the latest Ubuntu (10.10 - Maverick Meercat), and the LLVM-generated code no longer builds: I get the following error in the assembler stage (after the bitcode is converted to assembly):

SwitchStmtTest.s: Assembler messages:
SwitchStmtTest.s:294899: Fatal error: duplicate .debug_line sections

Note that this is still with calls to IRBuilder.SetCurrentDebugLocation() disabled - My FE is not emitting any debug line information at all at this time.

BTW, the reason I stopped responding to this thread is not because I solved the problem, but because I simply gave up and decided to work on other things for a while since I was making no progress. Having finished those other things (the stack crawler, for one), I’m hoping that time and a fresh start will yield better results. Unfortunately after about a day spent reviewing old llvm-dev threads and trying different permutations of calls to DIFactory, I have not discovered anything that I didn’t already know.

With respect to the suggestion of building my own copy of dwarfdump (so that I could run it under gdb and see where it breaks), I never did get it to compile and run on OS X, since it requires ELF headers and libs. I thought about doing the same thing under Linux, however dwarfdump doesn’t segfault on Linux when I feed it my LLVM-generated binary. (It does report errors, however: “dwarf_srclines: DW_DLE_ATTR_FORM_BAD”. The weird part is that it does this even when I completely disable the code that calls IRBuilder.SetCurrentDebugLocation()).

Interestingly enough, I just upgraded to the latest Ubuntu (10.10 - Maverick Meercat), and the LLVM-generated code no longer builds: I get the following error in the assembler stage (after the bitcode is converted to assembly):

SwitchStmtTest.s: Assembler messages:
SwitchStmtTest.s:294899: Fatal error: duplicate .debug_line sections

This is a known Linux binutils bug. There is a llvm pr in bugzilla database, I don’t remember the no. though.-
Devang

Direct .o file writing support for ELF is nearing functionality, it will define away this sort of issue.

-Chris

Interestingly enough, I just upgraded to the latest Ubuntu (10.10 - Maverick Meercat), and the LLVM-generated code no longer builds: I get the following error in the assembler stage (after the bitcode is converted to assembly):

SwitchStmtTest.s: Assembler messages:
SwitchStmtTest.s:294899: Fatal error: duplicate .debug_line sections

This is a known Linux binutils bug. There is a llvm pr in bugzilla database, I don’t remember the no. though.

Direct .o file writing support for ELF is nearing functionality, it will define away this sort of issue.

While that is great news, I’d like to also keep the ability to build via assembly language, as the ability to examine the assembly has been useful in solving many otherwise difficult bugs. (Especially given the difficulties I’ve had getting source-level debugging to work.)

For now, however, do you know if there is a workaround for this issue?

Searching “debug_line” in all llvm PR at llvm.org/bugs would immediately lead you to
http://llvm.org/bugs/show_bug.cgi?id=8210
Follow the trails and you’ll have all the info for this Fatal error.

I’m not sure I understand what you are saying. I’m not claiming that DIFactory is incorrect. I’m saying I don’t know how to use it properly.

At the moment, I’m testing my frontend on both Unbuntu (Maverick Meercat) and OS X (Leopard). They both fail, but in completely different ways.

On OS X, a lot of the debug info seems to be missing entirely, even though I am running dsymutil. That is, when I link the LLVM-generated modules with my runtime library (which is compiled by gcc) I only get debug information for the modules that were compiled by gcc. However, I’ve inspected the .bc files carefully and it appears as though the debug metadata is in fact there.

I’m guessing at this point that there is something wrong with my linker. (Because I don’t want to ship LLVM binaries around with my compiler, I have written a linker program which combines parts of ‘opt’ and ‘llc’, the GCStrategy pass, and the Reflection generator pass.

Interestingly enough, I just upgraded to the latest Ubuntu (10.10 - Maverick Meercat), and the LLVM-generated code no longer builds: I get the following error in the assembler stage (after the bitcode is converted to assembly):

SwitchStmtTest.s: Assembler messages:
SwitchStmtTest.s:294899: Fatal error: duplicate .debug_line sections

This is a known Linux binutils bug. There is a llvm pr in bugzilla database, I don’t remember the no. though.

Direct .o file writing support for ELF is nearing functionality, it will define away this sort of issue.

While that is great news, I’d like to also keep the ability to build via assembly language, as the ability to examine the assembly has been useful in solving many otherwise difficult bugs. (Especially given the difficulties I’ve had getting source-level debugging to work.)

For now, however, do you know if there is a workaround for this issue?

Searching “debug_line” in all llvm PR at llvm.org/bugs would immediately lead you to
http://llvm.org/bugs/show_bug.cgi?id=8210
Follow the trails and you’ll have all the info for this Fatal error.

Devang

[BTW, for your original dwarf error, focusing on DIFactory uses will unlikely to lead you towards real underlying issue. Your approach is equivalent to focusing on IRBuilder to find the cause of mis-compilation. ]

I’m not sure I understand what you are saying. I’m not claiming that DIFactory is incorrect. I’m saying I don’t know how to use it properly.

At the moment, I’m testing my frontend on both Unbuntu (Maverick Meercat) and OS X (Leopard). They both fail, but in completely different ways.

On OS X, a lot of the debug info seems to be missing entirely, even though I am running dsymutil. That is, when I link the LLVM-generated modules with my runtime library (which is compiled by gcc) I only get debug information for the modules that were compiled by gcc. However, I’ve inspected the .bc files carefully and it appears as though the debug metadata is in fact there.

I’m guessing at this point that there is something wrong with my linker. (Because I don’t want to ship LLVM binaries around with my compiler, I have written a linker program which combines parts of ‘opt’ and ‘llc’, the GCStrategy pass, and the Reflection generator pass.

(Grr, I didn’t mean to hit send…I wasn’t finished.)