CloneFunction during LTO leads to seg fault?

Hello,

I'm writing an LTO pass and I'd like to be able to duplicate a function (with debugging info). I'm trying to accomplish this with CloneFunction but it's leading to a seg fault in ld.

I've whittled down my problem so that it occurs in this small pass [1].

If I run this pass with opt, I get the expected result (i.e. a valid program that calls main twice). If I run the pass during LTO, ld seg faults. Here is a pastebin of when ld seg faults in lldb [2].

If I set the third parameter of CloneFunction (ModuleLevelChanges) to false, then then compilation will complete -- but I lose debugging information in the cloned function. I really want to preserve the debug info.

Any idea what is wrong? Is this a bug? Usually when I screw up the module the verifier catches the problem. This is getting to I think CodeGen then crashing.

I've found that the module itself needs to be non-trivial to cause the crash. Here is the module I'm testing with [3].

In case it is relevant: to get my pass to run during LTO I added "PM.add(createHelloPass())" to PassMangerBuilder::populateLTOPassManager. I'm using binutils-gold on Linux (Ubuntu 14.04 LTS).

Thank you,
Scott A. Carr
PhD Candidate
Purdue University

[1] namespace { // Hello - The first implementation, without getAnalysisUsage. - Pastebin.com

[2] * thread #1: tid = 8065, 0x00007ffff5488dc7 libLLVMAsmPrinter.so`llvm::DwarfComp - Pastebin.com

[3] #include <stdio.h>#include <time.h>#include <stdlib.h>#include <string.h> - Pastebin.com

Hello,

I'm writing an LTO pass and I'd like to be able to duplicate a function (with debugging info). I'm trying to accomplish this with CloneFunction but it's leading to a seg fault in ld.

I've whittled down my problem so that it occurs in this small pass [1].

If I run this pass with opt, I get the expected result (i.e. a valid program that calls main twice). If I run the pass during LTO, ld seg faults. Here is a pastebin of when ld seg faults in lldb [2].

If I set the third parameter of CloneFunction (ModuleLevelChanges) to false, then then compilation will complete -- but I lose debugging information in the cloned function. I really want to preserve the debug info.

Any idea what is wrong? Is this a bug? Usually when I screw up the module the verifier catches the problem. This is getting to I think CodeGen then crashing.

What could help is to pass -mllvm -print-after-all to ld and get the IR before and after your pass ran, and also just before CodeGen.

I've found that the module itself needs to be non-trivial to cause the crash. Here is the module I'm testing with [3].

In case it is relevant: to get my pass to run during LTO I added "PM.add(createHelloPass())" to PassMangerBuilder::populateLTOPassManager. I'm using binutils-gold on Linux (Ubuntu 14.04 LTS).

You may want to try to add it at the end of the pipeline, in case something does not play well with optimizations (just trying to pinpoint where the issue is).

Hi Medhi,

Thanks for you reply. Here is the full output of -print-after-all [1] and just the module itself after my pass[2].

I've looked over the IR, but I can't see anything obviously wrong.

I'm not sure what you meant by:

You may want to try to add it at the end of the pipeline

My pass is the last one added inside populateLTOPassManager. Should I add it to the PassManager somewhere else?

Thanks,
Scott

[1] print_all_after.ll · GitHub
[2] target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"target triple = "x8 - Pastebin.com

Hello again,

I've narrowed down my issue to a small test case. The core of the issue is that CloneFunction (with ModuleLevelChanges=true) on a function that has had other functions inlined into it generates orphan debug info metadata nodes. Then when this module is emitted, DwarfDebug crashes.

The C program is here [1]. Note that the inliner should inline add into main.

If compile that C program with "-g -S -emit-llvm" and run the result through opt with -std-link-opts, I get [2]. The optimizer does in fact inline add into main.

Then I run my pass on [2] and I get [3].

All my pass does is:

     ValueToValueMapTy vMap;
     auto main = M.getFunction("main");
     auto mainDup = CloneFunction(main, vMap, true);
     M.getFunctionList().push_back(mainDup);

Note that [2] and [3] are virtually the same except: 1) there is a new function "main.1" which is a copy of "main" and 2) there are more debug info metadata nodes at the end of the module. If you look at "!44" you'll see that it is not in the subprograms of any DICompileUnit. I think this is the problem.

If I compile [3], I get a seg fault at [4]. I back traced to here [5]. There's no check if SP is actually in SPMap. My guess is that it is missing in the crash case.

Is this a bug? Is there an assumption that a function won't be cloned after inlining? Is there an assumption that the debug info metadata nodes are well formed when they get to DwarfDebug?

I'm outside of my sliver of LLVM knowledge here. I mostly write transform passes. Is there a work around aside from disabling inlining or making sure my pass runs before it? If this is a bug, I'd be happy to try to help fix it, but I'd need some guidance.

Thanks,
Scott

[1] #include <stdio.h>#include <string.h>int add(int x, y) { int z = x + y; - Pastebin.com
[2] ; ModuleID = '<stdin>'target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S1 - Pastebin.com
[3] ; ModuleID = '<stdin>'target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S1 - Pastebin.com
[4] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L610
[5] https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L369

y

Hello again,

I've narrowed down my issue to a small test case. The core of the issue is that CloneFunction (with ModuleLevelChanges=true) on a function that has had other functions inlined into it generates orphan debug info metadata nodes. Then when this module is emitted, DwarfDebug crashes.

The C program is here [1]. Note that the inliner should inline add into main.

If compile that C program with "-g -S -emit-llvm" and run the result through opt with -std-link-opts, I get [2]. The optimizer does in fact inline add into main.

Then I run my pass on [2] and I get [3].

All my pass does is:

   ValueToValueMapTy vMap;
   auto main = M.getFunction("main");
   auto mainDup = CloneFunction(main, vMap, true);
   M.getFunctionList().push_back(mainDup);

Note that [2] and [3] are virtually the same except: 1) there is a new function "main.1" which is a copy of "main" and 2) there are more debug info metadata nodes at the end of the module. If you look at "!44" you'll see that it is not in the subprograms of any DICompileUnit. I think this is the problem.

If I compile [3], I get a seg fault at [4]. I back traced to here [5]. There's no check if SP is actually in SPMap. My guess is that it is missing in the crash case.

Adrian CC for what the debug info question.

Is this a bug? Is there an assumption that a function won't be cloned after inlining? Is there an assumption that the debug info metadata nodes are well formed when they get to DwarfDebug?

There shouldn't be such assumptions, and if there is such an assumption the compiler should assert instead of segfaulting (assuming you built with assertions).
My guess i that CloneFunction hasn't been tested with this case and it is a bug.

The problem is that the Clone* functions don't clean up debug info at all. This also affects split codegen in LTO (as is, it won't work if you have debug info). A work-around is to run something similar to StripDeadDebugInfo after cloning - Sergei (cc'ed) has seen some success using an approach based on that.

Tobias

In general I use DebugInfoFinder and clear out Metadata if GV in null or GV->isDeclaration().

If there is any interest, I can post that patch...

Sergei

Hi Sergei,

I would like to see a patch if you have one.

StripDeadDebugInfo doesn’t work for me because the new debug info nodes (after clone) are not the child of any llvm.dbg.cu node.

I’m not sure how deleting the new metadata nodes would work because they are referenced in the cloned functions.

Thanks,
Scott

I'll work on that in next couple days.

Sergei

Scott,

  You can see my solution for a specific problem I face with the stale debug info here

http://reviews.llvm.org/D17338

I am first to admit that this treats the effect but not the cause of the issue. In my mind CloneModule should not have created those records in the first place, but I am not sure how to prevent it from doing it elegantly. Maybe Mehdi or Peter can suggest a more direct way...

Without this patch stale debug info can retain references to objects not cloned by CloneModule which might produces relocations to non-existing objects and obviously fails to link.

Thanks.

Sergei