Line number and merged calls

Dear all,

I’m having an issue with line number information in LLVM IR. Apparently codegen is merging two calls (with different line numbers) into one. This is not a GVN optimization: the calls are different, and are virtual calls. The optimization does not generate wrong code., it just invalidates accurate line number information. Briefly, the IR looks like this:

BB1:
do virtual call on obj1 !dbg (line 1)
branch BB3

BB2:
do virtual call on obj2 !dbg (line 2)
branch BB3

BB3:
do something

The LLVM JIT generates something that looks similar to this code:

BB1:
place arguments in registers and stack
place function to call in register R1
branch BB3

BB2:

place arguments in registers and stack
place function to call in register R1
branch BB3

BB3:
call R1 !dbg (line 1) <------ what about line 2?
do something

That’s with standard codegen. “Fast” codegen does not perform this optimization.

Is this a bug with respect to accurate line number information with optimizations? If not, is there a way to disable this optimization?

I browsed the source code to find out where this optimization occurs during code generation, but couldn’t find anything. Can someone point me where this optimization is performed?

Thanks!
Nicolas

Is this a bug with respect to accurate line number information with optimizations?

No. The way LLVM is designed, debug information is not allowed to interfere with code generation. In fact, if including debug information does change the generated code in any way, it is considered a bug.

If not, is there a way to disable this optimization?

Yes.

I browsed the source code to find out where this optimization occurs during code generation, but couldn't find anything. Can someone point me where this optimization is performed?

The transformation you describe is called tail merging or branch folding. The opposite transformation also occurs. It is called tail duplication.

Try -disable-branch-fold and perhaps -disable-tail-duplicate.

Note that debug information on optimized code is a best effort thing. If you depend on certain information to be present, you will almost certainly run into more problems like this.

/jakob

Hi Jakob,

Thanks for the reply!

Is this a bug with respect to accurate line number information with optimizations?

No. The way LLVM is designed, debug information is not allowed to interfere with code generation. In fact, if including debug information does change the generated code in any way, it is considered a bug.

OK.

If not, is there a way to disable this optimization?

Yes.

Great! :slight_smile:

I browsed the source code to find out where this optimization occurs during code generation, but couldn’t find anything. Can someone point me where this optimization is performed?

The transformation you describe is called tail merging or branch folding. The opposite transformation also occurs. It is called tail duplication.

Try -disable-branch-fold and perhaps -disable-tail-duplicate.

Note that debug information on optimized code is a best effort thing. If you depend on certain information to be present, you will almost certainly run into more problems like this.

So what I currently depend on, is that each call instruction must have an accurate line number information. I don’t care about the line number of other instructions. Will I still run into trouble? If the debug info is attached to the call, how can LLVM optimizers (codegen or IR) mess with it (except with that special branch folding optimization you just mentioned)?

Thanks!
Nicolas

Inlining can make that call disappear entirely.

Partial inlining can introduce new calls.

I don't know if there are optimizations like the codegen branch folding in the IR optimizer. I wouldn't surprise me.

Other than that, you should be perfectly safe :wink: