Debug Info and Multi-Line statements (particularly "do" statements)

Hello all,

While I understand that debug data is best-effort, I am running into some trouble with an analysis tool I’m building partially using an LLVM pass. There are two (seemingly directly related) cases where debug data is not missing (which wouldn’t be a problem for me) but very misleading.

First, in the case of multi-line statements, clang will assign the debug data for each expression to be the first line of the statement but continue incrementing the column number past the number of columns on the line. The following code gives an example:
[1] int foo(){
[2] int x, y, z;
[3] if(x < y ||
[4] y < z){
[5] x++;
[6] }
[7] }
In this example, when my LLVM pass gets the bitcode both the (x<y) and (y<z) expressions are assigned DebugLocs which have line number 3 (but with increasing column numbers).

Second, in the case of a do-while statement where the closing } and the while condition are not on the same line, all expressions in the while condition are assigned the line number of the }. The following code gives an example:
[1] int foo(){
[2] int x;
[3] do
[4] {
[5] x++;
[6] }
[7] while(x < 10);
[8] }
In this case, the DebugLoc for the (x<10) expression lists line number 6.

The first case is not so difficult for me to handle, but the second seems much more difficult to recognize and deal with. Note that these problems will also show up in a normal gdb session if the debug symbols are generated by clang, and one will be unable to set a breakpoint on line 7 in example 2 above.

Does anyone have any insight into this issue? I am willing to look into it myself if no one is extremely familiar with debug data generation, but I have very limited knowledge and experience with the clang codebase, so I would definitely appreciate any pointers to appropriate source locations and/or ideas about precisely what causes this behavior you would be willing to provide.

Thanks,
Peter

Hi Peter,

When I try this, in the IR produced by 'clang -g', they both have line
3 column 0. If I use 'clang -g -gcolumn-info', they both have line 3
column 3 (the location of the 'if' token). That seems to make somewhat
more sense, but is still probably not what we want.

Yeah, not so much.

Peter: If you get a chance to file a bug in bugzilla so we can track it I'd
appreciate it.

Thanks!

-eric

Bug #15076.

Thanks,
Peter