Why LLVM doesn't have debug information of function right parentheses?

Simple Case:

1.int main()
2.{
3. int i = 0;
4. return 0;
5.}

compile command: clang -g a.c

In LLVM IR, we have one attribute named “scopeLine” to indicate the left parentheses. But we don’t have one attribute to indicate the right parentheses (line 5 in this example).

So if we use gdb to debug it:
(gdb) b main
Breakpoint 1 at 0x100005c8: file a.c, line 3.
(gdb) r
Breakpoint 1, main () at a.c:3
3 int i = 0;
Missing separate debuginfos,
(gdb) n
4 return 0;
(gdb) n
0x00003fffb7db4580 in generic_start_main.isra.0
We can not stop at line 5.

But GCC can stop at line 5
(gdb) b main
Breakpoint 1 at 0x100005e0: file a.c, line 3.
(gdb) r
Breakpoint 1, main () at a.c:3
3 int i = 0;
Missing separate debuginfos
(gdb) n
4 return 0;
(gdb) n
5 }
(gdb)

I’ve had this request from my users as well, but it has never been high enough on the priority list to look at closely.

I think it would be feasible to have the actual ret instruction associated with the closing brace, while the load/computation of the return value would be associated with the return statement; but that’s as far as I got when I looked at this before.

–paulr

P.S. The word “parenthesis” plural “parentheses” refers specifically to these characters: ( )

Generally are “square brackets” or sometimes just “brackets” while { } are called “braces” or “curly brackets.”

I have implemented this exact behavior in an out of tree LLVM fork I
maintain, where one of my users needed this behavior, and it seems to
work well. What we have done is extend the definition of DISubprogram to
contain a new field "endLine" which holds the line number of the closing
brace. A pass late in our backend uses this information to set the
DebugLoc of return instructions in our programs.

I haven't yet tidied up and submitted this upstream for review, as
without a consumer of this information, the extension itself is rather
dead, but could if some backends would find that information useful and
make use of it.

Thanks,
Simon

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of Simon
Cook via llvm-dev
Sent: Thursday, August 03, 2017 11:06 AM
To: llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] Why LLVM doesn't have debug information of
function right parentheses?

I have implemented this exact behavior in an out of tree LLVM fork I
maintain, where one of my users needed this behavior, and it seems to
work well. What we have done is extend the definition of DISubprogram to
contain a new field "endLine" which holds the line number of the closing
brace. A pass late in our backend uses this information to set the
DebugLoc of return instructions in our programs.

Interesting. I'd have thought the front end could generate the
return-value expression with the source location of the expression,
and the return instruction itself with source location of the closing
brace. Then it would automatically apply to all targets (and all
debug-info formats).

Or, if that distinction got lost during some optimization, the
separate source location could be attached during instruction
selection? Hopefully that could also be done in a target-neutral
way.
--paulr

I have implemented this exact behavior in an out of tree LLVM fork I
maintain, where one of my users needed this behavior, and it seems to
work well. What we have done is extend the definition of DISubprogram to
contain a new field "endLine" which holds the line number of the closing
brace. A pass late in our backend uses this information to set the
DebugLoc of return instructions in our programs.

Interesting. I'd have thought the front end could generate the
return-value expression with the source location of the expression,
and the return instruction itself with source location of the closing
brace. Then it would automatically apply to all targets (and all
debug-info formats).

This works in most cases. It feels the cleaner solution for its
target-independence, and was the first way that I tried to solve this
problem. It works, apart from the most trivial of functions.

The case I'm thinking of is a function that just returns a constant, we
only have one instruction to attach a source location to, and ideally
would like to associate the lowering of the return value with one
location, and the actual return with the other. This is what caused me
to switch from a return instruction associated location, to something
that I unconditionally set later on in my backend. I suppose the same
could occur during some optimizations, and we would still want to
recover the location.

Or, if that distinction got lost during some optimization, the
separate source location could be attached during instruction
selection? Hopefully that could also be done in a target-neutral
way.

Aah, good idea. Having this as part of instruction selection/return
lowering sounds like a more appropriate place to set the location
generically if it needs resetting, as it should be safe from thereon in.

Thanks,
Simon

Simon, I also think of the way you did. :slight_smile: And from my initial investigation, clang should also has some work(i.e. provide the end location for the “endLine” field of DISubprogram), right?

BTW, Simon, your fork of LLVM is open source or not? If open source, could you give the address of it?

Thanks

< Simon Cook via llvm-dev> 在 2017-08-04 00:10:59 写道:

I should also point out the clang does generate instructions associated with closing curly braces for calls to destructors of stack-allocated C++ objects. Check out test/CodeGenCXX/linetable-cleanup.cpp for examples of this behavior.

-- adrian

What happened with this? Is there something for review in Phabricator? Or was this put-on-hold/forgotten?

I also have some users that complain that some gdb test suite tests don't work with LLVM due to missing debug info regarding ending brace.
So gettint the location for the "endLine" field of DISubprogram etc should at least be a step in the right direction,
and if someone already has a fix for that it sounds interesting.

/Björn

I added one attribute named EndLine in LLVM IR before. LLVM’s part is not hard, but will modify many places in Clang. I success for it, you can try this way.

Hi,

I really wouldn’t be sure this is the right direction to go anyway - as pointed out, there coudl be a return of a constant which would be a single instruction & it would make more sense to me to attribute that to the line where “return” is written, than where the closing brace is.

I think this is, for my money, a legitimate difference in implementations between GCC and Clang - not a case of one being right/better than the other.

(adding echristo@ so he can speak for some of this if he wants to, since they were choices made a while back)

Remembering a discussion as to why we do this from a few years ago: (makes note to start writing design decisions this way down somewhere)

I was in favor of the current method for a few reasons:

a) ensuring that it would “always happen” that we had either a unified return block or a well propagated location onto return instructions seemed like it would be difficult to maintain and very subject to optimization differences.
b) The “always stop at the end brace of the function” seemed weirder to me than just being able to say “finish” when stepping in a debugger
c) I worried about profiling information that used debug information being accurate or ascribing cycles to the closing brace of a function which seemed to be a loss of fidelity.

Thoughts?

-eric

(Re-adding llvm-dev)