Optimizations and debug info

[Moving discussion to LLVMdev]

Hi Török,

@llvm.dbg.stoppoint actually does read and write memory, in a
sense. It's a point where a user could stop in a debugger, and
use the debugger to both read and write memory. If the optimizers
are allowed to reorder or delete memory operations, these
intrinsics will become inconsistent with the actual program.

If it's desirable to do optimizations and there are debug
intrinsics preventing that, it would be preferable to modify or
eliminate the debug intrinsics to get them out of the way,
rather than leave them in place and letting them become
inconsistent with the actual program state. This way, a debugger
could correctly tell the user "this is optimized code, I don't
know what's going on", which is fine, rather than presenting
bogus information, which is something we'd like to avoid. I
know several other people are thinking about how to do this;
it might be a good thing to bring up on llvmdev.
  
Hi Dan,

Indeed, there are optimizations that would invalidate debug info, but
there are really simple optimizations that could preserve debug info.

I am actually more interested in Analysis working in presence of debug
info, but in order to get any meaningful results, you need to run some
transformations, at least GVN, otherwise loops aren't transformed to
canonical form either.

So we could:
- teach GVN that a dependency on a debug instruction is not a real one,
and should ignore it
- teach GVN to remove stale dbg info due to its transformations
- if we cannot preserve dbg.stoppoint we may try to preserve
region.begin() and region.end() [or convert stoppoints to regions]

Consider you are debugging a crash in optimized code:

Hey, your program crashed somewhere in function foo in foo.o

vs.

Hey, your program crashed in function foo in region foo1.c:100 - foo1.c:120

Also preserving debug info about variable names can be useful.

http://nondot.org/sabre/LLVMNotes/EmbeddedMetadata.txt

For llvm.readcyclecounter, I think it's fine to inhibit
optimizations. It's hard to understand what it means if it
doesn't :-). Do you have an example where being able to
do optimizations would be useful?

Dan
  
I don't have a real example for the readcycles, but I could think that a
profiler wants to profile something, and inserts it and it ends up with:

loop
...
  read_cycle_counter
    call_to_function
  read_cycle_counter
....

Now, the presence of read_cycle_counter would prevent GVN from working
in that loop, since read_cycle_counter would be seen as a depedency of
every load and store (including the one from the induction variable). So
by inserting the read_cycle_counter it pessimized the code, and getting
wrong measurements.

Best regards,
--Edwin

Hi Dan,
I am actually more interested in Analysis working in presence of debug
info, but in order to get any meaningful results, you need to run some
transformations, at least GVN, otherwise loops aren't transformed to
canonical form either.

Right.

So we could:
- teach GVN that a dependency on a debug instruction is not a real one,
and should ignore it
- teach GVN to remove stale dbg info due to its transformations
- if we cannot preserve dbg.stoppoint we may try to preserve
region.begin() and region.end() [or convert stoppoints to regions]

I'd really like to avoid removing stoppoints unless they are redundant: two stoppoints that are right next to each other, so they represent the exact same state. Currently, I'm thinking that it would make sense to have instcombine zap two stoppoints that are right next to each other, but would like to avoid zapping them in other cases unless there is something that is really really hard to deal with.

OTOH, I'm not sure how to best handle global memory updates w.r.t. stoppoints. Dan is absolutely right that stoppoints represent a point where we'd like to preserve observable memory state, and so it makes sense to treat them as memory read points at the least. However, if we want to preserve stoppoints, then we have to give up on preserving memory consistency at those stoppoints.

This is a really tricky area with lots of tradeoffs, I don't claim to have all the answers :slight_smile:

Consider you are debugging a crash in optimized code:

Hey, your program crashed somewhere in function foo in foo.o

vs.

Hey, your program crashed in function foo in region foo1.c:100 - foo1.c:120

I think it's a bit worse than this. One of the reasons for my fondness for line number info has to do with profiling tools like shark/oprofile. These tools really do benefit a lot from having debug info for highly optimized code. They don't care at all about variable values, but it is very useful to have them attribute time samples to source lines (even if inherently fuzzy).

Also preserving debug info about variable names can be useful.

http://nondot.org/sabre/LLVMNotes/EmbeddedMetadata.txt

See also:
http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt

:slight_smile:

For llvm.readcyclecounter, I think it's fine to inhibit
optimizations. It's hard to understand what it means if it
doesn't :-). Do you have an example where being able to
do optimizations would be useful?

I don't have a real example for the readcycles, but I could think that a
profiler wants to profile something, and inserts it and it ends up with:

loop
...
read_cycle_counter
   call_to_function
read_cycle_counter
....

Now, the presence of read_cycle_counter would prevent GVN from working
in that loop, since read_cycle_counter would be seen as a depedency of
every load and store (including the one from the induction variable). So
by inserting the read_cycle_counter it pessimized the code, and getting
wrong measurements.

I agree with Dan here. These are quite different than debug info: they really should serve as optimization barriers of a sort. In your case here, you could always move the timing out of the loop, which is probably more precise anyway.

-Chris

  

Hi Dan,
I am actually more interested in Analysis working in presence of debug
info, but in order to get any meaningful results, you need to run some
transformations, at least GVN, otherwise loops aren't transformed to
canonical form either.
    
Right.

So we could:
- teach GVN that a dependency on a debug instruction is not a real
one,
and should ignore it
- teach GVN to remove stale dbg info due to its transformations
- if we cannot preserve dbg.stoppoint we may try to preserve
region.begin() and region.end() [or convert stoppoints to regions]
    
I'd really like to avoid removing stoppoints unless they are
redundant: two stoppoints that are right next to each other, so they
represent the exact same state. Currently, I'm thinking that it would
make sense to have instcombine zap two stoppoints that are right next
to each other, but would like to avoid zapping them in other cases
unless there is something that is really really hard to deal with.

OTOH, I'm not sure how to best handle global memory updates w.r.t.
stoppoints. Dan is absolutely right that stoppoints represent a point
where we'd like to preserve observable memory state, and so it makes
sense to treat them as memory read points at the least. However, if
we want to preserve stoppoints, then we have to give up on preserving
memory consistency at those stoppoints.

This is a really tricky area with lots of tradeoffs, I don't claim to
have all the answers :slight_smile:

I think we should make a difference somehow between the 3 cases of debug
info use:
- you single-step in gdb
- you just want to get a meaningful stacktrace when the program crashes
- you just want to get source:line info associated with an instruction
(e.g. your shark/oprofile situation below)

For the first case you need the memory consistency, and everything you
described above.
For the last two I think we could use a "best effort" approach of
preserving debug info where possible during optimizations,
but I'd rather see the debug info removed than my code pessimized :wink:

Consider you are debugging a crash in optimized code:

Hey, your program crashed somewhere in function foo in foo.o

vs.

Hey, your program crashed in function foo in region foo1.c:100 -
foo1.c:120
    
I think it's a bit worse than this. One of the reasons for my
fondness for line number info has to do with profiling tools like
shark/oprofile. These tools really do benefit a lot from having debug
info for highly optimized code. They don't care at all about variable
values, but it is very useful to have them attribute time samples to
source lines (even if inherently fuzzy).
  
Yes, I need something similar to that too, see above. (Analysis pass
that can give messages containing original source:line info).

Also preserving debug info about variable names can be useful.

http://nondot.org/sabre/LLVMNotes/EmbeddedMetadata.txt
    
See also:
http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt

:slight_smile:

Thanks for the pointer.

For llvm.readcyclecounter, I think it's fine to inhibit
optimizations. It's hard to understand what it means if it
doesn't :-). Do you have an example where being able to
do optimizations would be useful?

I don't have a real example for the readcycles, but I could think
that a
profiler wants to profile something, and inserts it and it ends up
with:

loop
...
read_cycle_counter
   call_to_function
read_cycle_counter
....

Now, the presence of read_cycle_counter would prevent GVN from working
in that loop, since read_cycle_counter would be seen as a depedency of
every load and store (including the one from the induction
variable). So
by inserting the read_cycle_counter it pessimized the code, and
getting
wrong measurements.
    
I agree with Dan here. These are quite different than debug info:
they really should serve as optimization barriers of a sort. In your
case here, you could always move the timing out of the loop, which is
probably more precise anyway.
  
Ok.

Best regards,
--Edwin

OTOH, I'm not sure how to best handle global memory updates w.r.t.
stoppoints. Dan is absolutely right that stoppoints represent a point
where we'd like to preserve observable memory state, and so it makes
sense to treat them as memory read points at the least. However, if
we want to preserve stoppoints, then we have to give up on preserving
memory consistency at those stoppoints.

This is a really tricky area with lots of tradeoffs, I don't claim to
have all the answers :slight_smile:

I think we should make a difference somehow between the 3 cases of debug
info use:
- you single-step in gdb
- you just want to get a meaningful stacktrace when the program crashes
- you just want to get source:line info associated with an instruction
(e.g. your shark/oprofile situation below)

For the first case you need the memory consistency, and everything you
described above.

Yes, you need it for correct output, but in practice, we can't provide that at high optimization levels. At least we can't do that if we preserve the guarantee that turning on debug info should not affect the generated executable code.

For the last two I think we could use a "best effort" approach of
preserving debug info where possible during optimizations,
but I'd rather see the debug info removed than my code pessimized :wink:

Yes, I think it is very important that the presence of debug info not affect generated code.

-Chris

What we could do is inform the client about the 'damage' done by the optimizer. If the optimization pass can not preserve debug info. then it can at least replace llvm.dbg.stoppoint with llvm.dbg.optimized_away_stoppoint. Now the clients can handle it in user friendly way appropriately. "Hey, you can not set a break point here because the optimizer optimized away this expression/statement."

I am just thinking out loud here...