Optimization passes and debug info

Hi all,

I've been fiddling around with debug info generated by clang, with the goal of
propagating line number info to our custom backend (which is not an llvm
backend, but does use llvm IR as its input).

I've created a small pass which strips all debug info except for stop points,
which are currently the only things we're interested in. Leaving only stop
points in actually works surprisingly well, most transformation passes keep
doing their work properly.

In our particular case, we found that SimplifyCFG fails to merge to basic
blocks. There is one basic block, which only contains a few phi nodes and an
unconditional branch. Normally, this block would be merged into its successor
(when it does not cause conflicts in the phi nodes). However, when debugging
info is enabled, a stop point shows up in this block. We really need to have
this block simplified away, but SimplifyCFG currently doesn't know about
stop points so leaves it be.

It would be simple to tell SimplifyCFG, for this particular transformation,
that a near-empty basic block (which can currently only contain phi nodes and
a single unconditional branch) can also contain stop points. This would simply
propagate the phi nodes where needed and throw away the stop point (which
really can't be propagated anywhere).

If your focus is on the optimizations, this is the right way to go (this
allows one to debug an optimized binary). However, the removal of the stop
point does slightly degrade the debugging capabilities (this particular case
mostly means you can't break just before a loop that's directly preceded by other
control flow). The debugging information will not become incorrect, only
incomplete.

If your focus is on getting complete debugging information, you would simply
not want to do this transformation, so you preserve all debugging
capabilities.

From this observation, I think it might be useful to have some kind of global
flag that tells transformations whether it is allowed to remove debugging code
in favour of optimizations. When we start making transformation passes
debug-info aware, I think the need for something like this might increase.

Gr.

Matthijs

I envision a three level flag that the transformation passes can use to make appropriate decision.

Level 1 Preserve all debug info and ability to single step in a debugger.
  If the transformation pass is not able to preserve these then the pass should skip the transformation.

Level 2 Preserve minimum debug info to help investigate crash reports from the field (stack traces etc).
  Here, it is ok if single stepping capabilities are degraded.

Level 3 Feel free to destroy debug info if it gets in you way.
  Here, the pass should not leave misleading debug info in the stream. If the info can not be fixed as part of transformation then the info should be just removed.

Well, technically you can move it back up before the merging and transform it to stop and then single step machine instructions until it hits the merged point. One would just need a way to express this in the debug info and have a debugger that would do that for you. :slight_smile:

Hi all,

I've been fiddling around with debug info generated by clang, with the goal of
propagating line number info to our custom backend (which is not an llvm
backend, but does use llvm IR as its input).

Cool.

I've created a small pass which strips all debug info except for stop points,
which are currently the only things we're interested in. Leaving only stop
points in actually works surprisingly well, most transformation passes keep
doing their work properly.

Ok

From this observation, I think it might be useful to have some kind of global
flag that tells transformations whether it is allowed to remove debugging code
in favour of optimizations. When we start making transformation passes
debug-info aware, I think the need for something like this might increase.

I think that the right answer for llvm-gcc at "-O3 -g" is (eventually) for debug info to be updated where possible but discarded when necessary as you describe. For llvm-gcc we really really want the non-debug related output of the compiler to be identical between "-O3" and "-O3 -g", and discarding this information is a reasonable way to do this.

If you're interested in a path forward, I describe one reasonable one here:
http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt

-Chris

Hi Chris,

> From this observation, I think it might be useful to have some kind of
> global flag that tells transformations whether it is allowed to remove
> debugging code in favour of optimizations. When we start making
> transformation passes debug-info aware, I think the need for something
> like this might increase.

I think that the right answer for llvm-gcc at "-O3 -g" is (eventually)
for debug info to be updated where possible but discarded when
necessary as you describe. For llvm-gcc we really really want the non-
debug related output of the compiler to be identical between "-O3" and
"-O3 -g", and discarding this information is a reasonable way to do
this.

You explicitely mention -O3 here, so I assume you will be wanting different
behaviour at -O2 and below? For this to work, some kind of gobal is required,
unless you want different passes to look at the -O commandline directly (which
sounds really, really bad).

Any suggestions on where to put this setting? I think making it a property of
PassManager seems to make sense? This way, the different tools can decide how
to expose the different settings to the user, and each transformation pass can
easily access the setting.

If you're interested in a path forward, I describe one reasonable one here:
http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt

There, you mainly describe a single goal: Prevent debugging info from
interfering with optimizations at all. When looking at Devang's proposal for a
three level debugging preservation setting, this would probably correspond to
level 3.

To recap:

Level 1 Preserve all debug info and ability to single step in a debugger.
        If the transformation pass is not able to preserve these then the pass
        should skip the transformation.

Level 2 Preserve minimum debug info to help investigate crash reports
        from the field (stack traces etc). Here, it is ok if single stepping
        capabilities are degraded.

Level 3 Feel free to destroy debug info if it gets in you way.
        Here, the pass should not leave misleading debug info in the stream.
        If the info can not be fixed as part of transformation then the info
        should be just removed.

Are these three levels acceptable? They look so to me.

I think it is important to define these levels and a setting for them now,
before any work starts on making transformations preserve any debugging info.
Alternatively (which I think your proposal implicitely states) passes could
just assume level 3, and once we get that working properly, we can worry about
the other levels. However, in a lot of cases it should be fairly trivial to
support all three levels, so it would be stupid not to implement all of them
at the same time. For the cases where level 3 is significantly easier, we
could implement just level 3 and add TODOs for the other levels.

Gr.

Matthijs

I think that the right answer for llvm-gcc at "-O3 -g" is (eventually)
for debug info to be updated where possible but discarded when
necessary as you describe. For llvm-gcc we really really want the non-
debug related output of the compiler to be identical between "-O3" and
"-O3 -g", and discarding this information is a reasonable way to do
this.

You explicitely mention -O3 here, so I assume you will be wanting different
behaviour at -O2 and below? For this to work, some kind of gobal is required,
unless you want different passes to look at the -O commandline directly (which
sounds really, really bad).

I just meant -O3 as an example. I'd expect all -O levels to have the same behavior. -O3 may run passes which are more "lossy" than -O1 does though, and I'd expect us to put the most effort into making passes run at -O1 update debug info.

Any suggestions on where to put this setting? I think making it a property of
PassManager seems to make sense? This way, the different tools can decide how
to expose the different settings to the user, and each transformation pass can
easily access the setting.

No new global needed :slight_smile:

If you're interested in a path forward, I describe one reasonable one here:
http://nondot.org/sabre/LLVMNotes/DebugInfoImprovements.txt

There, you mainly describe a single goal: Prevent debugging info from
interfering with optimizations at all.

Right, the first goal is to make it so that enabling -g doesn't affect codegen in any way. I consider this to be a very important goal.

When looking at Devang's proposal for a
three level debugging preservation setting, this would probably correspond to
level 3.

To recap:

Level 1 Preserve all debug info and ability to single step in a debugger.
        If the transformation pass is not able to preserve these then the pass
        should skip the transformation.

Level 2 Preserve minimum debug info to help investigate crash reports
        from the field (stack traces etc). Here, it is ok if single stepping
        capabilities are degraded.

Level 3 Feel free to destroy debug info if it gets in you way.
        Here, the pass should not leave misleading debug info in the stream.
        If the info can not be fixed as part of transformation then the info
        should be just removed.

Are these three levels acceptable? They look so to me.

These three levels are actually a completely different approach, on an orthogonal axis (reducing the size of debug info). I actually disagree strongly with these three levels, as the assumption is that we are willing to allow different codegen to get better debug info. I think that codegen should be controlled with -O (and friends) and that -g[123] should affect the size of debug info (e.g. whether macros are included, etc). If the default "-g" option corresponded to "-g2", then I think it would make sense for "-g1" to never emit location lists for example, just to shrink debug info.

I think debug info absolutely must not affect the generated code, and based on that, has some desirable features:

1. I agree with Alexandre Oliva that the compiler should never "lie" in debug info. If it has some information that is not accurate, it should respond with "I don't know" instead of making a wrong guess. This means that the debugger either says "your variable is certainly here" or it says "I don't know", it never says "your variable might be here".

2. Based on #1, it is preferable for all optimizations to update debug info. This improves the QoI of the debugging experience when the optimizations are enabled. However, based on #1, if they can't or haven't been updated yet, they should just discard it.

3. On an orthogonal axis (related to -g[123]), if an optimization is capable of updating information, but doing so would generate large debug info and the user doesn't want it - then it might choose to just discard the debug info instead of doing the update.

Does this seem reasonable?

-Chris

Hi Chris,

I just meant -O3 as an example. I'd expect all -O levels to have the
same behavior. -O3 may run passes which are more "lossy" than -O1
does though, and I'd expect us to put the most effort into making
passes run at -O1 update debug info.

I'm not really sure that you could divide passes into "lossy" and "not so
lossy" that easily.

For example, SimplifyCFG will be run at every -O level. This would imply that
it must be a "not so lossy" pass, since we don't want to completely thrash
debugging info at -O1.

However, being "not so lossy" would probably mean that SimplifyCFG will have
to skip a few simplifications. So, you will have to choose between two goals:
  1) -g should not affect the outputted code
  2) Transformations should preserve as much debug info as possible

I can't think of any way to properly combine these goals. It seems that goal
1) is more import to you, so giving 1) more priority than 2) could work out,
at least for the llvm-gcc code which defines optimization levels in this way.

However, I can imagine that having a -preserve-debugging flag, in addition to
the -O levels, would be very much welcome for developers (which would then
make goal 2) more important than 1)). Perhaps not so much as an option to
llvm-gcc, but even more so when using llvm as a library to create a custom
compiler.

Do you agree that goal 2) should be possible (even on the long term), or do
you think that llvm should never need it? In the latter case, I'll stop
discussing this, because for our project we don't really need it (though I
would very much like it myself, as an individual developer).

Say we do want goal 2) to be possible (of course not at the sime time as goal
1)), some kind of debugging preservation level is required AFAICS (Can't
think of any other solutions anyway). Now, even if we think that goal 1) is
more important on the short term, I would still suggest implementing this
level right now. Even though support for goal 2) will not be complete right
away (we can focus on 1) first), easy cases could be caught right away. I'm
afraid that only focussing on 1) now and later adding 2) might cause a lot of
extra work and missed corner cases. However, I might be completely
miscalculating this. If you think that this will not be a problem, or not a
significat problem, I'll stop discussing it as well, and just commit my
changes to get us to goal 1).

These three levels are actually a completely different approach, on an
orthogonal axis (reducing the size of debug info).

I'm not really sure what you mean with this. The idea behind the levels is to
find the balance in the optimization vs debug info completeness tradeoff.

I totally agree with keeping debug info consistent in all cases. Problems
occur when an optimization can't keep the debug info full consistent: It must
then either remove debug info or refrain from performing the optimization.

These levels will determine the balance between those two levels. Throwing
away more debug info will obviously reduce the size of the debug info, but
that's in no way a goal and only a side product.

I actually disagree strongly with these three levels, as the assumption is
that we are willing to allow different codegen to get better debug info.

Yes, this is indeed a tradeoff that I want to be able to make (see above).
This seems to be the fundamental point in this discussion :slight_smile:

I think that codegen should be controlled with -O (and friends) and that
-g[123] should affect the size of debug info (e.g. whether macros are
included, etc). If the default "-g" option corresponded to "-g2", then I
think it would make sense for "-g1" to never emit location lists for
example, just to shrink debug info.

I think that having the multiple -g options you describe is yet another axis,
that is related to which debug info is generated in the first place.

3. On an orthogonal axis (related to -g[123]), if an optimization is
capable of updating information, but doing so would generate large
debug info and the user doesn't want it - then it might choose to just
discard the debug info instead of doing the update.

I'm not sure when an optimization would be generating "large debug info", but
I'm not talking about any such thing.

Does this seem reasonable?

I think we're at least getting closer to making our points of view clear :slight_smile:

Gr.

Matthijs

1. I agree with Alexandre Oliva that the compiler should never "lie"
in debug info. If it has some information that is not accurate, it
should respond with "I don't know" instead of making a wrong guess.
This means that the debugger either says "your variable is certainly
here" or it says "I don't know", it never says "your variable might be
here".

I agree.

2. Based on #1, it is preferable for all optimizations to update debug
info. This improves the QoI of the debugging experience when the
optimizations are enabled. However, based on #1, if they can't or
haven't been updated yet, they should just discard it.

What about possibility of skipping the optimization if it can not update debug info successfully ?

When, I presented three levels, I did not focus on size of debug info. In fact, I focused on #1 feature you mention above.

The levels, I presented intended to match choices the optimizer can take if debug information is available:

choice 1: I know how to update debug info correctly.
choice 2: I'm a vectorizer, I do not know how to let the debugger single step in a vectorized loop, but I know how to update other debug information in this function.
choice 3: I do not know how to update debug info correctly. Should I do the optimization or skip ? Who wins, optimization level or debug-info level at the llvm-gcc command line ?

Hi Chris,

I just meant -O3 as an example. I'd expect all -O levels to have the
same behavior. -O3 may run passes which are more "lossy" than -O1
does though, and I'd expect us to put the most effort into making
passes run at -O1 update debug info.

I'm not really sure that you could divide passes into "lossy" and "not so
lossy" that easily.

For example, SimplifyCFG will be run at every -O level. This would imply that
it must be a "not so lossy" pass, since we don't want to completely thrash
debugging info at -O1.

Totally agreed,

However, being "not so lossy" would probably mean that SimplifyCFG will have
to skip a few simplifications. So, you will have to choose between two goals:
  1) -g should not affect the outputted code
  2) Transformations should preserve as much debug info as possible

I can't think of any way to properly combine these goals. It seems that goal
1) is more import to you, so giving 1) more priority than 2) could work out,
at least for the llvm-gcc code which defines optimization levels in this way.

I don't see how choosing between the two goals is necessary, you can have both. Take a concrete example, turning:

  if (c) {
     x = a;
   } else {
     x = b;
   }

into:

   x = c ? a : b

This is a case where our debug info won't be able to represent the xform correctly: if the select instruction is later expanded back out to a diamond in the code generator, we lost the line # info for the two assignments to x and the user won't be able to step into it. If the code generator doesn't expand it, you still have the same experience and there is no way to represent (in machine code) the original behavior.

That said, it doesn't really matter. This is an example where simplifycfg can just discard the line # info and we accept the loss of debug info. Even when run at -O1, I consider this to be acceptable. My point is that presence of debug info should not affect what xforms get done, and that (as a Quality of Implementation issue) xforms should ideally update as much debug info as they can. If they can't (or it is too much work to) update the debug info, they can just discard it.

However, I can imagine that having a -preserve-debugging flag, in addition to
the -O levels, would be very much welcome for developers (which would then
make goal 2) more important than 1)). Perhaps not so much as an option to
llvm-gcc, but even more so when using llvm as a library to create a custom
compiler.

Why? Is this is an "optimize as hard as you can without breaking debug info" flag? Who would use it (what use case)?

Do you agree that goal 2) should be possible (even on the long term), or do
you think that llvm should never need it? In the latter case, I'll stop
discussing this, because for our project we don't really need it (though I
would very much like it myself, as an individual developer).

I won't block such progress from being implemented, but I can't imagine llvm-gcc using it. I can see how it would make sense in the context of a JVM, when debugging hooks are enabled. Assuming that running at -O0 is not acceptable, this is a potential use case.

These three levels are actually a completely different approach, on an
orthogonal axis (reducing the size of debug info).

I'm not really sure what you mean with this. The idea behind the levels is to
find the balance in the optimization vs debug info completeness tradeoff.

There is no balance here, the two options are:

1) debug info never changes generated code.
2) optimization never breaks debug info.

The two are contradictory (unless all optimizations can perfectly update debug info, which they can't), so it is hard to balance them :). My perspective follows from use cases I imagine for C family of languages: I'll admit that other languages may certainly want #2. Can you talk about why you want this?

I totally agree with keeping debug info consistent in all cases. Problems
occur when an optimization can't keep the debug info full consistent: It must
then either remove debug info or refrain from performing the optimization.

In my proposal, the answer is to just remove the debug info, as above with the simplifycfg case.

I think that codegen should be controlled with -O (and friends) and that
-g[123] should affect the size of debug info (e.g. whether macros are
included, etc). If the default "-g" option corresponded to "-g2", then I
think it would make sense for "-g1" to never emit location lists for
example, just to shrink debug info.

I think that having the multiple -g options you describe is yet another axis,
that is related to which debug info is generated in the first place.

Fair enough.

Does this seem reasonable?

I think we're at least getting closer to making our points of view clear :slight_smile:

:slight_smile:

-Chris

In my opinion, for llvm-gcc, this would be extremely bad. Enabling -g should not affect the machine code generated by the compiler. One reason why: compiler bugs do happen, and some users like enabling -g to try to understand what the optimizer is doing and find the bug. If enabling -g changes the code being generated, this can't work.

-Chris

They will use level 3 in that case.

OK, I understand your point. I'll stop :slight_smile:

Hi guys,

However, I can imagine that having a -preserve-debugging flag, in addition to
the -O levels, would be very much welcome for developers (which would then
make goal 2) more important than 1)). Perhaps not so much as an option to
llvm-gcc, but even more so when using llvm as a library to create a custom
compiler.

Why? Is this is an "optimize as hard as you can without breaking debug info" flag? Who would use it (what use case)?

For those of us writing parallel and concurrent code this would be useful. Races may only manifest themselves under certain conditions that are triggered by optimized code, and tracking them down is really hard in the absence of debug information.

If one of the debug info preserving optimizations is the one triggering the race then having this option would help out. Of course, if it's a destructive optimization then we are out of luck, but one can always hope.

As a side note, I'm talking about both races introduced by reordering optimizations due to missing fences, and races that always exist but occur much more frequently because of the increase in speed in the optimized code.

Luke

Hi Chris,

> 1) -g should not affect the outputted code
> 2) Transformations should preserve as much debug info as possible

I don't see how choosing between the two goals is necessary, you can
have both. Take a concrete example, turning:

  if (c) {
     x = a;
   } else {
     x = b;
   }

into:

   x = c ? a : b

This is a case where our debug info won't be able to represent the
xform correctly

So that directly means choosing between two goals: You can either do the
transformation, but change debug info, or you can keep debug info (and thus
single stepping capabilities, for example) intact but that changes the
resulting code output.

Why? Is this is an "optimize as hard as you can without breaking debug
info" flag? Who would use it (what use case)?

The use case I see is when a bug is introduced or triggered by a
transformation. Ie, I observe a bug in my program. I compile with -g -O0
(since I want full stepping capabilities), but now the bug is gone.

So, I compile with -g -O2 and the bug is back, but my debugging info is
severely crippled, making debugging a lot less fun.

In this case, having the option of making the compiler try a bit harder to
preserve debugging info as useful to ease debugging. As pointed out by Luke,
there areas in which this is particularly important (he names parallel
programming and synchronization). I do think that the weak point of this
argument is that it the best it gets you is that debugging might get easier,
if you're lucky, but it might also make the bug vanish again.

To make this more specific, however, say that I have two nested loops.
  do {
    do {
      foo();
    } while (a());
    bar();
  } while (b());

When compiled, the loop header of the inner loop is in a lot of cases an empty
block, containing only phi nodes and an unconditional branch instruction. (Not
sure if the above example does this, I don't have clang or llvm-gcc at hand atm).

There is code in simplifycfg to remove such a block, which is possible in a
lot of cases. However, when debugging info is enabled, a stoppoint will be
generated inside such a block. This stoppoint represents the start of the
inner loop (ie, just before the inner loop is exected for the first time, not
the beginning of every iteration).

By default (at in your approach, always) the basic block is removed and the
stoppoint thrown away. This means that a fairly useful stoppoint is removed,
even at -O1 (since simplifycfg will run then).

I can see that in most cases, debugging at -O0 is probably sufficient.
However, I can't help but thinking that even in a debugging build, perfect
debugging info should be combinable with (partial) optimization.

I'm no longer sure that it is as important as I initally thought, though it
still feels like a shame if we would have no way whatsoever to be a bit more
conservative about throwing away debug info.

There is no balance here, the two options are:

1) debug info never changes generated code.
2) optimization never breaks debug info.

The two are contradictory (unless all optimizations can perfectly
update debug info, which they can't), so it is hard to balance
them :).

Especially because these two options are so contradictory, I can see a third
option in the middle. The above two options, corresponding to the outer
levels, are easy. If you can't update debug information through a
transformation, you either ignore that (option 1) or leave the code unchanged
(option 2). An extra middle level would try to find a balance. If the loss of
debug info is "small", you go ahead with the transformation, but if you lose
"a lot" of debug information, leave the code. The tricky part here is to
define where the border between "small" and "a lot" is, but that could be left
a little vague.

My perspective follows from use cases I imagine for C family
of languages: I'll admit that other languages may certainly want #2.
Can you talk about why you want this?

As stated above, I don't have a particularly solid reason, other than a decent
hunch of usefulness.

Since Devang originally proposed the three-level scheme (I originally thought
of having two levels only), perhaps he has some particular motivation to add
to this discussion? :slight_smile:

How would it be to add the proposed debugging levels, update some of the
passes and see how it turns out? I'm not sure I can invest enough time to
fully see this one through, though, since I'm going from fulltime to one day
per week after next week...

If we would add such a level, would you agree that the PassManager is a good
place to store it?

Gr.

Matthijs