Disabling certain optimizations at -O1?

Hi,

there are optimizations, mostly dealing with hoisting/merging common
code including function calls, that breaks stack trace symbolization
in a very bad way.

int f(int x) {
  if (x == 1)
    report("a");
  else if (x == 2)
    report("b");
  return 0;
}

For example, in the above function (at -O1) both calls to report() are
done from the same PC. As a result, stack trace (from inside report())
will point to the same source line no matter what branch was actually
taken. In practice, these two (or more) lines may be very far from
each other. This makes stack traces misleading and very hard to reason
about.

Mostly, we are running in this issue with sanitizers, where we care
about performance (and therefore don't use -O0), but want stacktraces
to be mostly sane anyway.

http://llvm-reviews.chandlerc.com/D2214
http://llvm-reviews.chandlerc.com/D2215

I've prepared two patches that deal with these issues by disabling
this and similar optimizations when building with sanitizers.

Would it be reasonable to disable these optimizations at -O1 instead?

I’d support disabling tail merging of calls at -O1. The other CFG simplification doesn’t seem like that big of a deal though for general debugging, though.

I’d support disabling tail merging of calls at -O1. The other CFG simplification doesn’t seem like that big of a deal though for general debugging, though.

I agree,

-Chris

So, do we have two ways of running SimplifyCFG? One for O1 and one for O2+,
and selectively disabling parts of it?

--renato

Reid,

by the other CFG simplification, do you mean this case in
http://llvm-reviews.chandlerc.com/D2214?

1: if (x < 0)
2: if (y < 0)
3: do_something();

Yes, this is only bad for MSan (and probably for sampling-based PGO,
but that's a whole different story). But the current version of D2214
disables this optimization at SimplifyUncondBranch() point, which also
covers HoistThenElseCodeToIf(). And that one is exactly as bad as tail
merging of calls:

void f(int x) {
  if (x == 1) {
    g();
    g1();
  } else {
    g();
    g2();
  }
}

Calls to g() are hoisted out of if/else blocks.

Also note that tail merging of calls happens in CodeGen, not in SimplifyCFG.

Hi Evgenly,

What we need is the general information that we want to avoid too much code
motion, from choosing the passes, to choosing steps on the passes, to
lowering code differently.

On the front-end layer, It's as simple as dealing -O levels.

On the middle-end, we could have front-ends to set a flag "debug-illusion"
on each individual pass, so that they could use this information to take
decisions locally, independent of the -O level (which they don't have
access to). This flag should only be set if the user requests -g and the
optimization level is not greater than 1.

On the back-end, I think the only place global enough that the front-end
has access to is the Target description, which could have a similar flag to
avoid folding too much during codegen.

cheers,
--renato

AFAIU, it's not OK for -g to affect code generation. I agree with the
rest of your plan.

AFAIU, it's not OK for -g to affect code generation. I agree with the
rest of your plan.

That's correct, -g must not affect code generation. This is a
fundamental mantra among debug-info people.

>>
>> Also note that tail merging of calls happens in CodeGen, not in
>> SimplifyCFG.
>
>
> Hi Evgenly,
>
> What we need is the general information that we want to avoid too much
code
> motion, from choosing the passes, to choosing steps on the passes, to
> lowering code differently.
>
> On the front-end layer, It's as simple as dealing -O levels.
>
> On the middle-end, we could have front-ends to set a flag "debug-
illusion"
> on each individual pass, so that they could use this information to
take
> decisions locally, independent of the -O level (which they don't have
access
> to). This flag should only be set if the user requests -g and the
> optimization level is not greater than 1.

Intuitively I'd expect that the set of passes to be run would vary with
opt level, and much less often would a pass want to vary its behavior
according to opt level. But "much less often" isn't "never" and so it
seems very weird that a pass wouldn't know the opt level.

Some indication of "be sanitizer/debugger friendly" that can guide
pass internal behavior seems like a good plan. I had this in a previous
compiler I worked on, and it was extremely helpful in producing code
that was easy to debug.
--paulr

That's correct, -g must not affect code generation. This is a
fundamental mantra among debug-info people.

I think you both got me on the wrong side, though I admit my email
wasn't clear. I didn't mean to suggest changing codegen for debug
purposes only, but on -O1 to be less aggressive on some parts than it
is today. Some flag saying "optimize-for-debug".

It's not easy to know what kind of optimization you can do that won't
change how the program runs, and thus changing how the program breaks,
so maybe the -g special flag was a bad idea to begin with. But the
need to make -O1 be debuggable could very well be the just the thing I
needed to give names to the optimization options.

Earlier this year I proposed we have names, rather than numbers, that
would represent our optimization levels:

0 = Debug
1 = FastDebug
2 = Speed
3 = Aggressive
S = Space
Z = SpaceAggressive

I'm assuming there is little value in -O1 than just a faster debug
experience, so why not make it take decisions on the debug illusion as
well? Ie. ignore my -g/-O1 dependency I proposed.

Intuitively I'd expect that the set of passes to be run would vary with
opt level, and much less often would a pass want to vary its behavior
according to opt level. But "much less often" isn't "never" and so it
seems very weird that a pass wouldn't know the opt level.

As far as I know (and I may be wrong), the passes only have access to
things like "optimize-for-space/speed".

Because optimization levels don't mean anything concrete, it'd be a
bit silly to have an "if (opt == 3) do this" in a pass.

Some indication of "be sanitizer/debugger friendly" that can guide
pass internal behavior seems like a good plan. I had this in a previous
compiler I worked on, and it was extremely helpful in producing code
that was easy to debug.

That's the plan.

If the optimization levels (at least in LLVM) have names, we can
easily make -OFastDebug and -ODebug have the same "debug" flag set,
and so on.

enum OptLevel {
  Debug = 1, // Debug Illusion
  Speed = 2, // Performance
  Space = 4, // Size
  Aggressive = 8,
}

-O0 = Debug
-O1 = Debug+Aggressive
-O2 = Speed
-O3 = Speed+Aggressive
-Os = Space
-Oz = Space+Aggressive

Then passes only need to know which flags are set...

In our specific case, not running SimplifyCFG and friends would only
need to know "if (OptLevel.isSpeed()) simplify();". At least the code
would make more sense to the reader...

cheers,
--renato

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: Wednesday, November 27, 2013 12:02 PM
To: Robinson, Paul
Cc: Evgeniy Stepanov; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Disabling certain optimizations at -O1?

> That's correct, -g must not affect code generation. This is a
> fundamental mantra among debug-info people.

I think you both got me on the wrong side, though I admit my email
wasn't clear. I didn't mean to suggest changing codegen for debug
purposes only, but on -O1 to be less aggressive on some parts than it
is today. Some flag saying "optimize-for-debug".

Ah, okay. Sounds good.

It's not easy to know what kind of optimization you can do that won't
change how the program runs, and thus changing how the program breaks,

Any optimization ought to preserve the overall behavior of the program.
The point where optimizations interfere with debugging is when they take
a simple mapping of instructions/values back to source code, and make
that mapping more complicated. When is it "too" complicated? That's not
a completely heuristic question, although it is too large a question to
get into here. In my experience, to a first approximation, anything
that changes the CFG or that reorders generated code beyond source
statement boundaries is likely to make things more difficult for the
debugger.

so maybe the -g special flag was a bad idea to begin with. But the
need to make -O1 be debuggable could very well be the just the thing I
needed to give names to the optimization options.

Earlier this year I proposed we have names, rather than numbers, that
would represent our optimization levels:

0 = Debug
1 = FastDebug
2 = Speed
3 = Aggressive
S = Space
Z = SpaceAggressive

I'm assuming there is little value in -O1 than just a faster debug
experience, so why not make it take decisions on the debug illusion as
well? Ie. ignore my -g/-O1 dependency I proposed.

Okay. I worked on compilers where -O1 was the default, actually, and
it was "generally fast enough" but still very easy to produce very
good debug info.

> Intuitively I'd expect that the set of passes to be run would vary
with
> opt level, and much less often would a pass want to vary its behavior
> according to opt level. But "much less often" isn't "never" and so it
> seems very weird that a pass wouldn't know the opt level.

As far as I know (and I may be wrong), the passes only have access to
things like "optimize-for-space/speed".

Because optimization levels don't mean anything concrete, it'd be a
bit silly to have an "if (opt == 3) do this" in a pass.

Hm okay, I could see that passes would rather have some kind of more
semantically significant criteria.

> Some indication of "be sanitizer/debugger friendly" that can guide
> pass internal behavior seems like a good plan. I had this in a
previous
> compiler I worked on, and it was extremely helpful in producing code
> that was easy to debug.

That's the plan.

If the optimization levels (at least in LLVM) have names, we can
easily make -OFastDebug and -ODebug have the same "debug" flag set,
and so on.

enum OptLevel {
  Debug = 1, // Debug Illusion
  Speed = 2, // Performance
  Space = 4, // Size
  Aggressive = 8,
}

-O0 = Debug
-O1 = Debug+Aggressive
-O2 = Speed
-O3 = Speed+Aggressive
-Os = Space
-Oz = Space+Aggressive

Then passes only need to know which flags are set...

I see what you're driving at although I'd quibble about two things.
- "Debug" is kind of overloaded, maybe "Simple" would express the right
semantic without being so ambiguous (avoid the -g thing!).
- I wonder whether Simple/Speed/Space are better modeled as a single
three-way setting and not as flags. Is it sensible to have more than
one of those three on at the same time? I wouldn't think so...
--paulr

In my experience, to a first approximation, anything
that changes the CFG or that reorders generated code beyond source
statement boundaries is likely to make things more difficult for the
debugger.

I agree.

Okay. I worked on compilers where -O1 was the default, actually, and
it was "generally fast enough" but still very easy to produce very
good debug info.

Me too, but if you search the meaning of that, it's actually to
maintain the debug illusion. You want to make it simple enough for the
user, the default should be a local minimum for all three functions:
speed, simplicity, debugability.

$ cc -g foo.c -o foo
$ dbg foo

It doesn't get simpler than that, if foo is "fast enough". But for the
debug session to *also* be simple, you must do your best to keep
program order on the resulting binary and to generate the best debug
info you possibly can.

That's -O1.

There are many local minima for all three functions, so different
compilers have different choices, but that, IMO, is the Holly Grail of
O1.

I see what you're driving at although I'd quibble about two things.
- "Debug" is kind of overloaded, maybe "Simple" would express the right
semantic without being so ambiguous (avoid the -g thing!).

I agree. Debug is specific. Simple is generic.

- I wonder whether Simple/Speed/Space are better modeled as a single
three-way setting and not as flags. Is it sensible to have more than
one of those three on at the same time? I wouldn't think so...

Yes, the idea is that they are mutually exclusive. What that means
underneath the bonnet doesn't matter, but as a *user* decision, it's
quite simple and powerful.

Just for context, this conversation started after a few emails were
sent asking the same question: In which opt level do I put my new
pass? What I found is that people have different concepts of what each
level should be, and that's because every other compiler take the
matters slightly different from all others. If the policy is clear,
than we can create a map of opt levels for passes that most people
agree.

cheers,
--renato

I'm not sure where the simplicity came in, nor why it's a particularly
important goal for debugging. (Clearly making it excessively "not simple"
would be bad, but if I'm trying to debug something having it be simple to
set-up/invoke isn't particularly important. Indeed, a most of the bugs
which really need a debugger are manifest in big applications where even a
non-debug build can be very "not simple".)

On the other hand, deciding on the precise trade-off between execution
speed and "interpretability of machine states in terms of the original
source" is indeed a significant problem for "source level" bugs which
manifest after a huge number of instructions have been executed (either
because the program is big, or it runs intensively before hitting the bug).

My example was a very crude example of simplicity. But the more
complex your application is, the simpler you want the compiler to be
for a debug session.

cheers,
--renato

Could we move this setting to function attributes?
We already have OptimizeForSize / MinSize there, but not the other opt
levels. We also have OptimizeNone, which seems to be completely
unused.
This would let us support __attribute__((optimize())) in the future,
which is currently ignored.
Another example would be an LTO link of objects compiled with
different optimization settings. I'm not sure if anyone would want
this in practice.

Could we move this setting to function attributes?

I think this is a good idea.

This would let us support __attribute__((optimize())) in the future,
which is currently ignored.

I'm adding #pragma vectorize enable which does more or less the same
thing as __attribute__(optimize(loop-vectorize)) or #pragma GCC
optimize loop-vectorize, but on a loop/block level. (Note to self, I
could do that on a function level, too).

For this change, I'll have to always add the vectorizer pass, but with
a flag on constructor specifying if I want it to always run, or only
with a pragma. The same thing can be done with the #pragma GCC
optimize (which we should support as is, but call #pragma optimize and
ask GCC to support both).

Same thing with optimization levels (#pragma optimize 3), we can embed
the knowledge of the flags and optimization level, so then when we
pass them to the passes, we already know what we want to run without
having to change in many places.

In the end, populateModulePassManager() should be about building the
flags table and (almost) unconditionally adding all passes with the
respective flags.

cheers,
--renato

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Evgeniy Stepanov
Sent: Sunday, December 01, 2013 2:45 AM
To: Renato Golin
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Disabling certain optimizations at -O1?

Could we move this setting to function attributes?
We already have OptimizeForSize / MinSize there, but not the other opt
levels. We also have OptimizeNone, which seems to be completely
unused.

Two of the three LLVM patches to implement OptimizeNone are in trunk,
the third is pending review. Once all of the LLVM patches are in, I
will commit the Clang patch to map __attribute__((optnone)) to the
OptimizeNone IR attribute.

This would let us support __attribute__((optimize())) in the future,
which is currently ignored.
Another example would be an LTO link of objects compiled with
different optimization settings. I'm not sure if anyone would want
this in practice.

We at least want LTO with some marked OptimizeNone. Converting the
optimization levels to per-function attributes is a different debate.
--paulr

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Renato Golin
Sent: Sunday, December 01, 2013 4:39 AM
To: Evgeniy Stepanov
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Disabling certain optimizations at -O1?

> Could we move this setting to function attributes?

I think this is a good idea.

You are now getting into behavior affecting the pass manager which
is a different debate.

> This would let us support __attribute__((optimize())) in the future,
> which is currently ignored.

I'm adding #pragma vectorize enable which does more or less the same
thing as __attribute__(optimize(loop-vectorize)) or #pragma GCC
optimize loop-vectorize, but on a loop/block level. (Note to self, I
could do that on a function level, too).

For this change, I'll have to always add the vectorizer pass, but with
a flag on constructor specifying if I want it to always run, or only
with a pragma. The same thing can be done with the #pragma GCC
optimize (which we should support as is, but call #pragma optimize and
ask GCC to support both).

No, because then we're saying Clang and GCC optimization levels track
each other. I haven't seen anybody say that degree of consistency
between GCC and Clang is a wonderful plan.

Same thing with optimization levels (#pragma optimize 3), we can embed
the knowledge of the flags and optimization level, so then when we
pass them to the passes, we already know what we want to run without
having to change in many places.

In the end, populateModulePassManager() should be about building the
flags table and (almost) unconditionally adding all passes with the
respective flags.

You need to engage Chandler about his long-term Pass Manager plan
about that. This is a rather larger topic than we started with on
this thread.
--paulr

No, because then we're saying Clang and GCC optimization levels track
each other. I haven't seen anybody say that degree of consistency
between GCC and Clang is a wonderful plan.

That's a good point.

You need to engage Chandler about his long-term Pass Manager plan
about that. This is a rather larger topic than we started with on
this thread.

Will do, thanks!

--renato

This is a bit of a tangent, but there are other optimizations which exhibit similar problems with stack trace preservation. For example, sibling call optimizations and self tail call elimination are both problematic. As are all forms of basic block commoning.

I have a use case which is similar to that of the sanitizers, but where the correctness of the stack traces is strictly required for correctness. For now, we're using an alternate mechanism, but we'd eventually like to move to relying on debug information for our stack traces.

Would it make sense to separate our a flag for preserving full and exactly stack traces? Using -O1 is one option, but it would be nice to move beyond -O1 with reasonable confidence that stack traces would be preserved. Would others be interested in such a feature?

Philip

I can't say I'm interested in that, but it shouldn't be too different
than a module-level #pragma optimize (no_this, no_that), which could
be supported if the table of flags has a rich enough semantics.

cheers,
--renato