[PROPOSAL] per-function optimization level control

Hello,

We've had a high priority feature request from a number of our customers
to
provide per-function optimization in our Clang/LLVM compiler.
I would be interested in working with the community to implement this.
The idea is to allow the optimization level to be overridden
for specific functions.

The rest of this proposal is organized as follows:
- Section 1. describes this new feature and explains why and when
   per-function optimization options are useful;
- Sections 2. and 3. describe how the optimizer could be adapted/changed
   to allow the definition of per-function optimizations;
- Section 4. tries to outline a possible workflow for implementing this
   new feature.

I am looking for any feedback or suggestions etc.

Thanks!
Andrea Di Biagio
SN Systems Ltd.
http://www.snsys.com

1. Description

Since the intent is to provide overrides on a per-function basis, have you
considered using a function attribute instead of a pragma?

Especially since we have support for per function code gen attributes now.

-eric

Hi,

Especially since we have support for per function code gen attributes

now.

I think having function attributes would certainly be useful.
GCC for example provide support for both pragma and function attributes to
control per-function optimizations. Also, the effect of using pragma
optimize to control optimizations on a per-function basis is equivalent in
GCC to specifying the 'optimize' function attribute (followed by a string
describing an optimization option) for that function.

In my opinion we could also have both pragma and function attributes:
having a pragma does not mean that we cannot have function attributes.

A very common pattern that we see in our customers' code is a unity build.
That is, in order to reduce debug data overhead, and to improve link-time
they'll group large numbers of their source files together into single
"unity files". e.g.

// ------------------
// unity01.cpp
#include <physics.cpp>
#include <textures.cpp>
#include <renderer.cpp>
etc.
// ------------------

In some cases if they've narrowed down a problem, they'll want to debug
individual functions inside one of these files in which case an attribute
may be enough.
However, if they just know, for example, that they have a problem
somewhere in their texture code they'll often want to do something like
this:

// ------------------
// unity01.cpp
#include <physics.cpp>

#pragma clang optimize push
#pragma clang optimize "O0"
#include <textures.cpp>
#pragma clang optimize pop

#include <renderer.cpp>
etc.
// ------------------

Thanks,
-- Andrea DiBiagio

In conclusion, we could teach PassManagers how to retrieve constraints on
passes and which passes to run taking into account both:
- the information stored on Pass Constraints and
- the optimization level associated to single functions (if available);

I like this approach. Today, the way to know which passes are added is to
look at the functions and follow the branches for O1, O2, etc. Your
proposal is way cleaner and allows for a table-based approach. It also
makes it simpler to experiment with passes in different optimization levels
on randomized benchmarks.

I often tried to comment passes to identify bugs (that bugpoint wouldn't)
and realized that it could generate many segmentation faults in the
compiler, which is worrying...

3.1 How pass constraints can be used to select passes to run
------------------------------------------------------------
It is the responsibility of the pass manager to check the effective
optimization level for all passes with a registered set of constraints.

There is a catch here. Passes generally have unwritten dependencies which
you cannot tell just by looking at the code. Things like "run DCE after
PassFoo only if state of variable Bar is Baz" can sometimes only be found
out by going back on the commits that introduced them and finding that they
were indeed, introduced together and it's not just an artefact of code
movement elsewhere.

The table I refer above would have to have the dependencies (backwards and
forwards) with possible condition code (a virtual method) to define if it
has to pass or not, based on some context, in addition to which
optimization levels they should run. In theory, having that, would be just
a matter of listing all passes for O-N which nobody depends on and follow
all the dependencies to get the list of passes on the PassManager.

Removing a pass from the O3 level would have to remove all orphaned passes
that it would create, too. Just like Linux package management. :wink:

Pass Constraints should allow the definition of constraints on both
the optimization level and the size level.

Yes, AND to run, OR to not run.

In order to support per-function optimizations, we should modify the
existing SimpleInliner to allow adapting the Threshold dynamically based
on changes in the effective optimization level.

This is a can of worms. A few years back, when writing our front-end we
figured that since there weren't tests on inline thresholds of any other
value than the hard-coded one, anything outside a small range around the
hard-coded values would create codegen problems, segfaults, etc. It could
be much better now, but I doubt it's well tested yet.

As a future develelopment, we might allow setting the inlining threshold
using the optimize pragma.

This, again, would be good to write randomized tests. But before we have
some coverage, I wouldn't venture on doing that in real code.

Unfortunately changing how code generator passes are added to pass

managers require that we potentially make changes on target specific parts
of the
backend.

Shouldn't be too hard, but you'll have to look closely if there is any
back-end that depends on optimization levels to define other specific
properties (cascading dependencies).

4. Proposed Implementation Workflow

I think your workflow makes sense, and I agree that this is a nice feature
(for many uses). Thanks for looking into this!

cheers,
--renato

Just a quick note/wish (apologies if it was already mentioned): please add support for the 'hot' and 'cold' attributes...

Hello,

We've had a high priority feature request from a number of our customers
to
provide per-function optimization in our Clang/LLVM compiler.
I would be interested in working with the community to implement this.
The idea is to allow the optimization level to be overridden
for specific functions.

Have you looked at "Noise", presented at this year's European LLVM Conference?

Noise is a language extension that allows a programmer to create custom optimization
strategies and apply them to specific code segments. This enables fine-grained control
over the optimizations applied by the compiler to conveniently tune code without
actually rewriting it.

<http://llvm.org/devmtg/2013-04/#poster6>

This seems to go into the same direction as your proposal. It's not yet open source,
but it's planned: <http://www.cdl.uni-saarland.de/projects/noise/>

CU,
Jonathan

Hi Jonathan,

From: Jonathan Sauer <jonathan.sauer@gmx.de>

Have you looked at "Noise", presented at this year's European LLVM

Conference?

> Noise is a language extension that allows a programmer to create
custom optimization
> strategies and apply them to specific code segments. This enables
fine-grained control
> over the optimizations applied by the compiler to conveniently
tune code without
> actually rewriting it.
<http://llvm.org/devmtg/2013-04/#poster6>

This seems to go into the same direction as your proposal. It's not
yet open source,
but it's planned: <http://www.cdl.uni-saarland.de/projects/noise/>

Thanks for the feedback.
During the poster session I had a quick chat with Ralf Karrenberg who gave
a lightning talk on project Noise.

In my understanding, their approach consists in running a sequence of
extra optimization passes on
functions and/or blocks of code (either generic compound statements or
loop statements) guarded by the NOISE keyword.
Those extra passes are run on the IR produced by Clang and before the
optimizer takes place.

Their approach does not allow for example to selectively disable passes or
in general to override optimization options for specific functions.

Also, the sequence of extra passes is always run in order based on the
sequence specified by the user through the NOISE keyword.
No changes are required for the optimizer which still works as before:
1) pass managers are still populated based on the global optimization
level;
2) there is no way to dynamically select passes to run based on the
per-function optimization level.

The only use case (partially) in common between my proposal and their
approach seems to be the case where the user tries to run extra
optimizations on specific functions.

Hi,

I just wanted to bump this discussion in case anyone had any more comments
to make.

We're in a bit of a bind here as we've now had requests for this feature
from 10 separate customers, so we're going to be required to implement
this feature somehow in our private branch at least (all of the other
compilers they use already support some form of this feature so it is very
heavily used in our field). Obviously we don't want to significantly
diverge from the mainline so it would be great to work with the community
to implement this in such a way that it could be incorporated into the
mainline and be beneficial to all of the other users too :-).

Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

What is the common use case? Making sure some funtion is always
optimized or making sure it never optimized? If the second one, I
wonder if marking it cold would be a good enough approximation.

If we do need to enabled/disable passes run in each function, I would
suggest starting by proposing which attributes should be added to the
language reference.

Cheers,
Rafael

Wasn't this already proposed?
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html

LLVM already has optsize. Maybe it's just a matter of hooking up gcc's
attr(optimize) to it in clang, as a first approximation.

Wasn't this already proposed?
http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html

LLVM already has optsize. Maybe it's just a matter of hooking up gcc's
attr(optimize) to it in clang, as a first approximation.

Looks like it, yes!

Chandler, what were you thoughts on the pass manager? Should it select
the set of passes for a function based on the function's attributes?

Cheers,
Rafael

In reply to the question about what would be the common use case:

What is the common use case? Making sure some funtion is always
optimized or making sure it never optimized? If the second one, I
wonder if marking it cold would be a good enough approximation.

Although both cases would be nice and our users have expressed some
interest in both, the critical one is the second case of making sure that
some functions are never optimized is the most critical one. The major
use-case for this is for ease of debugging optimized builds. Generally,
the type of programs that our users are writing run so slowly in
unoptimized builds that they are essentially unusable for
testing/debugging purposes. Unfortunately in fully optimized builds, as
we all know, the debugging experience is not always entirely pleasant. Our
users generally build against multiple targets each with their own
compiler and have adopted the typical workflow of marking functions and
ranges of functions that need closer inspection in the debugger with a
pragma to prevent the optimizer from coming along and hurting the
debuggability of them whilst still running with everything else fully
optimized and at a usable speed.

> Wasn't this already proposed?
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-January/058112.html
>
> LLVM already has optsize. Maybe it's just a matter of hooking up

gcc's

> attr(optimize) to it in clang, as a first approximation.

Looks like it, yes!

Chandler, what were you thoughts on the pass manager? Should it select
the set of passes for a function based on the function's attributes?

Yes, Chandler's proposal goes on the same direction as our proposal. In
fact the declared goals was
"to allow a specific function to have its optimization level overridden
from the command line based level".
Our proposal tries also to focus more on how function attributes could be
used to guide pass managers in the process of selecting which passes to
run etc.

Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

Hi Andrea,

please excuse the very delayed response, this thread somehow got out of my sight before I found the time to respond.

As Jonathan and you have mentioned already, "Noise" is indeed going into the same direction. Let me comment on a few of your observations:

In my understanding, their approach consists in running a sequence of
extra optimization passes on
functions and/or blocks of code (either generic compound statements or
loop statements) guarded by the NOISE keyword.
Those extra passes are run on the IR produced by Clang and before the
optimizer takes place.

Their approach does not allow for example to selectively disable passes or
in general to override optimization options for specific functions.

This is not entirely true.
We actually *do* override the general optimization options set via command line for that specified piece of code. The intent of the noise attribute is exactly to give the programmer *full* control over what optimizations are applied to a given code segment (a function, loop, or compound statement). This includes the fact that using an empty noise attribute results in no optimization being applied.
The only thing we currently do not support is something along the lines of "please do -O3 but exclude passes X, Y, and Z". However, this is not a conceptual shortcoming but simply not implemented yet.

Also, the sequence of extra passes is always run in order based on the
sequence specified by the user through the NOISE keyword.
No changes are required for the optimizer which still works as before:
  1) pass managers are still populated based on the global optimization
level;

To further clarify what I stated above: This is only true for all code *except* the parts marked with noise attributes.

  2) there is no way to dynamically select passes to run based on the
per-function optimization level.

I don't understand what "dynamically" means here.

The only use case (partially) in common between my proposal and their
approach seems to be the case where the user tries to run extra
optimizations on specific functions.

To sum it up, frankly, I don't think so ;).

Take a look at the examples on our webpage if you like:
http://www.cdl.uni-saarland.de/projects/noise

All of those only show what happens to the attributed functions. The rest of the program is compiled as it would have been with an unmodified Clang (e.g. all code that is not marked is optimized with -O3 if that is supplied via command line).

Best,
Ralf

The pass manager, as it is designed now, doesn't have the capability to dynamically change pass configurations. Until that's fixed, the only way to do this would be for clang to build multiple pass managers. That would open a can of worm though.

Evan

Although both cases would be nice and our users have expressed some
interest in both, the critical one is the second case of making sure that
some functions are never optimized is the most critical one. The major
use-case for this is for ease of debugging optimized builds.

I have a similar usage case: I work on code that tends to show up optimiser
bugs, possibly because it is very thoroughly tested. Optimization control
pragmas are invaluable for locating optimizer bugs in a particular function;
the lack of them is one of the reasons why my GCC and Clang builds don't have
optimization turned up so high as on some other compilers.

GCC's optimize attribute should work fine (at least with trunk):

__attribute__((optimize("O3","no-tree-pre"))) int foo( ...)
{
    ...
}

will turn on -O3 for 'foo', but disable PRE pass for it.

If you see any problems there, you should file a bug.

Regarding Andrea's proposal -- the new #pragma can be useful (in rare
cases when there is a compiler bug), the intended use cases are
questionable:
1) it should not be used as a mechanism to triage compiler bugs -- the
compiler backend should have mechanism to allow any pass to be
disabled for any (range of) function(s) via command line options so
that it can be automated -- you should not expect doing this via
source modification
2) Improve debuggability of optimized code. GCC has -Og option that
can be used to generate well optimized code with good debuggability.
3) there is a much bigger issue if the customer needs to resort to
this pragmas frequently to hide optimizer bugs.

David

Hi,

GCC's optimize attribute should work fine (at least with trunk):

__attribute__((optimize("O3","no-tree-pre"))) int foo( ...)
{
     ...
}

will turn on -O3 for 'foo', but disable PRE pass for it.

Indeed, the optimize attribute should do the job if you require optimization control on function level only.

If you need finer-grained control mechanisms, you need to resort to a pragma or attributes (or whatever kind of annotation) approach.
For instance, noise allows you to annotate loops or compound statements.

Regarding Andrea's proposal -- the new #pragma can be useful (in rare
cases when there is a compiler bug), the intended use cases are
questionable:
1) it should not be used as a mechanism to triage compiler bugs -- the
compiler backend should have mechanism to allow any pass to be
disabled for any (range of) function(s) via command line options so
that it can be automated -- you should not expect doing this via
source modification
2) Improve debuggability of optimized code. GCC has -Og option that
can be used to generate well optimized code with good debuggability.
3) there is a much bigger issue if the customer needs to resort to
this pragmas frequently to hide optimizer bugs.

I agree, these are indeed questionable.
However, there is a very important use case that I would call "performance tuning of critical code segments".

Cheers,
Ralf

Hi,

GCC's optimize attribute should work fine (at least with trunk):

__attribute__((optimize("O3","no-tree-pre"))) int foo( ...)
{
     ...
}

will turn on -O3 for 'foo', but disable PRE pass for it.

Indeed, the optimize attribute should do the job if you require optimization
control on function level only.

If you need finer-grained control mechanisms, you need to resort to a pragma
or attributes (or whatever kind of annotation) approach.
For instance, noise allows you to annotate loops or compound statements.

yes -- I like the fine grain control capability provided by Noise --
it will be a very useful tool for compiler engineers to find
opportunities and tune the default compiler behavior. Using the
annotations in the source to 'permanently override compiler's
decisions won't be good as it may get stale/invalid overtime or be
invalid for different targets (e.g. unroll decisions).

Regarding Andrea's proposal -- the new #pragma can be useful (in rare
cases when there is a compiler bug), the intended use cases are
questionable:
1) it should not be used as a mechanism to triage compiler bugs -- the
compiler backend should have mechanism to allow any pass to be
disabled for any (range of) function(s) via command line options so
that it can be automated -- you should not expect doing this via
source modification
2) Improve debuggability of optimized code. GCC has -Og option that
can be used to generate well optimized code with good debuggability.
3) there is a much bigger issue if the customer needs to resort to
this pragmas frequently to hide optimizer bugs.

I agree, these are indeed questionable.
However, there is a very important use case that I would call "performance
tuning of critical code segments".

yes, and that.

thanks,

David

1) it should not be used as a mechanism to triage compiler bugs -- the
compiler backend should have mechanism to allow any pass to be
disabled for any (range of) function(s) via command line options so
that it can be automated -- you should not expect doing this via
source modification

That would be just fine; I'm not seeking pragmas specifically, just a
way to control optimisation with finer distinction than a whole file.

3) there is a much bigger issue if the customer needs to resort to
this pragmas frequently to hide optimizer bugs.

I'm not using them to hide optimise bugs, but to help isolate them so
that they can be reported effectively.