[RFC] Removing optimization size level from LLVM and relying on minsize/optsize

Currently in the LLVM IR optimization pipelines we pass around an OptimizationLevel, which consists of a speedup level and a size level (e.g. -O1 is {1, 0}, -Oz is {2, 2}). We use the size level to turn on/off some passes and also to determine inliner thresholds.

When attempting to add support for -Os/-Oz in https://reviews.llvm.org/D113738, I got some pushback saying that we should be relying on the function attributes minsize and optsize. The logical extension of that is to completely remove the size level from OptimizationLevel and rely on frontends to set minsize/optsize for -Os/-Oz. Passes that are disabled with -Os/-Oz can check those attributes instead.

There are some tests (e.g. inline-optsize.ll) that test that if we have optsize and -Oz, the lower inlining threshold (-Oz in this case) wins, but perhaps we can revisit that and calculate inline thresholds purely based on the function attributes.

Any thoughts?

I do not believe in encoding optimization levels in the IR. The optimization level is an option for the machinery of the compiler, and not part of the semantics of the program.

-Matt

Currently in the LLVM IR optimization pipelines we pass around an OptimizationLevel, which consists of a speedup level and a size level (e.g. -O1 is {1, 0}, -Oz is {2, 2}). We use the size level to turn on/off some passes and also to determine inliner thresholds.

When attempting to add support for -Os/-Oz in https://reviews.llvm.org/D113738, I got some pushback saying that we should be relying on the function attributes minsize and optsize. The logical extension of that is to completely remove the size level from OptimizationLevel and rely on frontends to set minsize/optsize for -Os/-Oz. Passes that are disabled with -Os/-Oz can check those attributes instead.

There are some tests (e.g. inline-optsize.ll) that test that if we have optsize and -Oz, the lower inlining threshold (-Oz in this case) wins, but perhaps we can revisit that and calculate inline thresholds purely based on the function attributes.

Any thoughts?

I do not believe in encoding optimization levels in the IR. The optimization level is an option for the machinery of the compiler, and not part of the semantics of the program.

While I agree it is not semantics, we already encode similar things, e.g. related to toolchains and architectures, optnone, ... in different places of the IR.
Maybe I'm missing why having such information in the IR is inherently bad.

I say this because I very much like to encode all optimizations levels in IR, incl. O0/1/2/3/z/s/..., such that we can select the level per function rather than per file.
We have a prototype for that incl. some pass manager work but it's not ready for prime time yet (IIRC).

~ Johannes

We defined "optnone" in order to allow selectively disabling
optimization at the source level; this is very useful to users.
You can argue about whether it conveys "IR semantics" but it
certainly reflects a choice made by the programmer, and to implement
that choice it needed to be recorded in the IR. We don't have any
other mechanism for conveying that kind of information to LLVM.

"optnone" was then leveraged to allow compiling "-flto -O0" on
some modules to be preserved through the LTO stage, which is why
Clang puts optnone on all functions at -O0. Chandler was quite
clear at the time that "no optimization" was different in kind
from "some level of optimization" and resisted encoding levels
other than optnone into the IR (although I believe optsize/minsize
predate optnone).

Encoding these things in IR means defining rules for how they
interact when IPOs find functions with differing optimization
controls. We punted on this for optnone, instead defining a rule
that said optnone functions had to be marked noinline, so the
inliner didn't need to learn a special rule about optnone functions.
I see things crossing the lists about how floating-point controls
get interprocedural behavior wrong all the time. My takeaway is:
    It's complicated, and we don't want to go there.
I'd be very hesitant to start throwing lots more combinations into
the mix.

--paulr

From: llvm-dev <llvm-dev-bounces@lists.llvm.org> On Behalf Of Johannes
Doerfert via llvm-dev
Sent: Friday, November 12, 2021 4:31 PM
To: Matt Arsenault <arsenm2@gmail.com>; Arthur Eubanks
<aeubanks@google.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>; Tarindu Jayatilaka
<tarindujayatilaka@gmail.com>
Subject: Re: [llvm-dev] [RFC] Removing optimization size level from LLVM
and relying on minsize/optsize

Currently in the LLVM IR optimization pipelines we pass around an

OptimizationLevel, which consists of a speedup level and a size level
(e.g. -O1 is {1, 0}, -Oz is {2, 2}). We use the size level to turn on/off
some passes and also to determine inliner thresholds.

When attempting to add support for -Os/-Oz in

https://urldefense.com/v3/https://reviews.llvm.org/D113738;!!JmoZiZGBv
3RvKRSx!p0ynTSMsF1gDjsDYRsLyrJBMVYJVoxDbIGcbR3O9ZFWpknnQtWFkW40vUtzP1oWu_Q
$
<https://urldefense.com/v3/https://reviews.llvm.org/D113738;!!JmoZiZGB
v3RvKRSx!p0ynTSMsF1gDjsDYRsLyrJBMVYJVoxDbIGcbR3O9ZFWpknnQtWFkW40vUtzP1oWu_
Q$ >, I got some pushback saying that we should be relying on the function
attributes minsize and optsize. The logical extension of that is to
completely remove the size level from OptimizationLevel and rely on
frontends to set minsize/optsize for -Os/-Oz. Passes that are disabled
with -Os/-Oz can check those attributes instead.

There are some tests (e.g. inline-optsize.ll) that test that if we have

optsize and -Oz, the lower inlining threshold (-Oz in this case) wins, but
perhaps we can revisit that and calculate inline thresholds purely based
on the function attributes.

Any thoughts?

I do not believe in encoding optimization levels in the IR. The

optimization level is an option for the machinery of the compiler, and not
part of the semantics of the program.

While I agree it is not semantics, we already encode similar things,
e.g. related to toolchains and architectures, optnone, ... in different
places of the IR.
Maybe I'm missing why having such information in the IR is inherently bad.

I say this because I very much like to encode all optimizations levels
in IR, incl. O0/1/2/3/z/s/..., such that we can select the level per
function rather than per file.
We have a prototype for that incl. some pass manager work but it's not
ready for prime time yet (IIRC).

We defined "optnone" in order to allow selectively disabling
optimization at the source level; this is very useful to users.
You can argue about whether it conveys "IR semantics" but it
certainly reflects a choice made by the programmer, and to implement
that choice it needed to be recorded in the IR. We don't have any
other mechanism for conveying that kind of information to LLVM.

"optnone" was then leveraged to allow compiling "-flto -O0" on
some modules to be preserved through the LTO stage, which is why
Clang puts optnone on all functions at -O0. Chandler was quite
clear at the time that "no optimization" was different in kind
from "some level of optimization" and resisted encoding levels
other than optnone into the IR (although I believe optsize/minsize
predate optnone).

Encoding these things in IR means defining rules for how they
interact when IPOs find functions with differing optimization
controls. We punted on this for optnone, instead defining a rule
that said optnone functions had to be marked noinline, so the
inliner didn't need to learn a special rule about optnone functions.
I see things crossing the lists about how floating-point controls
get interprocedural behavior wrong all the time. My takeaway is:
     It's complicated, and we don't want to go there.
I'd be very hesitant to start throwing lots more combinations into
the mix.

It's complicated yes, but we got it wrong already. You said so yourself
and any `no-ipa` discussion/bug will also attest to that. The idea that
*not* changing anything anymore is therefore somewhat counter-intuitive
to me. If we would actually sit down to define what the interactions are
we could fix existing problems and make progress. However, using the
`no-ipa` discussions again as example, w/o a new attribute there is no way
to make everyone happy. The problem is not that we "add too much", the
problem (often) is that we do overload what we have rather than
differentiate properly what we want. (Why we do that is a different question.)

`optnone`/`noinline` means different things to different people,
as did `derferenceable` and other attributes. The solution to these problems
was, and will continue to be, differentiation through new attributes/options.

Long story short, it's not the number of attributes/options that is the problem
but nailing down their semantics and interplay. That is not to say we get it
right by adding more, but history shows problems are solved through new ones if
the problems of existing ones are taken into account during their definition.

~ Johannes

How would PassBuilderPipelines.cpp getInlineParamsFromOptLevel be implemented - note that the params affect a module-wide policy. Or would there be module-wide attributes and a priority scheme? (I’m trying to understand the full proposal)

Hi Arthur,

If -Oz/-Os are implemented via front-end setting the optnone/minsize attribute, would that have any impact on code-size related IPOs like machine outlining?

~ Todd

With respect to Os/Oz, it seems to me that what is controlled by the attribute can largely be orthogonal to the actual passes that are executed.

In this sense the attributes (optnone, minsize, optsize) conveys “optimizations hints”, which can largely be orthogonal to how the pass pipeline is set up.
Of course, with this view the “minsize” attribute shouldn’t have much effect when setting up an O0 kind of pass pipeline.

On the other hand, I have a hard time seeing O1/O2/O3 encoded in the IR: I see these very much as pass pipeline setup. Encoding this in the IR would also be hard to reconcile with how we’ve been doing LTO. But I’d be interested to see a more detailed RFC if anyone wanted to look into this.

(sorry, haven’t read all the replies)

I think there was a fair bit of discussion on this when the optnone and minsize attributes were added way-back-when - Chandler at least expressed a pretty firm opinion that these more “fundamental” requests that are currently attributes were distinct from -O1/2/3 - like noinline, there’s some specific property we want from the code with those attributes that’s hardest to describe with O1/2/3 which are a vague sliding scale of size/perf tradeoffs.

But for things we already have attributes for - yeah, naively (ie: take my opinion with lots of salt) it seems reasonable to have them be the only way to convey that information. (wonder if/how bad it’d be to do that with -O0/optnone… maybe not much value in it? Not sure - could make LTO and non-LTO more similar maybe - would exercise the “is optnone respected even in a single file” more frequently)