Redefining optnone to help LTO

In D28404, Mehdi wanted to use the 'optnone' attribute as a way to record
"I was compiled with -O0" in the IR, because it seems like a good idea to
remember that fact in an LTO compilation and there is no way to remember
that fact currently. A couple of people felt it might be better to have
this idea discussed on the dev list, where it might get better exposure,
so I'm volunteering to get that discussion started.

While 'optnone' does cause lots of optimizations to bypass a function,
exactly matching -O0 was not the motivation and never a hard requirement.
The implementation makes a distinct effort to get close to the behavior
of -O0, but it's not an exact match and for the intended purpose (allowing
a given function to be un-optimized to help debugging) it worked fine.

Using 'optnone' to convey -O0 to LTO is something of a redefinition, or
at least a re-purposing, of the attribute. To get there from here, I
think we would need a couple of things to happen, separately from the
minor grunt work of adding 'optnone' to function IR at -O0.

1) Update the LangRef definition of 'optnone' to reflect this intent.
The current definition doesn't provide a motivation, and the description
is (deliberately) a bit vague. If we want 'optnone' to intentionally
match -O0, that should be tightened up.

2) Make a concerted effort to teach 'optnone' to targets. Currently
I know the X86 target is aware of it, but I'm not so sure about others.

3) Take another look at what 'optnone' currently does *not* turn off,
and see if there is something we can do about that. In some cases this
will not be practical, and we may just have to live with that.

(Okay, we need 3 things to happen.)

I won't say this is blocking Mehdi's work, but it would remove a
point of contention and allow the review to proceed more smoothly.
--paulr

As someone who spent rather a lot of effort getting optnone done in the
first place, let me say explicitly that I support this redefinition.

it seems like a good idea to remember that fact in an LTO compilation

In fact the use-case is rather more compelling than I said. One goal is
supporting cross-component IR analyses such as those needed for Call-Frame
Integrity, while still allowing one or more components to be compiled with
-O0 (and having LTO comply with the -O0 request).

I don't plan to do any of the actual work, but the idea WFM.
--paulr

I would prefer we introduce a new attribute for this purpose. I regularly use optnone for debugging/reduction purposes or when trying to understand the interaction of our pass pipeline. Having an attribute that tells the optimizer to ignore a function (more or less) is really useful, and I'd rather not loose that functionality.

Philip

Can you clarify what would be the semantic of this new attribute compared to optnone?

Thanks,

What is the end goal? If the goal is roughly “if a user passes -O0 when compiling a TU in LTO mode, their final binary should have functions that look like the result of -O0 noLTO compilation”, then the frontend can just emit a normal -O0 object file I think. What is the advantage of passing bitcode all the way to the linker only to jump through hoops to prevent LTO code generation from modifying it?

– Sean Silva

That’s a good point, that addresses the “I want LTO for optimization”, which seems contradictory to use -flto -O0, and could even advocate for forbidding (or warning).

However LTO has other uses than optimizations: for instance instrumentations that needs full program access.
One of them is CFI: http://clang.llvm.org/docs/ControlFlowIntegrity.html
If you want to debug and rebuild only part of the program, CFI still requires to use LTO, IIUC.

What is the end goal? If the goal is roughly "if a user passes -O0 when
compiling a TU in LTO mode, their final binary should have functions that
look like the result of -O0 noLTO compilation", then the frontend can just
emit a normal -O0 object file I think. What is the advantage of passing
bitcode all the way to the linker only to jump through hoops to prevent LTO
code generation from modifying it?

That’s a good point, that addresses the “I want LTO for optimization”,
which seems contradictory to use `-flto -O0`, and could even advocate for
forbidding (or warning).

However LTO has other uses than optimizations: for instance
instrumentations that needs full program access.
One of them is CFI: http://clang.llvm.org/docs/ControlFlowIntegrity.html
If you want to debug and rebuild only part of the program, CFI still
requires to use LTO, IIUC.

That's a good point. I wonder if there are any commonalities of this
problem and the "hosted"/"freestanding" issues recently? These all seem to
tie into a common theme of "when I use LTO, various per-TU settings don't
make it to the LTO code generator"; can we adopt a uniform solution for
this class of problems, like always using function attributes or something?
This probably won't be the last of such issues, and we should have a
"standard solution" for them.

Taking a step back, consider the "trend" as we try to persist more per-TU
options to LTO: we will have more and more attributes (or whatever) telling
the optimizer and code generator what to do in greater and greater detail.
In such a world, what is the role of the frontend setting up the pass
pipeline, target info, etc. using calls into the LLVM libraries? If, for
LTO, we have to serialize those things anyway, then should frontends prefer
to simply add the annotations into the IR instead of making calls into the
LLVM libraries to configure the code generation?

Things like -mllvm options suggest that we're never really going to persist
"everything affecting codegen" into the IR on a per-TU basis during LTO
(can't control -mllvm options per-function). So is our approach here
basically to persist compilation options into the IR on an as-needed (i.e.
ad-hoc) basis? E.g. we go out of our way to persist -O0 (using e.g.
optnone) but don't do anything special for -O1. Also what is the
interaction with --lto-O[0123]? (i.e. linker options controlling the
optimization level used during LTO codegen)

I'm just trying to understand the bigger picture here.

Personally, my mental model has always been that the flags that you pass to
per-TU compilation are instructions for that compilation, and should not
influence things like optimization level for LTO code generation (which
will run at a different time in a different program). That's at least easy
to document.

-- Sean Silva

Yes, the optnone thing is indeed part of the work to bring “everything” needed to setup TLI, CodeGen and so on into function attributes.
I was working on the TLI and “pulling strings” I ended-up starting with optnone.

Yes, as much as possible. This is inline with the work Eric has been doing about having subtargets selected/configured with function attributes. We’re also trying to get rid of global flags in SelectionDAG or other phases of the codegen in favor of function attributes or individual instruction flags for Fast-Math.

I believe -mllvm options are supposed to be “developer options” only and are not “supported” or supposed to be exposed to the end user. For example I don’t believe the clang docs is referencing the -mllvm, right?

The discussion in the revision addressed this, but in particular the last comment of Chandler in the revision should answer this somehow: https://reviews.llvm.org/D28404

"Unlike the differences between -O[123], all of -O0, -Os, and -Oz have non-threshold semantic implications. So with this change, I think we will have all the -O flags covered, because I view ‘-O[123]’ as a single semantic space with a threshold modifier that we don’t need to communicate to LTO. We model that state as the absence of any attribute. And -O0, -Os, and-Oz have dedicated attributes.

If we ever want to really push on -Og, that might indeed require an attribute to distinguish it."

I reproduced above only one sentence, but I encourage to read the full comment :slight_smile:

It’s not always totally clear to me either :slight_smile:
Some options are making sense “per TU” while others can make sense “per-function” as well!
This can be inherent to what the option describes, or because the user interface is not uniform:

  • for some options, the interface is both the command line and the source code attributes like optnone, noinline.
  • for other options only the command line is available, like freestanding, nobuiltin (not sure for this one?)
  • and some options are only available in the source: disabling the sanitizer for example.

It changes over time as well, at some point the subtargets options were a property of the TU, while it has been promoted to a per-function attribute.

Finally, this is also a very “clang-centric” vision of the user-interface, other client of LLVM can bring more into the mix.

I have the impression that function attributes when possible seems the more flexible for all the clients overall.

What is the end goal? If the goal is roughly "if a user passes -O0 when
compiling a TU in LTO mode, their final binary should have functions that
look like the result of -O0 noLTO compilation", then the frontend can just
emit a normal -O0 object file I think. What is the advantage of passing
bitcode all the way to the linker only to jump through hoops to prevent LTO
code generation from modifying it?

That’s a good point, that addresses the “I want LTO for optimization”,
which seems contradictory to use `-flto -O0`, and could even advocate for
forbidding (or warning).

However LTO has other uses than optimizations: for instance
instrumentations that needs full program access.
One of them is CFI: http://clang.llvm.org/docs/ControlFlowIntegrity.html
If you want to debug and rebuild only part of the program, CFI still
requires to use LTO, IIUC.

That's a good point. I wonder if there are any commonalities of this
problem and the "hosted"/"freestanding" issues recently? These all seem to
tie into a common theme of "when I use LTO, various per-TU settings don't
make it to the LTO code generator"; can we adopt a uniform solution for
this class of problems, like always using function attributes or something?
This probably won't be the last of such issues, and we should have a
"standard solution" for them.

Yes, the optnone thing is indeed part of the work to bring “everything”
needed to setup TLI, CodeGen and so on into function attributes.
I was working on the TLI and "pulling strings" I ended-up starting with
optnone.

Taking a step back, consider the "trend" as we try to persist more per-TU
options to LTO: we will have more and more attributes (or whatever) telling
the optimizer and code generator what to do in greater and greater detail.
In such a world, what is the role of the frontend setting up the pass
pipeline, target info, etc. using calls into the LLVM libraries? If, for
LTO, we have to serialize those things anyway, then should frontends prefer
to simply add the annotations into the IR instead of making calls into the
LLVM libraries to configure the code generation?

Yes, as much as possible. This is inline with the work Eric has been doing
about having subtargets selected/configured with function attributes. We’re
also trying to get rid of global flags in SelectionDAG or other phases of
the codegen in favor of function attributes or individual instruction flags
for Fast-Math.

Things like -mllvm options suggest that we're never really going to
persist "everything affecting codegen" into the IR on a per-TU basis during
LTO (can't control -mllvm options per-function).

I believe -mllvm options are supposed to be “developer options” only and
are not “supported” or supposed to be exposed to the end user. For example
I don’t believe the clang docs is referencing the -mllvm, right?

Oh, derp. You're totally right.

So is our approach here basically to persist compilation options into the
IR on an as-needed (i.e. ad-hoc) basis? E.g. we go out of our way to
persist -O0 (using e.g. optnone) but don't do anything special for -O1.
Also what is the interaction with --lto-O[0123]? (i.e. linker options
controlling the optimization level used during LTO codegen)

The discussion in the revision addressed this, but in particular the last
comment of Chandler in the revision should answer this somehow:
https://reviews.llvm.org/D28404

"Unlike the differences between -O[123], all of -O0, -Os, and -Oz have
non-threshold semantic implications. So with this change, I think we will
have *all* the -O flags covered, because I view '-O[123]' as a single
semantic space with a threshold modifier that we *don't* need to
communicate to LTO. We model that state as the absence of any attribute.
And -O0, -Os, and-Oz have dedicated attributes.

If we ever want to really push on -Og, that might indeed require an
attribute to distinguish it."

I reproduced above only one sentence, but I encourage to read the full
comment :slight_smile:

Very nice.

I'm just trying to understand the bigger picture here.

It’s not always totally clear to me either :slight_smile:
Some options are making sense “per TU” while others can make sense
“per-function” as well!
This can be inherent to what the option describes, or because the user
interface is not uniform:
- for some options, the interface is both the command line and the source
code attributes like `optnone`, `noinline`.
- for other options only the command line is available, like
`freestanding`, `nobuiltin` (not sure for this one?)
- and some options are only available in the source: disabling the
sanitizer for example.

It changes over time as well, at some point the subtargets options were a
property of the TU, while it has been promoted to a per-function attribute.

Finally, this is also a very “clang-centric” vision of the user-interface,
other client of LLVM can bring more into the mix.

I have the impression that function attributes when possible seems the
more flexible for all the clients overall.

That's my thinking too.

-- Sean Silva

Hi Philip,
I might have incorrectly given the impression that the attribute would
become something used _only_ for LTO. That isn't the intent at all.
In *addition* to all the ways you can set 'optnone' now, it would *also*
be set on more-or-less all functions at -O0. Then the optimization
passes (whether they run immediately in a normal compilation, or later
in an LTO compilation) would continue to notice the attribute and do the
usual thing. At -O0 in normal compilation, the attribute would have no
practical effect because no pass that looks for 'optnone' would actually
run in the first place; that is, for -O0 in a normal compilation, adding
'optnone' to everything is really a no-op.

The "more-or-less" qualification comes up because optnone is not allowed
in combination with certain other attributes. Don't want to be creating
invalid IR, after all! But that's the only exceptional case.

Does that help?
--paulr

My impression is that we *do* run some minimal cleanup passes at O0. Is that correct?

If so, the biggest difference between what I believe the current optnone semantics to be and what you're proposing would be the behavior of those passes. Currently, optnone would direct those passes not to touch the function and skip execution. Under your proposal, those passes would run normally.

I will freely admit that I'm not familiar with the O0 implementation, so it's possibly my assumptions are wrong.

Philip

Honestly instead of optnone I’d prefer to do something else around storing optimization levels, but I think the connotations of that in LTO is going to be painful:

  1. What does it mean to merge two modules of different optimization levels?
  2. What does it mean to inline two functions of different optimization levels?
  3. What code generator should be used?

Etc.

Honestly I think this is a lot of stuff we don’t have any good ideas for and that the current LTO scope doesn’t really have. If there’s really a need for it though…

(Also, if we don’t want to store the actual optimization levels then replace “different optimization levels” above with “optnone and not-optnone” :slight_smile:

-eric

If we’re not storing actual optimization levels (which is Chandler’s expressed preference, in the review), and just looking at optnone/not-optnone, then all those questions have straightforward answers (2 & 3 are already things the existing optnone attribute has to deal with).

  1. every function would be marked as optnone or not, so there is no concept of a “module” optimization level.

  2. inlining is already well-defined for optnone v. not-optnone:

2a) calls made by an optnone function are never inlined except for calls to always_inline functions

2b) optnone functions are always noinline, therefore calls to an optnone function are never inlined

  1. optnone functions use FastISel (assuming that’s what you meant by “code generator”)

–paulr

My impression is that we *do* run some minimal cleanup passes at O0. Is that correct?

We only run the always_inliner (and forceattrs?), since it is needed for correctness AFAICT:

$ echo "int main() {}" | clang -x c - -O0 -mllvm -debug-pass=Arguments -emit-llvm -c
Pass Arguments: -tti -targetlibinfo -add-discriminators
Pass Arguments: -tti -assumption-cache-tracker -profile-summary-info -targetlibinfo -forceattrs -basiccg -always-inline -barrier -write-bitcode

If so, the biggest difference between what I believe the current optnone semantics to be and what you're proposing would be the behavior of those passes. Currently, optnone would direct those passes not to touch the function and skip execution. Under your proposal, those passes would run normally.

I will freely admit that I'm not familiar with the O0 implementation, so it's possibly my assumptions are wrong.

I can miss some subtleties as well, so I’m happy to get other inputs on this!

My intention is not to cause any disruption, I don’t of any non-required (from a semantic point-of-view) pass that we would want to intentionally run on optnone functions. Since you’re using the attribute on individual function you may have examples I haven’t thought of.

Ultimately pushing a much stuff a possible to function attributes makes it the most natural. Think -ffast-math, -Os, -Oz, or subtargets attributes. Handling all this stuff at the function level has been proven the easiest to get straightfoward during LTO, and more flexible for “advanced” use-case when mixing modes in a single module.

The reasons to not address O1 vs O2 vs O3 is IMO that 1) it is complicated and 2) there isn’t much use-case for this, or I didn’t hear about anyone really caring a lot about it. So pragmatically, it makes sense to me to try to address first the low-hanging fruits and to solve what is valuable (I can’t build our kernel with ThinLTO because we don’t encode -ffreestanding correctly, etc.).

Now, if someone has a great idea on how to handle O1/O2/O3 consistently in a way that includes LTO, I’m willing to help on the implementation side!