[RFC] Adding function attributes to represent codegen optimization level

All,
A recent commit, D43040/r324557, changed the behavior of the gold plugin when compiling with LTO. The change now causes the codegen optimization level to default to CodeGenOpt::Default (i.e., -O2) rather than use the LTO optimization level. The argument was made that the LTO optimization level should control the amount of cross-module optimizations done by LTO, but it should not control the codegen optimization level; that should be based off of the optimization level used during the initial compilation phase (i.e., bitcode generation).

Assuming the argument is reasonable (it make sense to me), I was hoping to solicit feedback on how to proceed. The suggestion in D43040/r324557 was to add function attributes to represent the compile-time optimization level (which also seems reasonable to me).

As a first step, I've put together two patches: 1) an llvm patch that adds the function attributes to the LLVM IR and 2) a clang patch that attaches these attributes to each function based on the codegen optimization level. I then use the function level attributes to "reconstruct" to codegen optimization level used with LTO.

Please understand this is very much a WIP and just a very small step towards a final solution.

Here are the patches for reference:
Clang: D45226
LLVM: D45225

Regards,
  Chad

All,
A recent commit, D43040/r324557, changed the behavior of the gold plugin
when compiling with LTO. The change now causes the codegen optimization
level to default to CodeGenOpt::Default (i.e., -O2) rather than use the
LTO optimization level. The argument was made that the LTO optimization
level should control the amount of cross-module optimizations done by
LTO, but it should not control the codegen optimization level; that
should be based off of the optimization level used during the initial
compilation phase (i.e., bitcode generation).

At least as I recall the discussion around optnone/optsize in the pass was that these were in some way special semantically distinct properties (optnone being “good for debugging” (or good for debugging compilers - what’s the baseline behavior before optimizations are applied), optsize being “make this fit into something it wouldn’t otherwise fit into”) but that the gradiations of -ON didn’t fit into this kind of model & wouldn’t ever be implemented as function attributes.

CC’d Chandler & Eric who I think had opinions/were involved in those previous discussions.

Sorry, my reply “to all” left out LLVM-Dev

Thanks, David. I believe at least part of the discussion you’re referring to can be found here:

The issue I’m running into is that after r324557 the codegen optimization level during LTO defaults to CodeGenOpt::Default irrespective of what they user may have specified either during the initial compilation or during the LTO stage.

This can have an impact on how the codegen pipeline is built (e.g., for AArch64 the SeparateConstOffsetFromGEPPass is only added to the pipeline with CodeGenOpt::Aggressive) or it may impact they way a function pass works (e.g., MachineBlockPlacement tail duplication is only run with CodeGenOpt::Aggressive).

I’m particularly interested in fixing the latter case, which I think can be addressed by adding function attributes and then modifying the codegen passes to use the function attributes rather than the TargetPass optimization level. However, I’m open to alternative solutions/feedback.

Chad

Hi Martin,

I think this is another example of why we might consider having such function level attributes… yes.

Chad

I actually don’t understand this clearly.

Unless we’re saying that we would change the IR optimization level either using the -OX flag during LTO (which is clumsy, because what is a “cross-module optimization” alone?), why would the -OX flag change the Codegen optimization level when passed to clang without LTO, but it wouldn’t during LTO?
Are we encoding O1/O2/O3 optimization level into function attributes and trying to honor these during the LTO IR optimization pipeline as well?

Thanks,

Yes, this is very useful RFC for being able to control optimization level at function level. We may also want to provide fp-model=<fast|precised …> control at function level via attributes as well. However, the “merge” actions of all these attributes need to be carefully defined and designed for LTO and IPO optimizations.

Thanks,

Xinmin

Le mar. 3 avr. 2018 à 12:47, via llvm-dev <llvm-dev@lists.llvm.org> a
écrit :

All,
A recent commit, D43040/r324557, changed the behavior of the gold
plugin
when compiling with LTO. The change now causes the codegen
optimization
level to default to CodeGenOpt::Default (i.e., -O2) rather than use
the
LTO optimization level. The argument was made that the LTO
optimization
level should control the amount of cross-module optimizations done
by
LTO, but it should not control the codegen optimization level; that
should be based off of the optimization level used during the
initial
compilation phase (i.e., bitcode generation).

I actually don't understand this clearly.

Unless we're saying that we would change the IR optimization level
either using the -OX flag during LTO (which is clumsy, because what is
a "cross-module optimization" alone?), why would the `-OX` flag change
the Codegen optimization level when passed to clang without LTO, but
it wouldn't during LTO?

I'm simply stating the argument made by Peter in r324557; this is not my opinion. Personally, I think it seems reasonable to allow the optimization flag used during the link step to control the codegen optimization level. However, this is no longer the case after r324557.

FWIW, I would be very much on-board with reverting r324557 and then changing lld to mirror the behavior of the gold plugin, but I don't know if that's the consensus in the community.

Are we encoding O1/O2/O3 optimization level into function attributes
and trying to honor these during the LTO IR optimization pipeline as
well?

No. The intent of these attributes are to control the codegen pipeline only. Of course this is all based on the assumption that using the optimization level used during bitcode generation should also be used with LTO in the codegen pipeline.

I don't have a strong opinion either way. I just want codgen to respect the fact that I specified -O3 during both the bitcode generation and link steps, but that's not the case anymore. :slight_smile:

  Chad

Le mar. 3 avr. 2018 à 12:47, via llvm-dev <llvm-dev@lists.llvm.org> a
écrit :

All,

A recent commit, D43040/r324557, changed the behavior of the gold
plugin
when compiling with LTO. The change now causes the codegen
optimization
level to default to CodeGenOpt::Default (i.e., -O2) rather than use
the
LTO optimization level. The argument was made that the LTO
optimization
level should control the amount of cross-module optimizations done
by
LTO, but it should not control the codegen optimization level; that
should be based off of the optimization level used during the
initial
compilation phase (i.e., bitcode generation).

I actually don't understand this clearly.

Unless we're saying that we would change the IR optimization level
either using the -OX flag during LTO (which is clumsy, because what is
a "cross-module optimization" alone?), why would the `-OX` flag change
the Codegen optimization level when passed to clang without LTO, but
it wouldn't during LTO?

I'm simply stating the argument made by Peter in r324557; this is not my
opinion. Personally, I think it seems reasonable to allow the optimization
flag used during the link step to control the codegen optimization level.
However, this is no longer the case after r324557.

FWIW, I would be very much on-board with reverting r324557 and then
changing lld to mirror the behavior of the gold plugin, but I don't know if
that's the consensus in the community.

To answer your question Mehdi, what I mean by "cross-module optimization"
is simply a series of passes that operates on a module after having linked
parts of other modules into it, that would result in IPO between modules.
For example, an inlining pass followed by scalar optimization passes.

The way I think about LTO is that it effectively splits the pass pipeline
in two, which lets us put cross-module optimizations in the middle.

What this means semantically is that LTO opt level 0 would essentially run
the two parts of the pipeline one after the other, giving you essentially
the same binary as not-LTO, but it would allow for LTO-only features such
as CFI to work. One might have also chosen to compile parts of one's
program with different optimization levels, and those levels would need to
be respected by the code generator. For this to work, we must at least use
the same CG opt level that was used at compile time.

Higher LTO opt levels would result in more passes being run in the middle,
perhaps at more aggressive settings, which would result in more
cross-module optimizations. But we still should at least try to approximate
the optimization level requested for each particular function.

Ideally, we would use the same optimization level that would have been used
at compile time. Such an optimization level would be communicated via an
attribute, as proposed here. However, in the absence of that information,
it does seem reasonable to make a guess about the user intent from the LTO
opt level. If a user specifies an LTO opt level of 3, it probably means
that the user cares a lot about performance, so we can guess a CG opt level
of CodeGenOpt::Aggressive. Otherwise, we can guess a CG opt level of
CodeGenOpt::Default since this would seem to provide the best balance of
performance, code size and debuggability.

So this is the direction that I would propose:
- Remove ability to override CG opt level from LTO API. For now, we can
infer it from the LTO opt level as mentioned above.
- Add function attributes for signaling compile-time opt level and start
moving towards using them in preference to TargetMachine::OptLevel.
- Remove code for inferring CG opt level from LTO opt level, as it is now
redundant with the function attribute.

This would seem to get us to a desired state without regressing users who
might depend on being able to use the aggressive CG opt level from LTO.

Thoughts?

Peter

Are we encoding O1/O2/O3 optimization level into function attributes

Hi,

Long term, what opt level would older IR get? (I.e. IR missing an explicit opt level)

– Sean Silva

Le mar. 3 avr. 2018 à 12:47, via llvm-dev <llvm-dev@lists.llvm.org> a
écrit :

All,

A recent commit, D43040/r324557, changed the behavior of the gold
plugin
when compiling with LTO. The change now causes the codegen
optimization
level to default to CodeGenOpt::Default (i.e., -O2) rather than use
the
LTO optimization level. The argument was made that the LTO
optimization
level should control the amount of cross-module optimizations done
by
LTO, but it should not control the codegen optimization level; that
should be based off of the optimization level used during the
initial
compilation phase (i.e., bitcode generation).

I actually don't understand this clearly.

Unless we're saying that we would change the IR optimization level
either using the -OX flag during LTO (which is clumsy, because what is
a "cross-module optimization" alone?), why would the `-OX` flag change
the Codegen optimization level when passed to clang without LTO, but
it wouldn't during LTO?

I'm simply stating the argument made by Peter in r324557; this is not my
opinion. Personally, I think it seems reasonable to allow the optimization
flag used during the link step to control the codegen optimization level.
However, this is no longer the case after r324557.

FWIW, I would be very much on-board with reverting r324557 and then
changing lld to mirror the behavior of the gold plugin, but I don't know if
that's the consensus in the community.

To answer your question Mehdi, what I mean by "cross-module optimization"
is simply a series of passes that operates on a module after having linked
parts of other modules into it, that would result in IPO between modules.
For example, an inlining pass followed by scalar optimization passes.

The way I think about LTO is that it effectively splits the pass pipeline
in two, which lets us put cross-module optimizations in the middle.

What this means semantically is that LTO opt level 0 would essentially
run the two parts of the pipeline one after the other, giving you
essentially the same binary as not-LTO, but it would allow for LTO-only
features such as CFI to work. One might have also chosen to compile parts
of one's program with different optimization levels, and those levels would
need to be respected by the code generator. For this to work, we must at
least use the same CG opt level that was used at compile time.

Higher LTO opt levels would result in more passes being run in the
middle, perhaps at more aggressive settings, which would result in more
cross-module optimizations. But we still should at least try to approximate
the optimization level requested for each particular function.

Ideally, we would use the same optimization level that would have been
used at compile time. Such an optimization level would be communicated via
an attribute, as proposed here. However, in the absence of that
information, it does seem reasonable to make a guess about the user intent
from the LTO opt level. If a user specifies an LTO opt level of 3, it
probably means that the user cares a lot about performance, so we can guess
a CG opt level of CodeGenOpt::Aggressive. Otherwise, we can guess a CG opt
level of CodeGenOpt::Default since this would seem to provide the best
balance of performance, code size and debuggability.

So this is the direction that I would propose:
- Remove ability to override CG opt level from LTO API. For now, we can
infer it from the LTO opt level as mentioned above.
- Add function attributes for signaling compile-time opt level and start
moving towards using them in preference to TargetMachine::OptLevel.
- Remove code for inferring CG opt level from LTO opt level, as it is now
redundant with the function attribute.

Long term, what opt level would older IR get? (I.e. IR missing an explicit
opt level)

I imagine that we would use CodeGenOpt::Default. Once we are at the point
where clang can communicate opt levels to LTO, we can probably count on the
majority of the IR consumed by LTO having associated opt levels, so using a
default CG opt level on old IR at LTO opt level 3 would probably not cause
a significant regression.

Peter