Revisiting/refining the definition of optnone with interprocedural transformations

While trying to reproduce some debug info thing (I don’t have the exact example at the moment - but I think it was more aggressive than the example I have now, but something like this:

attribute((optnone)) int f1() {
return 3;
}
int main() {
return f1();
}

(actually I think in my case I had a variable to hold the return value from f1, with the intent that this variable’s location couldn’t use a constant - a load from a volatile variable would probably have provided similar functionality in this case)

LLVM (& specifically Sparse Conditional Constant Propagation, llvm/lib/Transforms/Scalar/SCCP.cpp) optimizes this code noting that f1 always returns 3, so rather than using the return value from the call to f1, it ends up hardcoding the return value:

define dso_local i32 @main() local_unnamed_addr #1 {

entry:

%call = tail call i32 @_Z2f1v()

ret i32 3

}

I consider this a bug - in that optnone is used to implement -O0 for LTO, so it seemed to me that the correct behavior is for an optnone function to behave as though it were compiled in another object file outside the purview of optimizations - interprocedural or intraprocedural.

So I sent https://reviews.llvm.org/D100353 to fix that.

Florian pointed out that this wasn’t quite specified in the LangRef, which says this about optnone:

This function attribute indicates that most optimization passes will skip this function, with the exception of interprocedural optimization passes. Code generation defaults to the “fast” instruction selector. This attribute cannot be used together with the alwaysinline attribute; this attribute is also incompatible with the minsize attribute and the optsize attribute.

This attribute requires the noinline attribute to be specified on the function as well, so the function is never inlined into any caller. Only functions with the alwaysinline attribute are valid candidates for inlining into the body of this function.

So the spec of optnone is unclear (or arguably explicitly disallows) whether interprocedural optimizations should treat optnone functions in any particular way.

So I was going to update the wording to rephrase this to say “Interprocedural optimizations should treat this function as though it were defined in an isolated module/object.” (perhaps “interprocedural optimizations should treat optnone functions as opaque” or “as though they were only declarations”)

The choice of this direction was based on my (possibly incorrect or debatable) understanding of optnone, that it was equivalent to the function being in a separate/non-lto object. (this seems consistent with the way optnone is used to implement -O0 under lto - you could imagine a user debugging a binary, using -O0 for the code they’re interested in debugging, and potentially using an interactive debugger to change some state in the function causing it to return a different value - which would get quite confusing if the return value was effectively hardcoded into the caller)

What’re folks thoughts on this?

  • Dave

There's 'noipa' attribute in GCC, currently it is not supported by clang.
Theoretically, how would one implement it?

With your proposal, clang `noipa` attribute could be lowered
to `optnone` on the whole function, To me that seems like
too much of a hammer, should that be the path forward.

Would it not be best to not conflate the two,
and just introduce the `noipa` attribute?

Roman

There’s ‘noipa’ attribute in GCC, currently it is not supported by clang.
Theoretically, how would one implement it?

If we wanted to do this really robustly, I guess we might have to introduce some sort of “here’s the usual way to check if this is a definition/get the body of the function” (which for noipa it says “there is no body/don’t look here”) and “no, really, I need the definition” (for actual code generation).

Though I’m not advocating for that - I’m OK with a more ad-hoc/best-effort implementation targeting the -O0/debugging assistant attribute((optnone)) kind of use case - happy to fix cases as they come up to improve the user experience for these situations.

Maybe we could get away with generalizing this by having an optnone (or noipa) function appear “interposable” even though it doesn’t have a real interposable linkage? That should hinder/disable any IPA.

Hmm, looks like GlobalValue::isDefinitionExact would be best to return false in this case (whatever we end up naming it) /maybe/ mayBeDerefined should return false too.

Yeah, I guess if we can implement such a robust generalization, then it’d probably be OK/easy enough to implement both noipa and optnone implies noipa the same as it implies noinline (well, I guess noipa would subsume the noinline implication - if the function isn’t exact, the inliner won’t inline it so there wouldn’t be any need for the explicit noinline)

With your proposal, clang noipa attribute could be lowered
to optnone on the whole function, To me that seems like
too much of a hammer, should that be the path forward.

I agree that lowering noipa to optnone would be a very aggressive form of noipa - likely if we want to support noipa it would be to support it separately and maybe either lower -O0 (& maybe attribute((optnone))) to both optnone+noipa+noinline (since optnone already implies noinline) or make optnone imply ipa/be a superset of it implicitly (if we do have noipa it’s probably best to have “optnone requires noipa” the same way “optnone requires noinline” rather than an implicit superset sort of thing).

I think that’d certainly be appropriate for -O0, and I’d argue it’d be appropriate for attribute((optnone)) because I think it’d be what people expect/is consistent with the motivation for the attribute (for debuggability - so you wouldn’t want a caller to not fill in parameters/pass in garbage because it knows the implementation doesn’t matter, or not use the result because it knows what the result should be).

Would it not be best to not conflate the two,
and just introduce the noipa attribute?

I think we’d still want to conflate them for user-facing functionality, even if they were separable at the IR level.

  • Dave

Prototyping the idea of “isDefinitionExact” returning false for optnone (whether or not we split it out into noipo or not) I’ve tripped over something it seems I created 5 years ago:

I added some IPC support for optnone to GlobalsModRef: https://github.com/llvm/llvm-project/commit/c662b501508200076e581beb9345a7631173a1d8#diff-55664e96a7ce3533b46f12c6906acecb2bd9a599e2b79c97506af4b1b4873fa1 - so it wouldn’t conclude properties of an optnone function.

But I then made a follow-up commit (without a lot of context as to why, unfortunately :confused: ) that allowed GlobasModRef to use existing attributes on an optnone function: https://github.com/llvm/llvm-project/commit/7a9b788830da0a426fb0ff0a4cec6d592bb026e9#diff-55664e96a7ce3533b46f12c6906acecb2bd9a599e2b79c97506af4b1b4873fa1

But it seems making the function definition inexact, breaks the unit testing added in the latter commit. I suppose then it’s an open question whether existing attributes on an inexact definition should be used at all? (I don’t know what motivated me to support them for optnone)

Oh, and here’s a change from Chandler around the same time similarly blocking some ipo for optnone: https://github.com/llvm/llvm-project/commit/0fb998110abcf3d67495d12f854a1576b182d811#diff-cc618a9485181a9246c4e0367dc9f1a19d3cb6811d1e488713f53a753d3da60c - in this case preventing FunctionAttrs from deriving the attributes for an optnone function. That functionality looks like it can be subsumed by the inexact approach - applying inexact to optnone and removing the change in Chandler’s patch still passes the tests. (hmm, tested - not quite, but more work to do there)

I'm very much in favor of `noipa`. It comes up every few months
and it would be widely useful. I'd expose it via Clang and -O0 could
set it as well (for the LTO case).

When it comes to inexact definitions, optnone functions, and existing attributes,
I'd be in favor of 1) always allowing the use of existing attributes,
and 2) not deriving new ones for an inexact or optnone definition.

This is how the Attributor determines if it a function level attribute could
be derived or if we should only stick with the existing information:

 /// Determine whether the function \\p F is IPO amendable
 ///
 /// If a function is exactly defined or it has alwaysinline attribute
 /// and is viable to be inlined, we say it is IPO amendable
 bool isFunctionIPOAmendable\(const Function &F\) \{
   return F\.hasExactDefinition\(\) || InfoCache\.InlineableFunctions\.count\(&F\);
 \}

So, if the above check doesn't hold we will not add new attributes but we will
still use existing ones. This seems to me the right way to allow users/frontends
to provide information selectively.

That said, right now the Attributor will not propagate any information from an
optnone function or derive new information. Nevertheless, I'd be in favor to allow
existing information to be used for IPO.

~ Johannes

I’m very much in favor of noipa. It comes up every few months
and it would be widely useful.

Out of curiosity, what sort of uses do you have in mind for it?

I’d expose it via Clang and -O0 could
set it as well (for the LTO case).

When it comes to inexact definitions, optnone functions, and existing
attributes,
I’d be in favor of 1) always allowing the use of existing attributes,

I’m not sure what you mean by this ^ - could you rephrase/elaborate?

and 2) not deriving new ones for an inexact or optnone definition.

Also this ^ I’m similarly confused/unclear about.

This is how the Attributor determines if it a function level attribute could
be derived or if we should only stick with the existing information:

/// Determine whether the function \p F is IPO amendable
///
/// If a function is exactly defined or it has alwaysinline attribute
/// and is viable to be inlined, we say it is IPO amendable
bool isFunctionIPOAmendable(const Function &F) {
return F.hasExactDefinition() ||
InfoCache.InlineableFunctions.count(&F);
}

So, if the above check doesn’t hold we will not add new attributes but
we will
still use existing ones. This seems to me the right way to allow
users/frontends
to provide information selectively.

Yep, that sounds right to me (if you put attributes on an optnone/noipa function, they should be usable/used - but none should be discovered/added later by inspection of the implementation of such a function) - currently doesn’t seem to be the case for the (old pass manager?) FunctionAttrs pass, so I have to figure some things out there.

That said, right now the Attributor will not propagate any information
from an
optnone function or derive new information. Nevertheless, I’d be in
favor to allow
existing information to be used for IPO.

nod I think I’m with you there.

  • Dave

I'm very much in favor of `noipa`. It comes up every few months
and it would be widely useful.

Out of curiosity, what sort of uses do you have in mind for it?

Most times people basically want `noinline` to also mean "no
interprocedural optimization", but without `optnone`. So, your
function is optimized but actually called and the call result
is used, no constants are propagated etc.

Example:

__attribute__((noipa))
void foo() { return 1 + 2; }
void bar() { return foo(); }

should become

__attribute__((noipa))
void foo() { return 3; }
void bar() { return foo(); }

which it does not right now.

I'd expose it via Clang and -O0 could
set it as well (for the LTO case).

When it comes to inexact definitions, optnone functions, and existing
attributes,
I'd be in favor of 1) always allowing the use of existing attributes,

I'm not sure what you mean by this ^ - could you rephrase/elaborate?

and 2) not deriving new ones for an inexact or optnone definition.

Also this ^ I'm similarly confused/unclear about.

So if you have a call of F, and F has attribute A, we can use
that fact at the call site, regardless of the definition of F.
F could be `optnone` or with non-exact linkage, but the information
attached to it is still usable.

If we go for the above we can never derive/attach information
for a non-exact linkage definitions. That way we prevent IPO from
using information that might be invalid if the definition is replaced.

It is all about where you disturb the ipo deduction in this case, I think
it is more beneficial to not attach new things but an argument could be
made to allow that but no propagation. Both have benefits, its' not 100%
clear what is more desirable at the end of the day.

This is how the Attributor determines if it a function level attribute
could
be derived or if we should only stick with the existing information:

      /// Determine whether the function \p F is IPO amendable
      ///
      /// If a function is exactly defined or it has alwaysinline attribute
      /// and is viable to be inlined, we say it is IPO amendable
      bool isFunctionIPOAmendable(const Function &F) {
        return F.hasExactDefinition() ||
InfoCache.InlineableFunctions.count(&F);
      }

So, if the above check doesn't hold we will not add new attributes but
we will
still use existing ones. This seems to me the right way to allow
users/frontends
to provide information selectively.

Yep, that sounds right to me (if you put attributes on an optnone/noipa
function, they should be usable/used - but none should be discovered/added
later by inspection of the implementation of such a function) - currently
doesn't seem to be the case for the (old pass manager?) FunctionAttrs pass,
so I have to figure some things out there.

That is what I tried to say above, I think.

In the end, I want to know that foo does not access memory but
bar could for all we know:

__attribute__((pure, optnone))         // or non-exact linkage
void pure_optnone() { /* empty */ }

__attribute__((optnone))               // or non-exact linkage
void optnone() { /* empty */ }

void foo() { pure_optnone(); }

void bar() { optnone(); }

~ Johannes

The thread is long and I haven’t read it all, but I like the approach of:

  • add a new noipa LLVM IR attribute (feel free to bikeshed the name)
  • make clang optnone imply noipa (maybe in LLVM too, but I haven’t thought hard about it)

+1 from me on this FWIW!

>
>> I'm very much in favor of `noipa`. It comes up every few months
>> and it would be widely useful.
>
> Out of curiosity, what sort of uses do you have in mind for it?

Most times people basically want `noinline` to also mean "no
interprocedural optimization", but without `optnone`. So, your
function is optimized but actually called and the call result
is used, no constants are propagated etc.

Example:

__attribute__((noipa))
void foo() { return 1 + 2; }
void bar() { return foo(); }

should become

__attribute__((noipa))
void foo() { return 3; }
void bar() { return foo(); }

which it does not right now.

I'm curious what the use case is you've come across (the justification
for the GCC implementation of noipa was mostly for compiler testing -
which is my interest in having these semantics (under optnone or
otherwise) - so just curious what other use cases I should have in
mind, etc)

>> I'd expose it via Clang and -O0 could
>> set it as well (for the LTO case).
>>
>> When it comes to inexact definitions, optnone functions, and existing
>> attributes,
>> I'd be in favor of 1) always allowing the use of existing attributes,
>>
> I'm not sure what you mean by this ^ - could you rephrase/elaborate?
>
>
>> and 2) not deriving new ones for an inexact or optnone definition.
>>
> Also this ^ I'm similarly confused/unclear about.

So if you have a call of F, and F has attribute A, we can use
that fact at the call site, regardless of the definition of F.
F could be `optnone` or with non-exact linkage, but the information
attached to it is still usable.

+1 SGTM.

If we go for the above we can never derive/attach information
for a non-exact linkage definitions. That way we prevent IPO from
using information that might be invalid if the definition is replaced.

Yup, sounds good.

It is all about where you disturb the ipo deduction in this case, I think
it is more beneficial to not attach new things but an argument could be
made to allow that but no propagation.

Allow adding them, but never using them? Yeah, that doesn't seem
especially helpful/useful - the attributes are entirely for IPO, so if
you want to block IPO it seems best not to add them.

Both have benefits, its' not 100%
clear what is more desirable at the end of the day.

>
>
>> This is how the Attributor determines if it a function level attribute
>> could
>> be derived or if we should only stick with the existing information:
>>
>> /// Determine whether the function \p F is IPO amendable
>> ///
>> /// If a function is exactly defined or it has alwaysinline attribute
>> /// and is viable to be inlined, we say it is IPO amendable
>> bool isFunctionIPOAmendable(const Function &F) {
>> return F.hasExactDefinition() ||
>> InfoCache.InlineableFunctions.count(&F);
>> }
>>
>> So, if the above check doesn't hold we will not add new attributes but
>> we will
>> still use existing ones. This seems to me the right way to allow
>> users/frontends
>> to provide information selectively.
>>
> Yep, that sounds right to me (if you put attributes on an optnone/noipa
> function, they should be usable/used - but none should be discovered/added
> later by inspection of the implementation of such a function) - currently
> doesn't seem to be the case for the (old pass manager?) FunctionAttrs pass,
> so I have to figure some things out there.

That is what I tried to say above, I think.

In the end, I want to know that foo does not access memory but
bar could for all we know:

__attribute__((pure, optnone))         // or non-exact linkage
void pure_optnone() { /* empty */ }

__attribute__((optnone))               // or non-exact linkage
void optnone() { /* empty */ }

void foo() { pure_optnone(); }

void bar() { optnone(); }

Got it,

I'll see about posting an implementation of noipa and switching
__attribute__((optnone)) over to lower to LLVM's optnone+noipa rather
than optnone+noinline.

Happy if someone wants to add clang support for an
__attribute__((noipa)) lowering to that LLVM noipa once it's in (maybe
I'll do it, guess it's probably fairly cheap/easy).

- Dave

I'm very much in favor of `noipa`. It comes up every few months
and it would be widely useful.

Out of curiosity, what sort of uses do you have in mind for it?

Most times people basically want `noinline` to also mean "no
interprocedural optimization", but without `optnone`. So, your
function is optimized but actually called and the call result
is used, no constants are propagated etc.

Example:

__attribute__((noipa))
void foo() { return 1 + 2; }
void bar() { return foo(); }

should become

__attribute__((noipa))
void foo() { return 3; }
void bar() { return foo(); }

which it does not right now.

I'm curious what the use case is you've come across (the justification
for the GCC implementation of noipa was mostly for compiler testing -
which is my interest in having these semantics (under optnone or
otherwise) - so just curious what other use cases I should have in
mind, etc)

I looked for `noipa` in my inbox, here are some results that
show different use cases people brought up since March 2020:

https://reviews.llvm.org/D75815#1939277
https://bugs.llvm.org/show_bug.cgi?id=46463
https://reviews.llvm.org/D93838#2472155
https://reviews.llvm.org/D97971#2608302

Another use case is runtime call detection in the presence of definitions.
So, we detect `malloc` and also various OpenMP runtime calls, which works
fine because those are usually declarations. However, sometimes they are
not and then we can easily end up with signatures that do not match what we
expect anymore. At least that happens if we link in the OpenMP GPU runtime
into an application.

I'd expose it via Clang and -O0 could
set it as well (for the LTO case).

When it comes to inexact definitions, optnone functions, and existing
attributes,
I'd be in favor of 1) always allowing the use of existing attributes,

I'm not sure what you mean by this ^ - could you rephrase/elaborate?

and 2) not deriving new ones for an inexact or optnone definition.

Also this ^ I'm similarly confused/unclear about.

So if you have a call of F, and F has attribute A, we can use
that fact at the call site, regardless of the definition of F.
F could be `optnone` or with non-exact linkage, but the information
attached to it is still usable.

+1 SGTM.

If we go for the above we can never derive/attach information
for a non-exact linkage definitions. That way we prevent IPO from
using information that might be invalid if the definition is replaced.

Yup, sounds good.

It is all about where you disturb the ipo deduction in this case, I think
it is more beneficial to not attach new things but an argument could be
made to allow that but no propagation.

Allow adding them, but never using them? Yeah, that doesn't seem
especially helpful/useful - the attributes are entirely for IPO, so if
you want to block IPO it seems best not to add them.

We could use them *inside* the function, but we can make that work
differently as well. IPO seems the more important target.

Both have benefits, its' not 100%
clear what is more desirable at the end of the day.

This is how the Attributor determines if it a function level attribute
could
be derived or if we should only stick with the existing information:

       /// Determine whether the function \p F is IPO amendable
       ///
       /// If a function is exactly defined or it has alwaysinline attribute
       /// and is viable to be inlined, we say it is IPO amendable
       bool isFunctionIPOAmendable(const Function &F) {
         return F.hasExactDefinition() ||
InfoCache.InlineableFunctions.count(&F);
       }

So, if the above check doesn't hold we will not add new attributes but
we will
still use existing ones. This seems to me the right way to allow
users/frontends
to provide information selectively.

Yep, that sounds right to me (if you put attributes on an optnone/noipa
function, they should be usable/used - but none should be discovered/added
later by inspection of the implementation of such a function) - currently
doesn't seem to be the case for the (old pass manager?) FunctionAttrs pass,
so I have to figure some things out there.

That is what I tried to say above, I think.

In the end, I want to know that foo does not access memory but
bar could for all we know:

__attribute__((pure, optnone))         // or non-exact linkage
void pure_optnone() { /* empty */ }

__attribute__((optnone))               // or non-exact linkage
void optnone() { /* empty */ }

void foo() { pure_optnone(); }

void bar() { optnone(); }

Got it,

I'll see about posting an implementation of noipa and switching
__attribute__((optnone)) over to lower to LLVM's optnone+noipa rather
than optnone+noinline.

FWIW, I think `noipa` should not imply `noinline`, unsure if you
had that in mind or not.

Happy if someone wants to add clang support for an
__attribute__((noipa)) lowering to that LLVM noipa once it's in (maybe
I'll do it, guess it's probably fairly cheap/easy).

Agreed, I won't volunteer right now, I doubt that I'll get to it
anytime soon. That said, I actually would like to use `noipa`, see
above.

~ Johannes

Implemented a first-pass at adding noipa IR/bitcode and the basic
functionality, noipa implying "may be unrefined"/not
is-definition-exact. ⚙ D101011 [Attr] Add "noipa" function attribute

>>>
>>>> I'm very much in favor of `noipa`. It comes up every few months
>>>> and it would be widely useful.
>>> Out of curiosity, what sort of uses do you have in mind for it?
>> Most times people basically want `noinline` to also mean "no
>> interprocedural optimization", but without `optnone`. So, your
>> function is optimized but actually called and the call result
>> is used, no constants are propagated etc.
>>
>> Example:
>>
>> ```
>> __attribute__((noipa))
>> void foo() { return 1 + 2; }
>> void bar() { return foo(); }
>> ```
>> should become
>>
>> ```
>> __attribute__((noipa))
>> void foo() { return 3; }
>> void bar() { return foo(); }
>> ```
>> which it does not right now.
> I'm curious what the use case is you've come across (the justification
> for the GCC implementation of noipa was mostly for compiler testing -
> which is my interest in having these semantics (under optnone or
> otherwise) - so just curious what other use cases I should have in
> mind, etc)

I looked for `noipa` in my inbox, here are some results that
show different use cases people brought up since March 2020:

⚙ D75815 [InstCombine] Simplify calls with "returned" attribute
46463 – __attribute((noinline)) not respected
⚙ D93838 [SCCP] Add Function Specialization pass
⚙ D97971 [IPSCCP] don't propagate constant in section when caller/callee sections mismatch

Another use case is runtime call detection in the presence of definitions.
So, we detect `malloc` and also various OpenMP runtime calls, which works
fine because those are usually declarations. However, sometimes they are
not and then we can easily end up with signatures that do not match what we
expect anymore. At least that happens if we link in the OpenMP GPU runtime
into an application.

Ah, thanks for all the links/context!

>>>> I'd expose it via Clang and -O0 could
>>>> set it as well (for the LTO case).
>>>>
>>>> When it comes to inexact definitions, optnone functions, and existing
>>>> attributes,
>>>> I'd be in favor of 1) always allowing the use of existing attributes,
>>>>
>>> I'm not sure what you mean by this ^ - could you rephrase/elaborate?
>>>
>>>
>>>> and 2) not deriving new ones for an inexact or optnone definition.
>>>>
>>> Also this ^ I'm similarly confused/unclear about.
>> So if you have a call of F, and F has attribute A, we can use
>> that fact at the call site, regardless of the definition of F.
>> F could be `optnone` or with non-exact linkage, but the information
>> attached to it is still usable.
> +1 SGTM.
>
>> If we go for the above we can never derive/attach information
>> for a non-exact linkage definitions. That way we prevent IPO from
>> using information that might be invalid if the definition is replaced.
> Yup, sounds good.
>
>> It is all about where you disturb the ipo deduction in this case, I think
>> it is more beneficial to not attach new things but an argument could be
>> made to allow that but no propagation.
> Allow adding them, but never using them? Yeah, that doesn't seem
> especially helpful/useful - the attributes are entirely for IPO, so if
> you want to block IPO it seems best not to add them.

We could use them *inside* the function, but we can make that work
differently as well. IPO seems the more important target.

Ah, right. Yeah, agreed.

>> Both have benefits, its' not 100%
>> clear what is more desirable at the end of the day.
>>
>>
>>>> This is how the Attributor determines if it a function level attribute
>>>> could
>>>> be derived or if we should only stick with the existing information:
>>>>
>>>> /// Determine whether the function \p F is IPO amendable
>>>> ///
>>>> /// If a function is exactly defined or it has alwaysinline attribute
>>>> /// and is viable to be inlined, we say it is IPO amendable
>>>> bool isFunctionIPOAmendable(const Function &F) {
>>>> return F.hasExactDefinition() ||
>>>> InfoCache.InlineableFunctions.count(&F);
>>>> }
>>>>
>>>> So, if the above check doesn't hold we will not add new attributes but
>>>> we will
>>>> still use existing ones. This seems to me the right way to allow
>>>> users/frontends
>>>> to provide information selectively.
>>>>
>>> Yep, that sounds right to me (if you put attributes on an optnone/noipa
>>> function, they should be usable/used - but none should be discovered/added
>>> later by inspection of the implementation of such a function) - currently
>>> doesn't seem to be the case for the (old pass manager?) FunctionAttrs pass,
>>> so I have to figure some things out there.
>> That is what I tried to say above, I think.
>>
>> In the end, I want to know that foo does not access memory but
>> bar could for all we know:
>>
>> ```
>> __attribute__((pure, optnone)) // or non-exact linkage
>> void pure_optnone() { /* empty */ }
>>
>> __attribute__((optnone)) // or non-exact linkage
>> void optnone() { /* empty */ }
>>
>> void foo() { pure_optnone(); }
>>
>> void bar() { optnone(); }
>> ```
> Got it,
>
> I'll see about posting an implementation of noipa and switching
> __attribute__((optnone)) over to lower to LLVM's optnone+noipa rather
> than optnone+noinline.

FWIW, I think `noipa` should not imply `noinline`, unsure if you
had that in mind or not.

Do you think it should require that noipa also carries noinline? (the
way optnone currently requires noinline) Or should we let the
non-inlining fall out naturally from the non-exact definition
property?

So, non-exact definitions do not prevent inlining. You can even
create an internal copy and use that for IPO, think of it as
"inline-then-outline".

That said, I believe it is a mistake that `optnone` requires
`noinline`. There is no reason for it to do so on the IR level.
If you argue C-level `optnone` should imply `noinline`, that is
a something worth discussing, though on the IR level we can
decouple them. Use case, for example, the not-optimized version
is called from functions that are `optnone` themselves while
other call sites are inlined and your function is optimized. So
you can use the two attributes to do context sensitive `optnone`.

Circling back to `noipa`, I'm very much in favor of letting it
compose freely with the others, at least in the IR. So, it does
not require, nor imply `noinline` or `optnone`. Similarly,
`noinline` does not imply `noipa`, neither does `optnone`. The
latter might be surprising but I imagine I can use function
attributes of an `optnone` function at the call site but I will
not if the function is `noipa`.

Others might have different opinions though.

~ Johannes

That said, I believe it is a mistake that `optnone` requires
`noinline`. There is no reason for it to do so on the IR level.
If you argue C-level `optnone` should imply `noinline`, that is
a something worth discussing, though on the IR level we can
decouple them. Use case, for example, the not-optimized version
is called from functions that are `optnone` themselves while
other call sites are inlined and your function is optimized. So
you can use the two attributes to do context sensitive `optnone`.

The original intent for `optnone` was to imitate the -O0 pipeline
to the extent that was feasible. The -O0 pipeline (as constructed
by Clang) runs just the always-inliner, not the regular inliner;
so, functions marked `optnone` should not be inlined. The way
to achieve that effect most simply is to have `optnone` require
`noinline` and that's what we did.

If we have `optnone` stop requiring `noinline` and teach the
inliner to inline an `optnone` callee only into an `optnone` caller,
then we are violating the intent that `optnone` imitate -O0, because
that inlining would not have happened at -O0.

--paulr

That said, I believe it is a mistake that `optnone` requires
`noinline`. There is no reason for it to do so on the IR level.
If you argue C-level `optnone` should imply `noinline`, that is
a something worth discussing, though on the IR level we can
decouple them. Use case, for example, the not-optimized version
is called from functions that are `optnone` themselves while
other call sites are inlined and your function is optimized. So
you can use the two attributes to do context sensitive `optnone`.

The original intent for `optnone` was to imitate the -O0 pipeline
to the extent that was feasible. The -O0 pipeline (as constructed
by Clang) runs just the always-inliner, not the regular inliner;
so, functions marked `optnone` should not be inlined. The way
to achieve that effect most simply is to have `optnone` require
`noinline` and that's what we did.

If we have `optnone` stop requiring `noinline` and teach the
inliner to inline an `optnone` callee only into an `optnone` caller,
then we are violating the intent that `optnone` imitate -O0, because
that inlining would not have happened at -O0.

I think I initially read this wrong, hence the part below.
After reading it again, I have one question: Why would the
inliner inline something that is not `always_inline` into
an `optnone` caller? That would violate the idea of `optnone`,
IMHO, regardless if the callee is `optnone` or not. That is
why I don't believe `noinline` on the callee is necessary
for your use case.

--- I misread and I wrote this, might be useful still ---

Let's look at an example. I show it in C but what I am arguing
about is still IR, as described earlier, C is different.

__attribute__((optnone))
void foo() { ... }
__attribute__((optnone, noinline))
void bar() { foo(); ... }
void baz() { foo(); bar(); ... }

Here, the user has utilized optnone and noinline to get different
kinds of distinct effects that you could all want:
- foo is not optimized, not inlined into bar, but inlined into baz
- bar is not optimized and not inlined into baz

I hope this makes sense.

~ Johannes

>> That said, I believe it is a mistake that `optnone` requires
>> `noinline`. There is no reason for it to do so on the IR level.
>> If you argue C-level `optnone` should imply `noinline`, that is
>> a something worth discussing, though on the IR level we can
>> decouple them. Use case, for example, the not-optimized version
>> is called from functions that are `optnone` themselves while
>> other call sites are inlined and your function is optimized. So
>> you can use the two attributes to do context sensitive `optnone`.
> The original intent for `optnone` was to imitate the -O0 pipeline
> to the extent that was feasible. The -O0 pipeline (as constructed
> by Clang) runs just the always-inliner, not the regular inliner;
> so, functions marked `optnone` should not be inlined. The way
> to achieve that effect most simply is to have `optnone` require
> `noinline` and that's what we did.
>
> If we have `optnone` stop requiring `noinline` and teach the
> inliner to inline an `optnone` callee only into an `optnone` caller,
> then we are violating the intent that `optnone` imitate -O0, because
> that inlining would not have happened at -O0.

I think I initially read this wrong, hence the part below.
After reading it again, I have one question: Why would the
inliner inline something that is not `always_inline` into
an `optnone` caller? That would violate the idea of `optnone`,
IMHO, regardless if the callee is `optnone` or not. That is
why I don't believe `noinline` on the callee is necessary
for your use case.

The inliner should be ignoring `optnone` callers, so it would
never inline *anything* into an `optnone` caller. (Other than
an `alwaysinline` function.)

I had read this:

>> I believe it is a mistake that `optnone` requires `noinline`.

and the case that came to mind is inlining an `optnone` callee
into a not-`optnone` caller. The inlined copy would then be
treated to further optimization, which violates the idea of
`optnone`.

Now, the inliner already knows to avoid `noinline` callees, so
attaching `noinline` to `optnone` functions was (at the time)
considered an optimal way to avoid the problematic case. We
could instead teach the inliner to skip `optnone` callees, and
that would allow us to eliminate the requirement that `optnone`
functions must also be `noinline`. I am unclear why redefining
`optnone` to _imply_ `noinline` (rather than _require_ `noinline`)
is better, but then I don't work much with attributes.

The notion of allowing an `optnone` caller to inline an `optnone`
callee sounds like it would also violate the intent of `optnone`
in that it should imitate -O0, where inlining is confined to
`alwaysinline` callees, and `optnone` is defined to conflict with
`alwaysinline` (because if you always inline something, you are
allowing it to have subsequent optimizations same as the caller,
which conflicts with `optnone`).

So, if you want to undo the _requirement_ that `optnone` must
have `noinline`, but then redefine `optnone` such that it can't
be inlined anywhere, you've done something that seems to have no
practical effect. Maybe that helps Attributor in some way, but
I don't see any other reason to be making this change.
--paulr

That said, I believe it is a mistake that `optnone` requires
`noinline`. There is no reason for it to do so on the IR level.
If you argue C-level `optnone` should imply `noinline`, that is
a something worth discussing, though on the IR level we can
decouple them. Use case, for example, the not-optimized version
is called from functions that are `optnone` themselves while
other call sites are inlined and your function is optimized. So
you can use the two attributes to do context sensitive `optnone`.

The original intent for `optnone` was to imitate the -O0 pipeline
to the extent that was feasible. The -O0 pipeline (as constructed
by Clang) runs just the always-inliner, not the regular inliner;
so, functions marked `optnone` should not be inlined. The way
to achieve that effect most simply is to have `optnone` require
`noinline` and that's what we did.

If we have `optnone` stop requiring `noinline` and teach the
inliner to inline an `optnone` callee only into an `optnone` caller,
then we are violating the intent that `optnone` imitate -O0, because
that inlining would not have happened at -O0.

I think I initially read this wrong, hence the part below.
After reading it again, I have one question: Why would the
inliner inline something that is not `always_inline` into
an `optnone` caller? That would violate the idea of `optnone`,
IMHO, regardless if the callee is `optnone` or not. That is
why I don't believe `noinline` on the callee is necessary
for your use case.

The inliner should be ignoring `optnone` callers, so it would
never inline *anything* into an `optnone` caller. (Other than
an `alwaysinline` function.)

My point is, it already does ignore `optnone` callers
and inlines only `alwaysinline` calls into them:

I had read this:

I believe it is a mistake that `optnone` requires `noinline`.

and the case that came to mind is inlining an `optnone` callee
into a not-`optnone` caller. The inlined copy would then be
treated to further optimization, which violates the idea of
`optnone`.

But that is a composition issue. If you do not want to
inline a `optnone` callee into an non-`optnone` caller,
then add `noinline` to the callee. If you don't mind if
it is inlined into non-`optnone` callers and optimized
in there, then don't. My last email contained an example
to show the different cases, you can mix and match IR
attributes to get what you want. Requiring them to be
tied is not improving anything but just restricting the
options.

Now, the inliner already knows to avoid `noinline` callees, so
attaching `noinline` to `optnone` functions was (at the time)
considered an optimal way to avoid the problematic case. We
could instead teach the inliner to skip `optnone` callees, and
that would allow us to eliminate the requirement that `optnone`
functions must also be `noinline`. I am unclear why redefining
`optnone` to _imply_ `noinline` (rather than _require_ `noinline`)
is better, but then I don't work much with attributes.

The inliner will not inline calls into an `optnone` caller
if it is not necessary. As said before, that would violate
the `optnone` idea for the caller, no matter what the callee
looks like. So requiring `noinline` on the callee seems
to me like a workaround or an oversight.

It is better to not require them together because you can
actually describe more distinct scenarios. Please take
another look at my example in the last email, it shows
what is possible if you split them. Furthermore, `optnone`
does by design imply `noinline` for the call sites in the
caller, or at least nobody argued that it shouldn't. Thus,
requiring `noinline` on the callee is simply unnecessary
as it does not add any value.

The notion of allowing an `optnone` caller to inline an `optnone`
callee sounds like it would also violate the intent of `optnone`
in that it should imitate -O0, where inlining is confined to
`alwaysinline` callees, and `optnone` is defined to conflict with
`alwaysinline` (because if you always inline something, you are
allowing it to have subsequent optimizations same as the caller,
which conflicts with `optnone`).

Nobody said `optnone` callers should inline calls that are
not always_inline, at least so far I have not seen that
argument be made anywhere. I'll just skip this paragraph.

So, if you want to undo the _requirement_ that `optnone` must
have `noinline`, but then redefine `optnone` such that it can't
be inlined anywhere, you've done something that seems to have no
practical effect. Maybe that helps Attributor in some way, but
I don't see any other reason to be making this change.

I do not want to say `optnone` cannot be inlined. `noinline`
says it cannot be inlined. If you want it to not be inlined,
use `noinline`, if you want it to not be optimized in it's
own function, use `optnone`, if you want both, use both.

The practical effect was literally showcased in my last email,
please go back and look at the example.

I don't know why the Attributor has to do with this, I'm happy
to hear your thoughts on that though :slight_smile:

~ Johannes

Let's look at an example. I show it in C but what I am arguing
about is still IR, as described earlier, C is different.

__attribute__((optnone))
void foo() { ... }
__attribute__((optnone, noinline))
void bar() { foo(); ... }
void baz() { foo(); bar(); ... }

Here, the user has utilized optnone and noinline to get different
kinds of distinct effects that you could all want:
- foo is not optimized, not inlined into bar, but inlined into baz

foo's non-inlined instance is not optimized; but, the instance that
is inlined into baz *is* optimized. How does that obey `optnone`?

- bar is not optimized and not inlined into baz

I hope this makes sense.

~ Johannes

The use-case for `optnone` is to allow selectively not-optimizing
a function, which I've seen used only to permit better debugging
of that function. Inlining optimizes (some instances of) the
function, against the coder's express wishes, and interfering with
the better debugging enabled by not-optimizing. I don't see how
that is beneficial to the coder, or any other use-case. If you
have a practical use-case I would love to hear it.

Yes, I do see that separating the concerns allows this weird case
of a sometimes-optimized function, but I don't see any benefit.
Certainly it would be super confusing to the coder, and at the
Clang level I would strenuously oppose decoupling these.

Apologies for mentioning Attributor; I have no idea how it works,
and I was rather idly speculating why you want to decouple the
optnone and noinline attributes.
--paulr

There seems to be a bunch of confusion and probably some
conflation/ambiguity about whether we're talking about IR constructs
on the C attributes.

Johannes - I assume your claim is restricted mostly to the IR? That
having optnone not require or imply noinline improves orthogonality of
features and that there are reasonable use cases where one might want
optnone while allowing inlining (of the optnone function) or optnone
while disallowing inlining (of the optnone function)

Paul - I think you're mostly thinking about/interested in the specific
source level/end user use case that motivated the initial
implementation of optnone. Where, I tend to agree - inlining an
optnone function is not advantageous to the user. Though it's possible
Johannes 's argument could be generalized from IR to C and still
apply: orthogonal features are more powerful and the user can always
compose them together to get what they want. (good chance they're
using attributes behind macros for ease of use anyway - they're a bit
verbose to write by hand all the time)

There's also the -O0 use of optnone these days (clang puts optnone on
all functions when compiling with -O0 - the intent being to treat such
functions as though they were compiled in a separate object file
without optimizations (that's me projecting what I /think/ the mental
model should be) - which, similarly, I think will probably want to
keep the current behavior (no ipa/inlining and no optimization -
however that's phrased).

Essentially the high level use cases of optnone all look like "imagine
if I compiled this in a separate object file without LTO" - apparently
even noipa+optnone isn't enough for that, maybe (I need to test that
more, based on a comment from Johannes earlier that inexact
definitions don't stop the inliner... )? Sounds like maybe it's more
like what I'm thinking of is "what if this function had weak linkage"
(ie: could be replaced by a totally different one)?

Let's look at an example. I show it in C but what I am arguing
about is still IR, as described earlier, C is different.

__attribute__((optnone))
void foo() { ... }
__attribute__((optnone, noinline))
void bar() { foo(); ... }
void baz() { foo(); bar(); ... }

Here, the user has utilized optnone and noinline to get different
kinds of distinct effects that you could all want:
   - foo is not optimized, not inlined into bar, but inlined into baz

foo's non-inlined instance is not optimized; but, the instance that
is inlined into baz *is* optimized. How does that obey `optnone`?

`optnone` -> make sure the code in this symbol is not optimized.
`noinline` -> make sure the code in this symbol is not copied
into another symbol.

Two separate ideas, if you want both, use both attributes,
nobody argues against that use case. See below for a "real world"
use case.

   - bar is not optimized and not inlined into baz

I hope this makes sense.

~ Johannes

The use-case for `optnone` is to allow selectively not-optimizing
a function, which I've seen used only to permit better debugging
of that function. Inlining optimizes (some instances of) the
function, against the coder's express wishes, and interfering with
the better debugging enabled by not-optimizing. I don't see how
that is beneficial to the coder, or any other use-case. If you
have a practical use-case I would love to hear it.

Now you bring in the C level. I explicitly, and multiple times,
said I argue on IR level. If you want C `__attribute__((optnone))`
to imply `noinline`, that would be fine with me. However, on
IR level there is no reason to tie them together.

Even on C it is not clear. Think of a context sensitive problem
in a large application. You want pristine code for some calling
contexts but fast code for others. Right now, there is no way to
do that, except maybe using `__attribute__((flatten))` on all
callees that need to be fast. However, once you decoupled the two
attributes you can say that for some call sites you don't want it
to be inlined but for others you do. The ones you don't want to
inline the function are probably `optnone` themselves, so there is
no inlining happening anyway, no need to say anything special for
them.

Yes, I do see that separating the concerns allows this weird case
of a sometimes-optimized function, but I don't see any benefit.
Certainly it would be super confusing to the coder, and at the
Clang level I would strenuously oppose decoupling these.

I'd assume coders are capable of understanding the difference
between `optnone` and `noinline` and how they compose. That said,
I am only arguing on the IR level anyway and the conversation what
`__attribute__((optnone))` should be is a different one.

Apologies for mentioning Attributor; I have no idea how it works,
and I was rather idly speculating why you want to decouple the
optnone and noinline attributes.

No worries.

~ Johannes