always_inline and noinline attributes

Hi All,

for our compiler, we could very much use the always_inline attribute to force
a function to be inlined. Currently, we have a local hack that always inlines
functions marked "inline" (by giving them linkonce linkage and inline all
linkonce functions), but that's not really the way to go.

Gcc and llvm-gcc support the always_inline attribute for exactly this reason.
Clang currently completely ignores this attribute. What would we the correct
way of implementing this?

Currently, llvm-gcc supports always_inline by inlining those functions in the
gcc part, even before the llvm inliner comes along to look at the program. We
could take a similar approach in clang, though it seems like double work to
me. Perhaps it would be better to take a route similar to what the noinline
attribute does in llvm-gcc: Output a global variable in the llvm.metadata
section with a list of all the force_inline functions. This ensures we can
reuse the llvm inliner and could also be useful for other frontends. Any
details I'm missing?

Furthermore, I was also looking at the "noinline" attribute (mainly as an
example). It seems that llvm-gcc emits a llvm.noinline global variable
containing references to functions with the noinline attribute. Clang however,
parses the noinline keyword, but does not emit anything for it. Is this
intentional, or did nobody get around to writing this yet?

Gr.

Matthijs

Gcc and llvm-gcc support the always_inline attribute for exactly this reason.
Clang currently completely ignores this attribute. What would we the correct
way of implementing this?

I think the right way would be to add a hook in
CodeGenModule::~CodeGenModule() that iterates over every always_inline
function, inlines them, then erases them. This shouldn't be too hard
to implement; you can use the existing inlining code by including
llvm/Transforms/Utils/Cloning.h and adding libLLVMTransformUtils.a to
clang's link line.

Currently, llvm-gcc supports always_inline by inlining those functions in the
gcc part, even before the llvm inliner comes along to look at the program. We
could take a similar approach in clang, though it seems like double work to
me.

Perhaps it would be better to take a route similar to what the noinline
attribute does in llvm-gcc: Output a global variable in the llvm.metadata
section with a list of all the force_inline functions. This ensures we can
reuse the llvm inliner and could also be useful for other frontends. Any
details I'm missing?

The issue is that always_inline means that inlining is a requirement,
not just a suggestion. We have to make sure the functions are inlined
even if we don't run the inliner pass. Any other way puts too much
burden on the users of emitted bitcode files.

Furthermore, I was also looking at the "noinline" attribute (mainly as an
example). It seems that llvm-gcc emits a llvm.noinline global variable
containing references to functions with the noinline attribute. Clang however,
parses the noinline keyword, but does not emit anything for it. Is this
intentional, or did nobody get around to writing this yet?

AFAIK, nobody's written it yet. Patches welcome.

-Eli

The correct approach is to encode these function properties in LLVM IR. Stay tuned, I'll send out proposal on LLVM dev list in next few days.

Once the information is encoded in the IR, the remaining issue is how to invoke the LLVM inlining pass when at least one function is marked as always_inline. There are two possible approaches:

1) Teach FE tools (e.g. clang, llvm-gcc) to insert inlining pass in the PassManager while requesting (opt + code generation) when it least one function with attribute always_inline is seen.

2) Teach the LLVM PassManager to sniff always_inline property encoded in the LLVM IR and do the right thing.

I prefer 2), but would be OK with 1) until 2) is ready.

Hi Devang,

The correct approach is to encode these function properties in LLVM IR.
Stay tuned, I'll send out proposal on LLVM dev list in next few days.

Any chance of this proposal getting finished? :slight_smile:

Once the information is encoded in the IR, the remaining issue is how to
invoke the LLVM inlining pass when at least one function is marked as
always_inline. There are two possible approaches:

1) Teach FE tools (e.g. clang, llvm-gcc) to insert inlining pass in the
PassManager while requesting (opt + code generation) when it least one
function with attribute always_inline is seen.

Perhaps there could be a specific inliner that does only this? This prevents
surprises when people didn't expect inlining?

2) Teach the LLVM PassManager to sniff always_inline property encoded in
the LLVM IR and do the right thing.

This sounds like it might also surprise people, if they only add a single pass
and still things are inlined. Might not be a big problem, though.

Another reason why I think encoding in the IR is necessary: If I have an
always inline function defined in one module and referenced in another, I
think it should still be inlined after I link the two modules together, right?

Gr.

Matthijs

Hi Matthijs,

Hi Devang,

The correct approach is to encode these function properties in LLVM IR.
Stay tuned, I'll send out proposal on LLVM dev list in next few days.

Any chance of this proposal getting finished? :slight_smile:

I got side tracked on other things. I'll try to get this finished soon.

Once the information is encoded in the IR, the remaining issue is how to
invoke the LLVM inlining pass when at least one function is marked as
always_inline. There are two possible approaches:

1) Teach FE tools (e.g. clang, llvm-gcc) to insert inlining pass in the
PassManager while requesting (opt + code generation) when it least one
function with attribute always_inline is seen.

Perhaps there could be a specific inliner that does only this? This prevents
surprises when people didn't expect inlining?

2) Teach the LLVM PassManager to sniff always_inline property encoded in
the LLVM IR and do the right thing.

This sounds like it might also surprise people, if they only add a single pass
and still things are inlined. Might not be a big problem, though.

It makes to sense have a inliner mode (or a separate pass) that only handles always_inline.

Another reason why I think encoding in the IR is necessary: If I have an
always inline function defined in one module and referenced in another, I
think it should still be inlined after I link the two modules together, right?

That is also my understanding. I hope, that will not surprise users.

Once the information is encoded in the IR, the remaining issue is how to
invoke the LLVM inlining pass when at least one function is marked as
always_inline. There are two possible approaches:

1) Teach FE tools (e.g. clang, llvm-gcc) to insert inlining pass in the
PassManager while requesting (opt + code generation) when it least one
function with attribute always_inline is seen.

Perhaps there could be a specific inliner that does only this? This prevents
surprises when people didn't expect inlining?

I'd strongly prefer to do this in the LLVM IR level instead of on clang ASTs.

2) Teach the LLVM PassManager to sniff always_inline property encoded in
the LLVM IR and do the right thing.

This sounds like it might also surprise people, if they only add a single pass
and still things are inlined. Might not be a big problem, though.

GCC runs its inliner, even at -O0. I agree it is ugly, but seems necessary.

Another reason why I think encoding in the IR is necessary: If I have an
always inline function defined in one module and referenced in another, I
think it should still be inlined after I link the two modules together, right?

No, it should only happen within a translation unit. The semantics of the program should not change based on whether you're using LTO or not, and inlining can sometimes change semantics for (arguably very broken) uses.

-Chris

I'd strongly prefer to do this in the LLVM IR level instead of on clang
ASTs.

That was the plan, AFAIU. Yet it would make sense to have clang do it at the
end of its compilation (by running an inliner over the generated IR).

No, it should only happen within a translation unit. The semantics of the
program should not change based on whether you're using LTO or not, and
inlining can sometimes change semantics for (arguably very broken) uses.

Woah, you're saying to only do inlining withing a translation unit? Does this
mean that inlining does not happen at all in LTO? I would say that inlining is
one of the more obvious uses for LTO and IMHO it would be a waste not to do it
because it would change broken programs. Could you perhaps give an example of
a use that would be semantically changed by inlining?

At the very least, I would make it an option whether to strip the
always_inline info after compiling a single translation unit, so people can
explicitely choose to still do (forced) inlining at link time.

Gr.

Matthijs

Unless I'm mistaking the semantics of always_inline, it should be a
compile-time error to take the address of an always_inline function,
and always_inline functions shouldn't be exposed as external symbols.
Otherwise, it's impossible for the compiler to honor the
"always_inline" attribute.

In any case, the issue isn't whether to do inlining with LTO, but
whether to force inlining with LTO.

-Eli

I'd strongly prefer to do this in the LLVM IR level instead of on clang
ASTs.

That was the plan, AFAIU. Yet it would make sense to have clang do it at the
end of its compilation (by running an inliner over the generated IR).

No, it should only happen within a translation unit. The semantics of the
program should not change based on whether you're using LTO or not, and
inlining can sometimes change semantics for (arguably very broken) uses.

Chris, that's why I used the word "hope" :). I think, in real world, if a function is mark as always_inline in one translation unit, then it is always (:)) marked as always_inline in all translation unit.

However, If the semantics requires always_inline to not cross translation unit limit during LTO then the optimizer needs a notion of inlining scope or something like that.

Woah, you're saying to only do inlining withing a translation unit? Does this

mean that inlining does not happen at all in LTO?

Matthijs, This means, always_inline is not enforced while doing inlining at LTO. In other words, inlining still happens but always_inline does not override normal inline heuristics used by the inliner.

I would say that inlining is
one of the more obvious uses for LTO and IMHO it would be a waste not to do it
because it would change broken programs. Could you perhaps give an example of
a use that would be semantically changed by inlining?

At the very least, I would make it an option whether to strip the
always_inline info after compiling a single translation unit, so people can
explicitely choose to still do (forced) inlining at link time.

This may not be possible in all cases.

1) inlining a function and 2) throwing away a function definition when function is inlined at all call sites are two separate operations.

always_inline does not enforce 2) and always_inline does change symbol visibility.

In gcc, -fvisibility-inlines-hidden enforces what you're describing.

Unless I'm mistaking the semantics of always_inline, it should be a
compile-time error to take the address of an always_inline function,
and always_inline functions shouldn't be exposed as external symbols.
Otherwise, it's impossible for the compiler to honor the
"always_inline" attribute.

1) inlining a function and 2) throwing away a function definition when
function is inlined at all call sites are two separate operations.

always_inline does not enforce 2) and always_inline does change symbol
visibility.

oops, I meant : "always_inline does not change symbol visibility"

Unless I'm mistaking the semantics of always_inline, it should be a
compile-time error to take the address of an always_inline function,
and always_inline functions shouldn't be exposed as external symbols.
Otherwise, it's impossible for the compiler to honor the
"always_inline" attribute.

1) inlining a function and 2) throwing away a function definition when
function is inlined at all call sites are two separate operations.

always_inline does not enforce 2) and always_inline does change symbol
visibility.

Oh... okay, thanks for clarifying. That said, it seems sort of silly.
If the function can't be removed, doesn't always_inline just force
the compiler to bloat the code?

Is there some attribute that would enforce throwing away the
definition? If not, would it be useful to add one? Or is
always_inline along with static generally good enough?

In gcc, -fvisibility-inlines-hidden enforces what you're describing.

That isn't quite the same thing; it affects all inlines, and it
doesn't prevent taking the address.

-Eli

I *think* using "static" is the only option. Note, this is a gcc extension.

I have a different take. gcc allows one to do:

static int foo(int) __attribute__ ((always_inline));

inline int foo(int i) { return i + 1; }

volatile void *vp = foo;

int main() {
   return foo(0);
}

because they allow this, they have to emit the always_inline function. This goes against common sense, so, I'd argue this is a bug that should be fixed, and when fixed, there can't be any references to the always_inline function. If you want one in the file, you have to have one without always_inline, pedantically, this is even probably wrong (ODR).

While I don't mind being slavishly compatible with gcc/g++, I do think we should draw the line at bug for bug compatibility. If there is a question as to wether this is a bug or feature, certainly a gcc bug report should be able to answer that.

Hi all,

I think that a lot of confusion arises from unclarity about what always_inline
actually means.

The gcc docs [1] say:
  Generally, functions are not inlined unless optimization is specified. For
  functions declared inline, this attribute inlines the function even if no
  optimization level was specified.

This is not very specific on topics on taking function pointers and different
translation units, though. I see roughtly two options:

1) A function marked always_inline must always be inlined. This means it
cannot be used in other way, so taking the address of an always_inline
function is an error.

2) A function marked always_inline must be inlined whenever possible. This
means that any other uses simply won't get inlined, but are allowed.

Neither of these options would actually limit the visibility of thefunction,
as far as I can see. When the function is static, the function can normally be
DCE'd after inlining (which will always happen for 1), but might not happen
for 2)).

We could make always_inline work only with static functions, but I can't
really see why that would be necessary. IMHO, it would even greatly reduce the
usefulness of always_inline.

When the function is not static, it will still be inlined at all the callsites
within the same translation unit. We then have again two interpretations:

a) An always_inline attribute only works within the same translation unit.
Functions that are visible outside of the unit, are treated just as any other
function at link time, as if the always_inline attribute was not specified.

b) An always_inline attribute stays with a function, regardless of its
visibility. Any call site of the function, regardless of the translation unit
in which it lives, is inlined.

Option a) would be easiest to implement, but again lose a lost of usefulness
compared to option b). Option b) does require that inlining happens again at
link time, so any call sites in other translation units are inlined as well.

Also, option b) would be slightly tricky to combine with option 1), unless the
always_inline attribute is (can be) present on the definition as well as the
declaration of the function. Option b) combined with option 2) shouldn't have
this problem.

I think that Chris is really in favour of option b) here, because "inlining
can sometimes change semantics". However, I still don't really understand what
kind of cases we are talking about. Chris, could you give an example? Also, is
this changing of semantics specific to always_inline? From what you wrote, I
would assume that this changing of semantics can happen with any inline, so
you shouldn't be doing any inlining at all at link time. Nor at optimization
time, for that matter, so I'm probably missing an essential point here :slight_smile:

Also, Devang pointed out that inlining at link time might not be possible in
all cases. Again, the only case I can think of is taking the address of a
function, but this could easily be solved by using option 2), ie inline
whenever possible and leave the original function otherwise.

Furthermore, for our project we do actually need to have option b). AFAICS
option b) is fine as the default, but apparently I'm missing something there.
In any case, it would be very convenient to at least support option b), even
when it is not the default. This could perhaps be done through commandline
options to clang and/or the linker, or at the very least through options that
we can set in our own compiler driver (which links against clang,
transformation passes and the linker libraries).

So, which are the semantics that always_inline imply? What does gcc do?

Gr.

Matthijs

[1]: Function Attributes - Using the GNU Compiler Collection (GCC)

Hi,

1) A function marked always_inline must always be inlined. This means it
cannot be used in other way, so taking the address of an always_inline
function is an error.

I have been thinking about this. Would it be too hard to make this
rule more flexible, so that for instance the following case would be
possible:

const int CONFIG_OPTIMIZED = 1;

int foo(int x) __attribute__ ((always_inline))
{
    if(CONFIG_OPTIMIZED)
        return foo_optimized(x);
    else
        return foo_fast(x);
}

Taking the address of foo() would return the address either
foo_optimized or foo_fast. Of course this could be done with the
preprocessor, but using the preprocessor has several disadvantages (in
terms of maintenance).

Hi Luís,

I have been thinking about this. Would it be too hard to make this
rule more flexible, so that for instance the following case would be
possible:

The following code would be valid when you use the other interpretation, that
an always_inline function is inlined whenever possible.

const int CONFIG_OPTIMIZED = 1;

int foo(int x) __attribute__ ((always_inline))
{
    if(CONFIG_OPTIMIZED)
        return foo_optimized(x);
    else
        return foo_fast(x);
}

Taking the address of foo() would return the address either
foo_optimized or foo_fast.

Interesting case. A possible solution would be to mark foo_optimized and
foo_fast as always_inline as well, so that they will be inlined here and the
unused one will be removed by simplifycfg or some other pass.

However, if you really want the address of foo to be replaced with the address
of either foo_optimized or foo_fast, I guess you would need to use some pass
to turn foo into
  int foo (int x) {
    return foo_optmized(x);
  }

and then have some other pass (not sure which, really), to specifically look
for function looking like this and replace all uses of foo with foo_optimized
(possibly also direct calls).

In any case, I don't think that this optimization is directly related to
always_inline or inlining at all. It would be cool to support this
transformation, though there might be a few tricky exception cases to be
handled.

Gr.

Matthijs

Consider this example,
--- a.c ---
int foo() __attribute__((always_inline));
int foo() { return 42; }
int bar() {
  return foo() + foo();
}
--- b.c ---
extern int foo();
int main() {
  int (*fp)() = &foo;
  printf("%d\n", fp());
  return 0;
}

1) A function marked always_inline must always be inlined.

Yes. Though, seems reasonable to let this inlining happen at link time as well.

b) An always_inline attribute stays with a function, regardless of its
visibility. Any call site of the function, regardless of the translation unit
in which it lives, is inlined.

Yes. One way to look at this, is, the always_inline body is never used to generate a stand alone function.

Hi Devang,

1) A function marked always_inline must always be inlined. This means it
cannot be used in other way, so taking the address of an always_inline
function is an error.

2) A function marked always_inline must be inlined whenever possible. This
means that any other uses simply won't get inlined, but are allowed.

Consider this example,
--- a.c ---
int foo() __attribute__((always_inline));
int foo() { return 42; }
int bar() {
  return foo() + foo();
}
--- b.c ---
extern int foo();
int main() {
  int (*fp)() = &foo;
  printf("%d\n", fp());
  return 0;
}
---

This works.

With plain gcc I assume? Or also clang and/or llvm-gcc?

This would suggest that gcc uses option 2) instead of option 1) and we should
probably too.

While compiling and optimizing b.c, the compiler does not know that foo()
is marked as always_inline. However, during LTO, the optimizer will know.

Even if it did know, it wouldn't change anything as long as b.c didn't have
the function body. During LTO, the optimizer will know if we keep the
always_inline attribute in the IR (which corresponds to b) below and which we
should do IMO).

Neither of these options would actually limit the visibility of
thefunction, as far as I can see. When the function is static, the function
can normally be DCE'd after inlining (which will always happen for 1), but
might not happen for 2)).

In case of 1) if a library exports always_inline function then it can not
be DCE'd while building library.

Yeah, but I was talking about a static function (which is identical to
internal and thus not exported, right?)

a) An always_inline attribute only works within the same translation unit.
Functions that are visible outside of the unit, are treated just as any
other function at link time, as if the always_inline attribute was not
specified.

b) An always_inline attribute stays with a function, regardless of its
visibility. Any call site of the function, regardless of the translation
unit in which it lives, is inlined.

I think, Option b) is easier to implement.

My guess was that throwing away the always_inline attribute and leaving the
linker unchanged would be easier, though it doesn't really matter much.

Also, Devang pointed out that inlining at link time might not be possible
in all cases.

I did not mean this.

--- quote begin ---

At the very least, I would make it an option whether to strip the
always_inline info after compiling a single translation unit, so
people can
explicitely choose to still do (forced) inlining at link time.

This may not be possible in all cases.

--- quote end ---

I meant, it may not be possible to guarantee that always_inline info is
stripped in a llvm module presented to the link time optimizer. Sorry for
the confusion.

I was thinking of doing this stripping in the tool generating the LLVM module,
so opt (or perhaps clang?). It should always be able to just throw away the
always_inline global and the function should then be just a normal function.

I'm not sure if having such a strip-always_inline option is needed, at least
not if the default is to preserve always_inline information. I was proposing
this option, because IIRC someone proposed or implied throwing away the
always_inline info by default.

Devang, if I see this right, you agree with me that the combination of 2) and
b) are the "right" ones?

Gr.

Matthijs