[RFC] inlining and mismatched function attributes

Hey folks,
It was suggested that I bring up a peculiarity from an individual code
review to the list.

We have a couple function attributes (in C) where inlining can produce
unexpected or surprising results.

Let's take for example, stack protectors, which help me prevent stack
buffer overruns from rewriting a return address pushed on the stack
(sometimes). Now let's say I'm an operating system kernel and I need
to restore register state after suspend (ie. during resume). That
register state would conflict with the generated stack protector code,
so I use a function attribute like __attribute__((no_stack_protector)
to prevent my compiler from inserting such code in my resume handler
which has to initialize a new stack canary. But I wasn't careful to
check that all of the functions I call from said handler have a
matching function attribute, and due to inlining, I wind up with a
stack protector, even though I requested a function not have a stack
protector! Suddenly I load the incorrect stack canary value from
garbage register state, fail the stack check guard, and now I fail to
resume due to a kernel panic. (True story)

Another stack protector issue that came up recently has to do with
forking "zygote-like" processes, and having unique stack protector
values per process. Point being, there are a few cases where we want
stack protectors generally, but need the fine grain control provided
by function attributes. Moving code into separate translation units
in order to use different flags, is tedious and commonly runs into
issues with LTO in LLVM when flags that aren't encoded in IR are
dropped by LTO.

Similar problems crop up again with coverage, and profiling
instrumentation in "delicate" code. Sanitizers, shadow call stack,
and a few others run into similar situations.

As a developer without knowledge of my toolchain, how do I go about
debugging this myself? (Debugging the suspend/resume issue was not
fun).

What are some ways developers can fix this? Well, if I know my call
chain, I can go through and start marking my callees
__attribute__((no_stack_protector)) or __attribute__((noinline))
recursively. Eventually, I should get to the point where none of my
callees have stack protectors or are inlined. But that strips off more
stack protectors perhaps than I'd like, and leaves a lot of code
unprotected. This begins to feel like "what color is your function?"
"Red" functions may only call other "red" functions; "blue" functions
may only call other "blue" functions.
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

Maybe my callees have different callers with different constraints.
Aliases can be used for callees. Example:

// has a stack protector
void callee_bad(void) { ... }

__attribute__((alias("callee_bad"),noinline))
void callee_good(void);

// requires no stack protector
__attribute__((no_stack_protector))
void caller(void) {
  callee_good();
}

However, caller() could be inlined into yet another (not defined
above) call site that does have/need a stack protector.

And other call sites (that don't care about stack protectors) can just
call callee_bad like normal. If I don't have too many callees, it's
not too bad. But very quickly it can become difficult as a developer
to tell which callee was problematic.

Another possibility is that the compiler could have knowledge of
certain conflicting function attributes, and skip inline substitution
of callee into caller upon mismatch. We provide "remarks" to help
developers understand why inline substitution failed, if they care to
know why. ie.
$ clang -O1 -O1 -Rpass-missed=inline foo.c
foo.c:6:10: remark: foo not inlined into bar because it should never
be inlined (cost=never): <really good reason why here>

Indeed, that's exactly the tact I took with no_stack_protector in
commit bc044a88ee3c ("[Inline] prevent inlining on stack protector mismatch").

It's what GCC is proposing for
__attribute__((no_profile_instrument_function)) (coverage and
profiling) in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223
and I'm looking to match in
https://reviews.llvm.org/D104810 .

We already have precedence for this in LLVM; checkout `CompatRule` in
llvm/include/llvm/IR/Attributes.td and
`llvm::getAttributeBasedInliningDecision` in
llvm/lib/Analysis/InlineCost.cpp.

I'd argue it's also less surprising for a function not to be inlined
than for a stack protector or profile/coverage instrumentation to show
up when a function is explicitly attributed to not have those things.
These attributes describe a desired property of the function. If
inlined, we wish the new copies have the same property. However,
function attribute applies to the whole function, not a subset of
instructions.

But, now we're making IR function attributes that could have been
orthogonal (nossp, noinline) entangled. That hurts the composition of
such attributes. (I will note, nossp does not unconditionally imply
noinline; it implies "noinline for mismatched callers, and noinline on
callees that are mismatched, on a per call site basis."
__attribute__((always_inline)) will override this exception, and we
don't provide helpful diagnostics in such case; good luck! :-/ ) We
likely will have such conflicts with additional function attributes in
the future. Should the inliner scan the caller and callee at every
call site for such attribute lists? Should the LangRef document such
inlining behavior?

Another concern is diverging from GCC here; while we're both
discussing no_profile_instrument_function it would be good to gather
more feedback.

Sometimes, I wish in C we had the ability to express at a given
callsite that inlining the callee should not occur.

You can see more discussion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223 . Surely there are
additional solutions we have not yet conceived of. At this point, I
don't particularly care what color we paint this bikeshed; I just need
a bikeshed built so that my kernel boots. We all want all software to
be better, and can have a gentle-person's disagreement. Thoughts?

Nick,

Thanks for bringing up this topic.

Another issue to consider is the performance impact of the interaction between no_stack_protector and inlining. If the translation unit is compiled with stack protectors and you want to remove them from a specific function you have to prevent all inlining into the function in question or recursively annotate all called functions with no_stack_protector. This can lead to disastrous consequences for performance if it is a hot function.

Sometimes, I wish in C we had the ability to express at a given callsite that inlining the callee should not occur.

This would solve the usability problem that you are discussing (you wouldn’t have to go chasing down all callees that are marked as alwaysinline) but it wouldn’t solve the performance problem. Another solution might be to have an annotation that says “my no_stack_protector or no_profile_instrument_function annotation overrides the values from inlined functions”. This would allow the inlining to occur but gives the callee control over their own behavior. If we had had this annotation it would have made one of the several solutions to stack protectors in the Zygote that we tried actually viable.

  • Chris

Hey folks,
It was suggested that I bring up a peculiarity from an individual code
review to the list.

Hello.

You raise bunch of very interesting question I must say.

We have a couple function attributes (in C) where inlining can produce
unexpected or surprising results.

Let's take for example, stack protectors, which help me prevent stack
buffer overruns from rewriting a return address pushed on the stack
(sometimes). Now let's say I'm an operating system kernel and I need
to restore register state after suspend (ie. during resume). That
register state would conflict with the generated stack protector code,
so I use a function attribute like __attribute__((no_stack_protector)
to prevent my compiler from inserting such code in my resume handler
which has to initialize a new stack canary. But I wasn't careful to
check that all of the functions I call from said handler have a
matching function attribute, and due to inlining, I wind up with a
stack protector, even though I requested a function not have a stack
protector! Suddenly I load the incorrect stack canary value from
garbage register state, fail the stack check guard, and now I fail to
resume due to a kernel panic. (True story)

Just adding a note, where Jakub disagrees with blocking of the inlining
for the stact protector attribute:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722#c9

Another stack protector issue that came up recently has to do with
forking "zygote-like" processes, and having unique stack protector
values per process. Point being, there are a few cases where we want
stack protectors generally, but need the fine grain control provided
by function attributes. Moving code into separate translation units
in order to use different flags, is tedious and commonly runs into
issues with LTO in LLVM when flags that aren't encoded in IR are
dropped by LTO.

Similar problems crop up again with coverage, and profiling
instrumentation in "delicate" code. Sanitizers, shadow call stack,
and a few others run into similar situations.

As a developer without knowledge of my toolchain, how do I go about
debugging this myself? (Debugging the suspend/resume issue was not
fun).

What are some ways developers can fix this? Well, if I know my call
chain, I can go through and start marking my callees
__attribute__((no_stack_protector)) or __attribute__((noinline))
recursively. Eventually, I should get to the point where none of my
callees have stack protectors or are inlined. But that strips off more
stack protectors perhaps than I'd like, and leaves a lot of code
unprotected. This begins to feel like "what color is your function?"
"Red" functions may only call other "red" functions; "blue" functions
may only call other "blue" functions.
https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

Maybe my callees have different callers with different constraints.
Aliases can be used for callees. Example:

// has a stack protector
void callee_bad(void) { ... }

__attribute__((alias("callee_bad"),noinline))
void callee_good(void);

// requires no stack protector
__attribute__((no_stack_protector))
void caller(void) {
   callee_good();
}

However, caller() could be inlined into yet another (not defined
above) call site that does have/need a stack protector.

And other call sites (that don't care about stack protectors) can just
call callee_bad like normal. If I don't have too many callees, it's
not too bad. But very quickly it can become difficult as a developer
to tell which callee was problematic.

Another possibility is that the compiler could have knowledge of
certain conflicting function attributes, and skip inline substitution
of callee into caller upon mismatch. We provide "remarks" to help
developers understand why inline substitution failed, if they care to
know why. ie.
$ clang -O1 -O1 -Rpass-missed=inline foo.c
foo.c:6:10: remark: foo not inlined into bar because it should never
be inlined (cost=never): <really good reason why here>

Indeed, that's exactly the tact I took with no_stack_protector in
commit bc044a88ee3c ("[Inline] prevent inlining on stack protector mismatch").

It's what GCC is proposing for
__attribute__((no_profile_instrument_function)) (coverage and
profiling) in:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223
and I'm looking to match in
https://reviews.llvm.org/D104810 .

We already have precedence for this in LLVM; checkout `CompatRule` in
llvm/include/llvm/IR/Attributes.td and
`llvm::getAttributeBasedInliningDecision` in
llvm/lib/Analysis/InlineCost.cpp.

I'd argue it's also less surprising for a function not to be inlined
than for a stack protector or profile/coverage instrumentation to show
up when a function is explicitly attributed to not have those things.
These attributes describe a desired property of the function. If
inlined, we wish the new copies have the same property. However,
function attribute applies to the whole function, not a subset of
instructions.

But, now we're making IR function attributes that could have been
orthogonal (nossp, noinline) entangled. That hurts the composition of
such attributes. (I will note, nossp does not unconditionally imply
noinline; it implies "noinline for mismatched callers, and noinline on
callees that are mismatched, on a per call site basis."
__attribute__((always_inline)) will override this exception, and we
don't provide helpful diagnostics in such case; good luck! :-/ ) We
likely will have such conflicts with additional function attributes in
the future. Should the inliner scan the caller and callee at every
call site for such attribute lists? Should the LangRef document such
inlining behavior?

Another concern is diverging from GCC here; while we're both
discussing no_profile_instrument_function it would be good to gather
more feedback.

About this attribute, it's definitely something different as no_stack_protector
is a release feature, while no_profile_instrument_function is used for PGO.

My understanding of the attribute is that it's used for super-busy functions
which would slow down a training run significantly. So GCC approach would
be to allow inlining of a function after PGO instrumentation.

Martin