retpoline mitigation and 6.0

Hi all,

I saw the retpoline mitigation landed in r323155. Are we ready to merge this to 6.0, or are there any open issues that we’re waiting for? Also, were there any followups I should know about? Also, release notes please :slight_smile:

Thanks,
Hans

There was one open issue that i landed the fix for today.

I was letting bots clear out before i ping you with the patch series to merge.

Sounds great, thanks!

Hi all,

I saw the retpoline mitigation landed in r323155. Are we ready to merge this to 6.0, or are there any open issues that we're waiting for? Also, were there any followups I should know about? Also, release notes please :slight_smile:

There was a follow up commit for lld: r323288

-Tom

Eep, please can we keep the command line option for clang and the thunk
ABI matching what GCC, the Linux kernel and Xen are all doing?

To say that I am not stunningly keen on
LKML: Guenter Roeck: [RFC] x86/retpoline: Add clang support for 64-bit builds would be a bit of an
understatement...

Two aspects to this…

One, we’re somewhat reluctant to guarantee an ABI here. At least I am. While we don’t expect rampant divergence here, I don’t want this to become something we cannot change if there are good reasons to do so. We’ve already changed the thunks once based on feedback (putting LFENCE after the PAUSE).

Given that we don’t want this to be a guaranteed part of the ABI, I really want the thunk names to be specific about which implementation is being used. For example, if we come up with a new, better thunk interface, we would want to give it a new name (even if it were just a suffix of “2”) so that we don’t get runtime failures. Since LLVM and GCC can’t possible release in sync, IMO they should use different names. I asked the GCC developers to include ‘gcc’ in the name, but at least the person I asked was not at all receptive.

Two, I actually agree with you about the command line flags. I asked for it to be ‘-mretpoline’. I think a short, clear flag is really best, and we’ve very publicly documented this technique as ‘retpoline’. But the GCC community has a fairly different design AFAICT… The only embedded thunks the offer are inline (ours are always out-of-line, even if they aren’t external). Also, they support thunking indirect branches that we don’t: ret. Our feature really is oriented around exposing the retpoline mitigation, and I don’t think there is any demand for the complexity that would be entailed by generalizing it further. But I’m sure the GCC folks have a very reasonably different perspective here that make them prefer a quite different feature (they can thunk rets) and naming scheme.

So I don’t really know how to fix this. I think there are reasonable arguments in each direction. IMO, the easiest improvement would be for GCC to add command line aliases spelled using ‘retpoline’ for the specific use case, but keep the more general commandline interface for which they have reasonable use cases.

The naming thing I think is essentially by-design so that if we (or you) need to split apart the implementation, there are distinct names available. Until then, aliases seem very cheap to maintain?

A minor note on this specific patch:

You don’t need ‘-mretpoline -mretpoline-external-thunk’. The second flag implies the first. (Or should, if not, its a bug.) Our goal was that you needed exactly one flag to enable this in whatever form.

Surely adding the lfence was changing your implementation, not the ABI?

And if we really are talking about the *ABI* not the implementation,
I'm not sure I understand your concern.

The ABI for each thunk is that it is identical in all respects, apart
from speculation, to 'jump *\reg'. There's not a lot of scope for a
'v2' of that, surely?

I could live with the command-line divergence, although it's
suboptimal. But *please* since these things also need to be symbols
exported to loadable modules, can't we keep the thunk names identical?

You can even use __x86_indirect_thunk_\reg for *now* and reserve the
right to use a different name if you ever do revise the ABI.

Two aspects to this…

One, we’re somewhat reluctant to guarantee an ABI here. At least I
am. While we don’t expect rampant divergence here, I don’t want
this to become something we cannot change if there are good reasons
to do so. We’ve already changed the thunks once based on feedback
(putting LFENCE after the PAUSE).

Surely adding the lfence was changing your implementation, not the ABI?

And if we really are talking about the ABI not the implementation,
I’m not sure I understand your concern.

The ABI for each thunk is that it is identical in all respects, apart
from speculation, to ‘jump *\reg’. There’s not a lot of scope for a
‘v2’ of that, surely?

While I hope we have converged and never need to chaneg them, when we started, there was actually a substantially different ABI proposed, and multiple different ones. So some changes already were needed.

We already have at least one change that has been proposed, but we haven’t decided to pursue yet: a thunk which includes the load of the address, and specifically a collection to handle common repeated patterns of loads that before retpoline would have folded into addressing modes. And these would definitely have a different ABI.

I could live with the command-line divergence, although it’s
suboptimal. But please since these things also need to be symbols
exported to loadable modules, can’t we keep the thunk names identical?

At least for LLVM when not using external thunks, we specifically do not export the symbol. As a consequence, DSOs built with two versions of LLVM (or LLVM and GCC) with different thunk behavior and each will find the correct thunk. That was a specific part of the design.

While you can export your external thunk, that’s a choice of the code defining the thunk.

You can even use _x86_indirect_thunk\reg for now and reserve the
right to use a different name if you ever do revise the ABI.

Maybe I don’t understand, but I’m surprised that there is a significant burden to having aliases for now instead?

While you can export your external thunk, that’s a choice of the code defining the thunk.

The driving force in the kernel is to be able to runtime patch the thunks away, when running on a CPU or in a mode that doesn’t need them. We really want to have central implementations and have everything use them.

You can even use _x86_indirect_thunk\reg for now and reserve the
right to use a different name if you ever do revise the ABI.

Maybe I don’t understand, but I’m surprised that there is a significant burden to having aliases for now instead?

You can make that work within the kernel image itself, and export only the existing names. It gets somewhat harder to support loadable modules which attempt to use the thunks by different names to the function that’s exported. I’m not sure how we’d hack up the unresolved symbols in the module objects to match the exported symbol names.

While you can export your external thunk, that’s a choice of the code defining the thunk.

The driving force in the kernel is to be able to runtime patch the thunks away, when running on a CPU or in a mode that doesn’t need them. We really want to have central implementations and have everything use them.

Sure, I can see why for the kernel, this is important. It also gets to essentially define its ABI however it wants.

For our other users, they specifically don’t want this in the ABI (because they actually have DSOs and other junk being linked together). And yes, we actually have a decent number of users of this in userspace and outside the kernel. =/

You can even use _x86_indirect_thunk\reg for now and reserve the
right to use a different name if you ever do revise the ABI.

Maybe I don’t understand, but I’m surprised that there is a significant burden to having aliases for now instead?

You can make that work within the kernel image itself, and export only the existing names. It gets somewhat harder to support loadable modules which attempt to use the thunks by different names to the function that’s exported. I’m not sure how we’d hack up the unresolved symbols in the module objects to match the exported symbol names.

I had actually wanted to originally have the ‘-mretpoline-external-thunk’ flag take a completely custom name for the thunks on the command line. Unfortunately, that proved remarkably annoying to implement in LLVM (for pretty silly reasons, but it is what it is). I didn’t do that in large part because it seemed easy for users of external thunks to alias things themselves.

If the way loadable modules in the kernel work actually makes that quite hard, we can look again at allowing the external thunks to have a user controlled name. This seems strictly superior to any kind of agreed-to naming scheme as it gives you much more control. But it’ll be really messy in LLVM to do[1], so I’d appreciate understanding how important this is for the kernel.

-Chandler

[1]: Yeah, it really shouldn’t be. We are aware that this is a nasty aspect to how various components of LLVM are wired together and want to have better infrastructure here. But today, we don’t. And especially since all of this needs to be backported to old versions of Clang and LLVM, we can’t fix that and instead need to work around the existing infrastructure.

The llvm commit log says:

"... They can write this custom thunk and use `-mretpoline-external-thunk`
*in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names
must be: ... "

which I understand to mean that _both_ are needed. I'll be happy to stand
corrected if that is guaranteed not to be the case.

Thanks,
Guenter

I saw the retpoline mitigation landed in r323155. Are we ready to
merge this to 6.0, or are there any open issues that we’re waiting
for? Also, were there any followups I should know about? Also,
release notes please :slight_smile:

Eep, please can we keep the command line option for clang and the thunk
ABI matching what GCC, the Linux kernel and Xen are all doing?

To say that I am not stunningly keen on
https://lkml.org/lkml/2018/2/2/975 would be a bit of an
understatement…

A minor note on this specific patch:

You don’t need ‘-mretpoline -mretpoline-external-thunk’. The second flag implies the first. (Or should, if not, its a bug.) Our goal was that you needed exactly one flag to enable this in whatever form.

The llvm commit log says:

"… They can write this custom thunk and use -mretpoline-external-thunk
in addition to -mretpoline. In this case, on x86-64 thu thunk names
must be: … "

which I understand to mean that both are needed. I’ll be happy to stand
corrected if that is guaranteed not to be the case.

Gah, sorry. I thought I had updated that before landing, but I must have missed it.

The intent is that you do not need both. That changed during review, but that was the intent. The test seems to confirm that this should work. If you see anything that makes you think it isn’t working, please give a shout (or file a bug) and we’ll fix.

Using the *external* thunk in order to provide their own? While
claiming it isn't an ABI? That seems... odd.

Nobody cares what you call it when you're doing it inline or in a
COMDAT section. It's *only* the external thunk where it absolutely
*does* become an ABI, where we care about consistency.

So, I was waiting to hear a definitive response on whether using aliases is hard, and didn’t see one here, which is why I haven’t responded further.

However, a colleauge pointed me at an LKML thread where it seems there is a definitive response?

I’m really looking for clear direction: we can try to implement custom naming, but it will add undesirable complexity to the compiler. Do we need it for the kernel? I have to ask because I genuinely don’t know what is or isn’t reasonable in the kernel.

So, I was waiting to hear a definitive response on whether using aliases is hard, and didn’t see one here, which is why I haven’t responded further.

However, a colleauge pointed me at an LKML thread where it seems there is a definitive response?

Aliases are hard when the compiler is directly emitting calls to a function which is exported to modules, yet the compiler and the kernel disagree on what the symbol is actually called.

I spent a happy Sunday evening in a hotel room a few weeks ago, trying to make them work before telling the GCC folk “screw it this is too horrible please keep the symbol name as it is”.

If I was insufficiently definitive, that’s because I was inviting someone to prove me wrong — which I was still doing in the LKML thread you saw.

I’m really looking for clear direction: we can try to implement custom naming, but it will add undesirable complexity to the compiler. Do we need it for the kernel? I have to ask because I genuinely don’t know what is or isn’t reasonable in the kernel.

It’s only the external thunk where it absolutely
does become an ABI, where we care about consistency.

In the future, if there is such a powerful need for consistency, it would be good to actually engage with more than one compiler community. =/ As I said, I tried to talk to the GCC developers and made no progress but also heard no strong arguments that this kind of consistency was actually necessary.

Yes, it would be good to engage with more than one compiler community. In the grand scheme of things that’s probably one of the smaller things that didn’t get handled right during this whole fiasco.

You’ll note that when I added 16-bit support to LLVM/clang, I did make sure we did things consistently, and now the kernel boot code builds with -m16 with both clang and GCC.
On this occasion, I didn’t get that right. I could have turned up here sooner after the embargo was broken. I take responsibility for that delay; I’m sorry.

I really do want to produce a feature that addresses the kernel’s needs, but we need to know what they are and have some chance to figure out how to find a solution that also doesn’t cause problems for the compiler. This is just a note for the future though, the retpoline stuff is above.

At this point, what we really want is for identical thunks to have identical names — just like we do for builtins and other stuff, to avoid having differences between clang and GCC which just end up seeming capricious and being hard to work around. Having matching command line options would be a bonus, but isn’t imperative.

If you reserve the right to invent new thunk ABIs later and thus have different names for those, that’s absolutely fine. You’d break our build if you added those unconditionally anyway; we can choose to support them or not. And hopefully if GCC did the same, then again the thunk symbols would match.

So, I was waiting to hear a definitive response on whether using aliases is hard, and didn’t see one here, which is why I haven’t responded further.

However, a colleauge pointed me at an LKML thread where it seems there is a definitive response?

Aliases are hard when the compiler is directly emitting calls to a function which is exported to modules, yet the compiler and the kernel disagree on what the symbol is actually called.

I spent a happy Sunday evening in a hotel room a few weeks ago, trying to make them work before telling the GCC folk “screw it this is too horrible please keep the symbol name as it is”.

Nah, this is plenty definitive for me. ;]

At this point, what we really want is for identical thunks to have identical names — just like we do for builtins and other stuff, to avoid having differences between clang and GCC which just end up seeming capricious and being hard to work around. Having matching command line options would be a bonus, but isn’t imperative.

After talking to several others (to make sure we don’t have to do this whole thing yet again) we’ll change the external thunk names to match what GCC is using. Hopefully this doesn’t come back to bite us. =]

We’ll also make sure those patches get backported too so that no released versions have the old behavior.

Thank you.

For reference, is there a way to turn off the retpoline which has been enabled on the command line?

For init functions which run only at startup before any attacker can be in the system, we currently mark the function attribute((indirect_thunk(“keep”))). Is there a clang equivalent?

It’s not particularly important; a minor optimisation we can live without if we have to.

Can you file a bug to track either explaining how to do that or implementing it?

My guess is that it doesn’t work yet, will be made to work by a future change that has already been proposed but not implemented…

Also, I’m going to hypothesize this will introduce yet another divergence between GCC and Clang. Not sure there is anyway to avoid that at this point. (We still have the significant divergence in that GCC supports thunking in ways that LLVM doesn’t, and that in turn changes several aspects of the feature design.)

What name do we use for when the target address is pushed onto the stack? What are the semantics? Is there a spec anywheere?

LLVM only needs this on 32-bit x86, but we do kind of need an answer before we update all of our branches with new names…