RFC: Speculative Load Hardening (a Spectre variant #1 mitigation)

Hello all,

I’ve been working for the last month or so on a comprehensive mitigation approach to variant #1 of Spectre. There are a bunch of reasons why this is desirable:

  • Critical software that is unlikely to be easily hand-mitigated (or where the performance tradeoff isn’t worth it) will have a compelling option.
  • It gives us a baseline on performance for hand-mitigation.
  • Combined with opt-in or opt-out, it may give simpler hand-mitigation.
  • It is instructive to see how to mitigate code patterns.

A detailed design document is available for commenting here:
https://docs.google.com/document/d/1wwcfv3UV9ZnZVcGiGuoITT_61e_Ko3TmoCS3uXLcJR0/edit
(I pasted this in markdown format at the bottom of the email as well.)

I have also published a very early prototype patch that implements this design:
https://reviews.llvm.org/D44824
This is the patch I’ve used to collect the performance data on the approach. It should be fairly functional but is a long way from being ready to review in detail, much less land. I’m posting so folks can start seeing the overall approach and can play with it if they want. Grab it here:

Comments are very welcome! I’d like to keep the doc and this thread focused on discussion of the high-level technique for hardening, and the code review thread for discussion of the techniques used to implement this in LLVM.

Thanks all!
-Chandler

Hi Chandler,

Thank you very much for sharing this!

The RFC is pretty lengthy but the far majority of it makes sense to me. I’m sure I’m forgetting to react to some aspects below, but I thought I’d summarize some initial thoughts and questions I had after reading the RFC end-to-end.

  • I believe the same high-level principles you outline can also be used to implement the same protection on the Arm instruction sets (AArch64 and AArch32). The technique you describe is dependent on being able to do an “unpredicted conditional update of a register’s value". For the Arm architecture, the guarantee for the conditional update to be unpredicted can come from using the new CSDB instruction – see documentation at https://developer.arm.com/support/security-update/download-the-whitepaper.

  • It seems you suggest 2 ways to protect against side-channel attacks leaking speculatively-loaded secret data: either protect by zero-ing out address bits that may represent secret data, or zero-ing out loaded data. In the first case (zero-ing out address bits) – wouldn’t you have to apply that to addresses used in stores too, next to addresses used in loads?

  • IIUC, you state that constant-offset stack locations and global variables don’t need protection. For option 1 (zero-ing out the address bit that may represent secret data) – I can understand the rationale for why constant offset stack locations and global variables don’t need protection. But I’m wondering what the detailed rationale is for not needing protection on option 2 (zero-ing out the value loaded): what guarantees that no secret info can be located on the stack or in a global variable? Or did I misunderstand the proposal?

  • For x86 specifically, you explain how the low 2gb and high 2gb of address space should be protected by the OS. I wonder if this ±2gb range could be reduced sharply by letting the compiler not generate 32 bit constant offsets in address calculations, but at most a much smaller constant offset? I assume limiting that may have only a very small effect on code quality – and might potentially ease the requirements on the OS?

Thanks!

Kristof

Hi Chandler,

Thank you very much for sharing this!

The RFC is pretty lengthy but the far majority of it makes sense to me. I’m sure I’m forgetting to react to some aspects below, but I thought I’d summarize some initial thoughts and questions I had after reading the RFC end-to-end.

  • I believe the same high-level principles you outline can also be used to implement the same protection on the Arm instruction sets (AArch64 and AArch32). The technique you describe is dependent on being able to do an “unpredicted conditional update of a register’s value". For the Arm architecture, the guarantee for the conditional update to be unpredicted can come from using the new CSDB instruction – see documentation at https://developer.arm.com/support/security-update/download-the-whitepaper.

I think even without this the practical guarantee can be met. But let’s discuss that off line and in more depth as it doesn’t have too much to do with the compiler side of this and is really just an ARM architecture question.

  • It seems you suggest 2 ways to protect against side-channel attacks leaking speculatively-loaded secret data: either protect by zero-ing out address bits that may represent secret data, or zero-ing out loaded data. In the first case (zero-ing out address bits) – wouldn’t you have to apply that to addresses used in stores too, next to addresses used in loads?

Stores don’t intrinsically leak data. Specifically, if you have never loaded secret data, nothing you store can leak secret data.

  1. You can’t be storing the secret data itself, you never loaded it.
  2. You can’t be storing to an address influenced by the secret data, that too would require loading it.

So once loads are hardened, stores shouldn’t matter.

  • IIUC, you state that constant-offset stack locations and global variables don’t need protection. For option 1 (zero-ing out the address bit that may represent secret data) – I can understand the rationale for why constant offset stack locations and global variables don’t need protection. But I’m wondering what the detailed rationale is for not needing protection on option 2 (zero-ing out the value loaded): what guarantees that no secret info can be located on the stack or in a global variable? Or did I misunderstand the proposal?

It isn’t that we don’t need protection. It is that e don’t provide protection and insist programs keep their secret data elsewhere. This will certainly require changes to software to achieve. But without this, the performance becomes much worse because of reloads of spilled registers and such being impacted. Plus, it would preclude the address based approach.

  • For x86 specifically, you explain how the low 2gb and high 2gb of address space should be protected by the OS. I wonder if this ±2gb range could be reduced sharply by letting the compiler not generate 32 bit constant offsets in address calculations, but at most a much smaller constant offset? I assume limiting that may have only a very small effect on code quality – and might potentially ease the requirements on the OS?

Yes. Specifically, this seems like the only viable path for 32-bit architectures. I freely admit my only focus was on a 64-bit architecture and so I didn’t really spend any time on this.

For a 64-bit architecture, 2gb is as easy to protect as a smaller region, so it seems worth keeping the performance.

Hi Chandler,

I’ve just uploaded a sequence of patches that implement a similar technique for
AArch64.
A small difference of approach is that I went for introducing an intrinsic that
can make any integer or pointer value “speculation-safe”, i.e. the intrinsic
returns the value of its only parameter when correctly speculating, and returns
0 when miss-speculating.
The intrinsic is close to what Philip Reames suggested on
https://reviews.llvm.org/D41761.

Then a later patch (D49072) adds automatic mitigation by inserting the intrinsic
in necessary locations.

I believe this approach has the advantage that:
a) it makes it possible to only insert a mitigation in specific locations if
the programmer is capable of inserting intrinsics manually.
b) it becomes easier to explore different options for implementing automatic
protection - it’s just a matter of writing different ways on how the
intrinsic is injected into the program. See D49072 for how this is relatively
easy.

I’ve split the patches according to the following functionality:

I’ll be on a long holiday soon, so there may be delays to me reacting on review feedback.

Thanks,

Kristof

FYI to all: I’ve updated the design document to include the newly disclosed variants 1.1 and 1.2 (collectively called Bounds Check Bypass Store or BCBS).

There is no change to the proposed implementation which can already robustly mitigate these variants.

I’ve also updated my patch as we have very significant interest in getting at least an early “beta” version of this into the tree and available for experiments right away. Would really appreciate folks making review comments ASAP and bearing with us and tolerating some amount of post-commit iteration here.

Notably:

Hi Chandler,

I’ve just uploaded a sequence of patches that implement a similar technique for
AArch64.

This is awesome. =D I can’t wait to start wiring this together.

A small difference of approach is that I went for introducing an intrinsic that
can make any integer or pointer value “speculation-safe”, i.e. the intrinsic
returns the value of its only parameter when correctly speculating, and returns
0 when miss-speculating.
The intrinsic is close to what Philip Reames suggested on
https://reviews.llvm.org/D41761.

Cool, we’ll definitely need some intrinsic in the IR to help model source annotations. I still need to think a bit about the interface and model for this…

Then a later patch (D49072) adds automatic mitigation by inserting the intrinsic
in necessary locations.

I was never able to get automatic mitigation with an intrinsic to avoid really signiifcant performance problems in the x86 backend. I’ll look through your approach to see if you figured out a technique that works better than the ones I tried here…

I believe this approach has the advantage that:

a) it makes it possible to only insert a mitigation in specific locations if
the programmer is capable of inserting intrinsics manually.

This is definitely an area of great interest long-term.

b) it becomes easier to explore different options for implementing automatic
protection - it’s just a matter of writing different ways on how the
intrinsic is injected into the program. See D49072 for how this is relatively
easy.

As above, I actually tried this and it backfired in terms of code quality. I’ll definitely look at this and either try to explain the problem I hit or if you’ve dodged nicely, we can rework things to move in this direction.

I’ve split the patches according to the following functionality:

I’ll be on a long holiday soon, so there may be delays to me reacting on review feedback.

Sure. Given the sudden but very strong interest we have from some users, I’m going to try and make progress landing at least the initial version of the x86 stuff. But I very much want to iterate on it and get it and the AArch64 stuff you’ve got here to line up and work together. I really like the overall direction.

Annotating specific loads that need to be protected seems like a trap to me. See . (And Bounds Check Bypass Store variants open up other possibilities, like overwriting a spill slot.) Maybe we can come up with some workable approach to “whitelist” certain pointers: a pointer could be marked “speculatively-dereferenceable(N)” if it points to N bytes of non-secret data. (We could apply this as load metadata, like !dereferenceable, or it could be explicitly applied using an intrinsic.) -Eli

FYI to all: I’ve updated the design document to include the newly disclosed variants 1.1 and 1.2 (collectively called Bounds Check Bypass Store or BCBS).

There is no change to the proposed implementation which can already robustly mitigate these variants.

I’ve also updated my patch as we have very significant interest in getting at least an early “beta” version of this into the tree and available for experiments right away. Would really appreciate folks making review comments ASAP and bearing with us and tolerating some amount of post-commit iteration here.

Notably:

Hi Chandler,

I’ve just uploaded a sequence of patches that implement a similar technique for
AArch64.

This is awesome. =D I can’t wait to start wiring this together.

A small difference of approach is that I went for introducing an intrinsic that
can make any integer or pointer value “speculation-safe”, i.e. the intrinsic
returns the value of its only parameter when correctly speculating, and returns
0 when miss-speculating.
The intrinsic is close to what Philip Reames suggested on
https://reviews.llvm.org/D41761.

Cool, we’ll definitely need some intrinsic in the IR to help model source annotations. I still need to think a bit about the interface and model for this…

Then a later patch (D49072) adds automatic mitigation by inserting the intrinsic
in necessary locations.

I was never able to get automatic mitigation with an intrinsic to avoid really signiifcant performance problems in the x86 backend. I’ll look through your approach to see if you figured out a technique that works better than the ones I tried here….

My guess is that the key is that the automatic intrinsic insertion happens only really really late in my implementation - i.e. in the same pass that also inserts the speculation tracking.
One could say that there really isn’t that much difference between inserting an intrinsic this way and just inserting instructions directly, since the intrinsic that gets inserted gets lowered almost immediately by the same pass later on.
It still has the advantage that the automatic protection code remains simple and the actual lowering from intrinsic to instruction sequences is shared between the automatic mode and the user-inserted intrinsic mode.

I believe this approach has the advantage that:

a) it makes it possible to only insert a mitigation in specific locations if
the programmer is capable of inserting intrinsics manually.

This is definitely an area of great interest long-term.

b) it becomes easier to explore different options for implementing automatic
protection - it’s just a matter of writing different ways on how the
intrinsic is injected into the program. See D49072 for how this is relatively
easy.

As above, I actually tried this and it backfired in terms of code quality. I’ll definitely look at this and either try to explain the problem I hit or if you’ve dodged nicely, we can rework things to move in this direction.

I’ve split the patches according to the following functionality:

I’ll be on a long holiday soon, so there may be delays to me reacting on review feedback.

Sure. Given the sudden but very strong interest we have from some users, I’m going to try and make progress landing at least the initial version of the x86 stuff. But I very much want to iterate on it and get it and the AArch64 stuff you’ve got here to line up and work together. I really like the overall direction.

It sounds great that there are beta testers eager to experiment with this approach, so I agree that pressing on with committing something that they can experiment with is the right approach here. Their feedback can only make the end solution better.

Thanks!

Kristof

I understand where the whitelist idea comes from, but if you are executing in a protected domain you need to assume all data needs to be hidden. The only exception is something explicitly passed back across the domain boundary, per specification of some API. This is just a reality of system security.

I say this as someone who spent half of a long career in software security: People really excel at building insecure systems. Giving them more tools to make mistakes with is totally the wrong approach. I went back to compilers because dealing with those folks was giving me ulcers.

I hope this doesn’t come across as harsh or anything, I have very strong feelings on this and I’m not great at expressing it.

–paulr

Hi,

I've seen that in the new llvm version (7.0) there are some mitigations for Spectre v1, so the clang compiler might have them implemented, but, by the way, any of the command line flag works. I've downloaded the llvm source code and have seen that there are 3 dependencies broken on lib/Target/X86/X86SpeculativeLoadHardening.cpp -> lib/Target/X86/MCTargetDesc/ should have X86GenRegisterInfo.inc, X86GenInstrInfo.inc and X86GenSubtargetInfo.inc files, but it doesn't. It's this causing the problem?

There's some way to apply the mitigation or it haven't been done yet?

Cheers,

Andrés