Spectre V1 Mitigation - Internals?

Hi all,

I understand how the speculative information flow attack works. I’m trying get my head around the spectre v1 mitigation of LLVM.
In the design document here : https://llvm.org/docs/SpeculativeLoadHardening.html#speculative-load-hardening.


void leak(int data);
void example(int* pointer1, int* pointer2) {
  if (condition)

After the applying the mitigation the code resembles like:

void leak(int data);
void example(int* pointer1, int* pointer2) {
  uintptr_t predicate_state = all_ones_mask;
  if (condition) {
    predicate_state = !condition ? all_zeros_mask : predicate_state;
    pointer1 &= predicate_state;
  } else {
    int value2 = *pointer2 & predicate_state;

Let's assume that the branch is mispredicted and if body is taken. The value predicate_state mask is depend on the "result of the condition" but which is not yet available hence

speculative execution. My question whether the value of predicate_state is also guessed by the processor? If it is correct, then the value of predicate_state will be 
predicate_state, if the processor mis-predicts the condition as true. Is my assumption is correct? i.e predicate_state = ! 1 ? all_zeros_mask : predicate_state, where processor predicts 
condition as true. 

Or Whether the execution stalls at "predicate_state = !condition ? all_zeros_mask : predicate_state until the result of condition became available? If this is true, why we have to 

harden the pointers in the first place, because after the condition is actually computed, the processors will revert back right the execution trace of mis-prediction. 

I know that I'm missing something fundamental here, I would highly appreciate your help on this? Please let me know if you more info!



There’s a difference between control flow speculation and data flow.

By entangling the control flow & data, the cpu should stall the memory load until the condition is known, since the pointer value also isn’t known.

I assume that actual experts (not me) tested this, and / or asked intel if this was sufficient to prevent the speculative memory load, given how speculation is actually implemented.


Thanks for your email, I understand that the execution stalls until the predicated state is computed, then we mask pointers with all_zeros_mask if there is a mis-prediction. But I understand that as soon as the condition value is available, the processor can check about it’s assumptions and revert back.

That is,

If the branch prediction is correct during speculation, we mask with all_ones, the processor can follow the predicted branch to retire.

But if the processor mispredicted the branch, it will revert back as soon as condition become available if this is the case then we don’t execute speculatively the operations : pointer1 &= predicate_state - (if branch) and *pointer2 & predicted_state - (else branch) right? Or out-of-processor’s allow such access?

Plus, why we are masking with all_zeros_mask during mis-prediction. Is there any reason for choosing all_zeros_mask?


The reverting of state doesn’t occur when the condition is available. The processor has to “execute” the branch uop and see that it was mispredicted. This occurs at least one cycle after the condition is available. The conditional move on the predicate state can execute at the same time as the branch. Or it can execute before it if the branch unit is busy. The load can do the same.

Yeah, now I understand the problem here. Thanks.
But I too have another doubt in “Bounds check bypass store”

In this example in the Speculative load hardening :

unsigned char local_buffer[4];
unsigned char *untrusted_data_from_caller = ...;
unsigned long untrusted_size_from_caller = ...;
if (untrusted_size_from_caller < sizeof(local_buffer)) {
  // Speculative execution enters here with a too-large size.
  memcpy(local_buffer, untrusted_data_from_caller,
  // The stack has now been smashed, writing an attacker-controlled
  // address over the return address.
  // Control will speculate to the attacker-written address.

During speculative execution, it stores an arbitrary data in the stack memory (architectural state, I guess). 
Doesn't that change the architectural state of the processor? I learned that all stores are written to  register files first and then if the speculative 
execution is correct then it written back into memory, if not those register files are reverted back. But in the above case the return address 
is loaded from the (stack) memory, for which the contents have to be written to the memory first during the speculative execution. It contradicts 
my assumptions about the store operation in speculative execution.

Am I wrong here? 

The store goes to the store buffer first and is only committed to memory at retirement. Loads can get data forwarded from the store buffer so that the load doesn’t have to wait for the store to retire.

Hi Craig,
Thanks a lot for your help.