Hoisting in the presence of volatile loads.

Hi Krzysztof,

Could I get some background info on reasoning about hoisting in the presence of volatile loads?
I was looking at this testcase: test/Transforms/LICM/volatile-alias.ll

Context: MemorySSA treats volatile loads as defs. I’m looking to better understand expected behavior in the presence of volatile accesses.
More context: https://reviews.llvm.org/D40375.

Thanks in advance,
Alina

MemorySSA disambiguates volatile loads even though it treats them as defs.
see test/Analysis/MemorySSA/volatile-clobber.ll

Last discussion was https://reviews.llvm.org/D16875

Hi Alina,
The testcase was inspired by a situation that a customer reported to us a while back.
There was a function taking two pointers, one was used to load something from device-mapped memory (volatile), while the other was accessing regular memory. These two pointers were aliased (in LLVM), but not because they were proven to alias, but because they weren't proved to be different.
However, according to the C rules, accessing a volatile object through a non-volatile pointer results in an undefined behavior, so we could have assumed that the program was well-formed and thus that the pointers didn't alias after all.
In the end, what was important was that the loop was running as fast as possible, and any invariant code should have been moved out of it. The non-volatile load was not (because of how AliasSetTracker was treating volatile loads), and the customer reported it to us as a missed performance opportunity.

-Krzysztof

Daniel,
Thanks a lot for the pointer, that’s very helpful! I’ll use that as a guide to update how we handle volatile accesses.

Mind if I ask for feedback when I update the patch?

Krzysztof,

Thanks for the answer, that was very informative! I appreciate it!

Best,
Alina

Hi,

The testcase was inspired by a situation that a customer reported to us a
while back.
There was a function taking two pointers, one was used to load something
from device-mapped memory (volatile), while the other was accessing regular
memory. These two pointers were aliased (in LLVM), but not because they
were proven to alias, but because they weren't proved to be different.
However, according to the C rules, accessing a volatile object through a
non-volatile pointer results in an undefined behavior, so we could have
assumed that the program was well-formed and thus that the pointers didn't
alias after all.

Fwiw, I was under the impression that regular loads could *not* be
reordered with volatile loads since we could have e.g.:

  int *normal = &global_variable;
  volatile int* ptr = 0;
  int k = *ptr; // segfaults, and the signal handler writes to *normal
  int value = *normal;

and that we'd have to treat volatile loads and stores essentially as
calls to unknown functions.

-- Sanjoy

For this to work, "normal" should be volatile as well.

-Krzysztof

+Philip to get his input too.
I’ve talked with George offline, and here’s a summary:

In D16875, the decision made was: “The LLVM spec is ambiguous about whether we can hoist a non-volatile load above a volatile load when the loads alias. It’s probably best not to exploit this ambiguity at the moment by unconditionally allowing the motion of nonvolatile loads above volatile loads (and vice versa)”

So the testcase: test/Analysis/MemorySSA/volatile-clobber.ll, is checking that a volatile load is the defining access of a load with which it may alias.

Snippet:

; Ensuring that we don’t automatically hoist nonvolatile loads around volatile
; loads
; CHECK-LABEL define void @volatile_only
define void @volatile_only(i32* %arg1, i32* %arg2) {

[…]

; MayAlias
; CHECK: 2 = MemoryDef(1)
; CHECK-NEXT: load volatile i32, i32* %arg1
load volatile i32, i32* %arg1
; CHECK: MemoryUse(2)
; CHECK-NEXT: load i32, i32* %arg2
load i32, i32* %arg2

ret void
}

The testcase: test/Transforms/LICM/volatile-alias.ll checks the opposite, that we do hoist the non-volatile load. This is currently ensured by the AliasSetTracker.

The conclusion I’m drawing is that right now AliasSetTracker and MemorySSA have different behaviors and replacing one with the either will naturally lead to different outcomes.
So, how can I make progress here?

I think it’s reasonable in D40375 to have pointerInvalidatedByLoopWithMSSA only check if the defining access is within the current loop or liveOnEntry, and rely on MemorySSA to either consider a volatile load a clobbering access or not. So, right now the LICM/volatile-alias.ll testcase will behave differently with MemorySSA enabled.

A separate decision, is whether to update getLoadReorderability in MemorySSA to remove the restriction that loads should not alias. That would give the same behavior as the AliasSetTracker.George’s recollection is that the reason MemorySSA didn’t reorder aggressively is because it was untested at the time. Now that MemorySSA is used more widely, it may be a good time to revisit the initial decision?

Thoughts?

Thanks,
Alina

I think that this is the right way to approach this: we should change MemorySSA to be less conservative in this regard. LLVM’s language reference is pretty explicit about reordering volatile and non-volatile operations: I see no reason to prevent this kind of reordering, even if the volatile and non-volatile accesses might alias. -Hal

Thank you for the lang ref pointer, Hal!
I sent out the change in D41525.

Best,
Alina

Just to chime in here since I was explicitly CCd, I agree with this conclusion and believe this thread reached the proper result per the appropriate specs.

Thank you for the reply and confirming the approach!

Best,
Alina

Hi all,

I think that this is the right way to approach this: we should change
MemorySSA to be less conservative in this regard. LLVM's language reference is
pretty explicit about reordering volatile and non-volatile operations:

The optimizers must not change the number of volatile operations or change
their order of execution relative to other volatile operations. The
optimizers /may/ change the order of volatile operations relative to
non-volatile operations. This is not Java’s “volatile” and has no
cross-thread synchronization behavior.

I see no reason to prevent this kind of reordering, even if the volatile and
non-volatile accesses might alias.

Just to chime in here since I was explicitly CCd, I agree with this conclusion
and believe this thread reached the proper result per the appropriate specs.

Please forgive my possible naive question, but I do not understand how
"accessing a volatile object through a non-volatile pointer is UB" is enough to
justify this optimization. Don't you also need "accessing a non-volatile object
through a volatile pointer is UB"?

In other words, if I understand correctly, this proposed reordering could change
the behavior of the following program: <https://godbolt.org/g/zH1Tey&gt;

static void foo(int *x, volatile int *y) {
    // These may be reordered now?
    *x = 4;
    *y = 5;
}

int bar() {
    int x = 0;
    foo(&x, (volatile int*)&x);
    return x; // MUST return 5
}

`x` and `y` alias in `foo` and both point to a non-volatile object; reordering
them changes what `bar` returns. Notice that casts to volatile pointers are
pretty common e.g. in the Linux kernel:

  #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

Am I missing something?

Kind regards,
Ralf

Hi all,

I think that this is the right way to approach this: we should change
MemorySSA to be less conservative in this regard. LLVM's language reference is
pretty explicit about reordering volatile and non-volatile operations:

The optimizers must not change the number of volatile operations or change
their order of execution relative to other volatile operations. The
optimizers /may/ change the order of volatile operations relative to
non-volatile operations. This is not Java’s “volatile” and has no
cross-thread synchronization behavior.

I see no reason to prevent this kind of reordering, even if the volatile and
non-volatile accesses might alias.

Just to chime in here since I was explicitly CCd, I agree with this conclusion
and believe this thread reached the proper result per the appropriate specs.

Please forgive my possible naive question, but I do not understand how
"accessing a volatile object through a non-volatile pointer is UB" is enough to
justify this optimization. Don't you also need "accessing a non-volatile object
through a volatile pointer is UB"?

In other words, if I understand correctly, this proposed reordering could change
the behavior of the following program: <https://godbolt.org/g/zH1Tey&gt;

static void foo(int *x, volatile int *y) {
     // These may be reordered now?

No. Unless we can somehow prove that x and y don't alias, then the ordering must be preserved (because, if they do alias, then the final result must be 5). volatile should have nothing to do with that.

  -Hal

Right. This conversation was primarily about loads, not stores.

Hi,

ah, I must have missed that. Sorry for the noise!

; Ralf

Hello.
This might be a trivial question, but is it correct if the signal handler aborts the program?
For example:

int normal = ADDRESS1;
volatile int
ptr = ADDRESS2;
int k = *ptr; // segfaults, and the signal handler calls abort()
int value = *normal; // this may be UB(undefined behavior).

Let’s assume that normal is dereferenceable if ptr1 does not raise SEGSEGV, but accessing normal raises UB if ptr1 raises SEGSEGV.
Before reordering is done, there’s no UB because the program will always abort if UB could happen, but UB may happen after they are reordered.

Best Regards,
Juneyoung Lee

Yes, because the actions of SEGSEGV signal handlers don’t fall within the abstract machine we’re modeling. If they did, we’d need to treat all volatile memory accesses as arbitrary function calls (because that’s exactly what they’d be). This would inhibit our ability to do almost any kind of pointer-aliasing reasoning about the volatile accesses (i.e., they’d alias with anything that might be visible to an external function), and moreover, they’d all be read/write (because that external function might do anything). Another way to look at this is: Once the SEGSEGV is generated, the program has already triggered UB (regardless of whether the access is volatile and regardless of what the handler might do). This UB is just as undefined as any other. When we’ve discussed this in the past, I believe the recommendation was that, if we want to support a mode in which volatile accesses have these kinds of semantics (and there are certainly environments in which this might make sense), then we should have Clang lower them into some kind of “volatile-access intrinsics” because it will be easy to make these look like arbitrary read/write function calls in the optimizer. -Hal