Volatile in WebAssembly

Hello,

Like most ISAs, the WebAssembly ISA doesn’t have volatile flags, just regular loads/stores, and soon regular atomics. WebAssembly is a virtual ISA though, and WebAssembly engines may optimize regular loads/stores in ways that are permitted by the C/C++ standards, but which aren’t found in typical hardware-implemented ISAs. This could potentially be observable, in a racy way, once WebAssembly supports threads.

For example, a hypothetical WebAssembly engine could hoist a seeming loop-invariant non-atomic load out of a loop, something that traditional hardware CPUs typically wouldn’t do. In code like this:

http://jakob.engbloms.se/wp-content/uploads/2008/01/dekker.c

The code is expecting the volatile loads to eventually observe volatile stores from another thread. If a WebAssembly engine hoists the loads out of the loops, they may never see the stores. The C and C++ standards consider this a data race, so it’s undefined behavior. So we’re not required to do anything more here. But, such code works in practice on other platforms.

Though the specifics are different, the same problem arises on other platforms, such as ARM, which effectively has weaker volatile semantics than x86, in ways that break legacy code in the real world. Clang and LLVM have never attempted to support such code, except for clang’s MSVC compatibility mode. And while MSVC has a flag to enable compatibility with such code, that’s regarded as an error, and MSVC’s flag is disabled by default on ARM. Nevertheless, some people are asking us if we can support such code on WebAssembly.

So we’re currently evaluating our options, and are interested in feedback.

  • Clang has an existing option, -fms-volatile, which addresses the broader problem of code using volatile and expecting atomic semantics, and it turns out to be sufficient to solve the problem we have here too. Currently it’s only available in MSVC mode though, so should we make it available for WebAssembly users too? This flag gives volatile accesses consistent atomic semantics through the entire compilation pipeline.

  • Or, should we make WebAssembly CodeGen translate volatile accesses into atomic accesses? This wouldn’t provide atomic semantics at the LLVM IR level, so it wouldn’t prevent code from being broken by LLVM’s optimizer in general, but it would allow code that survives the optimizer to work. And, it would present the code to the optimizer in a form closer to that which it may have been tested with, assuming the testing on legacy platforms was done with clang, which may increase the chances of it surviving through LLVM’s optimizer with the behavior and performance it was tested with (although, if the testing was done with MSVC on x86, this would make the code less likely to survive with the behavior and performance it was tested with). And if there exist any non-C/C++ frontends that similarly use volatile and mistakenly expect atomic semantics, it would allow them to work on WebAssembly in cases where their output similarly survives the optimizer.

  • Or, should we do nothing? This would mean that volatile loads and stores are translated to wasm as plain loads and stores, the same as in all other ISAs. The C/C++ standards don’t normatively require us to do anything else here (though there are differing views on the non-normative intent), other backends don’t do anything here despite having the same problems, and we don’t currently have any reports of this problem occurring in practice on WebAssembly.

If we do something:

  • Should we enable it by default for WebAssembly?

  • Should we implement it in a way that can be used by other platforms too? The broader problem of people using volatile and expecting atomic semantics isn’t limited to WebAssembly.

  • WebAssembly backend developers

Hello,

Like most ISAs, the WebAssembly ISA doesn’t have volatile flags, just regular loads/stores, and soon regular atomics. WebAssembly is a virtual ISA though, and WebAssembly engines may optimize regular loads/stores in ways that are permitted by the C/C++ standards, but which aren’t found in typical hardware-implemented ISAs. This could potentially be observable, in a racy way, once WebAssembly supports threads.

For example, a hypothetical WebAssembly engine could hoist a seeming loop-invariant non-atomic load out of a loop, something that traditional hardware CPUs typically wouldn’t do. In code like this:

http://jakob.engbloms.se/wp-content/uploads/2008/01/dekker.c

The code is expecting the volatile loads to eventually observe volatile stores from another thread. If a WebAssembly engine hoists the loads out of the loops, they may never see the stores. The C and C++ standards consider this a data race, so it’s undefined behavior. So we’re not required to do anything more here. But, such code works in practice on other platforms.

Though the specifics are different, the same problem arises on other platforms, such as ARM, which effectively has weaker volatile semantics than x86, in ways that break legacy code in the real world. Clang and LLVM have never attempted to support such code, except for clang’s MSVC compatibility mode. And while MSVC has a flag to enable compatibility with such code, that’s regarded as an error, and MSVC’s flag is disabled by default on ARM. Nevertheless, some people are asking us if we can support such code on WebAssembly.

So we’re currently evaluating our options, and are interested in feedback.

  • Clang has an existing option, -fms-volatile, which addresses the broader problem of code using volatile and expecting atomic semantics, and it turns out to be sufficient to solve the problem we have here too. Currently it’s only available in MSVC mode though, so should we make it available for WebAssembly users too? This flag gives volatile accesses consistent atomic semantics through the entire compilation pipeline.

  • Or, should we make WebAssembly CodeGen translate volatile accesses into atomic accesses? This wouldn’t provide atomic semantics at the LLVM IR level, so it wouldn’t prevent code from being broken by LLVM’s optimizer in general, but it would allow code that survives the optimizer to work. And, it would present the code to the optimizer in a form closer to that which it may have been tested with, assuming the testing on legacy platforms was done with clang, which may increase the chances of it surviving through LLVM’s optimizer with the behavior and performance it was tested with (although, if the testing was done with MSVC on x86, this would make the code less likely to survive with the behavior and performance it was tested with). And if there exist any non-C/C++ frontends that similarly use volatile and mistakenly expect atomic semantics, it would allow them to work on WebAssembly in cases where their output similarly survives the optimizer.

I much prefer this approach because it’s closer to the treatment code already gets when it uses clang to target x86 or ARM. Any breakage of functionality or performance hit will be similar in WebAssembly as in other targets. It’s not WebAssembly’s place to play standards pedant with code, even if it’s misguided about volatile.

At the same time, it preserves the C and C++ “no optimization” semantics which, while non-normative, are clearly the intent of both standards and their respective design books. I also expect the standard’s specification of volatile to change in the future. This approach keeps WebAssembly as similar to other targets and means any standard change will Just Work for WebAssembly.

I think the -fms-volatile option should be available as opt-in for developers migrating code from MSVC. That’s not unique to WebAssembly.