FYI, Intel folks might be looking to add the __iso_volatile_Xxx family for MSVC STL <atomic> soon

I think Clang has those already implemented for ARM (since our compiles there) but we are looking to use them instead of cmpxchg8b to implement load of atomic<8 bytes> on x86 soon.

Billy3

These? https://docs.microsoft.com/en-us/cpp/intrinsics/arm-intrinsics?view=vs-2017#IsoVolatileLoadStore

__int16 __iso_volatile_load16(const volatile __int16 * Location)

__int32 __iso_volatile_load32(const volatile __int32 * Location)

__int64 __iso_volatile_load64(const volatile __int64 * Location)

__int8 __iso_volatile_load8(const volatile __int8 * Location)

void __iso_volatile_store16(volatile __int16 * Location, __int16 Value)

void __iso_volatile_store32(volatile __int32 * Location, __int32 Value)

void __iso_volatile_store64(volatile __int64 * Location, __int64 Value)

void __iso_volatile_store8(volatile __int8 * Location, __int8 Value)

Seems fine, given how MSVC’s volatiles on x86 were traditionally seq_cst when they could. I think the above match the LLVM IR semantics precisely.

I think you should add type-generic versions as well. Then you’ve got a nice builtin for volatile_load / volatile_store, and who doesn’t love C++ library features that require compiler support? Right Billy? :grin:

MSVC’s volatiles on x86 were traditionally seq_cst when they could.

Normal loads are already seq_cst on x86 with no extra work. We still intend to use _InterlockedCompareExchange64 (on x86) or _InterlockedExchange64 (on amd64) to implement store. To be clear, this is in response to a specific customer bug where they want to put an atomic<8 bytes> into read only memory, and want load() to not break their program with STATUS_ACCESS_VIOLATION. ( https://developercommunity.visualstudio.com/content/problem/274938/vs2017-1567158p2-stdatomicload-causes-write-access.html )

I think you should add type-generic versions as well.

I don’t think C1XX or EDG have the ability to do that form of intrinsics right now, so to implement volatile_load/volatile_store we’re going to need library tech that calls the __iso_volatile family.

who doesn’t love C++ library features that require compiler support? Right Billy?

I don’t hate things that require compiler support. I hate things that require compiler support of the form nobody has demonstrated to be reasonably implementable. Things that can be implemented in terms of traditional builtins or intrinsics (like volatile_load / volatile_store) are fine.

Billy3

Hi ,

I have configure LLVM and LIT on my PC (lit-0.8.0.dev0 and clang -v clang version 3.8.0 )

I wan to know how to run or add specific unit test or test case from LIT ?
Can any one have inputs or idea ?

Regards,
Srinivas

Hi ,

I am following the below url .

https://llvm.org/docs/TestingGuide.html

make check-llvm-unit  - not working
make check-llvm  - not working

make check-llvm
make: *** No rule to make target 'check-llvm'.  Stop.

llvm-project/llvm/test$ make check-all
make: *** No rule to make target 'check-all'.  Stop.

Can you please let me know the problem.Is there any configuration or setup is missing.

Regards,
Srinivas

-----Srinivas Kakarla1/HYD/TCS wrote: -----

Hi Srinivas,

These ‘make’ commands should be run from the directory where you have configured and built LLVM. This is not very well stated on the web page you cited, although in one place it does say:

Use make check-all to run the unit and regression tests after building LLVM.

Configuring and building LLVM is described at: https://llvm.org/docs/GettingStarted.html

The description on the TestingGuide.html page assumes you are using ‘make’ to build LLVM; if you are using ninja or some other tool, you should use the appropriate command for that tool, for example ‘ninja check-llvm-unit’.

If you have done those steps, and running the tests from the build directory still does not work, let us know what steps you have taken and what the error messages are.

–paulr

Update: VS2019 Update 2 is likely to do this; who do I need to prod such that I get to use these unconditionally in our ? :slightly_smiling_face:

Billy3

Update: VS2019 Update 2 is likely to do this; who do I need to prod such that I get to use these unconditionally in our ?:slightly_smiling_face:

You want the same feature in clang-cl, so I suggest talking to Reid, sending a patch to clang, and having him review it :wink:

I think it’s just a matter of moving code from BuiltinsAArch64.def (and ARM) to Builtins.def, and maybe some followon changes.

The naming for these intrinsics is confusing. It reads as, “do an ISO C++ standard volatile load”, but it really means “do an atomic (not volatile) load”, but I suppose the time to address that is long gone.

I went ahead and started a patch to make them available.

It reads as, “do an ISO C++ standard volatile load”

That is the intended behavior.

it really means “do an atomic (not volatile) load”

Only if that happens to be the case on the indicated hardware. For example, MSVC++ implements seq_cst loads on ARM with an __iso_volatile_load## followed by __dmb. It only gets atomic behavior on x86 because a seq_cst load on that hardware is an ordinary load.

__iso_volatile_store is also affected, though the specific customer bug report that opened this can of worms doesn’t touch that.

Thanks for your help!

Billy3

Update: VS2019 Update 2 is likely to do this; who do I need to prod such that I get to use these unconditionally in our ?:slightly_smiling_face:

You want the same feature in clang-cl, so I suggest talking to Reid, sending a patch to clang, and having him review it :wink:

I think it’s just a matter of moving code from BuiltinsAArch64.def (and ARM) to Builtins.def, and maybe some followon changes.

The naming for these intrinsics is confusing. It reads as, “do an ISO C++ standard volatile load”, but it really means “do an atomic (not volatile) load”, but I suppose the time to address that is long gone.

Huh, maybe I’m the one confused, because all they do is emit a volatile load. What exactly do you want to use these for, then? I’m not sure we’ve implemented them correctly.

all they do is emit a volatile load

If I understand correctly the point of these is to make an ordinary volatile load/store happen on ARM under /volatile:ms. We asked for the intrinsic to also be provided on Intel because it reduces the amount of macro-tastic-ness we need in .

Billy3

Hmmmm now that I think about this maybe we should be using a totally different set of builtins on clang anyway.

We currently are doing this:

#define _Compiler_barrier() _ReadWriteBarrier()

#if defined(_M_ARM) || defined(_M_ARM64)
#define _Memory_barrier() __dmb(0xB) // inner shared data memory barrier
#define _Compiler_or_memory_barrier() _Memory_barrier()

#define _ISO_VOLATILE_STORE8(_Storage, _Value) __iso_volatile_store8(_Atomic_address_as(_Storage), _Value)
#define _ISO_VOLATILE_STORE16(_Storage, _Value) __iso_volatile_store16(_Atomic_address_as(_Storage), _Value)
#define _ISO_VOLATILE_STORE32(_Storage, _Value) __iso_volatile_store32(_Atomic_address_as(_Storage), _Value)
#define _ISO_VOLATILE_STORE64(_Storage, _Value) __iso_volatile_store64(_Atomic_address_as(_Storage), _Value)
#define _ISO_VOLATILE_LOAD8(_Storage) __iso_volatile_load8(_Atomic_address_as(_Storage))
#define _ISO_VOLATILE_LOAD16(_Storage) __iso_volatile_load16(_Atomic_address_as(_Storage))

#elif defined(_M_IX86) || defined(_M_X64)
// x86/x64 hardware only emits memory barriers inside _Interlocked intrinsics
#define _Compiler_or_memory_barrier() _Compiler_barrier()

#define _ISO_VOLATILE_STORE8(_Storage, _Value) (_Atomic_address_as(_Storage) = _Value)
#define _ISO_VOLATILE_STORE16(_Storage, _Value) (
_Atomic_address_as(_Storage) = _Value)
#define _ISO_VOLATILE_STORE32(_Storage, _Value) (_Atomic_address_as(_Storage) = _Value)
#define _ISO_VOLATILE_STORE64(_Storage, _Value) (
_Atomic_address_as(_Storage) = _Value)
#define _ISO_VOLATILE_LOAD8(_Storage) (_Atomic_address_as(_Storage))
#define _ISO_VOLATILE_LOAD16(_Storage) (
_Atomic_address_as(_Storage))

#else // ^^^ x86/x64 / unsupported hardware vvv
#error Unsupported hardware
#endif // hardware

// in _Atomic_storage<_Ty, 1>:

_NODISCARD _Ty load() const noexcept { // load with sequential consistency
char _As_bytes = _ISO_VOLATILE_LOAD8(_Storage);
_Compiler_or_memory_barrier();
return reinterpret_cast<_Ty&>(_As_bytes);
}

That makes sense, but I’m left wondering in what context a plain volatile load is useful for implementing . I went in assuming that, to implement atomic, you need to use intrinsics to emit atomic memory operations.

Are you sure we shouldn’t be marking these as atomic instead of volatile? Volatile is usually not suitable for anything except

There was also a corresponding change that had to be done to x86 in the backend to actually emit a 64 bit load/store as a single instruction through (v)movq and/or x87 tricks.

Billy3

Are you sure we shouldn’t be marking these as atomic instead of volatile? Volatile is usually not suitable for anything except

I am implementing :slightly_smiling_face:

Billy3

My concern is that volatile in LLVM has nothing to do with atomicity, and I think what you really want is LLVM atomic loads and stores, with real ordering markers of monotonic, seq_cst, release, acquire, etc. This is all described here:
https://llvm.org/docs/LangRef.html#volatile-memory-accesses
https://llvm.org/docs/LangRef.html#memory-model-for-concurrent-operations
In particular: “The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior.”

So, I’m concerned that there is something subtly incorrect about using _iso_volatile* to implement atomics.

After looking at the xatomic.h source, I think I have a better understanding of what is going on. I added some folks who probably have a better understanding of LLVM IR atomics, and maybe they can help guide the discussion to a simpler implementation of Visual C++ that plays well with LLVM. We had a similar discussion about over- or under-emitting dmb fences in this code review: https://reviews.llvm.org/D52807.

The Visual C++ headers currently add fences for ARM themselves. The code looks like this: https://reviews.llvm.org/P8137 In small test cases, this generates the appropriate code, a single load followed by a fence.

The optimizers may change the order of volatile operations relative to non-volatile operations. This is not Java’s “volatile” and has no cross-thread synchronization behavior.

Right, but if they want to link with MSVC++ generated code they need to at least respect _ReadWriteBarrier.

I can certainly mess with our to play more nicely with you folks if you want us calling different builtins instead. is entirely library tech for us.

Billy3

Hi ,

I am following the below url .

https://llvm.org/docs/TestingGuide.html

make check-llvm-unit - not working
make check-llvm - not working

make check-llvm
make: *** No rule to make target ‘check-llvm’. Stop.

llvm-project/llvm/test$ make check-all
make: *** No rule to make target ‘check-all’. Stop.

Can you please let me know the problem.Is there any configuration or setup is missing.

Regards,
Srinivas

-----Srinivas Kakarla1/HYD/TCS wrote: -----

Hi Srinivas,

Which parameters did you give to cmake?

Can you show the output of pwd and ls -l ?

I get the exact same error, because I choose ninja as the cmake generator.

$ ninja check-llvm-unit

$ ninja check-llvm

$ ninja check-all

all work for me.

Csaba