RFC: Atomics.h

Some of you may have noticed that I addedd include/llvm/System/Atomics.h to the repository briefly, which will be used for adding support for threading in LLVM.

I have tried to provided appropriate implementations of the atomic ops (currently memory fence and CAS) for platforms we care about, but my knowledge of these, and my ability to test them, is limited. So, please, if you run on any less common platform, test out the file, and send me patches to improve it if it doesn't work verbatim on your system.

Thanks,

--Owen

Hi,

You might want to use this:

http://www.hpl.hp.com/research/linux/atomic_ops/

Zoltan

Surprisingly enough, libatomic_ops doesn’t define just a hardware memory fence call as far as I can tell.

–Owen

Actually, I take that back. The non-obviously named AO_nop_full() what I want. :slight_smile:

–Owen

Owen Anderson wrote:

Some of you may have noticed that I addedd include/llvm/System/Atomics.h to the repository briefly, which will be used for adding support for threading in LLVM.

Just out of curiosity, is there a design document somewhere for the plan for threading?

Also, atomic ops are usually pretty low level things used for nonblocking algorithms or to build higher level locking constructs. Is that the plan here too? It seems like you'd want to avoid anything too fancy since LLVM has to run on so many different architectures with their variety of memory semantics, etc.

Luke

What would you do with a just-hardware memory fence? If the compiler's
free to move operations over the hardware fence, that seems to defeat
the purpose.

C++0X provides a compiler-only fence, and a hardware+compiler fence,
but no hardware-only fence, I believe for this reason. See
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2857.pdf&gt;,
section 29.8.

I meant as opposed to CAS+fence, though I have since found out I was incorrect.

--Owen

Jeffrey Yasskin wrote:

What would you do with a just-hardware memory fence? If the compiler's
free to move operations over the hardware fence, that seems to defeat
the purpose.

If your compiler memory fence (gcc's __sync_synchronize()) doesn't distinguish different kinds of fencing requirements, and your platform has different hardware fences with different costs (isync, lwsync, eieio, etc), then it can make sense to separate the two using something like asm volatile(::"memory") for the compiler fence. It gets pretty architecture dependent though.

Luke

Luke Dalessandro wrote:

Jeffrey Yasskin wrote:

What would you do with a just-hardware memory fence? If the compiler's
free to move operations over the hardware fence, that seems to defeat
the purpose.

If your compiler memory fence (gcc's __sync_synchronize()) doesn't distinguish different kinds of fencing requirements, and your platform has different hardware fences with different costs (isync, lwsync, eieio, etc), then it can make sense to separate the two using something like asm volatile(::"memory") for the compiler fence. It gets pretty architecture dependent though.

To reply to my reply, most sane implementations of these memory barriers would imply the compiler fence by slapping a memory clobber on the underlying asm, so you're point is well taken. I have seen cross platform code where they are separated though for whatever reason though.

Luke

OK, I’ve enhanced Atomic.h by pulling in a bunch of implementations from libatomic_ops, and others that I could figure out on my own.

Again, my plea: PLEASE TRY THIS OUT ON YOUR PLATFORM, AND SEND ME PATCHES IF IT DOESN’T WORK! Similarly, if you think the implementation could be improved for your platform, send me a patch.

I know that Sparc doesn’t work currently (no CAS implementation yet), and I’m a little unsure about the ARM version, so it’d be great if gurus for those platforms could look at them.

–Owen

Owen Anderson wrote:

Some of you may have noticed that I addedd include/llvm/System/Atomics.h
to the repository briefly, which will be used for adding support for
threading in LLVM.

Just out of curiosity, is there a design document somewhere for the plan
for threading?

Not as yet. Chris may have ideas in this direction, but I don't think they're been written down anywhere. For now, I'm just trying to enhance the thread-safety of some obviously unsafe pieces of code.

Also, atomic ops are usually pretty low level things used for
nonblocking algorithms or to build higher level locking constructs. Is
that the plan here too? It seems like you'd want to avoid anything too
fancy since LLVM has to run on so many different architectures with
their variety of memory semantics, etc.

I totally agree. However, at least one case of thread-unsafety (ManagedStatic), has proven very-difficult-to-impossible to implement correctly without using lower-level operations.

--Owen

Hi Owen, perhaps you could post a little testsuite for the atomic ops.
I would be happy to run them on various machines. I have access to a
bunch of multi-processor machines, unfortunately all x86 or x86-64. I
also have access to single-processor alpha, arm, g5, mips and sparc64
machines.

Ciao,

Duncan.

Owen Anderson wrote:

Also, atomic ops are usually pretty low level things used for
nonblocking algorithms or to build higher level locking constructs. Is
that the plan here too? It seems like you'd want to avoid anything too
fancy since LLVM has to run on so many different architectures with
their variety of memory semantics, etc.

I totally agree. However, at least one case of thread-unsafety (ManagedStatic), has proven very-difficult-to-impossible to implement correctly without using lower-level operations.

Yes, double-checked locking is a pain. There's a C++ safe implementation in The "Double-Checked Locking is Broken" Declaration in the "Making it work with explicit memory barriers" section. As far as I know, it is still considered to work.

Luke

Our problems are actually deeper than that, because we need to interact well with static constructors. This means that we can't use a mutex with a non-constant initializer, or else we can't depend on it being properly initialized before the ManagedStatic is accessed. While this would be possible with pthread mutexes, I know of no good way to do it for Windows CRITICAL_SECTION's.

--Owen

There is a static mutex implementation in Boost, that uses
PTHREAD_MUTEX_INITIALIZER when pthread is available, and uses
InterlockedCompareAndExchange on win32 (no mutexes, or criticalsections):
https://svn.boost.org/trac/boost/browser/trunk/libs/regex/src/static_mutex.cpp

This may be useful too:
https://svn.boost.org/trac/boost/browser/trunk/boost/thread/win32/once.hpp
https://svn.boost.org/trac/boost/browser/trunk/boost/thread/pthread/once.hpp

It doesn't use any inline assembly.

Best regards,
--Edwin

OK, I've enhanced Atomic.h by pulling in a bunch of implementations from libatomic_ops, and others that I could figure out on my own.

Again, my plea: PLEASE TRY THIS OUT ON YOUR PLATFORM, AND SEND ME PATCHES IF IT DOESN'T WORK! Similarly, if you think the implementation could be improved for your platform, send me a patch.

I know that Sparc doesn't work currently (no CAS implementation yet), and I'm a little unsure about the ARM version, so it'd be great if gurus for those platforms could look at them.

Owen, I would really rather that you didn't take this path. Threading support in LLVM should always be optional: it should be possible to use LLVM on systems where we don't have support for threading operations. Indeed, some systems don't support threads!

Given that, I think it makes sense to start out the atomics operations very simple: just make them work for compilers that support GCC 4.2's atomics. Since things will be changing quickly initially, this makes it easy to prototype and build things out, and this also avoids pulling in an external library with a (compatible but) different license.

In practice, I think a huge chunk of the community will be served when LLVM supports GCC 4.2 atomics + a windows implementation. I don't see a reason to make things any more complex than that. Since llvm-gcc supports atomics, someone doing development on a supported architecture can just build llvm-gcc single threaded, which provides them with a compiler that supports atomics on their platform.

Our problems are actually deeper than that, because we need to interact well with static constructors. This means that we can't use a mutex with a non-constant initializer, or else we can't depend on it being properly initialized before the ManagedStatic is accessed. While this would be possible with pthread mutexes, I know of no good way to do it for Windows CRITICAL_SECTION's.

Actually, global static constructors are evil and should be eliminated. No static constructors should do anything non-trivial, and it is essential that ManagedStatic *not have a constructor*. That is its entire design point. However, ManagedStatic should theoretically pretty simple with double checked locking. The observation is that llvm_shutdown() can only be called on one thread, but that lazily initialization of data structures can happen from multiple threads. This means that the "get" operation should look something like this (an suitably fenced version of):

   if (Ptr == 0) {
      lock();
      if (Ptr == 0)
        init();
      unlock();
   }

Also, I see no reason why the lock needs to be per-object. Just use a heavy weight global "pthreads" lock in the .cpp file.

When you get back to hacking on ManagedStatic, please define this in one method, not duplicated in ->, *, etc.

-Chris

Owen, I would really rather that you didn't take this path. Threading
support in LLVM should always be optional: it should be possible to
use LLVM on systems where we don't have support for threading
operations. Indeed, some systems don't support threads!

I'm not trying to make it required. I had provided threads-disabled versions of all the operations.

Given that, I think it makes sense to start out the atomics
operations very simple: just make them work for compilers that support
GCC 4.2's atomics. Since things will be changing quickly initially,
this makes it easy to prototype and build things out, and this also
avoids pulling in an external library with a (compatible but)
different license.

In practice, I think a huge chunk of the community will be served when
LLVM supports GCC 4.2 atomics + a windows implementation. I don't see
a reason to make things any more complex than that. Since llvm-gcc
supports atomics, someone doing development on a supported
architecture can just build llvm-gcc single threaded, which provides
them with a compiler that supports atomics on their platform.

After thinking about this some more, I think you're right. Trying to support every platform anyone cares about is rapidly becoming a nightmare. Setting GCC4.2 as a baseline requirement (and presumably providing a Windows implementation as well) is probably a good way to get this back to a sane amount of stuff to support.

--Owen

In the current trunk, System/Atomic.[h,cpp] define void
llvm::sys::MemoryFence(). This conflicts with the MemoryFence macro in
<windows.h> and (since it's a preprocessor macro, and not a scoped
function definition) causes the sys::MemoryFence definition on
Atomic.cpp:23 to explode, as it's nonsensically expanded to a cl
intrinsic (_mm_mfence). This breaks the Visual Studio build.

The trivial fix is to #undef MemoryFence immediately after including
<windows.h>, since it's clearly assumed not to exist. A deeper and
safer fix might consider avoiding a name conflict with a core system
macro on a widely used platform.

Wait, it defines MemoryFence() AND MemoryBarrier()??

Sheesh, they had to take all the reasonable names. :-/

--Owen

Yes, indeed.