Handling of __c11_atomic_is_lock_free({1, 2, 4, 8}) in compiler-rt atomic.c

Hi,

I think this more of a front-end issue than a compile-rt issue, but I’m also copying the llvm-dev list

In compiler-rt the file lib/builtins/atomic.c seems to rely on determining at compile time if an atomic operation of size 1, 2, 4, or 8 is always lock free.

For example, in the implementation of __atomic_load_8() we have something like this after macro expansion:

void __atomic_load_8(…)

{

If (__c11_atomic_is_lock_free(8))

return __c11_atomic_load_8(…)

/* otherwise lock-based implementation */

}

Let’s say a target supports lock-free atomics for 8-byte objects. Then the front-end will lower __c11_atomic_is_lock_free(8) to “true” in clang/lib/AST/ExprConstant.cpp VisitBuiltinCallExpr() and further __c11_atomic_load_8() will expand into an inlined sequence of instructions. Therefore, we get what we want which is an optimized implementation of __atomic_load_8().

But, let’s say the target is an ARM variant that does not support atomics for 8-byte objects. The front-end will lower __c11_atomic_is_lock_free(8) to __atomic_is_lock_free(8) (note the “is” vs “always”) which is not lowered to “false” but instead remains as a function call. __c11_atomic_load_8() is then lowered to a function call __atomic_load_8() since we are not inlining atomic operations on 8-byte objects. The result is code like this:

void __atomic_load_8(…)

{

If (__atomic_is_lock_free(8)). /* should always return false, And if it did return true we’d have an infinite recursive call. */

return __atomic_load_8(…)

/* otherwise lock-based implementation */

}

That is fine if an __atomic_is_lock_free() is provided and it always returns false for 8 (BTW __atomic_is_lock_free is not implemented in compiler-rt), but this wastes code size and is inefficient.

Should atomic.c be calling __atomic_always_lock_free instead of __c11_atomic_is_lock_free to check for lock-free cases? The “always” flavor is determined at compile time.

OR should the front-end always lower __c11_atomic_is_lock_free() to true or false for atomics on objects of size 1, 2, 4, or 8 bytes?

AND /OR should some sort of new TargetInfo routine be added to assert that certain sized operations are NEVER lock free?

There is a comment in the code in VisitBuiltinCallExpr() that seems to be biasing the implementation towards some X86-64 processor assumptions:

// For __atomic_is_lock_free(sizeof(_Atomic(T))), if the size is a power

// of two less than the maximum inline atomic width, we know it is

// lock-free. If the size isn’t a power of two, or greater than the

// maximum alignment where we promote atomics, we know it is not lock-free

// (at least not in the sense of atomic_is_lock_free). Otherwise,

// the answer can only be determined at runtime; for example, 16-byte

// atomics have lock-free implementations on some, but not all,

// x86-64 processors.

BTW, I think that should be: “If the size isn’t a power of two or IS greater than the maximum…”.

Comments?

Beast Regards,

Eric Stotzer

Hi,

I think this more of a front-end issue than a compile-rt issue, but I’m also copying the llvm-dev list

In compiler-rt the file lib/builtins/atomic.c seems to rely on determining at compile time if an atomic operation of size 1, 2, 4, or 8 is always lock free.

For example, in the implementation of __atomic_load_8() we have something like this after macro expansion:

void __atomic_load_8(…)
{
  If (__c11_atomic_is_lock_free(8))
    return __c11_atomic_load_8(..)
/* otherwise lock-based implementation */
}

Let’s say a target supports lock-free atomics for 8-byte objects. Then the front-end will lower __c11_atomic_is_lock_free(8) to “true” in clang/lib/AST/ExprConstant.cpp VisitBuiltinCallExpr() and further __c11_atomic_load_8() will expand into an inlined sequence of instructions. Therefore, we get what we want which is an optimized implementation of __atomic_load_8().

But, let’s say the target is an ARM variant that does not support atomics for 8-byte objects. The front-end will lower __c11_atomic_is_lock_free(8) to __atomic_is_lock_free(8) (note the “is” vs “always”) which is not lowered to “false” but instead remains as a function call. __c11_atomic_load_8() is then lowered to a function call __atomic_load_8() since we are not inlining atomic operations on 8-byte objects. The result is code like this:

void __atomic_load_8(…)
{
  If (__atomic_is_lock_free(8)). /* should always return false, And if it did return true we’d have an infinite recursive call. */
    return __atomic_load_8(..)
/* otherwise lock-based implementation */
}

That is fine if an __atomic_is_lock_free() is provided and it always returns false for 8 (BTW __atomic_is_lock_free is not implemented in compiler-rt), but this wastes code size and is inefficient.

Should atomic.c be calling __atomic_always_lock_free instead of __c11_atomic_is_lock_free to check for lock-free cases? The “always” flavor is determined at compile time.

The current code dates from 2012 in llvm.org/r153735 <http://llvm.org/r153735>, which way predates is_always_lock_free.

OR should the front-end always lower __c11_atomic_is_lock_free() to true or false for atomics on objects of size 1, 2, 4, or 8 bytes?

If is can determine this, sure. However, you want to make sure you’re not breaking guarantees that the code might be trying to make right now. Do we guarantee anything about ABI from one clang version to the next? Do we match ABI with GCC?

AND /OR should some sort of new TargetInfo routine be added to assert that certain sized operations are NEVER lock free?

That might be helpful because it’ll avoid the is_lock_free check.

There is a comment in the code in VisitBuiltinCallExpr() that seems to be biasing the implementation towards some X86-64 processor assumptions:

    // For __atomic_is_lock_free(sizeof(_Atomic(T))), if the size is a power
    // of two less than the maximum inline atomic width, we know it is
    // lock-free. If the size isn't a power of two, or greater than the
    // maximum alignment where we promote atomics, we know it is not lock-free
    // (at least not in the sense of atomic_is_lock_free). Otherwise,
    // the answer can only be determined at runtime; for example, 16-byte
    // atomics have lock-free implementations on some, but not all,
    // x86-64 processors.

BTW, I think that should be: “If the size isn’t a power of two or IS greater than the maximum…”.

Comments?

It might be helpful to see what each in-tree platform does to understand what the minimum landscape that we support is.