Possibility of implementing a low-level naive lock purely with LLVM atomics?

In our frontend we are attempting to build a lock mechanism without using system apis like pthreads and whatnot for internal reasons.
In order to achieve this we are now creating a int32 type GV, and then use atomic load/store and comparisons. The generated IR looks like the following:

@Flag = private global i32 0, align 4
%0 = load atomic i32, i32* @Flag acquire, align 4
%1 = icmp eq i32 %0, 1
store atomic i32 1, i32* @Flag release, align 4

However when inspecting the generated assembly on x86-64, the following assembly was generated:

mov qword [rbp+var_50], rcx

mov qword [rbp+var_48], rdx

mov rbx, rsi

mov r15, rdi

mov eax, dword [l_Flag] ; l_Flag

cmp eax, 0x1

Which to my best knowledge is not atomic.
I’d like to know how do I fix my frontend to make sure the locking mechanism works



Thanks Andres.
What we basically is trying to achieve is to make sure that some BBs in the function are executed exactly once. Currently we use a GV to mark the execution status and at function start we load the value and do comparison, if executed we just directly branch bypass the BBs. To my best knowledge atomicrmw does the modify in place so we cant’t update the value only after the BB’s execution has finished

That probably has to involve more than just 2 states to work properly.
In C++ something like this:

enum {
  Uninitialized = 0,
  BeingInitialized = 1,
  Initialized = 2

std::atomic<int> Flag;

void foo() {
  if (Flag.load() != Initialized) {
    int CurrentStatus = Uninitialized;
    if (Flag.compare_exchange_strong(CurrentStatus, BeingInitialized)) {
      // We got the lock, do initialization here.
    } else {
      // Someone else beat us to it. Wait for the other thread to
finish initializing.
      while(Flag.load() != Initialized);