llvm.memory.barrier does not work

Instrinsic llvm.memory.barrier does not work as expected. Is it a bug
or it has not been implemented yet ?

(1) false arguments do not work

// pesudo code
void foo(int *x) {
  x[2] = 10;
  llvm.memory.barrier(0, 0, 0, 0, 0);
   x[2] = 20;
  return void
}

The barrier is actually noop, but it prevents "x[2] = 10" from being deleted.

(2) True arguments do not work.

// pesudo code
void foo(int * restrict x) {
  x[2] = 10;
  llvm.memory.barrier(1, 1, 1, 1, 1);
   x[2] = 20;
  return void
}

"x[2] = 10' should not be deleted because barrier is present. But it
is deleted anyway.

Here is llvm ir for the first case (commented code is for the second case).

declare void @llvm.memory.barrier(i1 , i1 , i1 , i1 , i1)

define void @foo(i32* %x) nounwind {
; define void @foo(i32* noalias %x) nounwind {
entry:
  %x.addr = alloca i32*, align 4
  store i32* %x, i32** %x.addr, align 4
  %tmp = load i32** %x.addr, align 4
  %arrayidx = getelementptr i32* %tmp, i32 2
  store i32 10, i32* %arrayidx, align 4
  call void @llvm.memory.barrier(i1 0, i1 0, i1 0, i1 0, i1 0) nounwind
; call void @llvm.memory.barrier(i1 1, i1 1, i1 1, i1 1, i1 1) nounwind
  %tmp1 = load i32** %x.addr, align 4
  %arrayidx1 = getelementptr i32* %tmp, i32 2
  store i32 20, i32* %arrayidx, align 4
  ret void
}

Using "opt -O3 " and result is

declare void @llvm.memory.barrier(i1, i1, i1, i1, i1) nounwind

define void @foo(i32* nocapture %x) nounwind {
entry:
  %arrayidx = getelementptr i32* %x, i32 2
  store i32 10, i32* %arrayidx, align 4
  tail call void @llvm.memory.barrier(i1 false, i1 false, i1 false, i1
false, i1 false) nounwind
  store i32 20, i32* %arrayidx, align 4
  ret void
}

Instrinsic llvm.memory.barrier does not work as expected. Is it a bug
or it has not been implemented yet ?

It's going away in favor of the new fence instruction (and I'll remove
it as soon as dragonegg catches up). It should still work at the
moment, though.

(1) false arguments do not work

// pesudo code
void foo(int *x) {
x[2] = 10;
llvm.memory.barrier(0, 0, 0, 0, 0);
x[2] = 20;
return void
}

The barrier is actually noop, but it prevents "x[2] = 10" from being deleted.

Don't do that. :slight_smile: Really, why are you using a noop barrier?

(2) True arguments do not work.

// pesudo code
void foo(int * restrict x) {
x[2] = 10;
llvm.memory.barrier(1, 1, 1, 1, 1);
x[2] = 20;
return void
}

"x[2] = 10' should not be deleted because barrier is present. But it
is deleted anyway.

The pointer is "restrict", therefore the compiler assumes nothing else
can touch it while the function runs.

Actually, the transformation in question is probably valid even if the
pointer isn't restrict (although LLVM won't actually do that); your
use of a barrier here doesn't really make sense.

-Eli

Instrinsic llvm.memory.barrier does not work as expected. Is it a bug
or it has not been implemented yet ?

It's going away in favor of the new fence instruction (and I'll remove
it as soon as dragonegg catches up). It should still work at the
moment, though.

(1) false arguments do not work

// pesudo code
void foo(int *x) {
x[2] = 10;
llvm.memory.barrier(0, 0, 0, 0, 0);
x[2] = 20;
return void
}

The barrier is actually noop, but it prevents "x[2] = 10" from being deleted.

Don't do that. :slight_smile: Really, why are you using a noop barrier?

Just to show it affects optimization, but it shouldn't.

(2) True arguments do not work.

// pesudo code
void foo(int * restrict x) {
x[2] = 10;
llvm.memory.barrier(1, 1, 1, 1, 1);
x[2] = 20;
return void
}

"x[2] = 10' should not be deleted because barrier is present. But it
is deleted anyway.

The pointer is "restrict", therefore the compiler assumes nothing else
can touch it while the function runs.

Actually, the transformation in question is probably valid even if the
pointer isn't restrict (although LLVM won't actually do that); your
use of a barrier here doesn't really make sense.

If you think of multiple-threaded code, it will make sense. Again,
this is a simplied code and is used just for showing the point.

The memory barrier requires that all memory operations prior to the
barrier point completes before any memory operations after the barrier
start. I think that it requires that compilers do not optimize across
the barrier point (and compilers do generate memory barrier
instructions if needed. I think LLVM only does this.), do you agree
on this ?

For the following gcc code, the asm basically behaves like barrier
that prevents across-barrier optimization. You can see that both
writes to p[2] are in the gcc output.

void foo (int * __restrict__ p)
{
  p[2] = 10;
  __asm__ __volatile__ ("":::"memory");
  p[2] = 20;
}

Junjie

memory.barrier's semantics aren't really all that well defined. Which
is why we're replacing it; the new fence instruction is clearly
defined based on C++0x semantics.

-Eli