We improved our instruction itineraries and now we’re seeing our testcases for fence instructions break.
For example, we have this testcase:
@write_me = external global i32
@read_me = external global i32
; Function Attrs: nounwind
define i32 @xstg_intrinsic(i32 %foo) #0 {
entry:
; CHECK: store r0, r1, 0, 32
; CHECK-NEXT: fence 2
%foo.addr = alloca i32, align 4
store i32 %foo, i32* %foo.addr, align 4
%0 = load i32* %foo.addr, align 4
store volatile i32 %0, i32* @write_me, align 4
call void @llvm.xstg.memory.barrier(i32 2, i8 0)
%1 = load volatile i32* @read_me, align 4
ret i32 %1
}
Prior to adding our instruction itineraries the code generated was:
xstg_intrinsic: # @xstg_intrinsic
BB#0: # %entry
subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
store r0, r510, 12, 32
movimm r1, %hi(write_me), 64
movimmshf32 r1, r1, %lo(write_me)
store r0, r1, 0, 32
fence 2
movimm r0, %hi(read_me), 64
movimmshf32 r0, r0, %lo(read_me)
load r1, r0, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511
Note the separation between the store prior to the fence and the code that comes after.
Now that we’ve got itineraries in place we see:
subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
movimm r1, %hi(write_me), 64
store r0, r510, 12, 32
movimmshf32 r1, r1, %lo(write_me)
movimm r2, %hi(read_me), 64
store r0, r1, 0, 32
movimmshf32 r2, r2, %lo(read_me)
fence 2
load r1, r2, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511
the movimm which sets up the address for the load has been moved up prior to the fence.
Is there a way to indicate in the itinerary that position of the fence should be fixed - no instruction reordering “through” the fence/barrier?
Phil