Instruction itineraries and fence/barrier instructions

We improved our instruction itineraries and now we’re seeing our testcases for fence instructions break.

For example, we have this testcase:

@write_me = external global i32
@read_me = external global i32

; Function Attrs: nounwind
define i32 @xstg_intrinsic(i32 %foo) #0 {
entry:
; CHECK: store r0, r1, 0, 32
; CHECK-NEXT: fence 2
%foo.addr = alloca i32, align 4
store i32 %foo, i32* %foo.addr, align 4
%0 = load i32* %foo.addr, align 4
store volatile i32 %0, i32* @write_me, align 4
call void @llvm.xstg.memory.barrier(i32 2, i8 0)
%1 = load volatile i32* @read_me, align 4
ret i32 %1
}

Prior to adding our instruction itineraries the code generated was:

xstg_intrinsic: # @xstg_intrinsic

BB#0: # %entry

subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
store r0, r510, 12, 32
movimm r1, %hi(write_me), 64
movimmshf32 r1, r1, %lo(write_me)
store r0, r1, 0, 32
fence 2
movimm r0, %hi(read_me), 64
movimmshf32 r0, r0, %lo(read_me)
load r1, r0, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511

Note the separation between the store prior to the fence and the code that comes after.

Now that we’ve got itineraries in place we see:

subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
movimm r1, %hi(write_me), 64
store r0, r510, 12, 32
movimmshf32 r1, r1, %lo(write_me)
movimm r2, %hi(read_me), 64
store r0, r1, 0, 32
movimmshf32 r2, r2, %lo(read_me)
fence 2
load r1, r2, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511

the movimm which sets up the address for the load has been moved up prior to the fence.

Is there a way to indicate in the itinerary that position of the fence should be fixed - no instruction reordering “through” the fence/barrier?

Phil

I don’t see a change relative to the memory instructions. Do you mean you want this to avoid scheduling of any instruction around any other? Does the instruction have isSideEffects set on it? I think the fallback if that isn’t enough is to override TargetInstrInfo::isSchedulingBoundary

-Matt

>
> We improved our instruction itineraries and now we're seeing our
testcases for fence instructions break.
>
> For example, we have this testcase:
>
> @write_me = external global i32
> @read_me = external global i32
>
> ; Function Attrs: nounwind
> define i32 @xstg_intrinsic(i32 %foo) #0 {
> entry:
> ; CHECK: store r0, r1, 0, 32
> ; CHECK-NEXT: fence 2
> %foo.addr = alloca i32, align 4
> store i32 %foo, i32* %foo.addr, align 4
> %0 = load i32* %foo.addr, align 4
> store volatile i32 %0, i32* @write_me, align 4
> call void @llvm.xstg.memory.barrier(i32 2, i8 0)
> %1 = load volatile i32* @read_me, align 4
> ret i32 %1
> }
>
> Prior to adding our instruction itineraries the code generated was:
>
> xstg_intrinsic: # @xstg_intrinsic
> # BB#0: # %entry
> subI r509, r509, 16, 64
> store r510, r509, 0, 64
> bitop1 r510, r509, 0, OR, 64
> store r0, r510, 12, 32
> movimm r1, %hi(write_me), 64
> movimmshf32 r1, r1, %lo(write_me)
> store r0, r1, 0, 32
> fence 2
> movimm r0, %hi(read_me), 64
> movimmshf32 r0, r0, %lo(read_me)
> load r1, r0, 0, 32
> bitop1 r509, r510, 0, OR, 64
> load r510, r509, 0, 64
> addI r509, r509, 16, 64
> jabs r511
>
> Note the separation between the store prior to the fence and the code
that comes after.
>
> Now that we've got itineraries in place we see:
>
> subI r509, r509, 16, 64
> store r510, r509, 0, 64
> bitop1 r510, r509, 0, OR, 64
> movimm r1, %hi(write_me), 64
> store r0, r510, 12, 32
> movimmshf32 r1, r1, %lo(write_me)
> movimm r2, %hi(read_me), 64
> store r0, r1, 0, 32
> movimmshf32 r2, r2, %lo(read_me)
> fence 2
> load r1, r2, 0, 32
> bitop1 r509, r510, 0, OR, 64
> load r510, r509, 0, 64
> addI r509, r509, 16, 64
> jabs r511
>
> the movimm which sets up the address for the load has been moved up
prior to the fence.
>
> Is there a way to indicate in the itinerary that position of the fence
should be fixed - no instruction reordering "through" the fence/barrier?
>
> Phil
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev@lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

I don’t see a change relative to the memory instructions.

True, we may be being a bit too skiddish about this... perhaps the solution
is to change the testcase so that we can ensure that the relative order
between the store and the fence has been preserved.

Do you mean you want this to avoid scheduling of any instruction around
any other? Does the instruction have isSideEffects set on it?

Where can I find information about isSideEffects? Googling "LLVm
isSideEffects" didnt' reveal anything that looked relevant.

I think the fallback if that isn’t enough is to override TargetInstrInfo::
isSchedulingBoundary

Thanks, I'll look at that in other targets.

Phil

We improved our instruction itineraries and now we’re seeing our testcases for fence instructions break.

For example, we have this testcase:

@write_me = external global i32
@read_me = external global i32

; Function Attrs: nounwind
define i32 @xstg_intrinsic(i32 %foo) #0 {
entry:
; CHECK: store r0, r1, 0, 32
; CHECK-NEXT: fence 2
%foo.addr = alloca i32, align 4
store i32 %foo, i32* %foo.addr, align 4
%0 = load i32* %foo.addr, align 4
store volatile i32 %0, i32* @write_me, align 4
call void @llvm.xstg.memory.barrier(i32 2, i8 0)
%1 = load volatile i32* @read_me, align 4
ret i32 %1
}

Prior to adding our instruction itineraries the code generated was:

xstg_intrinsic: # @xstg_intrinsic

BB#0: # %entry

subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
store r0, r510, 12, 32
movimm r1, %hi(write_me), 64
movimmshf32 r1, r1, %lo(write_me)
store r0, r1, 0, 32
fence 2
movimm r0, %hi(read_me), 64
movimmshf32 r0, r0, %lo(read_me)
load r1, r0, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511

Note the separation between the store prior to the fence and the code that comes after.

Now that we’ve got itineraries in place we see:

subI r509, r509, 16, 64
store r510, r509, 0, 64
bitop1 r510, r509, 0, OR, 64
movimm r1, %hi(write_me), 64
store r0, r510, 12, 32
movimmshf32 r1, r1, %lo(write_me)
movimm r2, %hi(read_me), 64
store r0, r1, 0, 32
movimmshf32 r2, r2, %lo(read_me)
fence 2
load r1, r2, 0, 32
bitop1 r509, r510, 0, OR, 64
load r510, r509, 0, 64
addI r509, r509, 16, 64
jabs r511

the movimm which sets up the address for the load has been moved up prior to the fence.

Is there a way to indicate in the itinerary that position of the fence should be fixed - no instruction reordering “through” the fence/barrier?

Phil


LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

I don’t see a change relative to the memory instructions.

True, we may be being a bit too skiddish about this… perhaps the solution is to change the testcase so that we can ensure that the relative order between the store and the fence has been preserved.

Do you mean you want this to avoid scheduling of any instruction around any other? Does the instruction have isSideEffects set on it?

Where can I find information about isSideEffects? Googling “LLVm isSideEffects” didnt’ reveal anything that looked relevant.

hasSideEffects. Look in Target.td / TargetInstrInfo.h. For a memory fence I think it should be sufficient to set mayLoad = 1, mayStore = 1 and not give the fence any memory operands

Forgot to add list.

ok, we do have hasSideEffects set on the fence instruction. I wonder if we should also set isBarrier ?

Phil

Certainly wouldn’t hurt to try and seems like a better option for what you want, though now I’m interested in the limitations of hasSideEffects and how the scheduler treats this exactly. let me know if this works out for you, curious, thanks.

-Ryan