is there any guarantee that the nvptx intrinsic “llvm.nvvm.barrier0” will not be moved around by opt ?
In other words, can I expect all the instructions above “llvm.nvvm.barrier0” to remain above it and those below it to remain below, after all the opt passes are run ?
If that is not the case, is there a way to define such an intrinsic ?
AFAIU, yes. Here's the definition:
def int_nvvm_barrier0 : GCCBuiltin<"__nvvm_bar0">,
Intrinsic<, , [IntrNoDuplicate]>;
Note that IntrNoDuplicate is the only intrinsic attribute. It has no other
attributes (like IntrNoMem) that would make it permissible for LLVM
optimizations to reorder things around it. By default, the optimizers would
not do this for function calls; only if these function calls are marked
with special attributes that permit this.
I have written test.ll as below and ran ‘opt’ on it as
" opt -std-compile-opts test.ll -S -o -" . But the output shows that there is code motion around the barrier intrinsics.
llvm.nvvm.barrier0 corresponds to __syncthreads in CUDA. Moving around the arithmetic instructions in your example should be fine, because they do not access memory.
The actual purpose that I wanted such an intrinsic is to solve a problem similar to this one in X86. Say I wanted to read the “mxcsr” register(which is the status register for SSE instructions) after a particular instruction, then I need a kind of barrier intrinsic which will not allow the arithmetic instructions to move around it. Or else I will be reading the status of some other instruction.
I can’t think of any NVPTX intrinsic that disallow even arithmetic instructions.
If you are trying to read some special registers in PTX, can you use inline assembly and mark it as having side effects? I think LLVM’s optimizer is very conservative about inline assembly marked with sideeffect, and will probably solve your code motion issue.
Have a look at how the ARM backend handles the CPSR register. It sounds like
what you're really looking for is liveness of that status register not to be
clobbered between the arithmetic instruction you're inspecting and the
instruction that reads that register.
I understand that. Once the control reaches the target back-end, I can disallow instructions moving around an intrinsic by defining an SDNode for the intrinsic, setting it’s properties appropriately and custom lowering it etc. But my question was aimed at how do we stop the “opt” passes from moving the arithmetic instructions around the intrinsic. For example we have “llvm.arm.set.fpscr” intrinsic to set the rounding mode of the arithmetic instructions following it. But if “opt” passes move arithmetic instructions around it, then the expected results are wrong. Am trying to check if anyone has a solution for this already.
This issue comes up every once in a while, see for example:
Or, for something much older:
The bottom line is that, unfortunately, there is no good way to enforce that at the IR level.