Hi all,
I'm currently working on implementing ACLE extensions for ARM. There
are some memory barrier intrinsics, i.e.__dsb and __isb that require
the compiler not to reorder instructions around their corresponding
built-in intrinsics(__builtin_arm_dsb, __builtin_arm_isb), including
non-memory-access instructions.[1] This is currently not possible.
It is sometimes useful to prevent the compiler from reordering
memory-access instructions as well. The only way to do that in both
GCC and LLVM is using a in-line assembly hack:
asm volatile("" ::: "memory")
I propose adding two compiler scheduling barriers intrinsics to LLVM:
__schedule_barrier_memory and __schedule_barrier_full. The former only
prevents memory-access instructions reordering around the instruction
and the latter stops all. So that __isb, for example, can be
implemented something like:
inline void __isb() {
__schedule_barrier_full();
__builtin_arm_isb();
__schedule_barrier_full();
}
Given your examples are in C, I want to ask a clarification question. Are you proposing adding such intrinsics to the LLVM IR? Or to some runtime library? If the later, *specifically* which one? Or at the MachineInst layer?
I'm going to run under the assumption you're using C pseudo code for IR. If this is not the case, the rest of this will be off base.
I'm not familiar with the exact semantics of an "isb" barrier, but I think you should look at the existing fence IR instructions. These restrict memory reorderings in the IR. Depending on the platform, they may imply hardware barriers, but they always imply compiler barriers.
If all you want is a compiler barrier with the existing fence semantics w.r.t. reordering, we could consider extending fence with a "compiler only" (bikeshed needed!) attribute.
If you're describing a new memory ordering for existing fences, that would seem like a reasonable extension.
I'm not familiar with how we currently handle intrinsics for architecture specific memory barriers. Can anyone else comment on that? Is there a way to tag a particular intrinsic function as *also* being a full fence?
To implement these intrinsics, I think the best method is to add
target-independent pseudo-instructions with appropriate
properties(hasSideEffects for memory barrier and isTerminator for full
barrier) and a pseudo-instruction elimination pass after the
scheduling pass.
Why would your barrier need to be a basic block terminator? That doesn't parse for me. Could you explain?
What do people think of this idea?
I'm honestly unclear on what your problem is and what you're trying to propose. It make take a few rounds of conversation to clarify.
Philip