Attached is the target independent llvm.atomic.barrier support, as
well as alpha and x86 (sse2) support. This matches Chandler's
definitions, and the LangRef patch will just restore that. Non-sse2
barrier will be needed, I think it is "lock; mov %esp, %esp", but I'm
not sure.
Any objections? I'll take a hack at the front end support for
__sync_synchronize after this goes in.
Andrew
barrier.patch (6.47 KB)
This is the gcc side of the patch.
Index: gcc/llvm-convert.cpp
This looks good to me after a cursory scan, and seems to match what I had worked up with some very welcome better form on the codegen/backend side. =D I'm glad you've had some time to start hacking on this, it may be several more weeks before I get any time to work on these and other LLVM projects.
Hopefully someone can chime in on appropriateness of the DAG and target implementations. One question: is "wmb" not actually useful on Alpha? My reading of docs had indicated it provided store-store memory barrier functionality, but I'm far from an expert on the architecture.
Again, glad to see some work on this, and glad you had some time!
-Chandler
Andrew Lenharth wrote:
Hopefully someone can chime in on appropriateness of the DAG and target
implementations. One question: is "wmb" not actually useful on Alpha? My
reading of docs had indicated it provided store-store memory barrier
functionality, but I'm far from an expert on the architecture.
wmb is probably useful and can be added should someone really want it
fairly easily, but since alpha isn't much used, I don't mind being
overly conservative there. The bigger problem is the pre-SSE2 x86
version.
Again, glad to see some work on this, and glad you had some time!
We have similar extensions sitting around for some research projects
and I wanted to reduce our code base 
Andrew
I'll take a hack at the front end support for
__sync_synchronize after this goes in.
This is the gcc side of the patch.
GCC 4.2 compiles this to a no-op on x86:
void foo() {
__sync_synchronize();
}
Are you seeing different behavior? What am I missing here?
-Chris
I'll take a hack at the front end support for
__sync_synchronize after this goes in.
This is the gcc side of the patch.
Thanks for tackling this Andrew.
Please prepare a patch for LangRef.html that explains what this thing does :). What are all those bools? Once that is available I'll review the rest of the llvm patch.
In the call below, you don't have to pass in 4 i1's. Just passing in the intrinsic ID should be fine. The type list is for intrinsics that take 'any' as an argument.
Finally, don't forget the 80 column rule.
-Chris
Please prepare a patch for LangRef.html that explains what this thing
does :). What are all those bools? Once that is available I'll
review the rest of the llvm patch.
In the call below, you don't have to pass in 4 i1's. Just passing in
the intrinsic ID should be fine. The type list is for intrinsics that
take 'any' as an argument.
The intrinsic takes 4 bools.
Finally, don't forget the 80 column rule.
Yea, already noticed that.
I see the same. I don't know why, __sync_synchronize() is suppose to
be a full memory barrier. I don't know why gcc doesn't generate a
barrier on x86. It does on alpha. X86 will do load-load reordering,
so I would expect a "full memory barrier" primitive in gcc to actually
generate a barrier (and at least the linux kernel implements barriers
on x86 for hardware io). In any event, generating the intrinsic all
the time should keep the optimizations from reordering when the
programmer doesn't want them too, and the codegen for x86 can always
remove unnecessary barriers.
Andrew
I know, but you don't have to pass them into intrinsic::getdeclaration
-Chris
+ // MEMBARRIER - This corresponds to the atomic.barrier intrinsic.
+ // it takes 4 operands to specify the type of barrier:
+ // ll, ls, sl, ss
+ MEMBARRIER,
This should specify the input and output values, like EH_RETURN does.
+def : Pat<(membarrier (i8 1), (i8 0), (i8 0), (i8 0)), (LFENCE)>;
+def : Pat<(membarrier (i8 imm:$ll), (i8 imm:$ls), (i8 imm:$sl), (i8 imm:$ss)),
+ (MFENCE)>;
Very nice. I'll trust Chandler that these are right. 
Looks great Andrew, please commit. Thanks!
-Chris
GCC 4.2 compiles this to a no-op on x86:
void foo() {
__sync_synchronize();
}
Are you seeing different behavior? What am I missing here?
Maybe the processor does a memory barrier when it executes
a call instruction.
Ciao,
Duncan.
I had tried several variants of that with loads and stores around the
barrier. GCC never generated a barrier, but it's not needed if you
are accessing cached memory (on x86, at least post-ppro, from what
I've read), only other stuff, so I am assuming gcc is making that
assumption about loads and stores.
Andrew