I have just begun investigating LLVM because it seems like it might be an ideal tool for building tools for playing with software. However, my current project is related to parallel programming on shared memory multiprocessor systems, so I need thread support. As far as I can tell, LLVM currently has no intrinsic support for threads, is this correct? I saw the bug that indicates that LLVM's JIT needs some locks to protect its data structures, but are there any other issues that might be preventing LLVM from executing multithreaded applications?
That is correct. If you try to run threaded programs in the JIT, it might run into problems if multiple functions need to JIT functions at the same time. This should be very simple to deal with, we just haven't had anyone take up the task yet. Only the JIT is affected here, not the static code generator or C backend.
I'm pretty sure that this is the only issue with threaded programs.
To help me in my investigations, I wrote a small program that calls the Linux clone system call. It turns out that this small test program executes correctly with LLVM, which surprised me. Is this just because I got lucky, or is this expected behaviour?
That is expected behavior. You should be able to write programs that use clone(), native pthreads libraries, native uithreads, native win32 threads, etc. LLVM also fully supports the volatile attribute in C.
Does anyone have any thoughts about if or how LLVM should support threads? I was thinking that it might be worthwhile to add a few thread primitives to the llvm.* collection of instrinsics. Something like llvm.spawn, llvm.join, llvm.lock, llvm.unlock, and llvm.cas would be sufficient for my purposes.
There has definitely been talk about this. We are slated to get very low-level primitives for compare&swap and atomic increment. I can't say when they will be added (I'm not doing the work myself) but I've been told it should be soon.
The other ones are higher level. There is a balance here between what we want to express directly in llvm, and what we want defer to native libraries. The argument for putting things into LLVM is that it makes it easier for front-end people to produce portable bytecode files (they don't have to provide target-specific runtime libraries). The argument against is that we only want target- and language-independent capabilities to be added to LLVM.
I'm confident that this will eventually be added to LLVM, it's just a matter of finding the right things to add, ones that make sense in the context of LLVM.
Finally, how does LLVM interact with native code? I see in the disassembly that it declares the prototype for the clone function, but how does it locate the implementation? If it ends up calling "dlopen" and "dlsym," how does it know which library to look in? Additionally, is it possible to link against arbitrary native code libraries?
LLVM works exactly like a native compiler. If you call an external function you have to link to the appropriate (possibly native) library with -lFOO options. Note that llvm-gcc takes these options which are particularly handy:
llvm-gcc x.c -o a.out -Wl,-native <- produce a native a.out file with our backend
llvm-gcc x.c -o a.out -Wl,-native-cbe <- produce a native a.out file with the C backend & GCC.