Parallelism in HLVM

The HLVM project is a high-level VM optimized for scientific computing:

I implemented the first-working version of a garbage collector capable of
collecting from threads that run in parallel in November. Initial performance
was awful due to the overhead of accessing thread-local data via POSIX
pthreads. I just completed optimizing HLVM so thread-local data are now
passed everywhere as an auxiliary argument to every HLVM function. This has
dramatically improved performance and single-threaded code now runs within
25% of the performance of the serial collector.

However, one test fails with a segfault when JIT compiled with TCO enabled and
I don't know why.