HLVM performance and shadow stack overheads

The (new) HLVM project is continuing to improve and I have graphed and
analysed some performance-related data. Beating OCaml on numerical
performance using LLVM turned out to be quite easy on x86:


This was achieved using a single optimization pass in HLVM (unrolling) and
none of LLVM's own IR optimization passes. So the performance is essentially
due to LLVM's excellent x86 code gen and sane IR generation by HLVM itself.

Also, many people have criticized LLVM's support for garbage collectors and
were quick to dismiss the simple shadow stack approach that I have used with
HLVM. So I thought it would be interesting to quantify the overheads


These results show that even a completely naive shadow stack and GC
implementation like the one currently in HLVM has quite reasonable
performance. In particular, suitable tweaking allows HLVM to come well within
2x the performance of OCaml on the list-based 10-queens benchmark. This is
really remarkable given that OCaml is one of the most highly optimized
single-threaded run-times in existence.

In the future, I intend to focus on optimizations that relieve GC stress
rather than on optimizing the GC itself. I also intend to add support for
parallelism which, although simple in design, should make multicores far more
useful for ML programmers.

Many thanks,