Heuristic for choosing between MCJIT and Interpreter

I’m facing a situation where I have generated IR that only needs to be executed once. I’ve noticed for simple IR it’s faster to run the interpreter on it, but for complex IR it’s much better to JIT compile and execute it. I’m seeking suggestions for a good heuristic to decide which approach to take for any given IR. I’m leaning in favor of deciding based on the presence/absence of loops.
-Josh

Hi Josh,

In the past, when facing a similar situation, I've decided based on
maintenance, not IR heuristics. Maybe maintaining two versions and
making sure they both get updated when you change something would be
worse than having MCJIT as a default and then try to improve the JIT
to make it fast and reliable, rather than avoid the issues altogether
and mud things up in the future.

cheers,
--renato

Hi Josh,

Hi Josh,

I'm facing a situation where I have generated IR that only needs to be executed once. I've noticed for simple IR it's faster to run the interpreter on it, but for complex IR it's much better to JIT compile and execute it. I'm seeking suggestions for a good heuristic to decide which approach to take for any given IR. I'm leaning in favor of deciding based on the presence/absence of loops.

What are you generating IR from? You may find that an AST interpreter, although slow, will be faster than going to the effort of generating LLVM IR and then interpreting it. LLVM IR is much slower than bytecodes for high-level languages because you have the overhead of interpreting for things that often map to a single machine instruction, whereas high-level bytecodes tend to amortise the interpretation cost by having complex operations.

I've found having a working AST interpreter to be good for testing an LLVM-based JIT, as you can run the same code with both check that the same actions externally visible happen in the same order.

David makes a good set of points here. The only reason not to take the approach he is suggesting is that it does involve writing and maintaining additional code. If you don't really care about the performance of your cold code, using the LLVM interpreter is definite an option. (Keep in mind that there's little active development happening on the interpreter and it may have started to bitrot.)

Given your original question, I'm guessing you do care about the performance of your cold code. Given that, David has suggested the best approach.

Thanks to everyone for the helpful feedback. I was able to incorporate the interpreter and loop heuristic with very little additional code, so I’m not immediately concerned about the extra maintenance introduced. An AST interpreter does sound sensible in the long run, but is more effort than I can justify spending at the moment. If the IR interpreter does bitrot (a shame) the MCJIT alone is probably still fast enough for my use.

v/r,
Josh

Longer term, you might want to look at the Alphabet Soup project from Oracle Labs. They apply type feedback to an AST interpreter and get very good performance out of it (>20% of compiled code - they did have JavaScript running faster than V8 for a while, but V8 is now quite a bit faster). The ASTs that they produce as a result of the type feedback are then ideal sources for feeding into your LLVM back end.

Of course, this is without knowing anything about your source language. If it has a strict static type system, then this approach will not be relevant.

David