Deep JIT specialization

Hi all,

I’m attempting to use LLVM for run-time code specialization, but I’m facing a performance hurdle. I’m currently performing the specialization during the AST to LLVM IR translation, but unfortunately this leads to relatively slow recompiles as LLVM has to perform all the heavy (optimization) passes over and over again.

So I was hoping that by first creating unspecialized LLVM IR, optimizing that as much as possible, and then performing the specializations starting from this optimized IR, the recompiles would be significantly faster. Currently the mem2reg and instcombine passes take the majority of compilation time, which could be avoided using “deep” JIT specialization.

So my question is how do I get started with this? Currently tracking the specialization parameters and caching the generated specialized functions is done outside of LLVM. So I imagine I’ll have to somehow inform LLVM of the semi-constant values of the specialization parameters, without losing the original (optimized) IR. Can I add and remove specialized function instances at run-time?

Thanks for any pointers!

Kind regards,

Nicolas

Hi Nicolas,

Nate Begeman's "building an efficient JIT" talk about the llvm developer meeting last year (or the year before) is a great place to start looking for this sort of thing.

-Chris

I don't think there is any infrastructure for this kind of
specialization. The closest thing I can think of is the
profile-guided optimization stuff that Andreas Neufstifter has worked
on.

Because compilation and optimization with LLVM is expensive, our
approach with Unladen Swallow has been to try to wait longer to
generate specialized code, so we don't have to recompile.

If memory serves me, we also found that code generation (ie register
allocation and instruction selection) held the lion's share of the
compilation time, and I can't think of a way to introduce
specializations without paying that cost.

What do you think a good library for doing this kind of specialization
would look like? Most of our specializations are very complicated,
like replacing calls to the runtime with inline guards.

Reid

Hi Chris,

Thanks for pointing me to that presentation! It helped me come up with a strategy that I believe might work:

  1. Use CloneFunction() to make a copy of the original unspecialized (but optimized) function.
  2. Specialize it using a custom function pass which identifies the specialization parameters and substitutes them with given run-time constants.
  3. Run the function through a FunctionPassManager with some post-specialization optimization passes (dead code, etc).
  4. Use getPointerToFunction() to generate the machine code.
  5. Call freeMachineCodeForFunction() when I no longer need a specific specialization.

I’m not entirely sure yet how to implement some of these steps in practice, but does this sound like the right approach or would you suggest something else? Can I call eraseFromParent() on the specialized function after step 4 or only at 5?

Thank you!

Nicolas

Hi Reid,

I will check out Andreas’ profile-guided optimizations. Thanks for the suggestion.

I don’t think the approach from Unladen Swallow works for my use case. The specialization parameters are provided by the application at run-time, and not specializing the functions would result in unacceptable performance. In case you’re familiar with the concept, I’m doing something very similar to so-called uber-shaders.

Indeed register allocation and instruction selection take a fair bit of time, but in my experience it’s not the lion’s share of compilation time when running heavy optimizations. Either way, if I can reduce the the compilation time even by a bit that would be a worthwhile result.

I’m not sure if an API for specialization would make sense. It has many faces and I don’t there’s a one-size-fits-all solution. I’ll let you know though if I’ve implemented something succesfully and it is reusable.

Cheers,

Nicolas

Hi Chris,

Thanks for pointing me to that presentation! It helped me come up with a strategy that I believe might work:

  1. Use CloneFunction() to make a copy of the original unspecialized (but optimized) function.
  2. Specialize it using a custom function pass which identifies the specialization parameters and substitutes them with given run-time constants.

Use llvm::CloneAndPruneFunctionInfo(). It lets you specify the constant replacements for values up front and therefore clones less by doing the constant folding as it goes.

Nick