GSoC- Speculative compilation support in ORC v2 , looking for mentors!

Hi all,

I would like to propose “Speculative compilation support in ORC v2 JIT API” for this year GSoC summer project.

Project Description:

Speculative compilation support. One of the selling points of the concurrent ORC APIs is that you can start compiling a function before you need it, in the hope that by the time that you do need it it is already compiled. However, if we just speculatively compile everything we’re quickly going to overload our memory/CPU resources, which will be lousy for performance. What we really want to do is to take into account any information we have (from high level program representations, e.g. ASTs or CFGs, or from profiling data from previous executions) to try to guess what functions are worth speculatively compiling next.

Idea is proposed by Lang Hames.

Current Status of ORC v2:

  1. ORC currently supports concurrent compilation and Lazy (on request compilation), trunk contains default new ORC JIT implementations as LLJIT, LLLazyJIT.

  2. [WIP] Jit-linker a drop-in-replacement for RuntimeDyld atleast for MachO in ORC JIT.

  3. Primitive code to handle speculative is in trunk. But it needs to refactored & designed to get generalized, simple APIs to support speculation.

  4. Currently, no heuristics to support speculation in trunk.

Proposed Solution:

  1. New APIs in ORC v2 to support speculation compilation of LLVM bitcode. LLSpecJIT - subtype of LLJIT, default JIT with speculation support in trunk.

  2. Heuristics to guide speculation -

Currently, I’m going through some literature/research papers to find good basic heuristics. We can derive useful information from Control flow graphs, IR etc. So far I figured out that Region-based compilation is more appropriate for speculation. Since, unit of compilation (granularity) is key good dynamic compilation, region based compilation address that by forming frequently executed code as a region (unit), this can be basic block, function or group of functions. But since, LLVM JIT API needs interoperates with static compiler, unit of compilation here is a complete module. The plan in mind is to break the module into regions and compile regions speculatively using multiple JIT backend light weight threads. I’m still not sure, how this can be done, I would highly appreciate the feedback from community in deciding the heuristics & granularity to compile.

3.Practical example with benchmarks: To my best of knowledge PostgreSQL using ORC API, it would be easier to get postgresql queries to get compile speculatively and compare results with baseline (concurrent compiler without speculation).

  1. Unit tests + documentation.

Benefits:

  1. We can full use of multiple compilation threads, to compile code upfront with minimum false positive.

  2. Stabilize JIT APIs which most clients looking for when transitioning to newer versions.

Further Opportunities :

We can also use thinlto summaries in module to decide which code is worthier to compile next, based on this lighting LLVM Dev talk by Stefan.

Profile guided optimization - information will be helpful in this, provide further room for improvement.

I would highly appreciate the community feedback & suggestions :slight_smile:

I will try my best to address all the comments in the proposal. I’m currently familiarizing myself with new APIs and learning heuristics. I will try to get the draft proposal by next week.

References: HHVM JIT: A profile-Guided, Region-Based Compiler for PHP & Hack [paper].

PostgreSQL [JIT]

Execution Engine [ORC]

Hi,

This idea sounds pretty cool
Just some quick comments inlined below:

Hi all,

I would like to propose “Speculative compilation support in ORC v2 JIT API” for this year GSoC summer project.

Project Description:
Speculative compilation support. One of the selling points of the concurrent ORC APIs is that you can start compiling a function before you need it, in the hope that by the time that you do need it it is already compiled. However, if we just speculatively compile everything we’re quickly going to overload our memory/CPU resources, which will be lousy for performance. What we really want to do is to take into account any information we have (from high level program representations, e.g. ASTs or CFGs, or from profiling data from previous executions) to try to guess what functions are worth speculatively compiling next.
Idea is proposed by Lang Hames.

Current Status of ORC v2:

  1. ORC currently supports concurrent compilation and Lazy (on request compilation), trunk contains default new ORC JIT implementations as LLJIT, LLLazyJIT.

  2. [WIP] Jit-linker a drop-in-replacement for RuntimeDyld atleast for MachO in ORC JIT.

  3. Primitive code to handle speculative is in trunk. But it needs to refactored & designed to get generalized, simple APIs to support speculation.

  4. Currently, no heuristics to support speculation in trunk.

Proposed Solution:

  1. New APIs in ORC v2 to support speculation compilation of LLVM bitcode. LLSpecJIT - subtype of LLJIT, default JIT with speculation support in trunk.

  2. Heuristics to guide speculation -

Currently, I’m going through some literature/research papers to find good basic heuristics. We can derive useful information from Control flow graphs, IR etc. So far I figured out that Region-based compilation is more appropriate for speculation.

Can you justify your point? What do you mean by saying “appropriate”? Is it easy to implement or have better performance on the generated code?

Just a FYI:
The JIT compilation community has been debating over which compilation region should be chosen for quite some time.
The TL;DR answer is: there is NO single answer, all depends on the usages and engineering efforts.
Just give you examples on two of the most adopted methods: trace-based(kind of similar to region based) and function-based.
Famous users of trace-based JIT are early versions of SpiderMonkey(i.e. JS engine in Firefox) and Dalvik VM(i.e. the old VM in Android). This approach can record hot path across functions, and just like the old saying: global optimizations are always preferred. However, it requires more engineering efforts and profiling/tracing is almost mandatory.
Function-based JIT is used by V8 and ART(Android Runtime, the current VM in Android), to name a few. The biggest advantage is that it’s pretty easy to implement. And the performance is actually not bad in most of the cases. The downside is , of course, it will loose some global optimization opportunities.

Be honestly I’m pretty surprised you didn’t mention any of the above projects. Of course, none of them use LLVM, but all of them are outstanding JIT engines with huge amount of users and tested by time. Some of them are even important milestones in the history of dynamic compilation. I believe you can learn the key insight of choosing a specific compilation region from them.

Since, unit of compilation (granularity) is key good dynamic compilation, region based compilation address that by forming frequently executed code as a region (unit), this can be basic block, function or group of functions. But since, LLVM JIT API needs interoperates with static compiler, unit of compilation here is a complete module.

The plan in mind is to break the module into regions and compile regions speculatively using multiple JIT backend light weight threads. I’m still not sure, how this can be done, I would highly appreciate the feedback from community in deciding the heuristics & granularity to compile.
3.Practical example with benchmarks: To my best of knowledge PostgreSQL using ORC API, it would be easier to get postgresql queries to get compile speculatively and compare results with baseline (concurrent compiler without speculation).
4. Unit tests + documentation.

Benefits:

  1. We can full use of multiple compilation threads, to compile code upfront with minimum false positive.

  2. Stabilize JIT APIs which most clients looking for when transitioning to newer versions.

Further Opportunities :
We can also use thinlto summaries in module to decide which code is worthier to compile next, based on this lighting LLVM Dev talk by Stefan.
Profile guided optimization - information will be helpful in this, provide further room for improvement.

I would highly appreciate the community feedback & suggestions :slight_smile:
I will try my best to address all the comments in the proposal. I’m currently familiarizing myself with new APIs and learning heuristics. I will try to get the draft proposal by next week.

References: HHVM JIT: A profile-Guided, Region-Based Compiler for PHP & Hack [paper].
PostgreSQL [JIT]
Execution Engine [ORC]

Best,
Bekket

Hi Bekket,

Sorry for the delayed reply.

By appropriate, I mean performance of compiled native code.

I was referring other JIT implementations like LuaJIT & webkit FTL JIT to see how they implement their JIT. I have gone through the design of Spider Monkey & Android runtime (ART) JIT. As, you said both region based and method based compilation unit have their own advantage & disadvantage. Both make use of profile generation for hot-code regions/functions. I have planned to rethink my idea of implementing “function based” instead of “region based” compilation unit. Because, trace-based jit requires considerable development time & time taken to find hot code region from IR with profile support is high. In my opinion these are difficult to get implemented within 11 weeks (summer time) and make a way to trunk for real use by people.

Also, LLVM ORC v2 resembles more like a function based approach. I think it would be easier to built a runtime profiler around it and make use of that profile data in compilelayer of ORC to recompile hot functions speculatively with more optimization. As far as I know, many JIT implementations uses profilers to decide hot code functions than information from their intermediate representation because of the dynamic nature of compiled/interpreting source language. This make me focus more runtime profilers than analysis on IR. Of course, analysis from IR will be useful in start phase of JIT compilation.

I find IntelJITEvents in Execution Engine, is there any documentation on it’s use cases? If you have thoughts over it, please share :slight_smile:

Thanks

Hi,

Hi Bekket,

Sorry for the delayed reply.

By appropriate, I mean performance of compiled native code.

I think you need more evaluations on that. As there are too many factors affecting the quality of generated code.

I was referring other JIT implementations like LuaJIT & webkit FTL JIT to see how they implement their JIT. I have gone through the design of Spider Monkey & Android runtime (ART) JIT. As, you said both region based and method based compilation unit have their own advantage & disadvantage. Both make use of profile generation for hot-code regions/functions. I have planned to rethink my idea of implementing “function based” instead of “region based” compilation unit. Because, trace-based jit requires considerable development time & time taken to find hot code region from IR with profile support is high. In my opinion these are difficult to get implemented within 11 weeks (summer time) and make a way to trunk for real use by people.

I totally agree with your plan. Function based JIT make more sense for a 3-month project in terms of engineering efforts for a deliverable result. Also, LLVM already has plenty of handy and mature call graph as well as inter-procedure toolkits you can use.

Also, LLVM ORC v2 resembles more like a function based approach. I think it would be easier to built a runtime profiler around it and make use of that profile data in compilelayer of ORC to recompile hot functions speculatively with more optimization. As far as I know, many JIT implementations uses profilers to decide hot code functions than information from their intermediate representation because of the dynamic nature of compiled/interpreting source language. This make me focus more runtime profilers than analysis on IR. Of course, analysis from IR will be useful in start phase of JIT compilation.

Regarding profiling, just as a FYI, you might want to take a look at the (PGO-guided) Hot/Cold code splitting toolkit landing in LLVM just few months ago. Although it takes static PGO profile, but I believe you can leverage the static PGO framework in LLVM as foundation for your dynamic profiling.

I find IntelJITEvents in Execution Engine, is there any documentation on it’s use cases? If you have thoughts over it, please share :slight_smile:

I’d never used that, sorry

Thanks

Best,
Bekket

Hi Bekket,

Thank you for your reply. Earlier I came across a paper called “Dynamic Look Ahead Compilation: Hide JIT compilation latencies”, it devised some methods for JIT compilation of functions before the actual call takes place by using call graph analysis, branch probabilities to generate a list of functions with high likelihood of execution in near future. In my opinion it would be nice to use that paper as a base design with call graph, inter-procedural analysis from LLVM to form heuristics. Currently, gathering information and planning on algorithm to use those information in ORC layers.

Thanks for mentioning PGO (hot/code splitting), I will investigate more on that in subsequent days.

I find IntelJITEvents in Execution Engine, is there any documentation on it’s use cases? If you have thoughts over it, please share :slight_smile:

I’d never used that, sorry

from what I recall, IntelJITEvents just tells the debugger to include debug symbols from JIT-compiled code. There are several similar classes, such as GDBJITRegistrationListener, for telling GDB about JIT-compiled code. IntelJITEvents is just one of them.

see https://github.com/llvm/llvm-project/blob/master/llvm/docs/DebuggingJITedCode.rst

Jacob Lifshay

Hi Jacob,

Thanks for the information, it is very helpful. Given that I’m sure that ORC JIT API didn’t have support for implementing a profiler, whether was there any discussions about implementing APIs to build a profiler for LLVM JIT Infrastructure in past? If anyone have suggestions/ideas please share :slight_smile:

Hi all,

Hereby I have attached my initial design plan. It is still incomplete!

Community feedback is appreciated, here : link

Please share your feedback :slight_smile: