LLVM Weekly - #110, Feb 8th 2016

LLVM Weekly - #110, Feb 8th 2016

.

JavaScriptCore’s FTL JIT is moving away
from using LLVM as its backend, towards B3 (Bare Bones
Backend)
. This includes its own SSA
IR
,
optimisations, and instruction selection backend.

In the end, what was the main motivation for creating a new IR?

Cheers,
Rafael

I can’t speak to the motivation of the WebKit team. Those are outlined in https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/.
I’ll give you my personal perspective on using LLVM for JITs, which may be interesting to the LLVM community.

Most of the payoff for high level languages comes from the language-specific optimizer. It was simpler for JavaScriptCore to perform loop optimization at that level, so it doesn’t even make use of LLVM’s most powerful optimizations, particularly SCEV based optimization. There is a relatively small, finite amount of low-level optimization that is going to be important for JavaScript benchmarks (most of InstCombine is not relevant).

SelectionDAG ISEL’s compile time makes it a very poor choice for a JIT. We never put the effort into making x86 FastISEL competitive for WebKit’s needs. The focus now is on Global ISEL, but that won’t be ready for a while.

Even when LLVM’s compile time problems are largely solved, and I believe they can be, there will always be systemic compile time and memory overhead from design decisions that achieve generality, flexibility, and layering. These are software engineering tradeoffs.

It is possible to design an extremely lightweight SSA IR that works well in a carefully controlled, fixed optimization pipeline. You then benefit from basic SSA optimizations, which are not hard to write. You end up working with an IR of arrays, where identifiers are indicies into the array. It’s a different way of writing passes, but very efficient. It’s probably worth it for WebKit, but not LLVM.

LLVM’s patchpoints and stackmaps features are critical for managed runtimes. However, directly supporting these features in a custom IR is simply more convenient. It takes more time to make design changes to LLVM IR vs. a custom IR. For example, LLVM does not yet support TBAA on calls, which would be very useful for optimizating around patchpoints and runtime calls.

Prior to FTL, JavaScriptCore had no dependence on the LLVM project. Maintaining a dependence on an external project naturally has integration overhead.

So, while LLVM is not the perfect JIT IR, it is very useful for JIT developers who want a quick solution for low-level optimization and retargetable codegen. WebKit FTL was a great example of using it to bootstrap a higher tier JIT.

To that end, I think it is important for LLVM to have a well-supported -Ojit pipeline (compile fast) with the right set of passes for higher-level languages (e.g. Tail Duplication).

-Andy

After reading https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/., I jotted down a couple of thoughts of my own here: Philip

After reading https://webkit.org/blog/5852/introducing-the-b3-jit-compiler/., I jotted down a couple of thoughts of my own here: http://www.philipreames.com/Blog/2016/02/15/quick-thoughts-on-webkits-b3/

Thanks for sharing. I think it’s worth noting that what you are doing would be considered 5th tier for WebKit, since you already had a decent optimizing backend without LLVM. You also have more room for background compilation threads and aren’t benchmarking on a MacBook Air.

Andy

So, serious, but naive question: what are the other tiers for? My mental model is generally: tier 0 - interpreter or splat compiler – (to deal with run once code) tier 1 - a fast (above all else) but decent compiler which gets the obvious stuff – (does most of the compilation by methods) tier 2 - a good, but fast, compiler which generates good quality code without burning too much time – (specifically for the hotish stuff) tier 3 - “a compile time does not matter, get this hot method” compiler, decidedly optional – (compiles only really hot stuff) (Profiling is handled by tier 0, and tier 1, in the above.) It really sounds to me like FTL is positioned somewhere between tier 1 and tier 2 in the above. Is that about right? True! Both definitely matter.

So, serious, but naive question: what are the other tiers for? My mental model is generally:

tier 0 - interpreter or splat compiler – (to deal with run once code)

You combined two tiers in one, and I start at 1. So using my terminology inspired by WebKit:
tier 1: interpreter
tier 2: splat compiler

tier 1 - a fast (above all else) but decent compiler which gets the obvious stuff – (does most of the compilation by methods)

or tier 3: compiling methods into IR or bytecode, applying high-level optimization, splatting codegen

tier 2 - a good, but fast, compiler which generates good quality code without burning too much time – (specifically for the hotish stuff)

or tier 4: high level optimization using profile data from tier3, nontrivial codegen

tier 3 - “a compile time does not matter, get this hot method” compiler, decidedly optional – (compiles only really hot stuff)

or tier 5: bolt a C compiler onto the JIT.

(Profiling is handled by tier 0, and tier 1, in the above.)

Profiling needs to be done by all tiers up to and including at least the first round of high-level optimization where the optimizer registers some assumptions about runtime types (tier 3 in my case).

I’m not saying it’s a good idea to have all those tiers, it’s just a way to compare JIT levels. The point is, you are a tier higher than B3.

  • Andy

And the fact that a company that has as much in-house LLVM expertise as Apple decided that this was a significant burden is something that we should take note of. LLVM is particularly unfriendly to out-of-tree developers, with no attempt made to provide API compatibility between releases. I maintain several out-of-tree projects that use LLVM and the effort involved in moving between major releases is significant (and not much more than the effort involved in moving between svn head revisions so, like most other projects, I don’t test with head until there’s a release candidate - or often after the release, if I don’t have a few days to update to the new APIs, which means that we lose out on a load of testing that other library projects get for free). Methods are removed or renamed with no deprecation warnings and often without any documentation indicating what their usage should be replaced with. Even for a fairly small project, upgrading between point releases of LLVM is typically a few days of effort.

David

Thanks David,

The integration burden is something to raise awareness of. I thought failing to mention it would be disingenuous. It needs to factor into anyone's plans to integrate LLVM into their runtime. I'll reiterate that I do not speak for the WebKit team or their motivation. I don't think integration burden is any less whether you work for one company or another, or have "in-house" expertise, and I know that API breakage can't be blamed on a particular company.

Bottom line (to risk stating the obvious):

- runtime compiler integration is even harder than static compiler integration

- don't expect to piggyback on LLVM's continual advances without continually engaging the LLVM open source community

I think either of these topics, MCJIT design and general API migration, would be great to discuss in separate threads.

Andy

I try to follow ToT closely. The amount of work required keeping things running is similar (some say slightly higher), but it gives you the advantage that the changes themselves are smaller and the set of commits you have to look at to find out what changed and what it was replaced by is much smaller (compensating for the lack of documentation of those changes and their replacements).

Thanks!

I found that during the weekend and it was a very nice read. I find it
quite impressive what you guys manage to do in such a short time. Hope
to see llvm catch up some day.

Cheers,
Rafael

For llvmlite, we typically wait for a X.Y.1 release. Switching from
X.Y-1.1 to X.Y.1 takes a few days on average. We ship binaries for
several platforms (Linux, OS X, Windows) and using SVN head would have
us suffer from whatever temporary instabilities or regressions have
been introduced.

Regards

Antoine.

Ah, this was the piece I was missing. I didn’t realize you had both an interpreter and a splat compiler. That makes the numbering make a lot more sense. I had come up with the possible off-by-one myself, but that didn’t fully explain the difference. Can you say anything about the reasoning for having both? Do you see warmish code that the splat compiler is worthwhile? I’m used to interpreters and splat compilers being positioned as an either-or choice. When do you decide to promote something to the splat compiler, but not the “tier 3” compiler?

I’m not a WebKit developer (though I do use JSC as a case study in the course that I teach), so this may be wide of the mark, but it’s worth noting that the requirements of JavaScript in a web browser are quite different from those of most languages. A *lot* of JavaScript code has execution time completely dominated by the time spent in the DOM and users notice if memory consumption for this code is high, but don’t notice if the JavaScript execution is slow (even a naïve AST interpreter will find performance massively dominated by the code in the DOM). Additionally, a lot is executed only once (JavaScript is purely imperative and so ends up with a lot of code that is declarative in Java / C#, for example creating classes and attaching methods to them, is imperative code in JavaScript and is executed precisely once). Much of this code must be executed in the few tens of milliseconds that exist between the user clicking on a link and the user complaining that the browser is slow.

The baseline JIT is around an order of magnitude faster than the interpreter[1], but consumes memory for the generated code and does not give a user-noticeable speedup for code that is executed only once. The baseline JIT and the interpreter both use the same stack layout (this was one of the motivations for replacing the old C++ interpreter with one written in JSC’s custom macro assembly), so it’s comparatively cheap to move from the interpreter to the baseline JIT.

Finally, WebKit / JSC runs on a lot of mobile devices (iPhones up to MacBooks Pro) where power consumption is a vital design consideration. It is very important in these situations not to speculatively burn cycles optimising code where the user won’t notice the difference, because they will notice if their battery doesn’t last as long.

These constraints don’t exist in server workloads and are even quite rare on the desktop. Few people care if their desktop app takes a couple of seconds to start (and, if they do, they won’t mind if the second time it starts a lot faster because it has cached the generated code). If a web page takes a couple of seconds to load, then a lot of people will close the tab before it finishes.

David

[1] Introducing the WebKit FTL JIT | WebKit

Can you say anything about the reasoning for having both? Do you see warmish code that the splat compiler is worthwhile? I'm used to interpreters and splat compilers being positioned as an either-or choice. When do you decide to promote something to the splat compiler, but not the "tier 3” compiler?

I’m not a WebKit developer either, but I think David’s explanation is right on the mark. A JVM would not typically bother with both.
-Andy