Performance degradation when repeatedly exchanging JITted functions

Hi all,

for a research project we need to repeatedly exchange functions in a program running in the JIT compiler.
We currently do this by calling recompileAndRelinkFunction(), after changing the body of the function. Of course we synchronize enough to ensure that the JIT doesn't concurrently compile the function (which should only happen if lazy compilation is enabled).

Now recompileAndRelinkFunction saves the old function pointer, then runs the JIT, and writes a jump to the new function pointer at the memory of the old function.
The problem with this implementation is (and I verified that this really happens) that this builds chains of jumps, that are traversed each time the function is called. This is because the callsites are never updated. There is actually a FIXME in the JITEmitter saying "FIXME: We could rewrite all references to this stub if we knew them.", but of course it would be hard to catch them all, given the variety of call instructions.
Another drawback is that the memory of old function memory can never be freed, since it is still used in the jump chain.

To measure the performance impact of this, I wrote a small example program, where each second the function is recompiled and the number of method calls is printed (Mcalls = million calls). The performance degradation is quite impressive:
After 0 replacements: 335.724 Mcalls/sec
After 1 replacements: 274.735 Mcalls/sec ( 82.010% of initial)
After 2 replacements: 232.640 Mcalls/sec ( 69.445% of initial)
After 3 replacements: 201.898 Mcalls/sec ( 60.268% of initial)
After 4 replacements: 177.727 Mcalls/sec ( 53.053% of initial)
After 5 replacements: 158.765 Mcalls/sec ( 47.393% of initial)
After 10 replacements: 102.098 Mcalls/sec ( 30.477% of initial)
After 20 replacements: 60.197 Mcalls/sec ( 17.969% of initial)
After 50 replacements: 27.049 Mcalls/sec ( 8.074% of initial)
After 200 replacements: 7.438 Mcalls/sec ( 2.220% of initial)
After 460 replacements: 3.273 Mcalls/sec ( 0.977% of initial)

I think a solution would be to always call a function through it's stub, so that there is a single location to update when the function is exchanged. This would mean that there is always exactly one level of indirection, which is worse for programs that don't exchange functions at runtime, but is much better in our scenario.
I tried to add a flag to the JIT to implement that (always return the address of the stub and never update the global mapping), but I gave up since there are too many classes relying on the update of the global map (including the JIT itself).

An alternative approach that won't require patching llvm would be to manage an array of all function pointers in the "VM" we are implementing, and then to replace in the bitcode each direct function call by a load from that array, and a call to that address. Then the VM could just update the array after recompiling a function, and all call sites will use the new pointer.
The overhead should be comparable to the "always go through stub" method.
Some more logic would be required to handle indirect calls, but this could be handled by callbacks into the VM.

But before implementing that I wanted to ask if anybody already has a working solution for the problem.
Or whether the problem is important enough to address it directly in LLVM.

Cheers,
Clemens

RepeatedMethodExchange.cpp (4.7 KB)

Actually, you just have to make sure that you always patch the initial
function. You don't have to force it to be a stub.

Joerg

Surely you need to patch *all* functions, not just the initial?

The point is with the current solution no matter which version of the function another function is linked to, it will hit a sled of JMPs and eventually end up at the newest.

If you only patched the first, that sled wouldn't work. So you'd have to patch all instances. That still shouldn't be too hard.

Cheers,

James

Depends on whether you always link to the original address or not.
If you use link with the latest address, you have to patch all versions
to point to the latest, otherwise you can just patch the first.
Advantage of using the latest address: one saved jmp per call.
Advantage of using the initial address: easier G/C of intermediate
versions, less things to keep track of.

Joerg

Advantage of using the latest address: one saved jmp per call.

Per newly JITted call :wink:

Advantage of using the initial address: easier G/C of intermediate
versions, less things to keep track of.

I still think both versions require larger changes. When using the latest address, you have to keep track of all JITted functions per Function in order to update them. And their number increases linearly, so the time needed for exchanging a function increases as well.

When using the initial address, you also have to patch all places in LLVM that rely on the global mapping being updated, which are more than I initially thought. That's why I stopped working on that.

I don't think that a patch implementing any of those approaches would be accepted, that's why I am tending towards implementing it outside of LLVM.

Cheers,
Clemens

I don't think that a patch implementing any of those approaches would be
accepted, that's why I am tending towards implementing it outside of LLVM.

Why not? If they make LLVM better and aren't hacks, why wouldn't they be accepted?

Okay, that motivated me to work on the patch again. I think I found a compromise of the discussed approaches.
The original stub (which is being hold by the JITResolver anyway) is updated to point to the new version in any case.
Additionally you can set a flag in the ExecutionEngine to always use the stub when calling a function. If this flag is set, a recompileAndRelinkFunction does *not* patch the old function pointer to jump to the new function, since all calls use the stub anyway.

Since - as I wrote - several places in the JIT rely on the global mapping being updated to the start of the newly jitted function, I didn't change that. Instead, after jitting a function, the mapping is changend back to the stub, if the KeepStubs flag is set.
The only drawback of this is that *directly* recursive calls still bypass the stub and jump back directly to the function pointer. But since exchanging a function while another thread is executing it is unsafe anyway, this shouldn't matter. Even exchanging a function running in the same thread (e.g. from a callback into the VM) is unsafe in the current implementation, since you would overwrite the original function code at the start of the method.

So I think this should be fine.

I attached a patch implementing this, and a test case for the new flag. Both apply to trunk.

Should I send them to the commits list, or does anyone with commit rights find them here?
If so, that person can also apply the fix and testcase for bug 12197, which I stumbled across and is slightly related to this one.
http://llvm.org/bugs/show_bug.cgi?id=12197

Cheers,
Clemens

implement_KeepStubs.patch (6.01 KB)

testcase_KeepStubs.patch (2.27 KB)

Hi Clemens,

You should send to the commits list, as you suggest :slight_smile:

Cheers,

James