Work in progress patch to bug 2606

The patch I recently attached to bug 2606, reproduced here, is my first attempt to solve this issue
as represented by the attached test cases. To solve the foreign Module GlobalVariable problem,
I modified JIT::getOrEmitGlobalVariable(…) to directly attempt to map a found “external” GlobalVariable.
To solve the foreign Module Function problem, I modified both JIT.{h,cpp} and JITEmitter.cpp to emit
a stub for a found “external” Function instance, when that foreign Module defined Function has not
yet been compiled.

All areas of interest are marked with // GMV Mod:

As I’ve taken liberties with existing code, and thereby may have touched test cases that will fail, I would like
to have those members of the list that have time, review this proposal for comments/changes.

Major mods:

JIT.{h,cpp}:

  1. I included Module.h.

  2. JIT::getPointerToFunction(…) was modified to search for a non-mapped function in all “other” modules.
    If found a stub is emitted, which will lazily emit the real function at runtime.

One issue here is that functions that are external to all modules, have their mappings delayed by the
above loop. I also kept the hasAvailableExternallyLinkage() the same as before, as I’m not sure if this
should also go through the same module procedure. The stub emission will only take place for function
definitions that are marked with external linkage. Should other cases be allowed? Should visibility also
be considered?

  1. JIT::getOrEmitGlobalVariable(…) was modified to search for non-mapped global variables in all “other”
    modules. If found, the address of the foreign global variable is requested, and mapped. There is no
    delay here.

An attempt is first made to find the global outside of all modules, before attempting to find the function
in the known modules. This is the reverse logic used in #2. On searching, unlike for functions, a linkage
check was not made. Also the global variable declaration and definition must have the exact same type.
Is this the correct approach?

  1. The declaration of void* forceEmitFunctionStub(Function *F) was added to the class JIT. This is
    implemented in JITEmitter.cpp.

JITEmitter.cpp:

  1. JIT::forceEmitFunctionStub(…) was added. It turns around and calls JITResolver::forceEmitFunctionStub(…).

Is the JIT:getPointerToFunction(…) code path sensitive to such call overhead?

  1. JITResolver::forceEmitFunctionStub(Function *F) was added. It is a simplified version of JITResolver::getLazyFunctionStub(…),
    which does not add to pending. It uses the same JITResolver::JITCompilerFn(…) for runtime emission that
    the lazy compilation system uses.

Outside of the goal, I’m not sure what code is unnecessary here. Notice that the hasAvailableExternallyLinkage
case is dealt with here even though forceEmitFunctionStub(…) would not be entered for this case. I’m forced to
use JITResolver::JITCompilerFn(…) as there is a static relationship between JITResolver and say X86JITInfo. See
implementation of X86JITInfo::getLazyResolverFunction(…) (X86JITInfo.cpp). Should a new JITCompilerFn like
function be created for the purposes of resolving this issue? This would of course require modifications to the
subclasses of TargetJITInfo.

  1. JITResolver::JITCompilerFn(…) was modified to no longer exit on being used for non-lazy compilation scenarios.

This means of course that the error test path behavior for this function has been modified. See #2 above.

Beyond any issues with the patch, there is a question, in my mind, as to whether 2606 is really a bug. Sure its resolution makes
using the JIT simpler for cross module behavior, but current manual solutions may be more fined grained in their approach.
If so a fix to 2606 would circumvent such handling.

Thanks for any time spent on this

Garrison

PR2606.proposal.patch (9.61 KB)

Hi Garrison,

I am not a specialist of the code but here is my 2 cents:

  • I like the idea that in lazy-mode the call (in module or not) is treated by a stub (like every calls).

  • If the jit is in non-lazy mode, I’m not really fan of the “stub” solution. Is it not possible to use the same mechanism as it already exists : add the function to pending list and emit it after the current functions ? In fact, can you replace forEmitFunctionStub with getLazyFunctionStub ? With a non-lazy jit, I want the first execution of my function to be the same execution speed as the second one. A stub in non-lazy mode doesn’t fell natural to me.

  • I don’t see why functions external to every modules (let’s say “printf” ?) are delayed with your mechanism. In the last part of your modified getPointerToFunction, you call “getPointerToNamedFunction”. AFAICT a printf call will be catch by this code, no ?

  • I like the ideas to be flexible, to be able to say this global variable declaration should point here (with addGlobalMapping), even if there is another module which have a global variable. And in fact, as the workaround is relatively easy, I wonder if it worth the change ?

Anyway, thanks for asking opinion on this and for contributions you’ve made on code !

Olivier.

Hi Olivier,

Thanks for responding! I get to learn this way.

Hi Garrison,

I am not a specialist of the code but here is my 2 cents:

  • I like the idea that in lazy-mode the call (in module or not) is treated by a stub (like every calls).

If we go further with this, I’ll have to add test cases for lazy mode. I kind of ignored it thinking I’ll wait to see what
the group thinks about the approach.

  • If the jit is in non-lazy mode, I’m not really fan of the “stub” solution. Is it not possible to use the same mechanism as it already exists : add the function to pending list and emit it after the current functions ? In fact, can you replace forEmitFunctionStub with getLazyFunctionStub ? With a non-lazy jit, I want the first execution of my function to be the same execution speed as the second one. A stub in non-lazy mode doesn’t fell natural to me.

So as best as I can determine, the problem with leveraging pending, as used by getLazyFunctionStub(…) is that
I don’t know when to call JIT::updateFunctionStub(…). Ideally this behavior should be invoked in the outermost
getPointerToFunctionCall(), but one doesn’t know, as currently implemented, when this is.

To be clearer, in order to prevent recursive function compilation cases, one needs to delay the compilation of a
function being called until after the current calling function is compiled. In other words to deal with the scenario
where Fa (module A), calls Fb (module B), which in turn calls Fa, we delay the compilation of Fb until we have
finished the compilation of Fa, unless Fb has previously been compiled. Let’s say we used JITEmitter::getLazyFunctionStub(…).
If a user called getPointerToFunction(…) on Fa, everything would work fine. In compiling Fa, getPointerToFunction(…)
(via JIT::runJITOnFunctionUnlocked(…)) would encounter Fb in module B, run getLazyFunctionStub(…) on it, and
then finish with the evaluation of the pending list; again via runJITOnFunctionUnlocked(). The latter functionality
would result in the compilation of Fb. Say however we instead had Fa1 call Fb, and Fa2 in sequence (Fa1, Fa2
in module A, Fa2 not previously compiled). The former description would still hold up to the point Fa2 is encountered.
Since Fa2 is in Fa1’s module, runJITOnFunctionUnlocked(…) would compile it, check the pending list, and update the
contained functions. Since the stub for Fb is in that pending list, Fb would be compiled at this time. However the compilation
of Fa1 is still not finished, and we have what I was trying to prevent. Any errors in my logic, or my understanding of the
code? MutexGuards are just mutexes, right? Seems one could use reader/writer lock semantics to force the handing of
the pending list only in the outmost runJITOnFunctionUnlocked(…) call. I should also add a test case for the above Fa1,
Fa2, Fb, use case in the bug’s attachments.

Having said this I might be going down a rat hole. It would seem such recursive compilations dependencies would exist
for functions in the same module, and these are handled. So I should probably look at that code. We are of course
limited by the existing code, since this is a bug fix not a design change.

You also have another point. It is dissatisfactory to have the first call delayed just because it is in another module.
We could use another semantic where the pending list is handled outside of the outermost getPointerToFunction(…)
is called. This is of course a design change.

  • I don’t see why functions external to every modules (let’s say “printf” ?) are delayed with your mechanism. In the last part of your modified getPointerToFunction, you call “getPointerToNamedFunction”. AFAICT a printf call will be catch by this code, no ?

If I understand what your saying correctly: the printf will be caught here, but only after the module search is finished. In
essence I was saying that with the bug fix, as compared to the pre-existing implementation, the handling of such
external functions is delayed. If I could tell that the function belonged to this category before I did the module search
then there would obviously be no delay. Is this possible? I have to say I’m being rather theoretical here. Delay, no delay;
sounds like I’m trying to predict whether the tree fell in the forest. :wink: I just don’t yet have the requisite LLVM knowledge to
run empirical tests. I tend to hide behind worries of falling timber given the erudite nature of this list’s membership. :slight_smile:

  • I like the ideas to be flexible, to be able to say this global variable declaration should point here (with addGlobalMapping), even if there is another module which have a global variable. And in fact, as the workaround is relatively easy, I wonder if it worth the change ?

I agree, I wonder if there should be a separate mode where a cross module find is turned on, with the default set to off.
In the optional new mode, stubs would be gathered and processed (until done), as a separate phase–a linking phase
for JIT as exists in other LLVM JIT tools. I have not looked at those either.

  1. Should we close the bug by dismissing it as no longer appropriate?
  2. Apply the approach as given, with possible changes, and fix the bug?
  3. Create a new approach, still viewing the bug as valid?
  4. Keep the bug open because the implied features are desired but require a minor design change to resolve?
  5. Do other?

Thanks again for looking at this Olivier.

Garrison

Anyway, thanks for asking opinion on this and for contributions you’ve made on code !

Olivier.

The patch I recently attached to bug 2606, reproduced here, is my first attempt to solve this issue
as represented by the attached test cases. To solve the foreign Module GlobalVariable problem,
I modified JIT::getOrEmitGlobalVariable(…) to directly attempt to map a found “external” GlobalVariable.
To solve the foreign Module Function problem, I modified both JIT.{h,cpp} and JITEmitter.cpp to emit
a stub for a found “external” Function instance, when that foreign Module defined Function has not
yet been compiled.

All areas of interest are marked with // GMV Mod:

As I’ve taken liberties with existing code, and thereby may have touched test cases that will fail, I would like
to have those members of the list that have time, review this proposal for comments/changes.

[snip]

In thinking about this we could use a Mutex::tryacquire(…) (non-recursive), around JIT::runJITOnFunctionUnlocked(…)'s
while loop, and use your JITEmitter:: getLazyFunctionStub(…) suggestion in place of forceEmitFunctionStub(…). Is the lock
attempt too heavy, even if it is implemented with atomics? I’ll implement this when I have time.

Garrison

Well Olivier I created the use case where I thought use of the JITResolver::getLazyFunctionStub() in place
of my new JITResolver::forceEmitFunctionStub(…) would break. However I was going down a rat hole since
JITResolver::getLazyFunctionStub() is apparently used for all encountered functions. I say apparently here
because I determined this with gdb, having seen my test case NOT break the use of getLazyFunctionStub(…).
The scenario Fa in module A, Fb1, Fb2 in module B, where Fb1 calls Fa then Fb2, with Fa calling Fb1 compiles with no
flaws. Therefore there is no need for an extra control mechanism, such as using a try lock, and no need for an extra
performance hit on the first invocation of a function belonging to an external module. JITResolver::getLazyFunctionStub(…)
seems to work great here. I’ll attach another patch to 2606 using JITResolver::getLazyFunctionStub(…) in place of
JITResolver::forceEmitFunctionStub(…). I’ll also re-submit here for discussion given the other issues outlined.

It is interesting to note that considering how JITResolver::getLazyFunctionStub(…) was designed/implemented, the
llvm leads must have known that JITResolver::getLazyFunctionStub(…) could be used for resolving functions in external
modules. The fact that this was not done implies the reasons for not fixing 2606 are in line with the lack of fine grained
control of such cross module auto link behavior (for lack of better words), as you and I both touched on. Again I keep
wondering if there should be an additional and optional mode.

Thanks for the help

Garrison

Hi Olivier,
Thanks for responding! I get to learn this way.

Hi Garrison,

I am not a specialist of the code but here is my 2 cents:

- I like the idea that in lazy-mode the call (in module or not) is treated
by a stub (like every calls).

If we go further with this, I'll have to add test cases for lazy mode. I
kind of ignored it thinking I'll wait to see what
the group thinks about the approach.

- If the jit is in non-lazy mode, I'm not really fan of the "stub" solution.
Is it not possible to use the same mechanism as it already exists : add the
function to pending list and emit it after the current functions ? In fact,
can you replace forEmitFunctionStub with getLazyFunctionStub ? With a
non-lazy jit, I want the first execution of my function to be the same
execution speed as the second one. A stub in non-lazy mode doesn't fell
natural to me.

So as best as I can determine, the problem with leveraging pending, as used
by getLazyFunctionStub(...) is that
I don't know when to call JIT::updateFunctionStub(...). Ideally this
behavior should be invoked in the outermost
getPointerToFunctionCall(), but one doesn't know, as currently implemented,
when this is.
To be clearer, in order to prevent recursive function compilation cases, one
needs to delay the compilation of a
function being called until after the current calling function is compiled.
In other words to deal with the scenario
where Fa (module A), calls Fb (module B), which in turn calls Fa, we delay
the compilation of Fb until we have
finished the compilation of Fa, unless Fb has previously been compiled.
Let's say we used JITEmitter::getLazyFunctionStub(...).
If a user called getPointerToFunction(...) on Fa, everything would work
fine. In compiling Fa, getPointerToFunction(...)
(via JIT::runJITOnFunctionUnlocked(...)) would encounter Fb in module B,
run getLazyFunctionStub(...) on it, and
then finish with the evaluation of the pending list; again
via runJITOnFunctionUnlocked(). The latter functionality
would result in the compilation of Fb. Say however we instead had Fa1 call
Fb, and Fa2 in sequence (Fa1, Fa2
in module A, Fa2 not previously compiled). The former description would
still hold up to the point Fa2 is encountered.
Since Fa2 is in Fa1's module, runJITOnFunctionUnlocked(...) would compile
it, check the pending list, and update the
contained functions. Since the stub for Fb is in that pending list, Fb would
be compiled at this time. However the compilation
of Fa1 is still not finished, and we have what I was trying to prevent. Any
errors in my logic, or my understanding of the
code? MutexGuards are just mutexes, right? Seems one could use reader/writer
lock semantics to force the handing of
the pending list only in the outmost runJITOnFunctionUnlocked(...) call. I
should also add a test case for the above Fa1,
Fa2, Fb, use case in the bug's attachments.
Having said this I might be going down a rat hole. It would seem such
recursive compilations dependencies would exist
for functions in the same module, and these are handled. So I should
probably look at that code.

Sorry for being so slow at looking at this. The existing pending
function system ensures that only one function is being compiled at a
time, and enforces that with the "assert(!isAlreadyCodeGenerating)" at
the top of runJITOnFunctionUnlocked(). You should use that mechanism
instead of adding a new way to create stubs.