[Proposal] Adding callback mechanism to Execution Engines

Hello,

I would like to have your opinions on this.

Problem:

Currently, there are no ways to perform hypercalls into LLVM (they transfer execution from the program being executed to the LLVM infrastructure to perform compilation). The goal of this project is to cleanly integrate an API into the LLVM code/execution engine to create custom callbacks. The “lli” executable should allow having a function be registered as callback with the execution engines and the program being compiled by LLVM should be able to make a callback into that function inside lli during its execution.

Design:

  1. The user programs link with a dummy library function called PerformCustomCallback(const char* CallbackName, void *Args).

  2. LLVM ExecutionEngines (JIT and MCJIT) will implement this function to perform the multiplexing of the various callbacks registered with them.

  3. Whenever the user program calls PerformCustomCallback with appropriate arguments, lli should experience a call to one of the registered callbacks.

Methodology:

In JIT, the external symbols are resolved during the call to void *JIT::getPointerToNamedFunction(const std::string &Name, bool AbortOnFailure). So, the hypercall resolution can be done here.

In MCJIT, during finalizeObject(), all the symbol relocations are finalized. In the Memory Managers, the external symbols are resolved. So, the hypercall resolution can be done in LinkingMemoryManager::getSymbolAddress().

In JIT::getPointerToNamedFunction(), check whether the function name is PerformCustomCallback and store the address of the function PerformCustomCallback in the global table. So the next time a call happens, JIT will get the address directly from the table.

In MCJIT, in LinkingMemoryManager::getSymbolAddress() do the same.

The execution engines (JIT and MCJIT) will provide the concrete implementation of PerformCustomCallback function. The PerformCustomCallback function has to call the appropriate function with the callback name provided by the user program and pass the arguments supplied by the user program to the callback functions. It also passes an extra argument to the callback functions which may contain the LLVM context as specified during the registering of the callback.

I propose to add a new API to the LLVM ExecutionEngine int registerCustomCallback(const char CallbackName, void (*)(), void *LLVM_args) to be used by llvm tools such as lli. This takes a function’s name to be registered as a callback, the pointer to the function and a placeholder for any extra arguments which the lli needs to pass to the callback functions. The ExecutionEngine will register the function names as callbacks and calls them when it encounters a callback in the user program.

Interface:

Whenever a new callback has to be registered, the following things are to be done:

  1. The new callback has to be defined in lli (a function definition).

  2. The callback has to be registered with the ExecutionEngine (JIT or MCJIT) using the API registerCustomCallback(). Depending upon the flags set during lli invocation, the call goes to JIT or MCJIT.

  3. In the user program, a call to PerformCustomCallback() with the name and arguments will execute the registered callback.

Example:

Suppose, I want to register a function named Callback1 which takes two integers as its parameters.

In lli.cpp in main(), define the function Callback1 and register it as follows:

EE->registerCustomCallback(“Callback1”, (reinterpret_cast<void*(*)()>(&Callback1)));

The user programs have to include the header file of the dummy library to get the PerformCustomCallback symbol. To make a hypercall to “Callback1”, do the following:

struct arg {

int arg1;

int arg2;

}Arg;

Arg.arg1 = 10;

Arg.arg2 = 20;

PerformCustomCallback(“Callback1”, &Arg);

Compile the user program (test.cc) using clang as follows:

clang -c -emit-llvm llvm-config --cppflags -o test test.cc llvm-config --libs llvm-config --ldflags -lLLVMCustomCallback

Usecases:

A typical usecase for this is to induce LLVM into Recompiling and relinking a function on demand from the client. Also, if we want to find the address of a function in a large module, we can get it using a mechanism like this.

Please let me know if there is a better design to achieve this. I am attaching the diff of the implementation for your reference.

Thanks,

Sumeeth

diff.txt (14.1 KB)

Hello,

I would like to have your opinions on this.

*Problem:*

Currently, there are no ways to perform hypercalls into LLVM (they
transfer execution from the program being executed to the LLVM
infrastructure to perform compilation). The goal of this project is to
cleanly integrate an API into the LLVM code/execution engine to create
custom callbacks. The “lli” executable should allow having a function be
registered as callback with the execution engines and the program being
compiled by LLVM should be able to make a callback into that function
inside lli during its execution.

What are you using lli for that you are running into this problem?

-- Sean Silva

Hi Sumeeth,

You want to call machine code functions from a program running under some EE.
Can’t this be implemented by directly mapping llvm::Function into an address?

Function* F = Function::Create(YourFunctionType, ExternalLinkage);

JIT->addGlobalMapping(F, Addr);

You can either add the CustomCallback function or better yet add exactly the functions you need with correct prototypes.

Yaron

Hi Sumeeth,

I’m not sure I understand what your new mechanism is supposed to add. You can already call functions defined in the host program from generated code. The Kaleidoscope tutorial does this. If the function you want to call is defined in a module that is dynamically loaded and you are using the default memory manager all you need to do is declare a prototype for the function in your generated module and the linking mechanism will handle it.

If the function is in a statically linked module, you need to do something to explicitly expose it. With the older JIT engine you can use addGlobalMapping as Yaron suggests, but I don’t think that will work with MCJIT. I believe that sys::DynamicLibrary::addSymbol will work with either engine. Alternatively, you could implement a custom memory manager to manage the linking directly.

-Andy

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]

On Behalf Of Kaylor, Andrew
Subject: Re: [LLVMdev] [Proposal] Adding callback mechanism to Execution Engines

If the function is in a statically linked module, you need to do something to explicitly expose it. With

the older JIT engine you can use addGlobalMapping as Yaron suggests, but I don’t think that will work

with MCJIT.

Seems to work fine for us with MCJIT.

  • Chuck

Hey Everyone,

I understood this a little differently (well, I do have direct contact with Sumeeth given that we both work in the same lab). Allow me to try and explain his proposal.

We are trying to optimise out instructions from a program (JIT-compiled OS Kernels or JIT-compiled Web Server code) during run time and we have this hypothesis that some of the decisions are best taken by the programmer himself, e.g. if statistics shows that a particular subsystem of the program being run hasn’t seen any action in quite a while, the programmer is in a position to decide whether that part of the code can be optimised out of the binary or not (if the code is some highly sensitive failure detection code then I guess he wouldn’t want it removed no matter what).

For this, we wanted to some mechanism to call back into the execution engine to get stuff like stats or to tell the execution engine to recompile a particular module or function etc.

His proposal is to introduce a callback mechanism in the execution engine that allows the internal tools to expose functions like offering stats or the ability to toggle a recompile etc. to the user program.

Hope that clears some things up.

Cheers,
Amogh

Are you using the latest code from trunk? I didn’t think the latest code used the address mapping in the ExecutionEngine base class.

Of course, if people are depending on this it might be something that should be fixed if it isn’t working.

-Andy

Hi Andrew,

I used the latest code from trunk. GlobalSymbolTable is being used in MCJIT.

I guess it wasn’t clear from the proposal that the user program will be modified to indicate that the callback should happen at that point in the code. The objective is to call some of the functions which belong to lli or the ExecutionEngine.

Thanks,
Sumeeth

That’s definitely interesting, but I’m not sure it’s a general enough use case to put into the LLVM classes.

As described earlier, you can call an arbitrary function in your program from the generated code. If the execution engine did expose interesting information, you could call the EE interface from that function.

However, I’m not sure that the ExecutionEngine is likely to have the information you want. The ExecutionEngine is primarily responsible for generating the binary code. The current interfaces expose some helper functions to allow you to call the generated code directly. However, these are likely to be moved out of the EE interface, at least for MCJIT. The most the EE could reliably tell you is how often the address of a particular function was requested, and again I don’t know that there is a general case for tracking such statistics in the EE itself.

In any event, once the EE has returned the address of a function, it will have no information about how many times the function is called. Also, a function can be called from within the generated code without the EE knowing anything about it.

-Andy

Unless I’m missing something, indeed addGlobalMapping should not work with MCJIT.
MCJIT does not consult EEState.getGlobalAddressMap when resolving symbols.
Instead it uses RTDyldMemoryManager::getSymbolAddress which checks with DynamicLibrary::SearchForAddressOfSymbol, so Andy’s suggestion of DynamicLibrary::addSymbol is better as it should work with both JIT and MCJIT.

Another options is to use the LazyFunctionCreator which is implemented in both JIT and MCJIT.

Andy - MCJIT::getPointerToFunction does call EE:addGlobalMapping - however EEState.getGlobalAddressMap is not used in MCJIT. Should this call be removed?

Yaron

Yes, MCJIT uses the RTDyldMemoryManager::getSymbolAddress when resolving symbols. I am using the LinkingMemoryManager::getSymbolAddress to resolve the symbol.

-Sumeeth

From: Kaylor, Andrew [mailto:andrew.kaylor@intel.com]
Subject: RE: [LLVMdev] [Proposal] Adding callback mechanism to Execution Engines

Are you using the latest code from trunk? I didn’t think the latest code used the address

mapping in the ExecutionEngine base class.

No, I should have mentioned that we’re on 3.3. I don’t think we’ve run anything other than small tests on recent trunk versions so far.

Of course, if people are depending on this it might be something that should be fixed if it isn’t working.

It appears there are alternatives, so it’s probably not a serious issue.

  • Chuck