Using bytecode version of std::sort for JIT generated data type

Hi LLVM devs,
The performance of C++ std::sort comes from being able to inline the comparator. For a JIT generated data type, using the comparator as a function call from std::sort may not be ideal. So, i was wondering how can we make a JIT-sort which is as good as statically compiled std::sort with comparator inlined.

What is the recommended way to pass a an existing function like std::sort into JIT so that it can optimize the whole program? For example, how do we generate the bytecode for existing function (say some external tools) and give to JIT runtime. Is there some code sample?

The alternative is to rollout my own Sort function inside JIT, but i don’t want to do that and want to take this as an opportunity to learn the general approach of passing existing function definitions into JIT to do whole program optimization like inlining.

I found this stackoverflow question which is related to what I am asking for, but I don’t see any final conclusion on this beyond incurring a function call for each comparison.

Thanks!

Rajesh S R

If you consider how std::sort works - it doesn’t exist in the machine (or even LLVM IR) level without a specific type - so if you were to support something like std::sort in your language (JITed or not), it means some form of specialization of your types/functions based on other types.

So, yes, something like “own Sort function inside JIT” - if your language doesn’t support a way to write this in the language itself (if it doesn’t have anything like C++ templates or an equivalent generic thing).

Otherwise you can do something like qsort (which uses a function pointer to parameterize the comparison) & perhaps force or encourage the optimizer to inline and collapse away this indirection.

Thanks David! I understand that std::sort doesn’t exist without types especially at bytecode layer.

What I was thinking was something like the following:

Compile std::sort with a thunk function Compare(void*, void*) {rerturn false} into bytecode with an option say noinline and always make the function call or even a simple unoptimized bytecode which guarantees that Compare exists as a function without any inlining.

At run time implement a new Compare function for your type and replace the Compare function with the new Compare implemented with JIT. Now the JIT runtime has whole program in bytecode which it can aggressively optimize.

A good way to realize this “thunking” into LLVM JIT can enable lots of optimized algorithms to be ported into JIT without having to re-invent them for each JIT runtime.

Thoughts?

That’s an option, for sure - having llvm::Functions you could clone into your module, replace the call to the Compare function to a call to the specific comparison you want & the let the optimizers do their work.

Thanks David!

I am not clear on how to achieve this though. Could you give more info on this?

I expect something like this:

sort.cc file:

bool Compare(void* a, void* b) {
return false;
}

void SortFunc(void* arr, int len) {
std::sort(arr, len, &Compare)
}

$MAGIC_COMMAND sort.cc -o a.llvm

a.llvm is a bytecode which can be loaded at runtime in my JIT module and write some code to replace “Compare” function with our own “Compare” function.

There isn’t any magic command for this - you’d have to write some C++ code/a custom LLVM optimization pass.

Though, that said - perhaps it should just be a runtime parameter where you rely on LLVM to inline/optimize things away? You could do some relatively smaller code generation - generate a function that calls into the generic function (that takes the functior by pointer) and have the generic function be always_inline, so it gets inlined into the outer, type-specific function, without you having to write code to create all that IR on every use (instead relying on the inliner to create it for you, essentially)

There isn’t any magic command for this - you’d have to write some C++ code/a custom LLVM optimization pass.

Though, that said - perhaps it should just be a runtime parameter where you rely on LLVM to inline/optimize things away? You could do some relatively smaller code generation - generate a function that calls into the generic function (that takes the functior by pointer) and have the generic function be always_inline, so it gets inlined into the outer, type-specific function, without you having to write code to create all that IR on every use (instead relying on the inliner to create it for you, essentially)

I don’t understand what you mean here. If sort is already generated into machine code based on Compare(void*, void*) { return false; } how can we inject our function here?

Ah, no, sort wouldn’t be compiled down to machine code - it’d be in LLVM IR. You’d import that into the module where the user code was (or copy this “standard library” LLVM IR module and then add the to-be-JITed code to that copy, then compile it all together).

Do you have some pointers on how to do it? Take a file like sort.cc above and generate a JIT module. Does llc help with this?
What library function can we use to load an existing IR module file into my JIT runtime module to compile them together?

clang is the tool you’d use to get LLVM IR from C source code
Looks like you can use CloneModule: https://llvm.org/doxygen/CloneModule_8cpp_source.html to clone a module. So you could have your “standard library” in a module, you load the module up - then every time you want to JIT something that might use it - you clone it, then JIT the new things into that module and compile it.

Thank you very much David for all the help!
I will try these and post if I have any questions.