OrcLazyJIT for windows

Hi There,

I am currently exploring C++ JIT-compilation for a project where this would be very useful. I started with the code from the lli tool which uses OrcLazyJIT and changed it, such that the module is being compiled from c++ source in memory and OrcLazyJIT is used exclusively.

Now since I am on windows, I found that my application is crashing when trying to run the main function from the jit-compiled module ( which was found by casting the symbol address to the main prototype). Now after some digging I found that the crash is caused by LocalJITCompileCallbackManager::reenter not getting the correct CompileCallback and trampolineid references. This in turn is being caused by
OrcX86_64::writeResolverCode not respecting windows calling convention in the asm code for calling the reentry function.

After making changes to the asm code in OrcX86_64::writeResolverCode, the code runs without any problems. I thought I share it here with the public so that others who would like to use orclazyjit on windows could benefit. Please let me know if a different channel would be more appropriate.

Best,
David

In order to get OrcLazyJIT to work under windows, replace the prebaked asm code in OrcX86_64::writeResolverCode in file llvm/lib/ExecutionEngine/Orc/OrcAchitectureSupport.cpp with the following. Note that more work is needed to both support linux/windows but I am not sure how this is best dealt with in llvm.

// windows (arguments go to rcx and rdx and have reversed order)—

const uint8_t ResolverCode[] = {
					   // resolver_entry:
0x55,                                      // 0x00: pushq     %rbp
0x48, 0x89, 0xe5,                          // 0x01: movq      %rsp, %rbp
0x50,                                      // 0x04: pushq     %rax
0x53,                                      // 0x05: pushq     %rbx
0x51,                                      // 0x06: pushq     %rcx
0x52,                                      // 0x07: pushq     %rdx
0x56,                                      // 0x08: pushq     %rsi
0x57,                                      // 0x09: pushq     %rdi
0x41, 0x50,                                // 0x0a: pushq     %r8
0x41, 0x51,                                // 0x0c: pushq     %r9
0x41, 0x52,                                // 0x0e: pushq     %r10
0x41, 0x53,                                // 0x10: pushq     %r11
0x41, 0x54,                                // 0x12: pushq     %r12
0x41, 0x55,                                // 0x14: pushq     %r13
0x41, 0x56,                                // 0x16: pushq     %r14
0x41, 0x57,                                // 0x18: pushq     %r15
0x48, 0x81, 0xec, 0x08, 0x02, 0x00, 0x00,  // 0x1a: subq      0x208, %rsp
0x48, 0x0f, 0xae, 0x04, 0x24,              // 0x21: fxsave64  (%rsp)

0x48, 0xb9,                                // 0x26: movabsq   <CBMgr>, %rcx
// 0x28: Callback manager addr.
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

0x48, 0x8B, 0x55, 0x08,                    // mov    rdx,QWORD PTR [rbp+0x8]
0x48, 0x83, 0xea, 0x06,                    // sub    rdx,0x6

0x48, 0xb8,                                // 0x38: movabsq   <REntry>, %rax
// 0x3a: JIT re-entry fn addr:
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

0xff, 0xd0,                                // 0x42: callq     *%rax
0x48, 0x89, 0x45, 0x08,                    // 0x44: movq      %rax, 8(%rbp)
0x48, 0x0f, 0xae, 0x0c, 0x24,              // 0x48: fxrstor64 (%rsp)
0x48, 0x81, 0xc4, 0x08, 0x02, 0x00, 0x00,  // 0x4d: addq      0x208, %rsp
0x41, 0x5f,                                // 0x54: popq      %r15
0x41, 0x5e,                                // 0x56: popq      %r14
0x41, 0x5d,                                // 0x58: popq      %r13
0x41, 0x5c,                                // 0x5a: popq      %r12
0x41, 0x5b,                                // 0x5c: popq      %r11
0x41, 0x5a,                                // 0x5e: popq      %r10
0x41, 0x59,                                // 0x60: popq      %r9
0x41, 0x58,                                // 0x62: popq      %r8
0x5f,                                      // 0x64: popq      %rdi
0x5e,                                      // 0x65: popq      %rsi
0x5a,                                      // 0x66: popq      %rdx
0x59,                                      // 0x67: popq      %rcx
0x5b,                                      // 0x68: popq      %rbx
0x58,                                      // 0x69: popq      %rax
0x5d,                                      // 0x6a: popq      %rbp
0xc3,                                      // 0x6b: retq
};
const unsigned ReentryFnAddrOffset = 0x3a;
const unsigned CallbackMgrAddrOffset = 0x28;

+Lang, JIT Cowboy

Hi David,

This is really cool. I’d love to get this in-tree.

There are two ways we could go about this:

(1) Make the OrcArchitecture interface ABI-aware so that it can choose the right resolver code,
or
(2) Replace the OrcArchitecture classes with OrcABI classes. I.e. We’d just a rename OrcX86_64 → Orc_X86_64_SysV (and rename I386 & AArch64 similarly) , then we add your code as Orc_X86_64_Windows.

I think the second is probably the way to go, with a little refactoring so that the various X86 ABIs could share the stub and resolver code.

Any interest in submitting a patch?

  • Lang.

Hi Lang,

I also agree that the second option should probably be the way to go. I will give it a shot and submit a patch.

Best,

David

Hi Lang,

I have attached the patch to this mail for your consideration.

Notes:

-I basically just added OrcX86_64_Win32 with the custom resolver code for windows. All the other static methods simply relay to their OrcX86_64 versions (therefore there is no need for any more refactoring).

-The asm code in my inital post doesnt work in release mode, because I forgot to account for shadow space allocation on stack. The code in the attached patch has that fixed.

-I decided to name the support class OrcX86_64_Win32 in order to be consistent with Triple::OSType::Win32.

-I didnt rename any of the other classes in OrcArchitectureSupport, as I didnt feel comfortable touching those. I guess renaming those is a quick thing to do.

-In lli, I made changes to stub and compilecallback creation functions accordingly. Here I test for the os type in triple and return the win32 support class if required. There might be other tools/places I am not aware of where this needs to be done.

-I didnt ran any tests apart from my little sandbox example.

Best,

David

orclazyjit_win32.diff (8.7 KB)

Awesome!

Patch committed with cleanup in r268845. Thanks very much David. :slight_smile:

  • Lang.