I read through Julia code that Jameson has written and it seems like
the unwind info is hand generated i.e. it is not coming from LLVM
directly because the generated prolog/epilog is the same always? I'm
So then I read through this bug report:
Alexey Zasenko mentions that relocation support is incomplete. Then
Andy Ayers from Microsoft says that the PE file format, and therefore
the corresponding RUNTIME_FUNCTION data structure only supports
code-loaded within a 4GB extent. He doesn't mention it explicitly but
it would follow that Microsoft compilers do not generate images that
are > 4GB in size because Windows cannot load them. It would then not
be a stretch that Microsoft's JIT compilers are also not capable of
producing code more than 4GB apart. Would it also be safe to say that
they can't generate more than 4GB of code?
From the bug report Andy's next set of comments are around how to work
around this problem. It seems they (Microsoft JIT compiler) don't need
to care because they don't ever generate code of such volumes or that
But then Stefan Gränitz suggests a solution that somehow accommodates
this > 4GB situation. It would seem that this is accomplished by
emitting a relocation of type: IMAGE_REL_AMD64_ADDR64
What is curious is that according to
Microsoft compilers rarely generate this type of relocation but they
can do it.
After reading all of this, I still don't have a clear picture, so I'm
writing a summary here and maybe somebody can refute what I'm saying
or point in the right direction
Options to make progress for at least some JIT users:
(1) Use the RuntimeDyIDELF and borrow code from JuliaLang/Julia where
Jameson has seemingly figured out what the UNWIND_INFO is for some set
of prolog/epilog -- unclear if this can break or not.
(2) Use small code model and have the application embedding the jit
ask the OS for some committed space range. This hopefully makes it so
that everything is within 32-bits.
(3) Figure out why IMAGE_REL_AMD64_ADDR32NB is ever emitted and why
Microsoft's JIT Compiler doesn't seem to be needing it. Alexey Zasenko
has a patch, does that need to be upstreamed?
(4) Stefan Gränitz has a patch that can solve the 64-bit problem a
And I'm looking for if my reasoning is sound for (5) and if indeed
this will work I can't think of a reason why it won't:
(5) Generate code function-by-function (i.e. 1 function per LLVM
module) and each time ask the memory manager to return a memory chunk
so that XDATA is within 32-bit of the function.
I suppose what is puzzling to me most is why (5) is not enough? I mean
assuming you're writing a JIT compiler with lazy compilation how is
that you'll ever generate code whose XDATA is > 4GB apart than the
code. And if you have direct references to full 64-bit addresses, i.e.
you're calling some previously jit compiled function or reference some
data structure ... they don't need to be relocated at all. Or is this
where I'm wrong? And that in fact the direct references do need to be
relocated? But that wouldn't make sense because how does anyone else
know what to relocate it to?
Appreciate any help or direction!