Orc Windows C++

Hello folks,

I’m developing an application that uses Orc JIT for C++, which works swimmingly on Mac OS X. However, the Windows version has been a battle and a half, and it’s now at the point where I need some assistance to progress.

The problem I’m having is “Relocation overflow” (related: https://llvm.org/bugs/show_bug.cgi?id=23228#c8, see #8) … so I spoke to some clang developers who focussed on Windows at CppCon last week, and they gave me the following advice:

  • Use ELF
  • Using this results in another issue about comdat sections, see here: https://root.cern.ch/phpBB3/viewtopic.php?t=19808
  • Stick with COFF, but use the large code model
  • No observed difference, seems to be the case because JITDefault is being used in the same way as Large, which would make sense
  • According to the clang developers I spoke to, Lang and Andy might have an interest in fixing this (would seem likely, as they’re the two commenters on the first issue I linked), since it’s better to use COFF on Windows than keep trying to work around it

Any ideas?

Thanks in advance!

Moving to the LLVM Dev list & cc’ing Lang.

Maybe looking at their code might help:

https://github.com/dotnet/llilc/blob/dd12743f9cdb5418f1c39b2cd756da1e8396a922/lib/Jit/LLILCJit.cpp#L299

Thanks for the link!
There’s some code there that looks extremely relevant to say the least.

If LLVM is generating the x64 code and you have specified a large code model, you should not see any 32 bit relocations.

So it would be interesting to determine what kind of relocation you are seeing and where it came from.

It’s pretty intermittent at the moment…sometimes I get the relocation overflow issue, sometimes I get another issue about BSS sections having no contents.

The source code to reproduce either is simple:

#include <iostream>

int main (int argc, char* argv)
{
  
}

I’ve managed to reproduce the BSS section issue in clang consistently, and since that comes before terms of where it happens in the compilation / JIT’ing process, I can’t get to the part where I see the relocation issue in clang.exe rather than my own program.

clang.exe -c "Y:\Documents\Visual Studio 2013\Projects\NewProject\Source\main.cpp"
llvm-rtdyld.exe" -execute main.o -dylib=C:\Windows\System32\msvcr120d.dll

It also occurs with -mcmodel=large specified.

The exact output of the second command, llvm-rtdyld, is as follows...

Assertion failed: (Sec->Characteristics & COFF::IMAGE_SCN_CNT_UNINITIALIZED_DATA) == 0 && "BSS sections don't have contents!", file C:\llvm\llvm\lib\Object\COFFObjectFile.cpp, line 951
0x00007FF65EAA574C (0x0000000000000016 0x00007FFC73140648 0x0000007900000008 0x00000079E68EDC40), HandleAbort() + 0xC bytes(s), c:\llvm\llvm\lib\support\windows\signals.inc, line 296
0x00007FFC807B396F (0x00007FF600000016 0x0000000000000000 0x0000007900000004 0x0000000000000101), raise() + 0x35F bytes(s)
0x00007FFC807C2060 (0x00000079E68EE3F0 0x0000000000000240 0x00007FFC80888430 0x00007FF65F7BFF80), abort() + 0x40 bytes(s)
0x00007FFC807ABF78 (0x00007FF65F7BFF80 0x00007FF65F7BFEF0 0xCCCCCCCC000003B7 0xCCCCCCCCCCCCCCCC), _wassert() + 0x108 bytes(s)
0x00007FF65E9E7F57 (0x00000079E6A4AC40 0x00000079E68EE998 0x00000079E6A317FC 0x00000079E68EE968), llvm::object::COFFObjectFile::getSectionContents() + 0x77 bytes(s), c:\llvm\llvm\lib\object\coffobject
file.cpp, line 951 + 0x43 byte(s)
0x00007FF65E9E46E4 (0x00000079E6A4AC40 0x00000079E68EEE88 0x00000079E6A317FC 0x00000079E68EEC98), llvm::object::COFFObjectFile::getSectionContents() + 0x74 bytes(s), c:\llvm\llvm\lib\object\coffobject
file.cpp, line 293
0x00007FF65E8B2BC5 (0x00000079E68EEC48 0x00000079E68EEE88 0x00000079E68EEC98 0x00000079E68EEC78), llvm::object::SectionRef::getContents() + 0x55 bytes(s), c:\llvm\llvm\include\llvm\object\objectfile.h
, line 375 + 0x2D byte(s)
0x00007FF65EA1E516 (0x00000079E6A5DEA0 0x00000079E68EEFF0 0x00000079E6A4AC40 0xCCCCCCCCCCCCCCCC), llvm::RuntimeDyldImpl::loadObjectImpl() + 0x4D6 bytes(s), c:\llvm\llvm\lib\executionengine\runtimedyld
\runtimedyld.cpp, line 186 + 0x25 byte(s)
0x00007FF65EA431AC (0x00000079E6A5DEA0 0x00000079E68EF708 0x00000079E6A4AC40 0x00000079E68EF0C8), llvm::RuntimeDyldCOFF::loadObject() + 0x3C bytes(s), c:\llvm\llvm\lib\executionengine\runtimedyld\runt
imedyldcoff.cpp, line 57 + 0x14 byte(s)
0x00007FF65EA1B411 (0x00000079E68EF338 0x00000079E68EF708 0x00000079E6A4AC40 0xCCCCCCCCCCCCCCCC), llvm::RuntimeDyld::loadObject() + 0x221 bytes(s), c:\llvm\llvm\lib\executionengine\runtimedyld\runtime
dyld.cpp, line 928 + 0x2F byte(s)
0x00007FF65E6781A9 (0x00007FF65FB9AB70 0x00000079E6A45150 0x00007FF65F177408 0xCCCCCCCCCCCCCCCC), executeInput() + 0x419 bytes(s), c:\llvm\llvm\tools\llvm-rtdyld\llvm-rtdyld.cpp, line 365 + 0x1D byte(
s)
0x00007FF65E67A885 (0x00007FF600000004 0x00000079E6A45150 0x0000000000000000 0x0000000000000000), main() + 0xF5 bytes(s), c:\llvm\llvm\tools\llvm-rtdyld\llvm-rtdyld.cpp, line 687 + 0x5 byte(s)
0x00007FF65EE518CD (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), __tmainCRTStartup() + 0x19D bytes(s), f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c, line 626 + 0x19 byte
(s)
0x00007FF65EE519FE (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), mainCRTStartup() + 0xE bytes(s), f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c, line 466
0x00007FFC9C4F2D92 (0x00007FFC9C4F2D70 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0x22 bytes(s)
0x00007FFC9EE19F64 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x34 bytes(s)

…the stack trace of which looks semantically the same as when I have that assertion triggered in my own program.

Relevant information:
- llvm, clang and compiler-rt revision 249038 from trunk
- built with the command (where ../llvm is the llvm source root) cmake -G "Visual Studio 12 2013 Win64" -DLLVM_INCLUDE_EXAMPLES=OFF -DLLVM_INCLUDE_TESTS=OFF -DLLVM_INCLUDE_DOCS=OFF -DLLVM_USE_CRT_DEBUG=MDd -DLLVM_USE_CRT_RELEASE=MD ../llvm
- VS2013 version 12.0.40629.00 Update 5

Running the same code without llvm-rtdyld.exe (i.e. non-JIT) does so without error.

Thanks very much for any response!

(Sorry for the slow reply, was trying to get something as minimal as possible for you to look at)

Additional info: when the relocation issue does occur the relocation type is IMAGE_REL_AMD64_REL32_5

Oops, sorry for the spam.

That last comment was incorrect. It’s IMAGE_REL_AMD64_REL32 not _5

Hi Joshua, Andy,

I’m afraid I’m not familiar with COFF. Andy - is IMAGE_REL_AMD64_REL32 unexpected if you’re compiling for 64-bit mode? It sounds like it from your description above.

I’ll look in to the “BSS sections don’t have contents” error tomorrow: It looks like it’s happening in platform-agnostic RuntimeDyld code, so hopefully I can reproduce this on Darwin.

Cheers,
Lang.

That’s great news, thanks! If I can be of any help, let me know. :slight_smile:
I’ll see if I can reduce the example for the relocation issue whilst you’re at it.

Regards,

Joshua

Yes, it’s unexpected – these 32 bit relocs are section relative and by using them the code generator is assuming that the two sections will be loaded within 4GB of one another.

So for large code model compiles, they shouldn’t be used.

Hi Joshua,

That’s great news, thanks!

I’ll spend as much time as possible today trying to minimise the relocation issue.

First of all, thanks very much to Lang for fixing the BSS section bug; works like a charm!

I’ve been unable to reproduce the 32 bit relocation on 64 bit code (I’ll let you know if I do). However, I’m still having issues with resolving the 64 bit symbol relocations. In case it’s relevant, the specific symbol my program is tripping up on is IID_IOleObject, where TargetAddress is dereferenced inside of the COFF::IMAGE_REL_AMD64_ADDR64 case of resolveRelocation inside RuntimeDyldCOFFx86_64.

I took a look at the link provided earlier and noticed a few things…

  • LLILC provides a resolver that simply returns UINT64_MAX for any given symbol, which the comment explains indicates means that we want to skip relocation resolution and that the client code will handle it manually.
  • recordRelocations is supposed to be that “manual handling”

This raises the following questions:

  1. Forgive the noobishness of this, but what is meant by “external” relocations? Something in a DLL? A static library? Something akin to a function declaration in C++ where the definition is not provided (similar to the linker error “unresolved external symbol” you get if you declare without defining at the link stage in many toolchains)?
  2. recordRelocations is doing quite a lot! Given that having external symbols in code (assuming one of my definitions above is correct) is quite a normal thing, is there anything in LLVM that can help me implement this a bit more simply?

Thank you very much for any help in advance.

LLILC runs inside a very specialized runtime environment (CoreCLR) and that environment handles resolving references to things defined outside the current compilation scope (“externals”). So it may not be the best example of how to solve your particular problems.

For windows, name resolution is done differently at link time vs load time:

· During linking, there is a flat global namespace searched by the linker. The linker will resolve module externs to a (dll-name, entry-name) tuple, and build an import dependency list in the module.

· During loading there is a tuple-based lookup that uses (dll-name, entry-name) to match imports and exports. Loading a module will also force loading of dependent modules.

When you dynamically load an object file, these rules collide, and there is no well-defined behavior. Here’s one plausible way to handle relocation processing:

· You first need to know if the reference is to a symbol defined in the same object. Presumably this should take precedence.

o If so, you can resolve it the way LLILC does for references to .rdata from .text.

· If not, you must somehow ensure any dependent DLLs (or I suppose, dependent objects) are loaded

o Note dependent loading auto-magically at load time requires considerable care because of circularities.

o This is doubly hard in the “obj” case since typically there’s no evidence in the obj as to which DLL might provide the export.

§ This is normally something the linker provides via the export libs you link with.

§ You might be able to guess at it, if there are linker pragmas embedded by the headers you include.

o You might just know the set of dependent DLLs from your application context. If so you could forcibly load these DLLs at startup

· Once you are confident all necessary dependencies are loaded, external name resolution then relies on some search heuristics to look through the loaded DLLs/(OBJs) for a match.

o Note this might find the wrong export.

Hope this helps. It would be cool to have a fairly robust “obj” loading path, but there are certainly challenges.

Apologies for the slow response, hectic times over here.
We know (or at least, can predetermine) in advance what DLLs and static libraries are needed in advance, so that’s good.
This gives me some idea on where to go / what to try next.

Thanks very much for the reply! I’ll be coming back to this next sprint, so I’ll let you know if it works out :slight_smile: