Size limitations in MCJIT / ELF Dynamic Linker/ ELF codegen?

I’m running in MCJIT a module generated from one C++ function. Every line of the source function uses C++ classes and may throw an exception. As long as there are less than (about) 1000 lines, everything works. With more lines the compiled code crashes when running it, with no sensible stack trace.

Is there any kind of hard-coded size limitation in MCJIT / ELF Dynamic Linker / ELF codegen / number of EH states in a function ?

I did browse the code but could not find anything obvious.

Yaron

I’m not aware of such a limitation.

What architecture, code model and relocation model are you using? Are you using the SectionMemoryManager?

-Andy

OS is Windows 7 64 bit OS, compiler is 32 bit Visual C++ 2012 with 32 bit.
The target which is i686-pc-mingw32-elf so I can use the ELF dynamic loader.
Code model, relocation model and and memory manager are whatever default for this - did not modify.

The Module comes from clang. The source is 1000 or more lines repeating C++ code in one big function:

A+1;
A*B.t();

where A and B are matrices from Armadillo http://arma.sourceforge.net/. This a stress and performance test due to the large number of EH and temporary objects created.

I am using the Engine Builder and MCJIT unmodified (except the multi-modules patches which are not relevant as there is only one module) like this:

OwningPtrllvm::ExecutionEngine EE(llvm::EngineBuilder(M)
.setErrorStr(&Error)
.setUseMCJIT(true)
.create());

to run the function either

llvm::Function *F = M->getFunction(Name);
void *FN = EE->getPointerToFunction(F);

or

uint64_t FN = EE->getFunctionAddress(Name);

followed by

((void (*)())FN)();

or

EE->runFunction(F, std::vectorllvm::GenericValue());

all work the same with smaller about 1000 lines of the above code module and crash the same with more code. The call stack is unhelpful Visual C++ says: Frames below may be incorrect and/or missing which indicates a real problem with it. I have tried to provide less stack space (default is 10M) for the compiled program without any change.

Yaron

I would guess that it’s crashing somewhere in the generated code. On Windows we don’t have a way to get call stacks to the generated code (though if you want to try it on Linux, that should work). You can probably look at the address where the crash is occurring and verify that it is in the generated code.

There are a couple of things I would look for.

First, I’d take a look at the SectionMemoryManager allocation handling. The fact that the problem is code size dependent strongly points in this direction. It may be that SectionMemoryManager does something wrong when it hits a page boundary or something.

Second, I’d look at the relocation processing. If it is generating any stubs, that would be a potential problem spot, but it shouldn’t be generating any stubs. So the obvious thing to look at is whether any of the relocations are writing to the spot where the crash occurs.

-Andy

Hi,

Thanks for your ideas.

Memory allocation already exceeds 2x64K in the “working” case so it’s not the condition of allocating more than 64K. To be sure I had modified SectionMemoryManager::allocateSection to allocate four time the required memory but it did not trigger more crashes.I debugged through the allocation code including the Win32 code and it seems to work well. I have also tried disabling the MemGroup.FreeMem cache which did not matter.

An added assert for no Stubs to the end of RuntimeDyldImpl::loadObject
processRelocationRef(SectionID, *i, *obj, LocalSections, LocalSymbols,
Stubs);
assert(!Stubs.size());
indeed caught nothing = no stubs created.

Disabling (de)registerEH did not help.

Looking at relocations and sections printouts, the exception is:

Unhandled exception at 0x0A3600D1 :
0xC0000005: Access violation writing location 0x00BC7680.

which is right after the start of .text:

emitSection SectionID: 1 Name: .text obj addr: 0A3F1350 new addr: 0A360000 DataSize: 253203 StubBufSize: 0 Allocate: 253203

Resolving relocations Section #1 0A360000

so at least it is running code but tries to write a wrong location.
Another run exhibits similar crash, still in .text but somewhat later.

I have checked and the function address I’m running is located in .text towards the end, as expected since it’s the last function added to the Module.

Also I speculated that if it crashes when .text crosses 128K but no, it happens when it’s larger.

I had attached gdb to the process hoping it will show more information but it showed even less information than the Visual C++ debugger.

Out of ideas…

Yaron

So it looks like 0x0A3600D1 is a good code address and there’s no problem executing the code there, but 0x00BC7680 is a bad data address. Is that correct?

If so, this is almost certainly a relocation problem. You just need to find a relocation that writes an entry (probably a relative offset) at 0x0A3600D1+the size of the instruction at that address.

BTW, what I said before about not being aware of any size limitations wasn’t quite correct. If you have enough code and data that we end up putting sections at addresses that are more than 2GB apart we’ll have problems, but you should see an assertion in that case. That can happen if we weren’t able to get the address we requested from allocateMappedMemory, but it doesn’t look like that’s what’s happening here.

-Andy

Yes, this is correct code address accessing bad data address.

However, there is no other relocation before .text or near it. I’ll send you the full debug printout, maybe you’ll note something.

The problem could be result of something else entirely else than the linker such as some library initialization code that by chance worked with smaller code but fails now.

I need to debug and see what’s going on. The trouble is no debug information. Maybe I can do without the source code information and debug the assembly but without any symbols it’s really a challenge to understand anything. I did try to make MCJIT emit debug info but for some reason attached gdb did not understand it. Maybe this could be solved.

I assumed there may be some limitations around 31-32 bits as there are various int32 members in the ELF structure, but that’s far far away. Problems start at .text size of about 150K.

Yaron

Hi Yaron,

If you’re outputting ELF on Windows this sounds like an issue we ran into where __chkstk calls weren’t being output in the assembly due to an explicit check for COFF output. Once stack allocations in a given function exceeded some amount we’d get exactly this kind of crash in the function initialization.

If you take a look for isTargetCOFF() in lib/Target/X86/X86ISelLowering.cpp and lib/Target/X86/X86FrameLowering.cpp you should be able to remove that check to force __chkstk output to see if that helps.

Cheers,
Andrew

YES, this is the problem!

The program work ok, even a 5x larger version works well.

Clearly the _chkstk calls must be emitted with ELF target on Windows as well - why not?

I’d like to make a patch and fix this right.

I experimented with both changes and practically only the lib/Target/X86/X86ISelLowering.cpp fixes the problem. The other change lib/Target/X86/X86FrameLowering.cpp was not required to fix the problem thus it is probably required for other reasons.

So, should I patch both tests?
Is the correct patch removing the test isTargetCOFF() completely?
Or enabling it for both COFF or ELF tarrgets?
I mean - is there any X86 target that does NOT require this stack checking?

Yaron

Oops, sorry, I switched the two locations, please read:

lib/Target/X86/X86FrameLowering.cpp (_chkstk probing) - was required.
lib/Target/X86/X86ISelLowering.cpp - did not change.

2013/10/23 Yaron Keren <yaron.keren@gmail.com>

Glad that helped! As I understand it __chkstk is always required on Windows regardless of output type, I had meant to file a bug about this but had apparently forgotten to do so. I think the check needs to be that the target is Windows and ignore the output type, Linux and OSX don’t use this.

Cheers,

Andrew

If it’s a Windows-only thing the correct tests would be:

if (NumBytes >= 4096 && STI.isOSWindows()) {

and

if (Subtarget->isTargetWindows())

where

bool isOSWindows() const { return TargetTriple.isOSWindows(); }

Yaron

This is the right fix if Cygwin wants calls to __chkstk. Otherwise you’ll want TargetTriple.isOSMSVCRT().

I have not much personal experience Cygwin with but the code inside the above condition considers MingW/Cygwin identical:

if (Is64Bit) {
if (STI.isTargetCygMing())
StackProbeSymbol = “___chkstk”;
else {
StackProbeSymbol = “__chkstk”;
isSPUpdateNeeded = true;
}
} else if (STI.isTargetCygMing())
StackProbeSymbol = “_alloca”;
else
StackProbeSymbol = “_chkstk”;

I’ll make a patch.

Yaron