VEX prefixes for JIT in llvm 3.5

Hi guys,

I just upgraded our JIT system to use llvm 3.5 and noticed one big
change in our generated code: we don't see any non-destructive VEX
prefix instructions being emitted any more (vmulsd xmm0, xmm1, blah)
etc.

It's long been on my list of things to investigate anyway as I noticed
llvm didn't emit VZEROUPPER calls either, so I supposed it might not
be a bad thing to disable vex.

That being said, try as I might I can't force avx on
(builder.setMCPU("core-avx-i") and/or
builder.setMAttrs(vector<string>{"+avx"});). We're still using the old
JIT but I just spiked out a move to MCJIT and I still don't see the
VEX instructions.

Was there a deliberate change on the llvm-side to discourage VEX
instructions unless they make a big enough difference (and/or is
VZEROUPPER now emitted?).

If not, how might I go about digging further into this?

Many thanks in advance, Matt

Hi Matt,

I suspect you need to specify the target CPU when you create the JIT. It’s just a method on the builder (e.g., builder.setMCPU(MCPU)). If you want auto-detection based on the host CPU, sys::getHostCPUName() returns a value suitable to be passed directly into the builder.

-Jim

Hi Jim,

Thanks for a very quick reply! That indeed does the trick!

Presumably the default has changed in 3.5 to be a "generic" CPU
instead of the native one? If that's the case I wonder why: especially
when JITting it really only makes sense to target the actual CPU -
unless I'm missing something? :slight_smile:

Thanks again,

Matt

Hi Matt,

Yep, that’s exactly what happened. There’s two things that motivated the change. First, the MCJIT supports JITing for a non-host CPU(*), so assuming the host isn’t safe. Second, the auto-detection was being done in a non-JIT-specific code path, so tools like llc were also getting the auto detection, which made writing good tests problematic. The plan is to re-introduce the auto detection for the MCJIT but to do it in a more well-contained manner.

-Jim

(*) That was the first use-case for MCJIT, actually. LLDB expression evaluation for remote targets (debugging iOS from OS X).

Hi,

You need to call llvm::sys::getHostCPUName() and pass the result to createTargetMachine() passed to the JIT. This patch should be applied:

http://llvm.org/bugs/show_bug.cgi?id=17422

Anyhow, the JIT was removed from current code and will not be in next LLVM release.

Yaron

Great stuff; thanks both!

I'm also looking to turn my MCJIT conversion spike into our main use
case. The only thing I'm missing is the ability to get a post-linked
copy of the generated assembly.

In JIT I used JITEventListener's NotifyFunctionEmitted and used a
MCDisassembler to disassemble the stream (with my own custom
annotators), and redirected the output to the relevant place for
auditing of our app.

With MCJIT I notice that NotifyFunctionEmitted is gone
(understandably) and so I hook NotifyObjectEmitted. I then run through
all the function symbols and dump them as before. Yay. Except that in
MCJIT terms the linking hasn't happened, so all the globals and
external functions are all zeros at this point. (If I hackily observe
the same code later on I see the linker has appropriately populated
these addresses). This makes my nicely annotated code a little
unreadable, unfortunately.

Does anyone have any suggestions as to how I might get to disassemble
the post-linked code?

Thanks once again!

-matt

Great stuff; thanks both!

I'm also looking to turn my MCJIT conversion spike into our main use
case. The only thing I'm missing is the ability to get a post-linked
copy of the generated assembly.

In JIT I used JITEventListener's NotifyFunctionEmitted and used a
MCDisassembler to disassemble the stream (with my own custom
annotators), and redirected the output to the relevant place for
auditing of our app.

With MCJIT I notice that NotifyFunctionEmitted is gone
(understandably) and so I hook NotifyObjectEmitted. I then run through
all the function symbols and dump them as before. Yay. Except that in
MCJIT terms the linking hasn't happened, so all the globals and
external functions are all zeros at this point. (If I hackily observe
the same code later on I see the linker has appropriately populated
these addresses). This makes my nicely annotated code a little
unreadable, unfortunately.

Does anyone have any suggestions as to how I might get to disassemble
the post-linked code?

my 2 cents: It seems like we need a different event type. Having access to the object before linking (and relocation?) seems useful, but I suspect most users (myself included) want the final object after everything is done.

Philip

Hi Matt, Philip,

You could get the data you want by recording the addresses returned by the allocateCodeSection and allocateDataSection methods on your RTDyldMemoryManager, then disassembling those sections after you’ve called resolveRelocations. That’s a little unsatisfying though. For one thing, unless you very carefully maintain the association with the original object via back-channels there will be no way of knowing which section belongs to which object file.

With a bit of cleanup though, we could do something more like this:

const RuntimeDyld::JITObject& JO = RTDyld.loadObject(Object);
// …
RTDyld.resolveRelocations();

DEBUG(
for (const RuntimeDyld::JITObject::Section& S : JO.sections())
if (S.isText())
Disassemble(S.getAddr(), S.getSize());
);

How does that look?

Cheers,
Lang.

Hi Lang,

Thanks for the reply, I'm glad it sounds like you have a fairly simple
suggested improvement.

I'm not too familiar with the inner workings of llvm, so apologies if
this is a dumb question; what's the "Disassemble" and DEBUG() parts
doing here? Are these the in the user code? I don't interact with the
RuntimeDyld directly anywhere in my code, so I'm not sure where this
code would go. If it's user code and I can somehow get hold of a
RuntimeDyld then this seems like a good match, as it seems pretty much
what I'm doing in NotifyObjectEmitted already.

Cheers, Matt :slight_smile:

I know. This is what I’m actually doing today. :slight_smile: I’m not using the loadObject interface today. On the surface, this looks ideal, but I don’t have any practical experience to confirm. I’m currently using the interfaces on ExecutionEngine (i.e. generateCodeForModule, mapSectionAddress, finalizeObject, then getPointerToFunction)

Hi Philip,

Ahh. Sorry to hear. The good news is that I have a plan to make it better. The bad news is that I don’t have a timeline. I’m trying to squash a few critical bugs in the infrastructure at the moment and then I’ll start putting out proposals and asking for volunteers. :slight_smile:

  • Lang.

Hi Matt,

CCing the Dev list, as I accidentally dropped them off my last reply.

Regarding loadObject, that API is very much hypothetical at the moment (there is a RuntimeDyld::loadObject, but it returns something else). Some background may make it clearer where I’m thinking of going with this though:

MCJIT builds on RuntimeDyld. RuntimeDyld is responsible for making object files in memory (instances of llvm::object::ObjectFile) runnable. It does this by allocating memory for the sections, applying relocations, and providing symbol lookup. To a very rough approximation, MCJIT is just:

class MCJIT {
private:
RuntimeDyld Dyld;
public:
void* getSymbolAddress(StringRef Name) { return Dyld.getSymbolAddress(Name); }
void addModule(Module *M) { Dyld.loadObject(CodeGenIRToObject(M)); }
};

All the interesting parts of MCJIT have to do with module ownership, setting up the CodeGen pipeline, etc.

If you’re happy to handle codegen and module ownership yourself, you could actually talk to RuntimeDyld directly, rather than going through MCJIT. That’s what I’m doing in that example, just for the sake of simplicity. For clients who want to keep using MCJIT we just need to add an API to get at the underlying Dyld state so that, if they want to, clients can inspect RuntimeDyld::JITObjects (these don’t exist yet either, but are easy to add). Exactly what that API will look like I’m not sure yet. It may be as simple as ‘RuntimeDyld::JITObject getJITObject(Module *M)’ in the end.

So short version: The loadObject API in the example above doesn’t exist yet, and even if it did you probably wouldn’t want to use it. However, there’s a plan to get this kind of information to JIT clients in the future.

On a related note, the demand for this kind of API usually comes from a desire to debug either the JIT or the JIT’d code. If we had a proper debugger registration system (i.e. a way to get relocated Dwarf to JIT clients and system debuggers), that could be used to find out most of the interesting things about relocated sections (including their, size, address, etc). This is also on the drawing board, but waiting for me to find time to do it.

Cheers,
Lang.

Thanks again Lang, that all makes sense. In my case the debugabbility
is not the primary desire (although it's definitely a nice bonus!).
I'm mainly after logging to disc what got compiled and executed for
auditing purposes.

Much appreciated,

Matt :slight_smile: