Thanks! You were right!
Changing the code to:
float (*theF)(float) = (float (*)(float)) EE -> getPointerToFunction(f);
float retVal = theF(arg1);
made the difference. Now it is dozens of times faster!
I don't really understand the cause though..
Why doesn't ExecutionEngine cope well with "define float
@someFunc(float %x)" and needs this trick ? (but copes well with
"define i32 @someFunc(i32 %x) )
The function was generated using "getOrInsertFunction(name,
Type::getFloatTy(ctx), Type::getFloatTy(ctx), (Type*)NULL);
And the orginal slow execution was:
args.FloatVal = 8.0f;
GenericValue retVal = EE -> runFunction(f, args);
Probably because the integer version of the prototype is
special-cased. The problem is that the JIT has a C function pointer
of an arbitrary type that it only finds out about at runtime.
Normally, if you call a function pointer with a known type, your
compiler will generate the proper calling code and allocate the
arguments in registers or on the stack. However, doing that inside
the JIT would be very hard because you would need to iterate over the
argument vector and put each argument in the appropriate register.
Obviously, this is going to interact badly with the compiler's own
register allocation, and probably requires writing tricky platform
The approach that LLVM takes is that it special cases a few common
prototypes (for things like main, which is where lli always enters the
picture) and casts the function pointer before calling it as I
described, which is efficient. In the case that there is a function
of unknown prototype, the JIT codegens a short snippet of code that
calls the underlying function.
So far as I know this entry point is not cached and lives as long as
the JIT, so if you use runFunction repeatedly you will be generating
many snippets of machine code and leaking memory.
The leak could be fixed by freeing the stub before returning its return value.
Hope that helps,