LLVM Execution engine: Native call vs LLVM IR function call

Hi,

I would like to understand why calling a native function from a function
in LLVM IR can be much faster than calling an equivalent function in LLVM
IR.

For instance, here are two equivalent programs. The first calls an llvm
function while the second calls a native function. On my AMD machine, the
first takes 4.48s to run while the second takes 3.49s.

define i64 @bloop() {
entry:
  br label %bb1

bb1: ; preds = %cont, %entry
  %0 = phi i64 [ 0, %entry ], [ %3, %cont ]
  %1 = phi i64 [ 0, %entry ], [ %res, %cont ]
  %2 = icmp ugt i64 %0, 1000000001
  br i1 %2, label %exit, label %cont

exit: ; preds = %bb1
  ret i64 %1

cont: ; preds = %bb1
  %3 = add i64 %0, 1
  %res = call i64 @sqr(i64 %0)
  br label %bb1
}

define i64 @sqr(i64 %arg1) {
entry:
  %0 = mul i64 %arg1, %arg1
  ret i64 %0
}

nlamee@cs.mcgill.ca writes:

I would like to understand why calling a native function from a function
in LLVM IR can be much faster than calling an equivalent function in LLVM
IR.

Do you optimize the LLVM IR? The IR version can inline the call and just
that would make it faster than the "native" version.

Please describe how do you compile your LLVM IR.

Once knowing that, if you optimize the IR and it still slower, the real
answer should pop up by looking at the assembler.

Hi Óscar,

Thank you for your response. I did not explicitly optimize the IR.

I compile and run the two versions with

...
GenericValue gv = EE->runFunction(bsqr, args);
...
GenericValue gv2 = EE->runFunction(cppnat, args);

I am calling method runFunction of ExecutionEngine. I am using the default
code gen optimization level.

Best regards,
Nurudeen.

nlamee@cs.mcgill.ca writes:

Thank you for your response. I did not explicitly optimize the IR.

I compile and run the two versions with

...
GenericValue gv = EE->runFunction(bsqr, args);
...
GenericValue gv2 = EE->runFunction(cppnat, args);

I am calling method runFunction of ExecutionEngine. I am using the default
code gen optimization level.

LLVM does not optimize your IR unless you explicitly ask for it. For
your example, once optimized, the code that is all LLVM will run faster
than the other version that calls a non-LLVM function.