Hi all,
After my previous post (thanks @ezhulenev for the reply) I was able to do some benchmarking of a linalg.matmul
operation. I am using 1000x1000 matrices and this is the MLIR code I have:
func @main(%A : memref<1000x1000xf32>, %B : memref<1000x1000xf32>, %C : memref<1000x1000xf32>) {
linalg.matmul ins(%A, %B: memref<1000x1000xf32>, memref<1000x1000xf32>)
outs(%C: memref<1000x1000xf32>)
return
}
I compile with the following command:
mlir-opt test.mlir -convert-linalg-to-loops -convert-scf-to-std -convert-std-to-llvm > test.llvm.mlir
Then, I wrote a benchmark program that reads the LLVM dialect file, lower to LLVM, compile and run:
mlir::OwningModuleRef module;
mlir::MLIRContext context;
context.getOrLoadDialect<mlir::LLVM::LLVMDialect>();
loadMLIR(..., module) // similar to the toy example
runJit(*module)
In runJit
I basically create the ExecutionEngine
, lookup
for the entry point and run the function:
auto maybeEngine = mlir::ExecutionEngine::create();
pack_args(...);
auto expectedFPtr = engine->lookup(entryPoint);
void (*fptr)(void **) = *expectedFPtr;
(*fptr)(args.data());
Timing the call to (*fptr)
, I get a result 100x slower than a naive three loops c++ program!
I tried different options, different passes, etc… but it didn’t seem to help at all. My main question is: am I doing something wrong? Is it doing something more than the simple computation inside the (*fptr)
call?
Any insight is more than welcome!
Thank you so much,
Giuseppe