Performance issues with memref.global and LLVM IR

vinograd47 · February 20, 2023, 1:59pm

Hello!

I was investigating compilation performance issues in my local project and find some strange thing.

In my project I used many memref.global constants with different sizes - from small 4 elements array to large multi-dimensional constants for convolution weights.
In some cases, when the convolution weights were large, the compilation process might took too much time and consume a lot of RAM. The bootlenech was in MLIR IR to LLVM IR conversion and in further LLVM machine code generation. In first part the most time was consumed by global constants translation.

I’ve added a simple pass, which just merges all constants into one single 1D I8 constant and replaces old constants accesses with memref.view to that merged constant. That’s significantly improved compilation time and reduced RAM usage. For example, on one network I’ve got the following improvements:

6 seconds vs 35 minutes compilation time
800 Mb vs 20 Gb peak memory usage

For me, such results looks quite strange. I’ve expected some improvements, but not such large.

So, my question - why large multi-dimensional global constants processing is so inefficient in MLIR->LLVM translation and further machine code generation?

vinograd47 · February 20, 2023, 2:06pm

Personally, I have one guess - LLVM in contrast to MLIR doesn’t support single multi-dimensional constant, if I understood the MLIR->LLVM conversion correctly. So, single memref.global : memref<256x128x5x5> will be converted to 163840 LLVM constants of 5 elements.

rengolin · February 20, 2023, 2:20pm

Interesting. I hit a similar problem when trying to build random dense globals in MLIR that needed to be an array of APFloat on similar sized memrefs and getting malloc errors trying to allocate a ridiculous amount of memory.

I could probably generate the random vector in memory, convert to base64 and initialize it as a string, but this could probably help you identify the source of the memory consumption (and performance problems).

I’ve been told DenseResourceAttribute should make that easier, but I didn’t have time to investigate yet.

mehdi_amini · February 20, 2023, 5:53pm

This is an MLIR solution to avoid hitting the MLIRContext with constants, but I don’t think that will help the translation to LLVM (assuming you still want the constant in the LLVM IR).

Ouch… Seems unfortunate.
I tried:

$ echo 'llvm.mlir.global external @gv2(dense<[[0.000000e+00, 1.000000e+00, 2.000000e+00], [3.000000e+00, 4.000000e+00, 5.000000e+00]]> : tensor<2x3xf32>) {addr_space = 0 : i32} : !llvm.array<2 x array<3 x f32>>' |  bin/mlir-translate --mlir-to-llvmir 
@gv2 = global [2 x [3 x float]] [[3 x float] [float 0.000000e+00, float 1.000000e+00, float 2.000000e+00], [3 x float] [float 3.000000e+00, float 4.000000e+00, float 5.000000e+00]]

So basically we pay some cost to form the multi-dimensional structure here?
Seems like we could emit a flat constant for the storage and cast it to the right shape maybe?

vinograd47 · February 21, 2023, 8:20am

Yes, I also have feeling that it could be done more efficiently - either in the translator itself or even on MLIR side in some MemRef->LLVM lowering pass. Should I create a ticket for that?

mehdi_amini · February 22, 2023, 6:21am

It can worthwhile to track this in a ticket, yes (if you can include a reproducer, it would be nice)

tungld · March 1, 2023, 1:47pm

I also encountered this issue, and realized that it seemed the translation recursively creates LLVM constant one by one: llvm-project/ModuleTranslation.cpp at main · llvm/llvm-project · GitHub

Here is the function to translate global variables, which shows that string variable is translated in one shot. It looks like if we pack data as string, we potentially get better result too.

Topic		Replies	Views
Global_memref and get_global_memref convert-std-to-llvm support MLIR	2	240	November 11, 2020
What is the strategy for tensor->memref conversion? (bufferization) MLIR	25	2501	November 9, 2020
Need help on code generation semantics for memref MLIR	3	337	September 30, 2020
MLIR/Linalg bad performance MLIR	5	573	August 12, 2021
Tensor to memref conversion (a.k.a. bufferize) question MLIR	17	2476	November 10, 2020

Performance issues with memref.global and LLVM IR

Related Topics