On medium-sized DNN models such as resnet50, the execution of mlir-translate --mlir-to-llvmir and llc is really slow. The first takes 2 minutes, the second takes 1 minute 9 seconds on my student’s computer, and a lot more on mine (a Mac which is more powerful on paper).
What is interesting, is that other passes by mlir-opt are a lot faster.
Are there options that would allow me to accelerate code generation?
Here is my build command sequence. And no, it’s not Bazel-based (I only use bazel on TF and iree, because I don’t know how to do it otherwise). For “just” llvm-project, I’m happy to (better) understand what I do.
cd llvm-project
mkdir build
cd build
cmake -DLLVM_ENABLE_PROJECTS=mlir -DLLVM_BUILD_EXAMPLES=ON -DLLVM_TARGETS_TO_BUILD=X86 -DCMAKE_INSTALL_PREFIX=$HOME/llvm -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=ON -DCMAKE_C_COMPILER=clang-mp-9.0 -DCMAKE_CXX_COMPILER=clang++-mp-9.0 -DCMAKE_ASM_COMPILER=clang-mp-9.0 -DCMAKE_CXX_FLAGS_DEBUG="-fno-omit-frame-pointer -fsanitize=address" -DCMAKE_SHARED_LINKER_FLAGS="-fno-omit-frame-pointer -fsanitize=address" -DCMAKE_EXE_LINKER_FLAGS="-fno-omit-frame-pointer -fsanitize=address " ../llvm
make -j6
make install
Ok, I think I understand the issue - to have speed, I have to choose Release, not Debug, as I do.
However, it’s funny that other transformations are fast, even under Debug build.
This means that LLVM is building effectively in -O0, which will be slow (but debuggable). Unless if doing C++ level debugging, you should be building with optimizations. The easiest way to do so is to build with -DCMAKE_BUILD_TYPE=Release or -DCMAKE_BUILD_TYPE=RelWithDebInfo (gives you optimizations and debug info which can help with stack traces and such).
Personally, I do a lot of my work with a release build as above but with assertions enabled (-DLLVM_ENABLE_ASSERTIONS=ON).
Without digging more into your setup, the best I can say is that your experience is consistent with using a debug/instrumented build. I suspect that mlir-opt is also going a lot slower but is comparatively doing less work, given that it works at a higher level. The translation to llvm ir and corresponding code generation is a completely different scale of ir manipulation.