In my ongoing quest to make MLIR faster at threaded compilation :-), I ran into a new problem. Earlier I was profiling on an intel mac, but I switched to an Apple M1Max laptop. It shows a very different profile.
In a release build of CIRCT with release build on MLIR, I now see OperationName::OperationName
at the very top of the profile:
Of course, this is completely dominated by the mutex operations in llvm::sys::SmartScopedReader
:
A couple of questions:
- does anyone know why mutex ops are so much slower on an Apple M1 MBP laptop than they are on an Intel X86 MBP?
- has anyone thought about improving this, e.g. by having the OperationName lookups happen during dialect registration (which is typically single threaded) and cached in a readonly map attached to the dialect? If we did that, then
OpBuilder::create
could check that before going to the big map in theMLIRContext
.
This is easy to reproduce FWIW, I’m using the public CIRCT build with the chipyard…hi.fir test input, and this command: firtool chipyard.hi.fir -o chipyard.hi.fir.v -verilog -mlir-timing
.
It is a 94M input file and takes about 21s of wall time, which isn’t a huge input but it is enough to measure.
-Chris