Speeding up OperationName::OperationName?

In my ongoing quest to make MLIR faster at threaded compilation :-), I ran into a new problem. Earlier I was profiling on an intel mac, but I switched to an Apple M1Max laptop. It shows a very different profile.
In a release build of CIRCT with release build on MLIR, I now see OperationName::OperationName at the very top of the profile:

Of course, this is completely dominated by the mutex operations in llvm::sys::SmartScopedReader:

A couple of questions:

  1. does anyone know why mutex ops are so much slower on an Apple M1 MBP laptop than they are on an Intel X86 MBP?
  2. has anyone thought about improving this, e.g. by having the OperationName lookups happen during dialect registration (which is typically single threaded) and cached in a readonly map attached to the dialect? If we did that, then OpBuilder::create could check that before going to the big map in the MLIRContext.

This is easy to reproduce FWIW, I’m using the public CIRCT build with the chipyard…hi.fir test input, and this command: firtool chipyard.hi.fir -o chipyard.hi.fir.v -verilog -mlir-timing.

It is a 94M input file and takes about 21s of wall time, which isn’t a huge input but it is enough to measure.


1 Like

Yes. This is the logical next step that was intended after I reorganized the way OperationName/AbstractOperation works. I still have a few commits in that queue that I haven’t flushed out (because of holidays and few other personal reorganizations). Thanks for posting an open benchmark to replicate, I’ll take at this sometime within the next week. Ideally my patches will already fix it, if not I’ll just add this fix to the queue.

– River

Oh nice, that would be great. I’m happy to do some profiling if you have candidate patches to play with. Once we get this settled, I can play with the inline attribute stuff that Jeff was working on.

Thanks River!