In the last few weeks I’ve been realizing more and more that the upstream codebase almost entirely ignores the usage of SymbolTable and SymbolTableCollection classes. This often leads to a quadratic scaling in compilation performance, as every symbol is resolved by doing a linear scan of the parent symbol table operation.
I’ve addressed this issue in the bufferization pipeline, but now I’m finding it again in the LLVM conversion pass. I’ve been trying to update the LLVM conversion interface to take caching into account, but such an approach feels more and more debatable, as users should always remember to keep the symbol tables up to date within their patterns. To be fair, I haven’t had the time to check how spread this problem actually is, but I wouldn’t find it surprising if other passes also exhibit the same behavior.
Ultimately, as a one-fix-all solution, I’m considering to add a SymbolTableCollection as a member of the RewriterBase class (or maybe of OpBuilder / Builder?), and have the symbol tables automatically updated during insertion or removal of operations. This way, users could just ask for a symbol table directly to the rewriter, instead of having a separate instance to be kept up to date.
The downside is for the rewriter having to explore the whole subtree of inserted / removed operations, as nested operations may in turn include symbols and / or symbol tables (which is what already happens in case of attached listeners).
If this last approach is preferable (and imho it is), some little changes should be also done to the SymbolTable class to allow for a temporary invalid state (e.g., in case of replaceWithNewOp, where duplicate symbols may appear), but I don’t see it as a major problem.
I’d like to hear your opinion on this last approach. In the meanwhile I’ll start implementing it anyway on my own fork as absolutely critical for my use-case.
Have you measured performance degradation of such change?
MLIR is supposed to be pay for what you use, what if I have a different symbol infrastructure, wouldn’t this be forcing users to opt into something they don’t want?
I haven’t measured that yet, as I still have to implement the changes.
As for your second point, that is also my concern. One alternative approach may be to implement everything in a listener that can be optionally attached to the rewriter. But then how would patterns be able to access the symbol tables that would now be owned by the listener, and not the rewriter?
Folks do build ops freely and then insert later into op/SymbolTable. For those that need symbol table insertion I see them mostly built at start of pass and then potentially passed into patterns.
Another option is utilizing Analysis. Mark it retained in passes that know how to update, else it invalidates as normal. It would have less reuse if passes are at different nested symbol table levels I think, but haven’t tried/thinking out loud.
Folks do build ops freely and then insert later into op/SymbolTable. For those that need symbol table insertion I see them mostly built at start of pass and then potentially passed into patterns.
For example, memref → LLVM patterns do insert malloc and free operations within the patterns. There is no insertion at the beginning of the pass, probably also because that would prevent from reusing the patterns in other different context.
Another option is utilizing Analysis. Mark it retained in passes that know how to update, else it invalidates as normal. It would have less reuse if passes are at different nested symbol table levels I think, but haven’t tried/thinking out loud.
This is not different from updating the patterns to receive a SymbolTableCollection. And possibly worse, imho, as patterns would be receiving a AnalysisManager instead of a SymbolTableCollection. Patterns would still have to update the content of the analysis if a symbol is inserted / removed or a symbol table erased.
In any case, what happens if I have, e.g., two set of patterns, one which updates symbol tables and the other that doesn’t, and I mix them? The two sets would be fine if working separately in two different passes, but the state of the symbol tables would become inconsistent as soon as I mix them together. This why I was thinking of having the symbol tables updated from the common player, that is the rewriter.
(This last part is more about the rationale of what I’m proposing, and not directly related to the message of Jacques, but still worth to be written down.)
Yes sorry I wasn’t clear: this is to pass it between passes in a pipeline, not patterns. If one wanted to do that. For patterns, it’s just passing in SymbolTable collection (and it’s mostly not too many patterns that need it where I’ve been working, but that may not be true for your flow).
And that is still a problem with it on the builder except if builder does all the bookkeeping. One could also do an expensive check variant which checks if up to date, that’s the other way I could think of.
Yes sorry I wasn’t clear: this is to pass it between passes in a pipeline, not patterns. If one wanted to do that. For patterns, it’s just passing in SymbolTable collection (and it’s mostly not too many patterns that need it where I’ve been working, but that may not be true for your flow).
I already pass the SymbolTableCollection to my patterns. Upstream ones, however, never do it. Sure I can update all the patterns, but what are the chances that me or anyone else forget some? Wouldn’t it be better if the user doesn’t have to worry about keeping symbol tables updated, and rely on the infrastructure to do it?
And that is still a problem with it on the builder except if builder does all the bookkeeping.
I’m not sure I’m understanding. What I’m proposing is indeed to have the rewriter / builder to update the cached symbol tables.
Just for historic justice, LLVM conversion was initially added before we had proper symbol table classes. FWIW, it was added before most of the current MLIR, so it must not be used as a proxy for what the codebase looks like.
Could you elaborate on the proposed solution to this problem? I have at times explicitly avoided using symbol tables in conversion patterns precisely because the IR is in the state that violates the preconditions for a symbol table. This may be less acute of a problem when most conversions use @matthias-springer’s [RFC] A New "One-Shot" Dialect Conversion Driver as long as they erase the base ops immediately, but still a problem.
A quick solution would be to have the patterns that want to use the symbol table take a symbol table object in their constructor. Then whoever uses the patterns will give them a shared symbol table and it is going to be up to patterns to properly handle potential duplication, and the patterns have that knowledge locally available.
Having and preserving an analysis across passes is also a good idea, but likely complementary to sharing across patterns. One could even implement an item from my wishlist: Connecting op verifiers to analyses
A quick solution would be to have the patterns that want to use the symbol table take a symbol table object in their constructor.
See previous comment (I anticipated you by few seconds :D).
Sure it can be done, I can actually very soon upload a PR to update the “To LLVM” conversion patterns. However I’m not convinced about the long-term maintenability and people actually remembering to update symbol tables. Consequently, I would expect some “complaints” from downstream projects if the PR will be merged (they would have to update all their LLVM conversion patterns). But I also don’t know how much this would be a problem, compared to the performance gain that would also derive from the change.
Could you elaborate on the proposed solution to this problem?
What I had in mind is to have a SymbolTableCollection inside the rewriter / builder, that can be publicly accessed, e.g., by patterns.
IR modifications methods should then be modified to take into account for symbol insertion / removal and / or symbol tables removal. In practice, this would mean touching all the create / erase / inlineRegionBeforeBlock / ecc. methods.
The small modifications I was mentioning about the SymbolTable is to cover cases like replaceWithNewOp, in which the new operation is creating before erasing the old one. If the two have the same symbol name, and a symbol table has not yet been constructed, then the precondition of symbol uniqueness would fail when the symbol itself gets created. To address this issue, I was thinking of enriching the SymbolTable class with an on-going-transaction boolean flag that signals if the preconditions are expected or not to hold. I’m not fully convinced on this being a good design, but at the moment I’m struggling finding alternatives. It also may very well be that the replaceWithNewOp could be modified so that the inconsistent state is avoided since the beginning, I’ve not thinked too much about this.
Yes, if the builders does all the work and its not just caching (which is what I referred to as doing all the bookkeeping). E.g., if just cached, users still have to use the cached one/update their patterns, which I see as no different than passing in SymbolTable to pattern if and when needed, but if forced by builder then it avoids that but also means a class of creates no longer possible.
Why all? Is that due to ConvertToLLVMPatternInterface? E.g., if one extended it to optionally take SymbolTable in its populate then the patterns could (optionally), but at cost that one can’t freeze the pattern set (and maybe this isn’t OpBuilder issue, but being able to query on Rewriter and having variant that queries it before driver).
How many of these modify the symboltable OOC? (internally and your side).
I expressed myself in a wrong way, let me explain again. I meant that downstream users should at least check all their patterns converting to LLVM, and mandatorly update those working on symbol / symbol tables.
Failing to do so would result in “symbol-table-caching patterns” (the upstream ones, after my eventual PR) to have outdated information, and thus operate in a wrong way when collected through
In other words, this imposes the constraint to “To LLVM” patterns - both upstream and downstream ones - to always keep the symbol tables in a consistent state. Which is exactly the same I’ve beeing doing for the bufferization pipeline, and of course I can replicate here, but this recurring problem is giving me the feeling that it should be taken into consideration more by the infrastructure rather than users.
Semi. The infrastructure should also not enforce a worse solution. E.g., if the automatic update and bookkeeping is much more expensive than can be done by hand/done by someone knowing the structure and where/when fixups are needed, then the infra shouldn’t lock someone in to worse solution. Now derived classes (AutoFixUpSymbolTableBuilder) and some flagging mechanism can help of course, both in giving convenience and showing good practices/flagging bad.
So I looked, I can only find 4 of these patterns that use SymbolTable and use the ConvertToLLVMPatternInterface. One of these is FuncToLLVM - and that one is today more efficient to run that pass than get the results via the interface. The other in MemRef - where it is to conditionally insert alloc call, which is rather wasteful as it ends up doing this for every Alloc op, potentially much better would have been to check if needed & insert once and then just reuse (or just have a pass that always inserts and then SymbolDCE post if one doesn’t want something more surgical).
SpirV precedes this/hasn’t switched, so not counted above. They are doing something I would have assumed is more common. They have 30 patterns that “uses” SymbolTable. They don’t pass in SymbolTables, they get them by way of iterating to parent and inserting in the appropriate scope - I actually don’t know if this will trigger the full module SymbolTable rebuild which is the expensive part. The other thing they do is they have a pass that does the converting launch to the func structure and mutates the SymbolTable.
I’d have imagined/its the case in my flow that at the point where one is lowering to LLVM, its not big jumps, so the interface is used where the structure is already close (e.g., not going from TOSA to LLVM by way of interface and patterns) and all are rather local. If I were to do this today I’d probably run passes until I get close to end and I’d change MemrefToLLVM to optionally create the alloc functions ahead of time and just nuke them locally in the pass if not used: that requires knowing if there were alloc ops, so could also just postprocess - that’s still just a single walk, as expensive as recomputing SymbolTable once. Or if one wants to use current structure, that’s one or two passes extra, 1) conditionally insert allocs, 2) delete unused, and passing in SymbolTable to be queried/the two queried alloc function directly to the patterns.
The infrastructure should also not enforce a worse solution.
Of course.
I’d have imagined/its the case in my flow that at the point where one is lowering to LLVM, its not big jumps, so the interface is used where the structure is already close
Without going into much unneeded details, I already bring everything into a mix of arith + scf + memref + func dialects, and perform the conversion to the LLVM dialect starting from there. Such IR is therefore already very close to the LLVM-only version that is obtained in the end.
If I were to do this today I’d probably run passes until I get close to end and I’d change MemrefToLLVM to optionally create the alloc functions ahead of time and just nuke them locally in the pass if not used: that requires knowing if there were alloc ops, so could also just postprocess - that’s still just a single walk, as expensive as recomputing SymbolTable once. Or if one wants to use current structure, that’s one or two passes extra, 1) conditionally insert allocs, 2) delete unused, and passing in SymbolTable to be queried/the two queried alloc function directly to the patterns.
So, in practice, the generic LLVM conversion pass has some asterisks here and there saying that, performance wise, is not the best solution and some passes should better be executed on their own if searching for that. That doesn’t sound bad as a quick solution, but there is still a penalization given by walking the IR once for every pass that is separately executed. E.g., memref → llvm and vector → llvm both need to insert function declarations, and those two would already imply one more IR traversal than having one unique pass building the symbol table.
I’m a bit more relucant to the second option of having separate passes on which the generic LLVM conversion would become dependant on. That would break the self-contained nature of the pass.
In any case, many thanks for your inputs, for now I will stick to pre-executing some of the passes, and upload some PRs speeding up the patterns (e.g., memref → llvm) if a symbol table is given.
The chances are non-zero, but the argument is a bit fallacious. People may equally forget to update to use the rewriter and update the symbol attribute directly, or not even know it is there. We have spent years finding direct usages of RAUW and Operation::erase in patterns, for example. Or not use a builder at all! For example, there are no builders in the C API, and therefore in any of the language bindings. So people will have to worry about using the right API.
The way to avoid this mental load is to make the concept of a symbol a first-class entity in the core IR and have symbol tables be updated the same way use-def chains are. Those are completely transparent to any manipulation, including direct low-level API like setting operands. Note that I’d consider this opposite to the design rationale of MLIR, “little built-in”, though I am not dogmatic on anything.