I’d like to do a minor update to our developer policy about the use of recursion. Currently we state:
Do not use recursive algorithms if the recursion can’t be bounded statically: that is avoid recursion if there is a possible IR input that can trigger a stack overflow (for example traversing use-def chains in a recursive way). At the moment, we tolerate it for the two following cases:
The nesting of the IR: we use recursion when traversing nested regions.
Type nesting: recursion may be used for the nesting of composite types.
The assumption for the exception for the two cases here is that these recursions are bounded in a reasonable way. However this assumption seems right now at odd with what we observe with actual uses of MLIR. In particular for the first one, the use of regions for nested control-flow, combined with inlining, can easily lead to O(100) level of nesting in practice (we actually see this in some TensorFlow models).
I would propose that we remove the first exception and make sure MLIR Core does not come with such recursion (we have a major one in the verifier right now).
I haven’t seen any problem with the second exception (nesting of composite types), I’m OK to go either way (keep it or eliminate it preemptively).
(In case there is any doubt: this policy only applies to the components provided by MLIR itself, and so to contributions to MLIR/LLVM GitHub, client projects built using MLIR are of course free to do as they wish )
Thinking about other options, other than rewriting the recursive code:
The bound could be enforced in the code, since in these case, the recursion is bounded by a simple structural aspect of the IR. It would be possible to add a simple static bound by adding a check in the reader with a useful limit (say 10000 nested regions?) This could even be a configurable value, perhaps?
Maybe an annotation to require tail-recursion optimization would be a way to avoid the stack explosion (which I assume is the main concern here?)
Nested control-flow, I can’t share the specific TensorFlow model, but this is just a series of switch/case, if/else, and while that are calling functions that themselves contains the same kind of control flow.
In terms of IR design: that seems inherent to how we use region?
If we want to model a “functional” relationship through regions (like we do when we go from functional control flow to region control-flow), then I think we ought to think about region nesting in the same terms as a call-graph.
This argument applies also to many algorithm working on use-def chains, but we rather have “ugly” and robust
I can try to quantify it though if you’d like: are there many places that are recursive on region already? I can work on a patch as well to show the actual impact.
I am not aware, are you? I think tail recursions with the clang::musttail attribute may be OK though?
I spend quite some times years ago removing many recursions (in particular on the call graph) in clang and LLVM when we were trying to ship it on “some embedded OS” where the stack size (maybe in threads?) was smaller than on desktop and we were more prone to hit stack overflows. I kept from this experience that “recursion is bad when not statically bounded”, it’s a ticking bomb in the compiler.
You would make it invalid IR to have overly deeply nested regions? I’m concerned that this makes the system fragile: it isn’t the IR reader I’m concerned with, but that each transformation affecting the nesting in any way should check that it won’t break the limit.
I was thinking more like a ‘published limit of the compiler implementation’ than ‘invalid IR’. I guess you’re right that it would have to be checked at multiple points, so maybe not as feasible as I thought. It would be better not to have a limit, though…
Sure, there are lots of these, which is why I assume you’re looking to change this. One algorithm that comes to mind is lib/IR/Verifier.cpp, the verifyBlock/verifyOperation methods are recursive over the region tree, as is verifyDominanceOfContainedRegions below that. Converting the later to being worklist based is probably straightforward, converting the former is a bit uglier I expect.
There are lots and lots of other algorithms though. My general concern is that IR with hundreds of nested regions (e.g. I can imagine this happening due to forced exponential inlining/flattening of a call graph that uses SCF for control flow) is going to have lots of problems working with it. The recursion issue here is tip of the ice-berg.
The stack depth limitations are also only an issue for threaded compilation. In terms of workarounds, could you have the allocated threads get deeper stacks, or do work on these models in the main process? Such an approach would put the complexity onto the weird client doing this, instead of pushing complexity into all the MLIR transformations that are naturally recursive.
Yes this is true, but it is also contextual - what is wrong for SSA graphs can be ok for smaller bounded things, which is why we have the current approach in MLIR.
All this being said, I agree with you that changing this wouldn’t be the end of the world. It is just a small bit of ugliness that makes the codebase less pretty and less idiomatic C++.
Did you mean to say this applies to all the MLIR official repo dialect passes, transforms and analyses utilities? For eg. a utility to check if two memrefs alias is naturally recursive (cf. lib/Analysis/AliasAnalysis/LocalAliasAnalysis.cppcollectUnderlyingAddressValues) – you’ll recurse on view-like ops as one example. It would be pretty painful to disallow recursion here and numerous other similar situations: in fact, this scenario isn’t even covered by the two bullets listed above in the policy but is completely acceptable (it’s recursive on the backward chain use ← def ← op’s use ← def ← …) and well-bounded. (The policy needs an update it appears.)
Have you instead considered:
Getting rid of recursion as needed only on the real core (eg. lib/IR, lib/Pass, lib/Rewrite, etc.);
Getting rid of recursion at other places on-demand and on a case-by-case basis as and when use cases present
I’m a -1 on restricting recursion just to nested types with a blanket ban everywhere else – it doesn’t seem to make sense to me. Recursion should often be allowed on backward use/def/use etc. for example – this has nothing to do with nested regions. You could have a recursive algorithm on a single block’s operations and one could run out of stack space in theory if the block has a sufficiently large number of operations and the SSA data flow is “long chain-like” as opposed to being sparse and spread out. If such adverse situations really arise, one could consider switching to using iterative + heap even for such algorithms as needed.
Do we have to blanket change the rule? Or can we just fix the verifier or other parts on an as-needed basis. The MLIR project has lots of different code in it, and what makes sense for one part need not make sense for the other – I fear that just removing a bullet point in the developer policy won’t move the needle.
I’m aware of multiple algorithms inside TensorFlow itself that use region recursion already, so it seems to not be as widespread of a requirement, even in the TensorFlow world, to impose globally.
How is bounded?
I’m fine updating the policy to allow statically bounded recursion on def-use chain of course. I’m just against “ticking bombs” in the compiler.
I’m confused, you’re saying you’re think recursion should be allowed on def/use chains (contrary to the current policy?) but then you’re explaining how it can stack overflow? The risk for stack overflow in large blocks is the reason this is disallowed right now. I feel I am missing something in your argument here…
I really don’t think this is a good idea: this means just having “ticking bombs” in the compiler that will at some point crash and become some P0 for some user.
Having been in this place where you have to go back after everyone else to remove recursion, I rather not introduce what we know from the start is just inherently fragile (a change in compiler, compiler flags, platform, or in the C++ standard library can change the stack frame size and lead to an overflow on the same IR that would otherwise work fine on your own x86 linux system).
I would really want MLIR to be just robust out of the box for what it is meant to support (and I believe region-based control-flow is really a pattern we’re developing heavily around).
This is because region-based control-flow hasn’t been widely used in Tensorflow, it is only recently that we’ve deployed MLIR by default in Grappler, and last month we tried to switch to region-based control flow by default there but couldn’t because of the verifier crashing, hence why I’m revisiting the policy.
I don’t disagree, just saying that we should probably lead with code that fixes the most prominent issues like the verifier (I don’t think anyone will disagree with improvements when a downstream user has a really critical use case – plus, one can imagine uses of MLIR where very deeply nested regions would be very common, such as when regions don’t represent “code” per se but are more like data – verifier should handle that).
Then we can look into the tradeoff of requiring all of upstream to follow such a rule separately. For example, it seems somewhat strange to require all of upstream MLIR to follow a new rule when the downstream user that critically requires the rule to be followed hasn’t even fully migrated to follow it.
To clarify something here: it seems that by “the downstream user that critically requires the rule to be followed hasn’t even fully migrated to follow it” you refer to TensorFlow: however it should be kept in mind that there are two different MLIR path in TensorFlow: the TFG/Grappler is an entirely different dialect and codebase than the historical TensorFlow dialect and associated transformations. They will remain separated in the future and my work here mainly pertain to TFG/Grappler in isolation of TFLite and other flow where the TensorFlow dialect is used.
I think that it is important to ground on a functional goal here. I agree with Mehdi that unbounded recursion almost always becomes a problem at some point: programs tend to grow over time, MLIR is used in many different domains with many different constraints.
We explicitly endorse threaded compilation, and threads must have limited stacks.
The question is how to bound this. I agree with your point:
It isn’t about recursion over use/def chains, it is about having a bound. LLVM IR has various things in instcombine and other value analyses that are bounded by the magic number “6” for example - these are naturally recursive, and that bound makes it ok.
The question here is whether we consider “region depth” to be a natural bound. So far we have, because recursive algorithms can be nice, and deeply nested regions aren’t typical. The example Mehdi is explaining sounds like an abuse to me ;-), but I think his general point is sound: MLIR gets used in lots of different ways, and saying “thou shalt not have deeply nested regions” is a arbitrary and unnecessary limitation we are imposing that we could remove.
I thought this was pretty clear from my post. By bounded, I didn’t mean “statically bounded”. The bound could be something very specific to the algorithm relying on a specific metric on the IR. For eg., it could be bounded by “the number of memrefs defined in the block”, the number of surrounding affine.for/if ops of an operation, the number of symbol table ops surrounding an operation, the depth of an affine expression tree, etc. (the latter isn’t covered by your two bullets for eg.) Of course, one could construct IRs where all of these structures are large and deep enough to cause a stack exhaustion; but calling all of these as ticking time bombs is a dubious argument. I could make the same argument to show that the parser would crash and that we shouldn’t be using recursive descent – for example, the parser to parse affine expressions is mutually recursive and I could make it and most of the utilities visiting affine expressions crash by generating a million-way nested affine expression. But I’d not change the recursive nature unless there is a real use case.
What’s dubious about this? The policy exists precisely because we considered this absolutely not OK in MLIR to write recursion this way: this is in my opinion not acceptable for any production code.
Techniques like Chris mentioned where the algorithm would bail out after a specific fixed recursion depth is an OK alternative of course.
I think you could apply a similar reasoning to upstream here. For example, things in the IR core (or more core-ish) would have more restrained use of recursion, while the many leaf/leaf-ish dialects can make a case-by-case decision. I don’t see why the SPIR-V dialect should be asked to immediately migrate to a new convention if what they have works for their use cases now.
Overall I think the trend in MLIR upstream is to organize the projects with multiple layers of “core-ness” vs “experimental-ness”. What makes sense for one part of the project on this topic (in terms of migration cost, etc.) might not be the best tradeoff for another part, as you observe for the TF codebases.
What I don’t want is for us to put such a policy in writing, but not make a committed effort to actually make the codebase be actually in line with the policy, or equitably enforcing the policy across the code base. Overall, I would bias towards leading with code – let’s first have the interested parties fix up the most important cases that are blocking them, and see where things are after that.