Motivation
Our optimizer does a lot of work trying to eliminate branches by condition line x < y basing on some dominating facts from branches/assumes/guards. Currently, a lot of work like this is done through SCEV implication engine. However, it has its downsides, at least:
It is known to be embarassingly slow. Each branch makes a query that traverses up the dominator tree, and each query may trigger a non-linearly-growing chain of queries required to prove it. It’s hard to evaluate overall complexity, but it’s huge.
Note that to prove <, it is enough that one of the known condition is < and all others can be <=. For proving x <= val_n, all of them can be <=.
We also want it to work for any values (not just induction variables) and have a reasonable cost in terms of compile time.
Proposal
We can have a pass that does the following:
Stores a partially ordered set (POS) of values known by the basic block entry*. This set is a graph, where nodes are values and edges between x and y exists if it’s known that the block entry is guarded by fact x < y or x <= y (edges are marked by exact predicate).
In the entry block, the set is empty
If a block has one predecessor, and it has terminator like x < y, then POS of current block is POS of its predecessor united with fact x < y from its terminator
For any other block, POS is a POS of its immediate dominator, or intersection of POSes of its predecessors (which is a bit harder to implement).
We traverse blocks in RPOT and analyze their conditional branch terminators.
We try to prove it can be eliminated. Imagine the terminator is x <= y. If our graph contains a path from x to y, it can be replaced with if (true). Alternatively, try to prove x > y by finding a path from y to x that contains at least one < (other can be <=), and if so, replace with if (false).
If we coudn’t prove either, we continue. The condition from the branch will be added to POS’es of its successors.
facts may not only become known on entry, but also somewhere in the middle because of llvm.experimental.guard and llvm.assume, but let’s omit this for simplicity of explanation
Complexity
Let N be number of basic blocks. RPOT traversal is O(N), on each predicate we need to find a path in the graph. The graph has at most N edges (no more than one from each preceding block), so overall complexity is O(N * N).
Ideologically, it looks similar to GVN but for comparison graphs.
I’ve already mentioned this on the previous thread: Have you taken a look at the ConstraintElimination pass? I believe it does a more sophisticated version of what you’re proposing here (it handles more general linear inequality constraints, not just a straightforward implication chain). From what I’ve seen, the compile-time impact is low enough to consider default enablement.
This is an interesting data point, thanks for sharing! I am not entirely sure how to best interpret the data.
It looks like the worst regression for compile-time in the primary benchmarks is ripgrep-13.0.0 incr-patched: println with +0.91% and a size decrease of -0.54%.
This seems somewhat reasonable to me, but would it be possible to provide more details on how to interpret the results in the context of Rust? Is it bad/good/neutral?
Given that this introduces a genuinely new class of optimizations, I think the size of the regressions is acceptable.
Something I was surprised about is that we don’t seem to see much in terms of benefits from the pass – usually we would see some minor improvements on check builds (due to elimination of bounds checks in the bootstrapped rustc compiler). Possibly this is less useful for rust code than one might naively think (e.g. due to prevalence of iterators rather than manual indexing).
I confirm that existing constraint-elimination pass is able to deal with cases I’m interested in. I also made some studies and confrim that the constraint elimination pass triggers very rarely on our code too. It’s like 40 methods affected out of 2500 on a huge Java workload, and in each case it made 1-8 transforms and that’s it.
No conditions created for it to be wide-spreadly useful?
It’s very hard to say without specific cases where enough information is available to remove a check, but the pass doesn’t remove it. Such cases would go a long way to help. There are several reported issues that can be handled by the pass (linked in the patch)
Across MultiSource,SPEC2006 & SPEC2017 it triggers 2k times with -O3, which isn’t massive, but also not nothing.
One limitation at the moment is the pass only reasons about inbounds GEPs, if we can prove that the index is positive, to ensure the GEP doesn’t wrap in the unsigned sense.