[RFC] Alternative approach of dealing with implications from comparisons through POS analysis

Motivation
Our optimizer does a lot of work trying to eliminate branches by condition line x < y basing on some dominating facts from branches/assumes/guards. Currently, a lot of work like this is done through SCEV implication engine. However, it has its downsides, at least:

Overall, the motivating example is following: we want to prove that x < val_n from dominating conditions like:

x <= val_1
val1 < val_2
val2 <= val_3
...
val_(n-1) <= val_n

Note that to prove <, it is enough that one of the known condition is < and all others can be <=. For proving x <= val_n, all of them can be <=.

We also want it to work for any values (not just induction variables) and have a reasonable cost in terms of compile time.

Proposal
We can have a pass that does the following:

  • Stores a partially ordered set (POS) of values known by the basic block entry*. This set is a graph, where nodes are values and edges between x and y exists if it’s known that the block entry is guarded by fact x < y or x <= y (edges are marked by exact predicate).
  • In the entry block, the set is empty
  • If a block has one predecessor, and it has terminator like x < y, then POS of current block is POS of its predecessor united with fact x < y from its terminator
  • For any other block, POS is a POS of its immediate dominator, or intersection of POSes of its predecessors (which is a bit harder to implement).
  • We traverse blocks in RPOT and analyze their conditional branch terminators.
    We try to prove it can be eliminated. Imagine the terminator is x <= y. If our graph contains a path from x to y, it can be replaced with if (true). Alternatively, try to prove x > y by finding a path from y to x that contains at least one < (other can be <=), and if so, replace with if (false).
  • If we coudn’t prove either, we continue. The condition from the branch will be added to POS’es of its successors.
  • facts may not only become known on entry, but also somewhere in the middle because of llvm.experimental.guard and llvm.assume, but let’s omit this for simplicity of explanation

Complexity
Let N be number of basic blocks. RPOT traversal is O(N), on each predicate we need to find a path in the graph. The graph has at most N edges (no more than one from each preceding block), so overall complexity is O(N * N).

Ideologically, it looks similar to GVN but for comparison graphs.

Thoughts?

I’ve already mentioned this on the previous thread: Have you taken a look at the ConstraintElimination pass? I believe it does a more sophisticated version of what you’re proposing here (it handles more general linear inequality constraints, not just a straightforward implication chain). From what I’ve seen, the compile-time impact is low enough to consider default enablement.

Yeah, it seems you are right. Let me check if it covers all cases I’m interested in.

Yep, in terms of results, this sounds very similar to https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Scalar/ConstraintElimination.cpp, which also tracks what dominating conditions hold at entry of blocks.

Being able simplify cases like [SCEV] Missing implication opportunity: cannot infer inequality from 2 guarding conditions · Issue #58074 · llvm/llvm-project · GitHub was the reason for adding the pass and I think it should handle both cases.

Enable constraint-elimination by default · Issue #49344 · llvm/llvm-project · GitHub collects known issues that can be solved by ConstraintElimination. Those are only the ones I noticed on the issue tracker, so there may be other known issues where this would help.

Recent compile-time numbers are:

NewPM-O3: +0.24%
NewPM-ReleaseThinLTO: +0.33%
NewPM-ReleaseLTO-g: +0.33%

https://llvm-compile-time-tracker.com/compare.php?from=215c9fa4deac9ec6b4e504843830551f03b60620&to=9f427e7148970d50399d49ade587de47e58ffdbc&stat=instructions

I think given the motivating cases the pass can handle, enabling it by default seems like a reasonable compile-time trade-off to me.

That would be great!

There are 2 areas for improvements on my radar for now:

  • Extending support for removing *with.overflow intrinsics.
  • Extending pointer-based reasoning.

Compile-time results for enabling the LLVM 15 ConstraintElimination pass in rust: rustc performance data

This is an interesting data point, thanks for sharing! I am not entirely sure how to best interpret the data.

It looks like the worst regression for compile-time in the primary benchmarks is ripgrep-13.0.0 incr-patched: println with +0.91% and a size decrease of -0.54%.

This seems somewhat reasonable to me, but would it be possible to provide more details on how to interpret the results in the context of Rust? Is it bad/good/neutral?

Given that this introduces a genuinely new class of optimizations, I think the size of the regressions is acceptable.

Something I was surprised about is that we don’t seem to see much in terms of benefits from the pass – usually we would see some minor improvements on check builds (due to elimination of bounds checks in the bootstrapped rustc compiler). Possibly this is less useful for rust code than one might naively think (e.g. due to prevalence of iterators rather than manual indexing).

OK sounds good.

So far the pass has been focused on code generated from Clang and all investigations were for C/C++ code.

Some of the issues linked in Enable constraint-elimination by default · Issue #49344 · llvm/llvm-project · GitHub are C/C++ versions inspired by code that Rust generates, but to make sure we get the best possible results for Rust code we would need cases where runtime checks aren’t eliminated when they should.

Other interesting cases would be uses of *with.overflow intrinsics that should be removed but aren’t.

I confirm that existing constraint-elimination pass is able to deal with cases I’m interested in. I also made some studies and confrim that the constraint elimination pass triggers very rarely on our code too. It’s like 40 methods affected out of 2500 on a huge Java workload, and in each case it made 1-8 transforms and that’s it.

No conditions created for it to be wide-spreadly useful?

Also here [RFC] Inject invariant conditions to loops to enable unswitching (and constraint elimination?) is a related discussing about how we can trigger constraint-elimination more frequently.

I went ahead and put up a patch with more details up for review that enables the pass: ⚙ D135915 [ConstraintElim] Enable pass by default.

It’s very hard to say without specific cases where enough information is available to remove a check, but the pass doesn’t remove it. Such cases would go a long way to help. There are several reported issues that can be handled by the pass (linked in the patch)

Across MultiSource,SPEC2006 & SPEC2017 it triggers 2k times with -O3, which isn’t massive, but also not nothing.

One limitation at the moment is the pass only reasons about inbounds GEPs, if we can prove that the index is positive, to ensure the GEP doesn’t wrap in the unsigned sense.

This means it won’t perform simplifications on GEPs with size_t indices, unless there’s additional info that the index is positive. I posted Retain fact that index/offsets for GEP are non-negative for `size_t/uint64_t` offsets? to discuss options to improve the situation.

An example from the issue tracker is LLVM doesn't eliminate bounds check inside loop · Issue #51358 · llvm/llvm-project · GitHub