I tried searching for small bitwise expressions using AND OR XOR and
NOT that "opt -O3" fails to optimize to a simpler form. For example:
(A^B)|~A --> ~(A&B)
A>B>(A^B) --> A|B
((A|B)^C)&A --> A&~C (actually I don't understand why this one is OK,
even if B might be poison, but alive2 says it is OK)
I can file bugs for specific examples but I wondered if there was any
interest in a more systematic approach to finding these missed
optimizations? My approach was to write some very hacky Python
(bitwise_missed_opts · GitHub,
please don't read it) to exhaustively generate programs; then take all
the programs with a given truth table, run them all through "opt -O3",
and check that they all got optimized to the same (or at least equally
Nice test program! I don’t know python well enough to understand that yet, but no reason to be ashamed of hacks.
If we’re missing 2 variable logic reductions, those are common/easy enough that we should have those in instcombine. So yes, it would be great if you file bugs for those, and they could be marked with the ‘beginner’ keyword too as a potential easy patch for newcomers to LLVM.
There’s also a set of recent patch proposals for 3 variable logic reductions – for example, https://reviews.llvm.org/D112276 – these were inspired by a logic lookup table function as discussed in the comments.
The extra-use and commuted variations make these harder. IMO, this is where there should be a dedicated pass/solver for logic folds if we want those optimizations to be complete. Otherwise, there’s an explosion of possible pattern match combinations.
If we're missing 2 variable logic reductions, those are common/easy enough that we should have those in instcombine.
I think I agree with this. There should be 16 canonical forms for the
bitwise functions on two variables, including degenerate cases like
"false" as well as the interesting ones like A&~B, ~A^B etc. It seems
reasonable that we should be able to simplify AND OR and XOR on any
pair of canonical forms, and produce another canonical form as the
For three variables there would be 256 canonical forms, which seems
far less tractable.
For any number of inputs, you can build a look-up table (LUT).
Some processor architectures have LUT instructions. For those, your built LUT can be used directly.
Otherwise, for each of the 256 possible LUTs you can precompute the “optimal” representation for the target. This may depend on what the target offers - e.g. some processors have an “and not” instruction that may allow a more compact sequence. For any of the 256 LUTs (and counterparts arising from the permutation of inputs) you can cache the resulting target-mapped tree.
The only issue I see is undef/poison in an operand. Can this break an optimization, esp. if there is internal reconvergence in the logic tree?
For expressions of more than 2-3 distinct variables you can use technology mappng to construct an arbitrarily deep tree of LUTs, and then technology-map to the target architecture. It’s likely that the few applications that would benefit already do some of this optimization internally.
A logically correct transformation “(~a & b & c) | ~(b | c) → ~((a & b) | (b ^ c))” has issue with undef inputs. I.e. our logic isn’t really binary, our truth tables shall also have undef columns. This makes logical solver prospects even more questionable. Then I am not sure if we shall count poison as well.
Speaking of LUT with 256 expressions it seems to be possible to optimize in a much smaller set of transforms than 256. I have few and I see that the rest of the 256 cases started to converge.
Now that that one is fixed (thanks!) my script can't find any more
cases that we fail to optimize where the input has two variables and
Moving on to the two-variable four-instruction cases, there are
various missing simplifications like:
A | B | ~(A ^ B) --> -1
(A ^ B) & (~A | B) --> ~A & B
(((A | B) ^ B) & A) ^ (A | B) --> B
(((A & B) ^ A) & B) | (A & B) --> A & B
Taking the last of these as an example: Compiler Explorer
I'm surprised that we don't simplify it in stages:
1. (A & B) ^ A --> A & ~B // perhaps this is not considered profitable?
2. (A & ~B) & B --> 0 // perhaps we don't manage to reassociate this
to see that it contains B & ~B ?
3. 0 | (A & B) --> A & B
(A & B) ^ A → A & ~B // perhaps this is not considered profitable?
If the ‘and’ has one use, we do prefer the form with ‘not’ because it removes a use of ‘A’, it’s better for analysis, and it’s likely better for codegen since multiple targets have an ‘andn’ instruction.