I came across this issue in the context of
Thanks, Sanjay, for starting this discussion.)
or keep additional ones
out of instcombine,
open questions for me would be
1. Since -reassociate isn't a fixed point pass,
This is fixable, fwiw, without fixpointing it.
Depends on specifically which part you would like to know about
Maybe I misunderstood what you meant by "This is fixable". Did you mean
that we won't somehow need to fixpoint between instcombine and reassociate,
or that the specific motivating examples from the above differentials
are foldable without fixpointing?
If by fixpointing you mean "fixpointing reassociate and instcombine",
then yes, that is fixable without fixpointing reassociate and instcombine,
but would require rewriting instcombine
If the latter, that may be the case. The concern was that we may
encounter examples that may need many more iterations, if not fixpointing.
As long as it's feasible to fixpoint between instcombine and
reassociate, it seems to work, but I guess that would probably need
some pass management change.
we might need to repeat "-instcombine -reassociate" multiple times
down to what we want (relating to my comment here
<https://reviews.llvm.org/D46336#1087082>). I assumed this isn't
not what we want to do
? My impression is we don't do a fixed-point with passes?
Well, i mean there is no practical difference between passes that we
fixpoint externally and fixpoint internally.
I had the following in mind: Does the pass manager support fixpointing
externally? Is there any performance difference? Are people okay with that
But if there is no practical difference, I don't see any problem with
Since -reassociate needs to come up with one operand order (at
least currently as the only reassociate pass), would there exist a
single, unique operand order that would enable all
reassociative/commutative foldings that we want?
In what way?
Are you asking whether there is a single reassociation order that
makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.
Does this rephrase help: with the motivating examples (like
and-of-shifts or bit check patterns) from the above differentials in mind,
can we come up with a single reassociation order that solves all
those and all the others that may come up in the future? Would we need
different reassociation orders to fold different patterns?
It doesn't quite help.
When stated that generally, there can be no such ordering at all,
that's easy to prove. It is a statically undecidable problem.
There is however, a different question and answer to a few related
problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or
nearly-maximal set of folds/graph transforms that could be applied to a
given set of code in a sane and principled way -> yes
(see, e.g., http://www.cs.cornell.edu/~ross/publications/eqsat/)
2. Is there a way to determine all expressions in the program as it
exists that are equivalent or equivalent under constant time constant
folding/reassociation, in a reasonable time bound -> yes
(not a single easy link, happy to talk about it)
Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists
that are equivalent or could be made equivalent through any type of folding
that one can think up?
The answer to that is "no", it's provable that this is not statically
decidable, so the time bound doesn't matter
You have to limit the possible folding/evaluation you apply in various
ways to make this decidable, and then further limit it to make the time
This all quickly devolves into herbrand equivalence and it's variations.
Let me try one more time May we need multiple reassociate passes to
fold different reassociative patterns?
A longer version: If Sanjay wants a particular reassociative pattern to
be folded (D45842), Omer wants another particular reassociative pattern
to be folded (D41574), and I want yet another particular reassociative
pattern to be folded (D46336), would we potentially need three
different reassociate passes with each combined with instcombine, rather
than just one that may be able to somehow handle those cases in one
shot, (assuming we don't want to put those in instcombine)?
And it sounds like the answer is yes?
If you take the current instcombine as a base, then yes, that is correct.
If you are willing to rearchitect instcombine, the answer is no, it's
possible to do this all in a single pass in a relatively sane way.
I assume by rearchitect, you mean a major rewrite as per this comment: "Is
there a way to determine all expressions in the program as it exists that
are equivalent or equivalent under constant time constant
folding/reassociation, in a reasonable time bound -> yes". Any pointer or
time to chat?
I'm happy to do both.
I think that an approach like
has a merit: it would adds a bit of complexity, but would not require:
1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out
how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern
If you look at the diffs in the existing .ll files in
, it helps fold some previously-unfolded reassociation patterns beyond
the bit check patterns that it originally targeted.
Sure, and it does so by adding another O(N) cost to evaluation in each
case. Instcombine doesn't even do lazy reevaluation through tracking
dependencies, so it'll do so a lot of times as well.
To me, that's not a good tradeoff, especially given how slow instcombine is
*already*. The code it produces is "good enough" to stop for a while and
do something else and not suffer horribly in performance.
Let me ask a different question:
At what point would anyone here be willing to stop adding things to
instcombine and start doing something else instead, instead of waiting for
someone else to do it?
As far as i can tell, the answer is: "never", which makes most of these
discussions just pointless rehashes as we slowly repeat the same disaster
that became gcc's instruction combiner
If the answer is "something", great, i'll set a mail filter and ignore
these threads until that something happens
Personally, in my experience people will never do more here unless pushed
somewhat, or the thing becomes such a complete disaster no one wants to
 Last year i computed the "improvement in performance on applications"
due to instcombine for a bunch of google apps and open source apps that had
easy to use benchmarks (IE I isolated about two years of instcombine
changes and made them to a current compiler piece by piece while measuring
I also computed the compile time increase in single instcombine passes over
the same time period.
On x86, but the numbers basically said we were basically gaining nearly
nothing for high cost. IE our drive for better looking output does not
appear to translate into any real gains that i can find. Either
improvements to other opts hid them, or they simply didn't matter on the
processors i tested on.
Certainly, apps/workloads/architectures may vary here, and my goal is not
to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective
on instcombine was still "reasonable", not to do a true scientific
I didn't have time/energy/etc to run it elsewhere, and again, my goal was
not to give certainty/try to give exact percentages.