Reassociate and Canonicalization of Expressions

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some assertions:

    assert(!Duplicates.count(Factor) &&
               "Shouldn't have two constant factors, missed a canonicalize");
    assert(NumAddedValues > 1 && "Each occurrence should contribute a value”);

My understanding is that these assertions enforce that when processing an expression tree, we expect that the nodes have already been canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function and it should be deterministic, i.e. if I run Reassociate two times I expect that the second one does not make any change.

However right now we are far from that. I have multiple patches in flight to improve the situation, but the situation is still not perfect. Before going further I’d like some clarification on the expectation of Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does not change the result after the first run?

If the answer is no, then I think the two assertions should be relaxed.

Bonus question: how does InstCombine behave wrt to the two questions above?

Thanks,

Ping.

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some
assertions:

   assert(!Duplicates.count(Factor) &&
              "Shouldn't have two constant factors, missed a
canonicalize");
   assert(NumAddedValues > 1 && "Each occurrence should contribute a
value”);

My understanding is that these assertions enforce that when processing
an expression tree, we expect that the nodes have already been
canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function and
it should be deterministic, i.e. if I run Reassociate two times I expect
that the second one does not make any change.

I don't think deterministic is the right word (as the pass in
deterministic, AFAICT), but I get what you're saying. You infer we should
always arrive at one final solution once the pass has run.

However right now we are far from that. I have multiple patches in
flight to improve the situation, but the situation is still not perfect.
Before going further I’d like some clarification on the expectation of
Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does not
change the result after the first run?

If the answer is no, then I think the two assertions should be relaxed.

IMHO, I think these asserts are a good idea and we should work towards
enforcing these assumptions. One alternative would be to add a slew of
test cases to ensure functionality doesn't regress and relax the asserts,
but I'd still prefer the assertions to be in place.

How you would propose relax the asserts? Are the asserts hitting code in
the wild or is it just your fuzz tests?

I do appreciate your work on cleaning up the pass and I certainly don't
want to block your efforts, but again I do think these assertions are good
in general.

Chad

Hi Chad,

Thanks for you answer.

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some
assertions:

  assert(!Duplicates.count(Factor) &&
             "Shouldn't have two constant factors, missed a
canonicalize");
  assert(NumAddedValues > 1 && "Each occurrence should contribute a
value”);

My understanding is that these assertions enforce that when processing
an expression tree, we expect that the nodes have already been
canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function and
it should be deterministic, i.e. if I run Reassociate two times I expect
that the second one does not make any change.

I don't think deterministic is the right word (as the pass in
deterministic, AFAICT), but I get what you're saying. You infer we should
always arrive at one final solution once the pass has run.

Yes.

However right now we are far from that. I have multiple patches in
flight to improve the situation, but the situation is still not perfect.
Before going further I’d like some clarification on the expectation of
Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does not
change the result after the first run?

If the answer is no, then I think the two assertions should be relaxed.

IMHO, I think these asserts are a good idea and we should work towards
enforcing these assumptions. One alternative would be to add a slew of
test cases to ensure functionality doesn't regress and relax the asserts,
but I'd still prefer the assertions to be in place.

I’m all for keeping the assertion, but do we agree that the answer is “yes” to my two questions above?
Enforcing a “final” result after one run can be quite involved, and I’m asking for clarification on the expectation here before spending too much time on that.

How you would propose relax the asserts? Are the asserts hitting code in
the wild or is it just your fuzz tests?

Only the fuzzer, so I don’t have a high priority on fixing this. But since I started, I rather finish the job.

By relaxing the assertions, I meant being tolerant to expression for which nodes have not been fully canonicalized.
Again, I’m not suggesting we actually relax these assertions. Just that we clarify the expectation either way.

Thanks,

Mehdi

FWIW, I believe the term you’re looking for is idempotent.

—Owen

Wikipedia confirms that’s the word, thanks :slight_smile:

Mehdi

Hi Chad,

Thanks for you answer.

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some
assertions:

  assert(!Duplicates.count(Factor) &&
             "Shouldn't have two constant factors, missed a
canonicalize");
  assert(NumAddedValues > 1 && "Each occurrence should contribute a
value”);

My understanding is that these assertions enforce that when processing
an expression tree, we expect that the nodes have already been
canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function and
it should be deterministic, i.e. if I run Reassociate two times I
expect
that the second one does not make any change.

I don't think deterministic is the right word (as the pass in
deterministic, AFAICT), but I get what you're saying. You infer we
should
always arrive at one final solution once the pass has run.

Yes.

However right now we are far from that. I have multiple patches in
flight to improve the situation, but the situation is still not
perfect.
Before going further I’d like some clarification on the
expectation of
Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does
not
change the result after the first run?

If the answer is no, then I think the two assertions should be
relaxed.

IMHO, I think these asserts are a good idea and we should work towards
enforcing these assumptions. One alternative would be to add a slew of
test cases to ensure functionality doesn't regress and relax the
asserts,
but I'd still prefer the assertions to be in place.

I’m all for keeping the assertion, but do we agree that the answer is
“yes” to my two questions above?
Enforcing a “final” result after one run can be quite involved, and
I’m asking for clarification on the expectation here before spending too
much time on that.

IMO, I think these are reasonable expectations and if you can enforce
these expectations for the common cases, great. However, if it doubles
your effort to get the remaining 1% exposed by fuzzing, then I don't think
it would be worth your time.

Personally, I would like to invest more time in adding support for vector
operations (I had a patch up on Phabricator at one point). I'm sure there
are also lots of other types of simplifications that could be added to
this pass.

Just my opinion though.. :slight_smile:

Hi Chad,

Thanks for you answer.

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some
assertions:

  assert(!Duplicates.count(Factor) &&
             "Shouldn't have two constant factors, missed a
canonicalize");
  assert(NumAddedValues > 1 && "Each occurrence should contribute a
value”);

My understanding is that these assertions enforce that when
processing
an expression tree, we expect that the nodes have already been
canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function
and
it should be deterministic, i.e. if I run Reassociate two times I
expect
that the second one does not make any change.

I don't think deterministic is the right word (as the pass in
deterministic, AFAICT), but I get what you're saying. You infer we
should
always arrive at one final solution once the pass has run.

Yes.

However right now we are far from that. I have multiple patches in
flight to improve the situation, but the situation is still not
perfect.
Before going further I’d like some clarification on the
expectation of
Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does
not
change the result after the first run?

If the answer is no, then I think the two assertions should be
relaxed.

IMHO, I think these asserts are a good idea and we should work towards
enforcing these assumptions. One alternative would be to add a slew of
test cases to ensure functionality doesn't regress and relax the
asserts,
but I'd still prefer the assertions to be in place.

I’m all for keeping the assertion, but do we agree that the answer is
“yes” to my two questions above?
Enforcing a “final” result after one run can be quite involved, and
I’m asking for clarification on the expectation here before spending
too
much time on that.

IMO, I think these are reasonable expectations and if you can enforce
these expectations for the common cases, great. However, if it doubles
your effort to get the remaining 1% exposed by fuzzing, then I don't think
it would be worth your time.

Personally, I would like to invest more time in adding support for vector
operations (I had a patch up on Phabricator at one point). I'm sure there
are also lots of other types of simplifications that could be added to
this pass.

Just my opinion though.. :slight_smile:

In other words, I don't think we have to implement the pass is such as way
as to guarantee idempotence; we have bigger fish to fry.

Hi Chad,

Thanks for you answer.

Hi,

I encountered some bugs in Reassociate [1] where we are hitting some
assertions:

assert(!Duplicates.count(Factor) &&
            "Shouldn't have two constant factors, missed a
canonicalize");
assert(NumAddedValues > 1 && "Each occurrence should contribute a
value”);

My understanding is that these assertions enforce that when
processing
an expression tree, we expect that the nodes have already been
canonicalized by Reassociate.

I infer that there should be *one* canonicalization for a function
and
it should be deterministic, i.e. if I run Reassociate two times I
expect
that the second one does not make any change.

I don't think deterministic is the right word (as the pass in
deterministic, AFAICT), but I get what you're saying. You infer we
should
always arrive at one final solution once the pass has run.

Yes.

However right now we are far from that. I have multiple patches in
flight to improve the situation, but the situation is still not
perfect.
Before going further I’d like some clarification on the
expectation of
Reassociate:

- Is there one expected canonicalization in all cases?
- Do we expect that running multiple times Reassociate in a row does
not
change the result after the first run?

If the answer is no, then I think the two assertions should be
relaxed.

IMHO, I think these asserts are a good idea and we should work towards
enforcing these assumptions. One alternative would be to add a slew of
test cases to ensure functionality doesn't regress and relax the
asserts,
but I'd still prefer the assertions to be in place.

I’m all for keeping the assertion, but do we agree that the answer is
“yes” to my two questions above?
Enforcing a “final” result after one run can be quite involved, and
I’m asking for clarification on the expectation here before spending
too
much time on that.

IMO, I think these are reasonable expectations and if you can enforce
these expectations for the common cases, great. However, if it doubles
your effort to get the remaining 1% exposed by fuzzing, then I don't think
it would be worth your time.

Personally, I would like to invest more time in adding support for vector
operations (I had a patch up on Phabricator at one point). I'm sure there
are also lots of other types of simplifications that could be added to
this pass.

Just my opinion though.. :slight_smile:

In other words, I don't think we have to implement the pass is such as way
as to guarantee idempotence

Yet the current assertions rely on it.

; we have bigger fish to fry.

I like the metaphor :slight_smile:

Mehdi