Simplifying selects + arm stuff

clattner · November 11, 2010, 7:39am

<cc'ing llvmdev>

Hi Chris, in commit 12800 you introduced the following transform:

Implement select.ll:test12*

This transforms code like this:

    %C = or %A, %B
    %D = select %cond, %C, %A
into:
    %C = select %cond, %B, 0
    %D = or %A, %C

Since B is often a constant, the select can often be eliminated. In any case,
this reduces the usage count of A, allowing subsequent optimizations to
happen.

Right, I still think this is goodness.

Now consider the following example:

%z = and i32 %x, %y
%s = select i1 %cond, i32 %y, i32 %z
%r = and i32 %x, %s

Here %r could be simplified to
%r = and i32 %x, %y
(i.e. the same as %z) so replaced with %z everywhere.

We even have logic to do this (ThreadBinOpOverSelect, which I just added) in
InstructionSimplify. The nice thing about InstructionSimplify is that it
doesn't ever create any new instructions, so is pretty much always a win.
The problem is that the logic never fires: your instcombine logic gets there
first, and transforms the example into

%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y
%r = and i32 %s, %x

which is not nearly as good.

Yeah, that's not nearly as good as "%r = and i32 %x, %y"

Note that:

%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y

Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way.

Do you have any thoughts for how to get the best of both worlds? How about
only applying your transform when one of the operands (eg B) is a constant?

I don't think that's a great solution, because reducing # uses of a value is general goodness (increases odds that "hasOneUse()" xforms will kick in for example).

Alternatively, it could look at uses of the select and try to simplify those
first using InstructionSimplify (not sure if instcombine is allowed to
simplify a different instruction to the one it is visiting), or check to see
if some or all uses of the select would be simplified by InstructionSimplify,
and not perform the transform if so.

Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate?

This is conceptually no different than (x & y & x) -> (x & y), it's just that in this case you have: (x & y & (x or -1)) which should fold to (x & y) just the same.

-Chris

Duncan_Sands · November 11, 2010, 3:43pm

Hi Chris,

We even have logic to do this (ThreadBinOpOverSelect, which I just added) in
InstructionSimplify. The nice thing about InstructionSimplify is that it
doesn't ever create any new instructions, so is pretty much always a win.
The problem is that the logic never fires: your instcombine logic gets there
first, and transforms the example into

  %z = select i1 %cond, i32 -1, i32 %x
  %s = and i32 %z, %y
  %r = and i32 %s, %x

which is not nearly as good.

...

Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate?

This is conceptually no different than (x& y& x) -> (x& y), it's just that in this case you have: (x& y& (x or -1)) which should fold to (x& y) just the same.

I actually came to the same conclusion, only since I didn't know about
the reassociate pass I planned to do this in instcombine I will give
it a go.

Thanks for your help,

Duncan.

Duncan_Sands · November 11, 2010, 4:30pm

Hi Chris,

Instead of turning this into a phase ordering issue, I'd rather increase the power of the folding logic to catch the general form. In this case, instead of handling this in instsimplify, why not handle this in reassociate?

This is conceptually no different than (x& y& x) -> (x& y), it's just that in this case you have: (x& y& (x or -1)) which should fold to (x& y) just the same.

in fact this already works, at least for the given example:
-instcombine -reassociate -instcombine results in:

original:
   %z = and i32 %x, %y
   %s = select i1 %cond, i32 %y, i32 %z
   %r = and i32 %x, %s

after first instcombine:
   %z = select i1 %cond, i32 -1, i32 %x
   %s = and i32 %z, %y
   %r = and i32 %s, %x

after reassociate:
   %z = select i1 %cond, i32 -1, i32 %x
   %s = and i32 %y, %x
   %r = and i32 %s, %z

after second instcombine:
%s = and i32 %y, %x

This might be a coincidence, but I am happy for the moment

Ciao,

Duncan.

clattner · November 11, 2010, 5:09pm

Ok! If this "conditional and" idiom is common, it would still be worth teaching reassociate about it, so we don't run into phase order issues.

-Chris

Evan_Cheng1 · November 12, 2010, 7:04pm

Yeah, that’s not nearly as good as “%r = and i32 %x, %y”

Note that:

%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y

Is a “conditional and”. It would be interesting to know if the ARM backend gets this as a single predicated ‘and’ instruction (similarly for the ‘or’ and ‘xor’ version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way.

On ARM, an instruction predicated on false predicate is still executed so it’s frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation.

By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I’m not surprised it often turns branching code performs better. ICC also almost never generates conditional moves.

Evan

clattner · November 12, 2010, 7:09pm

That may be, but I can't imagine that:

$ cat t.ll
define i32 @test(i32 %a, i32 %b, i32 %x, i32 %y) nounwind {
%cond = icmp slt i32 %a, %b
%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y
ret i32 %s
}
$ llc t.ll -o - -march=arm
_test: @ @test
@ BB#0:
  cmp r0, r1
  mvn r12, #0
  movlt r2, r12
  and r0, r2, r3
  bx lr

is better than a cmp + conditional and + bx.

-Chris

Evan_Cheng1 · November 12, 2010, 7:46pm

%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y

Is a "conditional and". It would be interesting to know if the ARM backend gets this as a single predicated 'and' instruction (similarly for the 'or' and 'xor' version of these patterns). I bet not, which is bad if instcombine is canonicalizing this way.

On ARM, an instruction predicated on false predicate is still executed so it's frequently undesirable. Because llvm canonicalize to select instruction, it already generates significantly more predicated instructions than gcc. We have seen some regressions due to overly aggressive select formation.

By definition select requires both source operands to be evaluated. Given how good branch predicators are these days, I'm not surprised it often turns branching code performs better. ICC also almost never generates conditional moves.

That may be, but I can't imagine that:

$ cat t.ll
define i32 @test(i32 %a, i32 %b, i32 %x, i32 %y) nounwind {
%cond = icmp slt i32 %a, %b
%z = select i1 %cond, i32 -1, i32 %x
%s = and i32 %z, %y
ret i32 %s
}
$ llc t.ll -o - -march=arm
_test: @ @test
@ BB#0:
  cmp r0, r1
  mvn r12, #0
  movlt r2, r12
  and r0, r2, r3
  bx lr

is better than a cmp + conditional and + bx.

This should be

        cmp r0, r1
        movlt.w r2, #-1 @ or mvnlt r2, #0
        and.w r0, r2, r3
        bx lr

which we gets right in Thumb2 mode (I need to check why it's not matching in ARM mode). How can we use a conditional and here? The result is either (y & -1) or (y & x), the "and" is not conditional.

Evan

clattner · November 12, 2010, 7:53pm

y&-1 == y. There is no need to materialize -1 as a constant.

-Chris

Evan_Cheng1 · November 12, 2010, 8:02pm

This should be

      cmp r0, r1
      movlt.w r2, #-1 @ or mvnlt r2, #0
      and.w r0, r2, r3
      bx lr

which we gets right in Thumb2 mode (I need to check why it's not matching in ARM mode). How can we use a conditional and here? The result is either (y & -1) or (y & x), the "and" is not conditional.

y&-1 == y. There is no need to materialize -1 as a constant.

Yes, *face palm*. That will require target specific dag combine.

Evan

Topic		Replies	Views
[RFC][InstSimplify] Optimize compares with multiple selects as operands IR & Optimizations llvm	0	201	May 18, 2023
llvm 'select' instruction LLVM Dev List Archives	10	83	April 20, 2013
Conditonal to constant promotion? LLVM Dev List Archives	1	45	August 22, 2008
Disabling select instructions LLVM Dev List Archives	4	104	February 1, 2020
[InstCombine] Simplify `Select i1 (and/or/xor %x %y), i32 (and/or/xor %x %y), i32 (and/or/xor %x, %y)` IR & Optimizations	4	224	November 24, 2023

Simplifying selects + arm stuff

Related Topics