visitShiftByConstant of DAGCombiner

Hi,

I recently worked on doing constant canonicalisation in DAGCombine level to make as more folding as possible.
And I found the behavior of “visitShiftByConstant” is same with what I have done, just only it only be enabled under
a shift context. But I think maybe we could expand it to as more case as possible.

The original issuse of it:

For code as below:

unsigned array[4];

unsigned foo(unsigned long x) {

return array[(x>>2)&3ul];

sequence before this canonicalisation (ARM):

foo:

.fnstart

@ BB#0: @ %entry

lsrs r0, r0, #2

movs r1, #3

ands r1, r0

lsls r0, r1, #2

ldr r1, .LCPI0_0

ldr r0, [r1, r0]

bx lr

.p2align 2

sequence after this canonicalisation:

foo:

.fnstart

@ BB#0: @ %entry

movs r1, #12

ands r1, r0

ldr r0, .LCPI0_0

ldr r0, [r0, r1]

bx lr

.p2align 2

This canonicalisation makes shift folding possible. But I wonder if only shift context could benifit from this.When I worked on
CoreMark optimization recently, I found a case similar to below:

define i32 @test(i32 %v) {
entry:
%a = and i32 %v, 65534
%b = lshr i32 %a, 1
%c = and i32 %v, 65535
%d = lshr i32 %c, 1
%e = add i32 %b, %d
ret i32 %e
}

It would be profitable as well if we could enable the canonicalisation on it.
sequence before this canonicalisation (ARM):

test:
.fnstart
@ BB#0: @ %entry
movw r1, #65534
and r1, r0, r1
ubfx r0, r0, #1, #15
add r0, r0, r1, lsr #1
bx lr

sequence after this canonicalisation:

test:
.fnstart
@ BB#0: @ %entry
ubfx r0, r0, #1, #15
add r0, r0, r0
bx lr

But when I tried to expand the condition to this case, there are lots of various regression fails.
The expanding may prevent some other possible folding or something. Maybe I missed something or
the expanding is not reasonable at all.but I didn’t figure out yet.
Any suggestions? Any inspiration is highly appreciated.

And there maybe some other cases could be included also. If you have any findings, please let
me know as well.

Thank you very much! :slight_smile:

Hi Jojo,

I think this canonicalisation is correct, at least in this case, so
it's quite possible that the regressions are just bad tests, or you
actually made the code better in other places that weren't expecting
it.

I'm adding some folks that have worked on the DAG combiner recently to
have a look, but it would be good to know what regressions were there
to see if they're really regressions (ie. worse code) or just
different/better code.

You may also have missed a check on your code (which would wrongly
combine constants). Can you create a review on Phab of your proposed
change?

Also, if you could list the regressions and see if they have a common
pattern (ie. "tests expecting an additional mask failed", or "the
wrong mask was applied"), it would become clearer if they are real
regressions or not.

cheers,
--renato

Thanks very much, Renato!

I looked into the regressions, there’re many cases have similar context as the case I mentioned but N just has only one use.
In this case, the canonicalisation won’t make it more profitable, will only prevent the possible folding or
make the sequence is not expected.
The regressions be eliminated, after I limit the expanding to not “N->hasOneUse()”.

And now only one regression left:

FAIL: LLVM :: CodeGen/Thumb2/machine-licm.ll (18865 of 29222)
******************** TEST ‘LLVM :: CodeGen/Thumb2/machine-licm.ll’ FAILED ********************
Script:

Hi Jojo,

This is just a bad check line. :slight_smile:

Instead of:

; CHECK: movw {{(r[0-9])|(lr)}}, #32768

it should have been:

; CHECK: movw {{(r[0-9]+)|(lr)}}, #32768

(ie. add the extra '+' at the end, so you allow more than one number).

If that's the last remaining problem, I suggest you fix the CHECK
line, create the tests necessary to test your pass, and make sure that
the LICM code makes sense (it doesn't look worse to me, but the
instructions are different, just need to make sure they have the same
semantics).

cheers,
--renato

Hi Renato,

There will be two more check fail one after another(below), after fix this one.
But we could remove the check lines in my opinion, as the check fails are caused by
different instructions sequence after canonicalisation. And the canonicalisation won’t
affect the LICM.