removing unnecessary ZEXT

Hi,

Within a basic block I can remove unnecessary register copies + zero sign extensions of unsigned-8bit-loaded values by implementing isZExtFree() for ISD::LOAD nodes.
…But not between basic blocks.

The first block does a CopyFromReg of the unsigned-8bit-loaded vreg1 into a new vreg2.
The second block then does a unnecessary zext to vreg2.
What I want is the 2nd block to use the original vreg1!
What I am getting is one extra register clobber and two extra instructions.

I have looked at other targets to see what they do but can’t see what I am missing.

Help please!
Thank you

Robert

Hi,

A bit more information.
I believe my problem lies with the fact that the load is left as ‘anyext from i8’.
On the XCore target we know this will become an 8bit zext load - as there is no 8bit sign extended load!
If BB#1 were to force the load to a “zext from i8” would this information be available in BB#2?

BB#1:
0x268c1b0: i32 = Register %vreg1 [ID=3]
0x2689d80: i32,ch = load 0x265d380, 0x2689f80, 0x268b9b0<LD1[%s2], anyext from i8> [ORD=4] [ID=5]
0x2689e80: ch = CopyToReg 0x265d380, 0x268c1b0, 0x2689d80 [ORD=4] [ID=6]

BB#2:
0x268a480: i32 = Register %vreg1 [ID=1]
0x268bbb0: i32,ch = CopyFromReg 0x265d380, 0x268a480 [ORD=6] [ID=7]
0x268a080: i32 = Constant<255> [ID=6]
0x268bdb0: i32 = and 0x268bbb0, 0x268a080 [ORD=6] [ID=8]
0x2689e80: i32 = Constant<0> [ID=5]
0x268c1b0: ch = seteq [ID=2]
0x268a880: i32 = setcc 0x268bdb0, 0x2689e80, 0x268c1b0 [ORD=6] [ID=9]

Robert

The instruction selector only operates within a block. An IR CodeGenPrepare pass runs first and attempts to hoist the zext into the load’s block if it sees a legal zextload pattern (isLoadExtLegal). I’m not sure why the zero_extend isn’t hoisted in your case.
-Andy

Hi Andrew,

Thank you for the suggestion.
I’ve looked at CodeGenPrepare.cpp and MoveExtToFormExtLoad() is never run.

I also notice that the ARM target produces the same additional register usage (copy) and zero extending (of the copy).
(See the usage of r3 &r5 and also r12 & r4 in attached file arm-strcspn.s, my understanding is that ‘ldrb’ is zero extending.)

Here is a simplified example:
void test(const char *c) {
do {
if (!*c) break;
++c;
} while (*c);
}

And in IR form:
define void @test(i8* nocapture %c) {
entry:
%.pre = load i8* %c, align 1
br label %do.body
do.body:
%0 = phi i8 [ %.pre, %entry ], [ %1, %if.end ]
%c.addr.0 = phi i8* [ %c, %entry ], [ %incdec.ptr, %if.end ]
%tobool = icmp eq i8 %0, 0
br i1 %tobool, label %do.end, label %if.end
if.end:
%incdec.ptr = getelementptr inbounds i8* %c.addr.0, i64 1
%1 = load i8* %incdec.ptr, align 1
%tobool1 = icmp eq i8 %1, 0
br i1 %tobool1, label %do.end, label %do.body
do.end:
ret void
}

The problem seems to be that an icmp becomes isolated in a different basic block to the originators of the vreg it uses viz:
entry:
%.pre = load i8* %c, align 1
do.body:
%0 = phi i8 [ %.pre, %entry ], [ %1, %if.end ]
%tobool = icmp eq i8 %0, 0
if.end:
%1 = load i8* %incdec.ptr, align 1

When a vreg is promoted during legalization, I assume there is knowledge that the top bits are zero.
(assuming a ZEXTLOAD will be used - the only option for the xcore target)
But when a vreg is subsequently truncated, can the top bits be known? viz:
BB#1 ‘test:do.body’
0x2c8d190: i32 = Register %vreg3
0x2c8d590: i32,ch = CopyFromReg 0x2c5fcc0, 0x2c8d190 [ORD=4]
0x2c8ce90: i8 = truncate 0x2c8d590 [ORD=4]
0x2c8d990: i8 = Constant<0>
0x2c8d890: ch = seteq
0x2c8d390: i1 = setcc 0x2c8ce90, 0x2c8d990, 0x2c8d890 [ORD=4]

Is this what is happening?
Can the Type-legalizer discover this information when lowering the truncate?
Is the problem that this knowledge is not know in the DAG, only in the IR?

Robert

arm-strcspn.s (1.67 KB)

Hi

I’ve looked a bit more at the SelectionDAGBuilder.cpp and the use of AssertZext.

The isZExtFree() is called by RegsForValue::getCopyToRegs().
This correctly marks vreg as ZEXT when an unsigned 8 load.

The function that adds the ISD::AssertZext is RegsForValue::getCopyFromRegs().
It does this by checking the LOI->KnownZero.countLeadingOnes() for integer Vregs.
Hence, the EXT type needs to be known (and it to be in GetLiveOutRegInfo?)

If the vreg is a PHI node instruction, getCopyFromRegs() does not look at the possible originators.
Is that because some originator vreg are in later BasicBlocks, so getCopyToRegs() has not yet been called and ZEXTed!
Is there a mechanism for sharing this information across blocks?
Is this the GetLiveOutRegInfo?..

Attached is a patch to output the following debug:
$ llc -march=xcore test.ll -o -
getCopyToRegs ZERO_EXTEND: %.pre = load i8* %c, align 1
getCopyFromRegs: %0 = phi i8 [ %.pre, %entry ], [ %1, %if.end ]
%1 = load i8* %lsr.iv, align 1
%.pre = load i8* %c, align 1
did/will getCopyToRegs ZERO_EXTEND? Can we assume?
getCopyToRegs ZERO_EXTEND: %1 = load i8* %lsr.iv, align 1

I think I need to understand the intent of the code/data better before I can continue.
Any explanations most welcome.

Robert

p.s. can someone explain the following:

/// GetLiveOutRegInfo - Gets LiveOutInfo for a register, returning NULL if the
/// register is a PHI destination and the PHI's LiveOutInfo is not valid. If
/// the register's LiveOutInfo is for a smaller bit width, it is extended to
/// the larger bit width by zero extension. The bit width must be no smaller
/// than the LiveOutInfo's existing bit width.

Patch_ZERO_EXTEND (3.24 KB)