A mid-end optimization opportunity

Hi,

For the following loop represented in llvm IR:

for.body.lr.ph: ; preds = %if.end19

%conv23 = zext i8 %conv2 to i32

%3 = zext i8 %ucInputNum.addr.0 to i64

%4 = shl i64 %3, 2

br label %for.body

for.body: ; preds = %for.body, %for.body.lr.ph

%lsr.iv = phi i64 [ %lsr.iv.next, %for.body ], [ 0, %for.body.lr.ph ]

%ulTemp.0101 = phi i32 [ 0, %for.body.lr.ph ], [ %phitmp, %for.body ]

……

%not.cmp39 = xor i1 %cmp39, true

%cond48 = zext i1 %not.cmp39 to i32

%add49 = add nuw nsw i32 %ulTemp.0101, %cond48

%phitmp = and i32 %add49, 65535 <------ A

%lsr.iv.next = add nuw nsw i64 %lsr.iv, 4

%tmp = trunc i64 %lsr.iv.next to i32

%tmp106 = trunc i64 %4 to i32

%exitcond = icmp eq i32 %tmp106, %tmp

br i1 %exitcond, label %for.end, label %for.body

for.end:

……

The instruction A: %phitmp = and i32 %add49, 65535 is redundant and can be removed, because the trip count is at most 10 bits, and %add49 (from 0) is incremented by at most 1 per iteration. That is %add49 is at most 10 bits and there is no need for the 16-bit mask of and operation.

Where and how should this optimization implemented?

Thanks.

Best,

Ning

image001.jpg

image002.jpg

image003.jpg

image004.jpg

image005.jpg