better code for IV

Hi Andrew,
The issue below refers to LSR, so I’ll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat

Hi Andrew,
The issue below refers to LSR, so I’ll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat

I’m not sure how you ended up with that IR. You have a 64-bit index that is 32-bit sign-extended within the loop. That’s not something I’ve seen. Normally you have a pure 64-bit index or a 32-bit sign extended index. In this case, the sign extend would be hoisted out of the loop.

That said, LSR has a lot of trouble with sext/zext/trunc.

In this case, SCEV is getting it right:

%idxprom = ashr exact i64 %sext, 32
→ (sext i32 {0,+,1}<%L_entry> to i64) Exits: (sext i32 (trunc i64 (-1 + %iNumElements) to i32) to i64)

The presense of the shl does lose the nuw/nsw flags, but I don’t think that’s your problem.

I had to debug this a bit to understand the issue, hence my delay in answering. Some possibilities:

(1) Generate IR that is normally generated for C code, or change the source to match conventional patterns.

(2) -mllvm -disable-lsr. You’ll still have two increments, but no crazy shifting. You could rewrite the code with a single counter/index and get perfect codegen.

(3) Teach IV-Users to see through the ashr instruction. Currently it thinks the ashr is not interesting so tells LSR to generate an expression for its operands. That’s why you get the weird shifted IV. This is probably the right fix, but I can’t say for sure without trying it.

(4) Don’t let instcombine bitwise expand the trunc/sext when it feeds a gep. I’m not a big fan of the bitwise expansion but realize it was done for a reason. I don’t know all of the optimizations that it exposes without doing some investigation. Before you could do this you would need to send a specific proposal titled “InstCombine should not expand trunc/sext into shl/ashr” and get the ok from other people on the list. You would also need to do performance analysis of llvm test-suite and get someone to do the same for arm.

-Andy

Hi Andy,

Thanks a lot for looking into this.

I work on a backend for OpenCL.

(1) This pattern of 64bit loop limit and 32bit IV appears in many OpenCL kernels, so I have to deal with what I get and changing the source code is not an option.

(2) I guess that you wrote LSR for a reason and that it has value so I didn’t consider this option.

I have a solution that implements your option (4). However I think that this was done in order to do all operations in the same size, so this is not my first choice. It seems to me more reasonable not to take apart the ashr(shl()) in the LSR pass.

Can you please advise how to do your proposed option (3)? I looked in the pass code and I couldn’t figure this out.

Thanks, Anat

Hi Andy,

Thanks a lot for looking into this.

I work on a backend for OpenCL.
(1) This pattern of 64bit loop limit and 32bit IV appears in many OpenCL kernels, so I have to deal with what I get and changing the source code is not an option.
(2) I guess that you wrote LSR for a reason and that it has value so I didn’t consider this option.

I have a solution that implements your option (4). However I think that this was done in order to do all operations in the same size, so this is not my first choice. It seems to me more reasonable not to take apart the ashr(shl()) in the LSR pass.

Can you please advise how to do your proposed option (3)? I looked in the pass code and I couldn’t figure this out.

I can think of one potential quick hack. In IVUsers.cpp, isInteresting() you could return false for a shl that looks like the first half of a sign-extend. Maybe you can see if its only user is ashr. There are other ways, but this seems quick and easy.

Please you file a PR with your test case and move the discussion there.

Thanks.

-Andy