better code for IV

Shemer_Anat · February 19, 2014, 7:32am

Hi Andrew,
The issue below refers to LSR, so I’ll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat

atrick · February 26, 2014, 4:47am

Hi Andrew,
The issue below refers to LSR, so I’ll appreciate your feedback. It also refers to instruction combining and might impact backends other than X86, so if you know of others that might be interested you are more than welcome to add them.
Thanks, Anat

I’m not sure how you ended up with that IR. You have a 64-bit index that is 32-bit sign-extended within the loop. That’s not something I’ve seen. Normally you have a pure 64-bit index or a 32-bit sign extended index. In this case, the sign extend would be hoisted out of the loop.

That said, LSR has a lot of trouble with sext/zext/trunc.

In this case, SCEV is getting it right:

%idxprom = ashr exact i64 %sext, 32
→ (sext i32 {0,+,1}<%L_entry> to i64) Exits: (sext i32 (trunc i64 (-1 + %iNumElements) to i32) to i64)

The presense of the shl does lose the nuw/nsw flags, but I don’t think that’s your problem.

I had to debug this a bit to understand the issue, hence my delay in answering. Some possibilities:

(1) Generate IR that is normally generated for C code, or change the source to match conventional patterns.

(2) -mllvm -disable-lsr. You’ll still have two increments, but no crazy shifting. You could rewrite the code with a single counter/index and get perfect codegen.

(3) Teach IV-Users to see through the ashr instruction. Currently it thinks the ashr is not interesting so tells LSR to generate an expression for its operands. That’s why you get the weird shifted IV. This is probably the right fix, but I can’t say for sure without trying it.

(4) Don’t let instcombine bitwise expand the trunc/sext when it feeds a gep. I’m not a big fan of the bitwise expansion but realize it was done for a reason. I don’t know all of the optimizations that it exposes without doing some investigation. Before you could do this you would need to send a specific proposal titled “InstCombine should not expand trunc/sext into shl/ashr” and get the ok from other people on the list. You would also need to do performance analysis of llvm test-suite and get someone to do the same for arm.

-Andy

Shemer_Anat · February 26, 2014, 1:23pm

Hi Andy,

Thanks a lot for looking into this.

I work on a backend for OpenCL.

(1) This pattern of 64bit loop limit and 32bit IV appears in many OpenCL kernels, so I have to deal with what I get and changing the source code is not an option.

(2) I guess that you wrote LSR for a reason and that it has value so I didn’t consider this option.

I have a solution that implements your option (4). However I think that this was done in order to do all operations in the same size, so this is not my first choice. It seems to me more reasonable not to take apart the ashr(shl()) in the LSR pass.

Can you please advise how to do your proposed option (3)? I looked in the pass code and I couldn’t figure this out.

Thanks, Anat

atrick · February 28, 2014, 12:43am

Hi Andy,

Thanks a lot for looking into this.

I work on a backend for OpenCL.
(1) This pattern of 64bit loop limit and 32bit IV appears in many OpenCL kernels, so I have to deal with what I get and changing the source code is not an option.
(2) I guess that you wrote LSR for a reason and that it has value so I didn’t consider this option.

I have a solution that implements your option (4). However I think that this was done in order to do all operations in the same size, so this is not my first choice. It seems to me more reasonable not to take apart the ashr(shl()) in the LSR pass.

Can you please advise how to do your proposed option (3)? I looked in the pass code and I couldn’t figure this out.

I can think of one potential quick hack. In IVUsers.cpp, isInteresting() you could return false for a shl that looks like the first half of a sign-extend. Maybe you can see if its only user is ashr. There are other ways, but this seems quick and easy.

Please you file a PR with your test case and move the discussion there.

Thanks.

-Andy

Topic		Replies	Views
Strange codegen in LLVM for RISCV64 RISCV	2	290	January 6, 2023
loop strength reduction and zext/sext formulae LLVM Dev List Archives	1	72	April 15, 2015
Dealing with information loss for widened integer operations at ISel time LLVM Dev List Archives	4	87	January 3, 2019
Best IR for mixed-width shift? Beginners	1	201	October 24, 2022
Extending vector operations LLVM Dev List Archives	13	83	July 23, 2008

better code for IV

Related Topics