Hi,
Would you be able to kindly check and assist with the IndVarSimplify / SCEV problem I got in the latest LLVM, please?
Sometimes IndVarSimplify may not eliminate narrow IV’s when there actually exists such a possibility. This may affect other LLVM passes and result in inefficient code. The reproducing test ‘indvar_test.cpp’ is attached.
The problem is with the second ‘for’ loop that accesses array elements with different indexes on each iteration.
The latest LLVM fails to reuse array element values from previous iterations and generates an unnecessary GEP. The generated IR is shown in the attached file ‘bad.ir’.
This happens because IndVarSimplify fails to eliminate ‘%idxprom7’ and ‘%idxprom10’.
The clang command line we use:
clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf indvar_test.cpp -o bad.ir
I found that ‘WidenIV::widenIVUse’ (IndVarSimplify.cpp) may fail to widen narrow IV uses.
When the function gets a NarrowUse such as ‘{(-2 + %inc.lcssa),+,1}<%for.body3>’, it first tries to get a wide recurrence for it via the ‘getWideRecurrence’ call.
‘getWideRecurrence’ returns recurrence like this: ‘{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<%for.body3>’, which is fine by itself.
Then a wide use operation is generated by ‘cloneIVUser’. The generated wide use is evaluated to ‘{(-2 + (sext i32 %inc.lcssa to i64)),+,1}<%for.body3>’, which is different from ‘getWideRecurrence’ result (please note the position of -2). ‘cloneIVUser’ sees the difference and returns nullptr.
I attached a test patch ‘indvar.patch’, which is not correct for all cases, but it fixes the specific ‘indvar_test.cpp’ scenario to demonstrate the efficient code that could have been generated (good.ir).
It transforms expressions like ‘(sext i32 (-2 + %inc.lcssa) to i64)’ into ‘-2 + (sext i32 %inc.lcssa to i64)’ making expressions comparison succeed. IV’s are successfully eliminated, which can be seen in the ‘-mllvm -debug’ output.
The problem with the patch is that it uses wrong extend logic for the ‘-2’ operand. It must be sign or zero extended depending on the context.
Could you check and confirm the problem, and give any hints how this might be fixed properly, please?
Thank you.
bad.ir (2.73 KB)
good.ir (1.88 KB)
indvar.patch (989 Bytes)
indvar_test.cpp (229 Bytes)