How to add optimizations to InstCombine correctly?

Haidl_Michael · September 13, 2017, 5:01pm

Hi,

I am working on PR34474 and try to add a new optimization to
InstCombine. Like in other parts of the visitMul function I add a Shl
through the IR builder and create a new BinaryOp which I return from
visitMul. If I understand correctly the new BinaryOp returned from
visitMul should replace the original Instruction in the Worklist.
However, I end up in an infinite loop and the Instruction I try to
replace gets scheduled again and again. What is wrong in my code?

// Replace X * (2^C+/-1) with (X << C) -/+ X
APInt Plus1 = *IVal + 1;
APInt Minus1 = *IVal - 1;
int isPow2 = Plus1.isPowerOf2() ? 1 : Minus1.isPowerOf2() ? -1 : 0;

if (isPow2) {
   APInt &Pow2 = isPow2 > 0 ? Plus1 : Minus1;
   Value *Shl = Builder.CreateShl(Op0, Pow2.logBase2());
   return BinaryOperator::Create(isPow2 > 0 ? BinaryOperator::Sub :
BinaryOperator::Add, Shl, Op0);
}

Thanks,
Michael

topperc · September 13, 2017, 5:11pm

Your code seems fine. InstCombine can infinite loop if some other transform is reversing your transform. Can you send the whole patch and a test case?

topperc · September 13, 2017, 5:18pm

There is in fact a transform out there somewhere that reverses yours.

define i64 @foo(i64 %a) {
%b = shl i64 %a, 5
%c = add i64 %b, %a
ret i64 %c
}

becomes

define i64 @foo(i64 %a) {

%c = mul i64 %a, 33

ret i64 %c

}

topperc · September 13, 2017, 5:21pm

And that is less instructions. So from InstCombine’s perspective the multiply is the correct answer. I think this transformation is better left to codegen where we know whether multiply or shift is truly better.

Haidl_Michael · September 14, 2017, 5:06am

Hi Craig,

thanks for digging into this. So InstCombine is the wrong place for
fixing PR34474. Can you give me a hint where such an optimization should
go into CodeGen? I am not really familiar with stuff that happens after
the MidLevel.

Cheers,
Michael

topperc · September 14, 2017, 5:23am

Probably in visitMUL in DAGCombiner.cpp to be target independent. Or in LowerMUL in X86ISelLowering.cpp to be X86 specific.

Adding Simon. Simon, which were you thinking?

Haidl_Michael · September 14, 2017, 7:12am

Hi Craig,

until Simon answers, I have implemented the opt in DAGCombiner. How do I
test this change to avoid regressions?

Cheers,
Michael

David_Menendez · September 15, 2017, 4:22am

The relevant transformation appears to be in SimplifyUsingDistributiveLaws. When trying to optimize an add, it treats left shifts as multiplies, so (X << C) + X becomes (X * (1<<C)) + (X * 1) which factorizes to X * ((1 << C) + 1).

In Alive, this would be:

Name: Factor

%y = shl %X, C
%r = add %y, %X
=>
%r = mul %X, (1<<C)+1

The two variations of the proposed optimization are,

Name: Plus
Pre: isPowerOf2(C+1)
%r = mul %X, C
=>
%y = shl %X, log2(C+1)
%r = sub %y, %X

Name: Minus
Pre: isPowerOf2(C-1)
%r = mul %X, C
=>
%y = shl %X, log2(C-1)
%r = add %y, %X

The Alive-Loops tool confirms that these can cause InstCombine to loop.

Specifically, it finds this transformation, which applies Factor and then Plus:

Name: (Factor;Plus)
Pre: isPowerOf2((((1 << C) + 1) - 1))
%y = shl %X, C
%r = add %y, %X
=>
%y1 = shl %X, log2((((1 << C) + 1) - 1))
%r = add %y1, %X

Note that its precondition is trivial and its source and target code are identical (aside from renaming). Barring interference from other transformations, this will cause InstCombine to apply this transformation endlessly.

There’s a more complete discussion of InstCombine non-termination in the paper “Termination checking for LLVM peephole optimizations” (ICSE 2016), by Santosh Nagarakatte and me.

https://www.cs.rutgers.edu/~sn349/papers/icse2016-alive-loops.pdf

The prototype can be downloaded from

https://github.com/rutgers-apl/alive-loops

Simon_Pilgrim · September 16, 2017, 1:46pm

This conversation has (partially) moved on to D37896 now, but if possible I was hoping that we could perform this in DAGCombiner and remove the various target specific combines that we still have.

At least ARM/AARCH64 and X86 have cases that can hopefully be generalised and removed, but there will probably be a few legality/perf issues that will occur.

Simon.

Haidl_Michael · September 19, 2017, 11:23am

I am currently improving the D37896 to include the suggestions from
Chad. However, running the lit checks for the x86 backend I observe some
changes in the generated MC, e.g.:

llvm/test/CodeGen/X86/lea-3.ll:13:10: error: expected string not found
in input
; CHECK: leal ([[A0]],[[A0]],2), %eax
          ^
<stdin>:10:2: note: scanning from here
  orq %rdi, %rax
  ^
<stdin>:10:2: note: with variable "A0" equal to "%rdi"
  orq %rdi, %rax
  ^
<stdin>:10:2: note: with variable "A0" equal to "%rdi"
  orq %rdi, %rax
  ^
<stdin>:23:2: note: possible intended match here
  leal (,%rdi,4), %eax
  ^
or:

llvm/test/CodeGen/X86/mul-constant-i16.ll:40:13: error: expected string
not found in input
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
             ^
<stdin>:35:2: note: scanning from here
  movzwl 4(%esp), %ecx
  ^
llvm/test/CodeGen/X86/mul-constant-i16.ll:272:13: error: expected string
not found in input
; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
             ^
<stdin>:212:2: note: scanning from here
  movzwl 4(%esp), %ecx
  ^

What is the right way to fix this? Is it ok to modify the tests to match
the new generated pattern?

Cheers,
Michael

rotateright · September 19, 2017, 1:58pm

For the tests that are changing, you should see if those changes are improvements, regressions, or neutral. This is unfortunately not always obvious for x86 asm, so feel free to just post those diffs in an updated version of the patch at D37896.

If the test files have auto-generated assertions (look for this string on the first line of the test file: “NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py”…
and both of these do as of: https://reviews.llvm.org/rL313631 ), then it’s easy to observe the diffs by re-running that script after your code patch is applied:
$ /path/to/update_llc_test_checks.py --llc=/path/to/local/and/new/llc lea-3.ll

$ svn diff lea-3.ll

Haidl_Michael · September 19, 2017, 6:35pm

Hi Sanjay,

thanks for enlighten me on terms of tests. I assume I have to run the test-suite benchmarks to check for regressions? Is there a guide to get the metrics from the benchmarks?

Cheers,

Michael

BTW the beginner tag for bugs was really a good idea to get started with contributing to llvm.

rotateright · September 19, 2017, 7:45pm

Your patch must pass all of the regression tests (everything under llvm/test), or you’ll get many fail mails from bots soon after the commit. The commit will then be reverted if it can’t be corrected immediately.

Running the test-suite is a good idea, but it’s not required. If there’s consensus that the patch is doing the theoretical right thing, then the patch should be ok to commit once reviewed and approved.

Particularly with low-level changes like this, it’s possible that doing the right thing may still cause some unexpected change that leads to a regression on some hardware somewhere in the world. There are public and private bots running the test-suite and many other benchmarks, so you’ll hear about it sooner or later if your patch makes something worse.

A couple of links to get started:
https://llvm.org/docs/TestingGuide.html
http://llvm.org/docs/lnt/quickstart.html

Haidl_Michael · September 20, 2017, 10:17am

Hi,

I am currently working on a more or less intrusive patch (D37896), which
pulls optimizations on multiplications from some back-ends, e.g., (mul
x, 2^N + 1) => (add (shl x, N), x) in AArch64, into the DAGCombiner to
have this optimization generic on all targets.
However, running the LLVM Tests leads to 67 unexpected results.

For the tests that are changing, you should see if those changes are
improvements, regressions, or neutral. This is unfortunately not always
obvious for x86 asm, so feel free to just post those diffs in an updated
version of the patch at D37896.

If the test files have auto-generated assertions (look for this string
on the first line of the test file: "NOTE: Assertions have been
autogenerated by utils/update_llc_test_checks.py"...
and both of these do as of: rGbd7958d7ca8f
<Login; ), then it's easy to observe the
diffs by re-running that script after your code patch is applied:
$ /path/to/update_llc_test_checks.py --llc=/path/to/local/and/new/llc
lea-3.ll
$ svn diff lea-3.ll

As described by Sanjay in the original thread, I updated the llc tests
but some of the tests were not updated.

For example: CodeGen/X86/lea-3.ll produces the following output:
llvm/utils/update_llc_test_checks.py --llc=/home/dev/local/bin/llc
CodeGen/X86/lea-3.ll
WARNING: Found conflicting asm under the same prefix: 'CHECK'!
WARNING: Found conflicting asm under the same prefix: 'CHECK'!
WARNING: Found conflicting asm under the same prefix: 'CHECK'!

leaving parts of the test unchanged and the tests still fail.

Other tests like CodeGen/AArch64/aarch64-gep-opt.ll results in:
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -mattr=-use-aa -print-after=codegenprepare
< %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -mattr=+use-aa -print-after=codegenprepare
< %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cyclone <
%s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -print-after=codegenprepare
-mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA
<%t %s

And the worst thing I witnessed was with the Hexagon back-end were my
changes in DAGCombiner trigger an Unreachable:

home/dev/llvm/build/./bin/llc -march=hexagon -mcpu=hexagonv5
-disable-hsdr <
/home/dev/llvm/llvm/test/CodeGen/Hexagon/vect/vect-cst-v4i32.ll |
/home/dev/llvm/build/./bin/FileCheck
/home/dev/llvm/llvm/test/CodeGen/Hexagon/vect/vect-cst-v4i32.ll

Haidl_Michael · September 20, 2017, 10:34am

Hi,

I am currently working on a more or less intrusive patch (D37896), which
pulls optimizations on multiplications from some back-ends, e.g., (mul
x, 2^N + 1) => (add (shl x, N), x) in AArch64, into the DAGCombiner to
have this optimization generic on all targets.
However, running the LLVM Tests leads to 67 unexpected results.

For the tests that are changing, you should see if those changes are
improvements, regressions, or neutral. This is unfortunately not always
obvious for x86 asm, so feel free to just post those diffs in an updated
version of the patch at D37896.

If the test files have auto-generated assertions (look for this string
on the first line of the test file: "NOTE: Assertions have been
autogenerated by utils/update_llc_test_checks.py"...
and both of these do as of: rGbd7958d7ca8f
<Login; ), then it's easy to observe the
diffs by re-running that script after your code patch is applied:
$ /path/to/update_llc_test_checks.py --llc=/path/to/local/and/new/llc
lea-3.ll
$ svn diff lea-3.ll

As described by Sanjay in the original thread, I updated the llc tests
but some of the tests were not updated.

For example: CodeGen/X86/lea-3.ll produces the following output:
llvm/utils/update_llc_test_checks.py --llc=/home/dev/local/bin/llc
CodeGen/X86/lea-3.ll
WARNING: Found conflicting asm under the same prefix: 'CHECK'!
WARNING: Found conflicting asm under the same prefix: 'CHECK'!
WARNING: Found conflicting asm under the same prefix: 'CHECK'!

leaving parts of the test unchanged and the tests still fail.

Other tests like CodeGen/AArch64/aarch64-gep-opt.ll results in:
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -mattr=-use-aa -print-after=codegenprepare
< %s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -mattr=+use-aa -print-after=codegenprepare
< %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -print-after=codegenprepare -mcpu=cyclone <
%s >%t 2>&1 && FileCheck --check-prefix=CHECK-NoAA <%t %s
WARNING: Skipping non-FileChecked RUN line: llc -O3
-aarch64-enable-gep-opt=true -print-after=codegenprepare
-mcpu=cortex-a53 < %s >%t 2>&1 && FileCheck --check-prefix=CHECK-UseAA
<%t %s

And the worst thing I witnessed was with the Hexagon back-end were my
changes in DAGCombiner trigger an Unreachable:

home/dev/llvm/build/./bin/llc -march=hexagon -mcpu=hexagonv5
-disable-hsdr <
/home/dev/llvm/llvm/test/CodeGen/Hexagon/vect/vect-cst-v4i32.ll |
/home/dev/llvm/build/./bin/FileCheck
/home/dev/llvm/llvm/test/CodeGen/Hexagon/vect/vect-cst-v4i32.ll

rotateright · September 20, 2017, 2:09pm

There are multiple problems/questions here:

Make sure you’ve updated trunk to the latest rev before running update_llc_test_checks.py on lea-3.ll. Ie, I would only expect the output you’re seeing if you’re running the script on a version of that test file before r313631. After that commit, each RUN has its own check prefix, so there should be no conflict opportunity.
I didn’t realize the scope of the patch covered all targets and both scalars and vectors. That isn’t going to work as-is. We can’t assume that every target and data type will prefer to replace that multiply. Even the x86 diffs in lea-3.ll are regressions:

-; LNX1-NEXT: leal (%rdi,%rdi,2), %eax
+; LNX1-NEXT: leal (,%rdi,4), %eax
+; LNX1-NEXT: subl %edi, %eax

I suggest taking a smaller first step by limiting the patch to cases where the multiply is not a legal op for a target (TLI.isOperationLegal()).

Since the patch can cause a crash, that needs to be investigated first. I didn’t look into it, but my guess for the Hexagon crash (and probably other targets) is that you’re trying to create illegal

ops after the DAG has been legalized. So that’s another potential way to limit the scope of the patch - don’t try the transform unless we’re pre-legalization:
if (Level < AfterLegalizeDAG) { // do something }

Haidl_Michael · September 22, 2017, 5:06am

Hi Sanjay,

thanks for this information. I did get a little bit further with the
patch. However, Hexagon gives me headaches.

I tried to limit the scope of the patch to the BeforeLegalizeTypes phase
and Hexagon still reaches the unreachable. Hexagon tries to split or
widen a vector type for a node with custom lowering where the
unreachable arises from inside TargetLowering::ReplaceNodeResults which
is not overwritten in the Hexagon backend.

To avoid generating illegal code after all I queried TLI if all
generated operations are legal operations for the given vector type.

   if ( TLI.isOperationLegal(ISD::SUB, VT)
       && TLI.isOperationLegal(ISD::ADD, VT)
       && TLI.isOperationLegal(ISD::SHL, VT)
       && TLI.isOperationLegal(ISD::SRA, VT)) {

The Hexagon backend says happily: yes! Go ahead!
When it comes to HexagonDAGToDAGISel and the type should be legalized no
custom lowering is found and TargetLowering::ReplaceNodeResults is
called which falls back to the default implementation and the
unreachable is trigged.

Is this really working as intended or am I missing something here?

Cheers,
Michael

topperc · September 22, 2017, 5:19am

Is VT a legal type on Hexagon? It looks like Hexagon may be setting SHL as Custom for every defined vector type. Try adding TLI.isTypeLegal(VT) too.

Haidl_Michael · September 22, 2017, 5:36am

Sorry the part "type for a node with custom lowering" should be
"type for a node without custom lowering".

Michael

Haidl_Michael · September 22, 2017, 5:45am

Hi Craig,

protecting the transformation with:
   if (TLI.isTypeLegal(VT)
       && TLI.isOperationLegal(ISD::SUB, VT)
       && TLI.isOperationLegal(ISD::ADD, VT)
       && TLI.isOperationLegal(ISD::SHL, VT)
       && TLI.isOperationLegal(ISD::SRA, VT)) {

shows the same result.

Michael

Topic		Replies	Views
InstCombine wrongful (?) optimization on BinOp with SameOperands LLVM Dev List Archives	2	61	October 26, 2015
[RFC] Instcombine pass goes into a loop IR & Optimizations	6	273	August 18, 2022
Instcombine and instsimplify have similar optimizations , may I copy optmization from instcombine to instsimplify? Beginners	7	271	June 29, 2023
[GSOC] Potential Project Ideas LLVM Dev List Archives	1	68	March 25, 2016
Missing InstCombine optimization. LLVM Dev List Archives	3	88	June 4, 2013

How to add optimizations to InstCombine correctly?

Related Topics