Optimizations hindered by GVN widening

annat · June 30, 2016, 3:05pm

Currently, the GVN optimization widens loads if the alignment permits it. There are some limitations that show up, as seen in example below:

Initial IR:

declare void @consume(i8) readonly

define i8 @bar(i8* align 2 %a) {
  %1 = load i8, i8* %a
  %idx = getelementptr i8, i8* %a, i64 1
  %2 = load i8, i8* %idx, align 1
  call void @consume(i8 %1).
  ret i8 %2
}

define i1 @foo(i8* %a) {
entry:
  %0 = call i8 @bar(i8* %a)
  %1 = icmp eq i8 %0, 0
  br i1 %1, label %cont.1, label %exit

cont.1:
  store i8 0, i8* %a
  %2 = call i8 @bar(i8* %a)
  %3 = icmp eq i8 %2, 0
  ret i1 %3

exit:
ret i1 true
}

Since %a is 2 byte aligned, GVN widens the loads in bar, then bar() gets inlined into foo. The resulting final IR at O3:

define i8 @bar(i8* align 2 %a) {
  %1 = bitcast i8* %a to i16*
  %2 = load i16, i16* %1, align 2
  %3 = trunc i16 %2 to i8
  %idx = getelementptr i8, i8* %a, i64 1
  %4 = lshr i16 %2, 8
  %5 = trunc i16 %4 to i8
  call void @consume(i8 %3)
  ret i8 %5
}
define i1 @foo(i8* %a) {
entry:
  %0 = bitcast i8* %a to i16*
  %1 = load i16, i16* %0, align 2 <— widened load from bar()
  %2 = trunc i16 %1 to i8
  call void @consume(i8 %2)
  %3 = icmp ult i16 %1, 256
  br i1 %3, label %cont.1, label %exit

cont.1: ; preds = %entry
  store i8 0, i8* %a, align 2
  %4 = load i16, i16* %0, align 2 <— widened load from bar()
  %5 = trunc i16 %4 to i8
  call void @consume(i8 %5)
  %6 = icmp ult i16 %4, 256
  ret i1 %6

exit: ; preds = %entry
ret i1 true
}

In the absence of GVN widening (we can see this when %a is 1 byte aligned in bar), bar is inlined into foo as-is.
Final IR at O3:
define i8 @bar(i8* align 1 %a) {
  %1 = load i8, i8* %a, align 1 <— align is 1, so GVN does not widen load
  %idx = getelementptr i8, i8* %a, i64 1
  %2 = load i8, i8* %idx, align 1
  call void @consume(i8 %1)
  ret i8 %2
}

define i1 @foo(i8* %a) {
entry:
  %0 = load i8, i8* %a, align 1
  %idx.i = getelementptr i8, i8* %a, i64 1
  %1 = load i8, i8* %idx.i, align 1 <— both loads exist non-widened from bar(). Used in GVN for removing the loads in cont.1 BB.
  call void @consume(i8 %0)
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %cont.1, label %exit

cont.1: ; preds = %entry
  store i8 0, i8* %a, align 1
; both the loads are removed. First load has the value fed from the store above. Second load is same as the one in entry BB (%1).
  call void @consume(i8 0)
  ret i1 true

exit: ; preds = %entry
ret i1 true
}

Here the 2 loads in the cont.1 basic block are removed by GVN, since the loads are not widened to a single load. Also, GVN and later optimizations does value forwarding of %a (0) and also the compare folds to true.

This does not happen when GVN widens the load. Note that at the time GVN did the widening, it was a good transform since we have a single load instead of 2. However, inlining and later optimizations proved that leaving the load non-widened was more beneficial.

Any thoughts on how we could mitigate these problems introduced by GVN widening?

Thanks,
Anna

Topic		Replies	Views
load widening no more? LLVM Dev List Archives	1	85	November 27, 2012
Load combine pass LLVM Dev List Archives	27	952	April 7, 2022
Load Widening in IR IR & Optimizations	15	810	June 27, 2022
load instruction erroneously removed by GVN LLVM Dev List Archives	7	125	August 11, 2015
load instruction erroneously removed by GVN v2 LLVM Dev List Archives	7	134	July 25, 2016

Optimizations hindered by GVN widening

Related topics