NewGVN load coercion (for GPUs?)

jayfoad · February 3, 2022, 12:48pm

Hi,

I have noticed some cases where NewGVN misses load coercion opportunities compared to old GVN, and wondered whether anyone has plans to work on this, or advice on how to tackle it?

I have a feeling that this might be more useful for GPU workloads (because vector loads are common and load latency is very high) than CPUs.

Here’s a motivating example:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0

  %p2 = bitcast i8* %p to i32*
  %v2 = load i32, i32* %p2
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %v2, 1

  %p3 = getelementptr i32, i32* %p2, i32 1
  %v3 = load i32, i32* %p3
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %v3, 2

  ret <{ <2 x i32>, i32, i32 }> %r3
}

opt -gvn -instcombine replaces the scalar loads with extractelement instructions:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1, align 8
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0
  %0 = extractelement <2 x i32> %v1, i64 0
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %0, 1
  %1 = extractelement <2 x i32> %v1, i64 1
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %1, 2
  ret <{ <2 x i32>, i32, i32 }> %r3
}

But opt -newgvn -instcombine leaves the three loads intact:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1, align 8
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0
  %p2 = bitcast i8* %p to i32*
  %v2 = load i32, i32* %p2, align 4
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %v2, 1
  %p3 = getelementptr i8, i8* %p, i64 4
  %0 = bitcast i8* %p3 to i32*
  %v3 = load i32, i32* %0, align 4
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %v3, 2
  ret <{ <2 x i32>, i32, i32 }> %r3
}

I see from ⚙ D30929 NewGVN: Handle coercion of constant stores, loads, memory insts. that NewGVN does support some load coercion, but only in the cases where the loaded value can be proved to be a constant, because in that case you don’t need to insert any new instructions. Also I can’t really see how this implementation could do load->load coercion, since it is based on calling MSSAWalker->getClobberingMemoryAccess, which will always return a store not a load?

I also found NewGVN misses a load coercion opportunity · Issue #33383 · llvm/llvm-project · GitHub which has some relevant discussion including comments like:

the infrastructure needed to do this right is something like “allow additional non-inserted instructions to appear in blocks, etc”
It’s tricky

There are also a couple of relevant test cases which are currently XFAILed:

test/Transforms/NewGVN/pr14166-xfail.ll
test/Transforms/NewGVN/pr10820-xfail.ll

Thanks for any help or advice,
Jay.

Topic		Replies	Views
[RFC] NewGVN LLVM Dev List Archives	18	168	November 18, 2016
load instruction erroneously removed by GVN LLVM Dev List Archives	7	106	August 11, 2015
Optimizations hindered by GVN widening LLVM Dev List Archives	0	89	June 30, 2016
Stupid '-load-vn -licm' question (LLVM 1.6) LLVM Dev List Archives	7	77	March 19, 2006
load instruction erroneously removed by GVN v2 LLVM Dev List Archives	7	115	July 25, 2016

NewGVN load coercion (for GPUs?)

Related topics