NewGVN load coercion (for GPUs?)

Hi,

I have noticed some cases where NewGVN misses load coercion opportunities compared to old GVN, and wondered whether anyone has plans to work on this, or advice on how to tackle it?

I have a feeling that this might be more useful for GPU workloads (because vector loads are common and load latency is very high) than CPUs.

Here’s a motivating example:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0

  %p2 = bitcast i8* %p to i32*
  %v2 = load i32, i32* %p2
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %v2, 1

  %p3 = getelementptr i32, i32* %p2, i32 1
  %v3 = load i32, i32* %p3
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %v3, 2

  ret <{ <2 x i32>, i32, i32 }> %r3
}

opt -gvn -instcombine replaces the scalar loads with extractelement instructions:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1, align 8
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0
  %0 = extractelement <2 x i32> %v1, i64 0
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %0, 1
  %1 = extractelement <2 x i32> %v1, i64 1
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %1, 2
  ret <{ <2 x i32>, i32, i32 }> %r3
}

But opt -newgvn -instcombine leaves the three loads intact:

define <{ <2 x i32>, i32, i32 }> @f(i8* %p) {
entry:
  %p1 = bitcast i8* %p to <2 x i32>*
  %v1 = load <2 x i32>, <2 x i32>* %p1, align 8
  %r1 = insertvalue <{ <2 x i32>, i32, i32 }> undef, <2 x i32> %v1, 0
  %p2 = bitcast i8* %p to i32*
  %v2 = load i32, i32* %p2, align 4
  %r2 = insertvalue <{ <2 x i32>, i32, i32 }> %r1, i32 %v2, 1
  %p3 = getelementptr i8, i8* %p, i64 4
  %0 = bitcast i8* %p3 to i32*
  %v3 = load i32, i32* %0, align 4
  %r3 = insertvalue <{ <2 x i32>, i32, i32 }> %r2, i32 %v3, 2
  ret <{ <2 x i32>, i32, i32 }> %r3
}

I see from ⚙ D30929 NewGVN: Handle coercion of constant stores, loads, memory insts. that NewGVN does support some load coercion, but only in the cases where the loaded value can be proved to be a constant, because in that case you don’t need to insert any new instructions. Also I can’t really see how this implementation could do load->load coercion, since it is based on calling MSSAWalker->getClobberingMemoryAccess, which will always return a store not a load?

I also found https://github.com/llvm/llvm-project/issues/33383 which has some relevant discussion including comments like:

the infrastructure needed to do this right is something like “allow additional non-inserted instructions to appear in blocks, etc”
It’s tricky

There are also a couple of relevant test cases which are currently XFAILed:

test/Transforms/NewGVN/pr14166-xfail.ll
test/Transforms/NewGVN/pr10820-xfail.ll

Thanks for any help or advice,
Jay.

1 Like