Questions about modify getelementptr

Hi, I’m going to vectorize load&store instructions in an LLVM IR generated on a new GPU backend.
However, I found that slp vectorizer fails to vectorize these load&store instructions. Because it failed to prove that stores are in consecutive memory.

Then I tried to manually modify the getelementptr instructions in my LLVM IR as follows,

From

  %linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3
  %linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2
  %linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1
  getelementptr inbounds half, ptr  %base_ptr, i64 %linear_index_plus_base
  getelementptr inbounds half, ptr  %base_ptr, i64 %linear_index1
  getelementptr inbounds half, ptr  %base_ptr, i64 %linear_index2
  getelementptr inbounds half, ptr  %base_ptr, i64 %linear_index3

To

%ld_base = getelementptr inbounds half, ptr %base_ptr, i32 %linear_index_plus_base
getelementptr inbounds half, ptr %ld_base, i64 0
getelementptr inbounds half, ptr %ld_base, i64 1
getelementptr inbounds half, ptr %ld_base, i64 2
getelementptr inbounds half, ptr %ld_base, i64 3

It succeeds to vectorize these load&store instructions.

Are there any LLVM passes that could do the same thing? Any hints or tips would be greatly appreciated.

SeparateConstOffsetFromGEP and a few other passes are intended to set up addressing this way

1 Like