Hi, I’m going to vectorize load&store instructions in an LLVM IR generated on a new GPU backend.
However, I found that slp vectorizer fails to vectorize these load&store instructions. Because it failed to prove that stores are in consecutive memory.
Then I tried to manually modify the getelementptr instructions in my LLVM IR as follows,
From
%linear_index3 = add nuw nsw i32 %linear_index_plus_base, 3
%linear_index2 = add nuw nsw i32 %linear_index_plus_base, 2
%linear_index1 = add nuw nsw i32 %linear_index_plus_base, 1
getelementptr inbounds half, ptr %base_ptr, i64 %linear_index_plus_base
getelementptr inbounds half, ptr %base_ptr, i64 %linear_index1
getelementptr inbounds half, ptr %base_ptr, i64 %linear_index2
getelementptr inbounds half, ptr %base_ptr, i64 %linear_index3
To
%ld_base = getelementptr inbounds half, ptr %base_ptr, i32 %linear_index_plus_base
getelementptr inbounds half, ptr %ld_base, i64 0
getelementptr inbounds half, ptr %ld_base, i64 1
getelementptr inbounds half, ptr %ld_base, i64 2
getelementptr inbounds half, ptr %ld_base, i64 3
It succeeds to vectorize these load&store instructions.
Are there any LLVM passes that could do the same thing? Any hints or tips would be greatly appreciated.