Hi all,
For loads and stores i want to extract information about the number of indices accessed. For instance:
struct S {int X, int *Y};
__global__ void kernel(int *A, int **B, struct S) {
int x = A[..][..]; // -> L: A[..][..]
int y = *B[2]; // -> L: B[0][2]
int z = S.y[..]; // -> L: S.1[..]
// etc..
}
I am performing some preprocessing on IR to:
1. Move constant inline GEPs into instructions
2. For loads and stores without a GEP operand, explicitly create a (trivial) GEP with index 0
So now the operand of every load and store is a GEP instruction.
For simple stuff i am getting the right answer but when the index expression becomes more complex multiple GEPs are introduced. For instance:
*(A+2*(blockDim.x*blockIdx.x+threadIdx.x+1)+2+3) = 5;
produces:
%6 = call i32 @llvm.nvvm.read.ptx.sreg.ntid.x()
%7 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x()
%8 = mul i32 %6, %7,
%9 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
%10 = add i32 %8, %9,
%11 = add i32 %10, 1,
%12 = mul i32 2, %11,
%13 = zext i32 %12 to i64,
%14 = getelementptr inbounds i32, i32* %0, i64 %13
%15 = getelementptr inbounds i32, i32* %14, i64 2
%16 = getelementptr inbounds i32, i32* %15, i64 3
store i32 5, i32* %16, align 4,
So i guess relying on the number of GEPs to figure the number of indices is only a heuristic. Is there a more robust way to go on about it? Or some example i can look at?
PS: I'm only interested about CUDA kernels.
Ees