LLVM CreateStructGEP type assert error

Hi all,

Running LLVM 3.4 to create a custom pass for OpenCL transformations. I am attempting to GEP into a struct using IRBuilder's CreateStructGEP, but I keep getting this assert:

aoc: ../../../../../../compiler/llvm/include/llvm/Instructions.h:703: llvm::Type* llvm::checkGEPType(llvm::Type*): Assertion `Ty && "Invalid GetElementPtrInst indices for type!"' failed.

Which I've decoded as it doesn't recognize my struct as a built in type. This is confusing since it shouldn't care what type I'm indexing into, only that I give the correct indices. Otherwise what is the point of user defined structs?

//This part is atop my cl kernel file:

    #define BUFFER_LEN 0x100000
    typedef struct RB{
        unsigned int x;
        unsigned int y;
        int z[BUFFER_LEN];
        unsigned int xx[BUFFER_LEN];
        unsigned int yy[BUFFER_LEN];
        float zz[BUFFER_LEN];
    } RB_t;

//The kernel sig. with the struct as the last argument:

    __kernel void ht( int iteration, global RB_t *cB){ ... }

//My LLVM code:

       void MPASS::exPointers(Module& M, Function& F){
        BasicBlock *first = F.begin();
        IRBuilder<> builder(first->begin());
        Value *cB = --(F.arg_end());
        Value *x_idx = builder.CreateStructGEP(cB, 0);
        ...
        ... < more of the same>
        ...
        return
    }

Simple enough, but I can't get past the first CreateStructGEP without the assert. I thought it was a version issue, so I changed the call to:

    Value *x_idx = builder.CreateConstGEP2_32(cB, 0, 0);

However I got the same assert. It looks like this line in Instruction.h from LLVM is producing the assert further down the stack:

    815 Type *PtrTy = PointerType::get(checkGEPType(
    816 getIndexedType(Ptr->getType(), IdxList)),
    817 Ptr->getType()->getPointerAddressSpace());

Resulting in the debugger producing:

    (gdb) p PtrTy
    $19 = (llvm::Type *) 0x7fffffff63d0
    (gdb) p PtrTy->dump()
    <unrecognized-type> $20 = void

Is there something special which needs to be done in order to index into a user defined struct in LLVM for OpenCL?

Hi all,

Running LLVM 3.4 to create a custom pass for OpenCL transformations. I am attempting to GEP into a struct using IRBuilder's CreateStructGEP, but I keep getting this assert:

aoc: ../../../../../../compiler/llvm/include/llvm/Instructions.h:703: llvm::Type* llvm::checkGEPType(llvm::Type*): Assertion `Ty && "Invalid GetElementPtrInst indices for type!"' failed.

I've hit this assertion before, and it was because I was passing
the wrong number of operands to the GEP instruction.

Can you add

cB->getType()->getPointerElementType()->dump();
cB->dump();

to your pass right before you call CreateStructGEP and show us
what the output looks like.

-Tom

cB->getType()->getPointerElementType()->dump();

gives:

%struct.RB = type opaque2189 x_idx = builder.CreateStructGEP(cB, 0);

cB->dump()

gives:

%struct.RB addrspace(1)* %cB
$1 = void

//To undo confusion, the last cB is the function arg name. For this example I unfortunately chose the same name for my argument Value* as the argument name.
//Also one correction. I am running LLVM 3.2 not 3.4

cB->getType()->getPointerElementType()->dump();

gives:

%struct.RB = type opaque2189 x_idx = builder.CreateStructGEP(cB, 0);

Ok, sorry. I thought that would dump the full struct type. Can you dump the
struct as it is defined in LLVM IR, usually if you dump the whole module,
this will be at the top.

-Tom

If I do M.dump(), at the top of the output I have:

%struct.RB = type opaque

Further down I have:

@.str18 = internal addrspace(2) constant [13 x i8] c"RB_t*\00"

However nowhere does it dump the full struct type when I call "M.dump()". I have it explicitly defined above the kernel in the kernel file, but LLVM doesn't seem to pick it up.
Opaque is a placeholder until it can resolve the "actual" type, so at my pass level "optimization pass" it hasn't resolved it.

If I do M.dump(), at the top of the output I have:

Can you post the full .cl file you are compiling?

-Tom

Sure, it is the Rodinia 2.4 Hotspot benchmark OpenCL kernel (not my kernel), with the addition of my struct as the last argument in the kernel function.

//------- kernel file start -------------------------------

#define BLOCK_SIZE 16

//dlowell's type
#define BUFFER_LEN 0x100000
typedef struct RB{
  unsigned int x;
  unsigned int y;
  int z[BUFFER_LEN];
  unsigned int xx[BUFFER_LEN];
  unsigned int yy[BUFFER_LEN];
  float zz[BUFFER_LEN];
} RBr_t;

#define IN_RANGE(x, min, max) ((x)>=(min) && (x)<=(max))

__kernel void hotspot( int iteration, //number of iteration
                               global float *power, //power input
                               global float *temp_src, //temperature input/output
                               global float *temp_dst, //temperature input/output
                               int grid_cols, //Col of grid
                               int grid_rows, //Row of grid
                 int border_cols, // border offset
                 int border_rows, // border offset
                               float Cap, //Capacitance
                               float Rx,
                               float Ry,
                               float Rz,
                               float step,
//dlowell's argument
                               global RB_t *cB) {
  
  local float temp_on_cuda[BLOCK_SIZE][BLOCK_SIZE];
  local float power_on_cuda[BLOCK_SIZE][BLOCK_SIZE];
  local float temp_t[BLOCK_SIZE][BLOCK_SIZE]; // saving temporary temperature result

  float amb_temp = 80.0f;
  float step_div_Cap;
  float Rx_1,Ry_1,Rz_1;

  int bx = get_group_id(0);
  int by = get_group_id(1);

  int tx = get_local_id(0);
  int ty = get_local_id(1);

  step_div_Cap=step/Cap;

  Rx_1=1/Rx;
  Ry_1=1/Ry;
  Rz_1=1/Rz;

  // each block finally computes result for a small block
  // after N iterations.
  // it is the non-overlapping small blocks that cover
  // all the input data

  // calculate the small block size
  int small_block_rows = BLOCK_SIZE-iteration*2;//EXPAND_RATE
  int small_block_cols = BLOCK_SIZE-iteration*2;//EXPAND_RATE

  // calculate the boundary for the block according to
  // the boundary of its small block
  int blkY = small_block_rows*by-border_rows;
  int blkX = small_block_cols*bx-border_cols;
  int blkYmax = blkY+BLOCK_SIZE-1;
  int blkXmax = blkX+BLOCK_SIZE-1;

  // calculate the global thread coordination
  int yidx = blkY+ty;
  int xidx = blkX+tx;

  // load data if it is within the valid input range
  int loadYidx=yidx, loadXidx=xidx;
  int index = grid_cols*loadYidx+loadXidx;
       
  if(IN_RANGE(loadYidx, 0, grid_rows-1) && IN_RANGE(loadXidx, 0, grid_cols-1)){
            temp_on_cuda[ty][tx] = temp_src[index]; // Load the temperature data from global memory to shared memory
            power_on_cuda[ty][tx] = power[index];// Load the power data from global memory to shared memory
  }
  barrier(CLK_LOCAL_MEM_FENCE);

  // effective range within this block that falls within
  // the valid range of the input data
  // used to rule out computation outside the boundary.
  int validYmin = (blkY < 0) ? -blkY : 0;
  int validYmax = (blkYmax > grid_rows-1) ? BLOCK_SIZE-1-(blkYmax-grid_rows+1) : BLOCK_SIZE-1;
  int validXmin = (blkX < 0) ? -blkX : 0;
  int validXmax = (blkXmax > grid_cols-1) ? BLOCK_SIZE-1-(blkXmax-grid_cols+1) : BLOCK_SIZE-1;

  int N = ty-1;
  int S = ty+1;
  int W = tx-1;
  int E = tx+1;

  N = (N < validYmin) ? validYmin : N;
  S = (S > validYmax) ? validYmax : S;
  W = (W < validXmin) ? validXmin : W;
  E = (E > validXmax) ? validXmax : E;

  bool computed;
  for (int i=0; i<iteration ; i++){
    computed = false;

    if( IN_RANGE(tx, i+1, BLOCK_SIZE-i-2) && \
    IN_RANGE(ty, i+1, BLOCK_SIZE-i-2) && \
    IN_RANGE(tx, validXmin, validXmax) && \
    IN_RANGE(ty, validYmin, validYmax) ) {

      computed = true;
      temp_t[ty][tx] = temp_on_cuda[ty][tx] + step_div_Cap * (power_on_cuda[ty][tx] +
      (temp_on_cuda[S][tx] + temp_on_cuda[N][tx] - 2.0f * temp_on_cuda[ty][tx]) * Ry_1 +
      (temp_on_cuda[ty][E] + temp_on_cuda[ty][W] - 2.0f * temp_on_cuda[ty][tx]) * Rx_1 +
      (amb_temp - temp_on_cuda[ty][tx]) * Rz_1);

    }
    barrier(CLK_LOCAL_MEM_FENCE);
    
    if(i==iteration-1)
      break;
    if(computed) //Assign the computation range
      temp_on_cuda[ty][tx]= temp_t[ty][tx];
      
    barrier(CLK_LOCAL_MEM_FENCE);
  }

  if (computed){
    temp_dst[index]=temp_t[ty][tx];
  }
}

//-------------- end kernel file -----------------------------------

Hi,

When I compile this with clang, I get:

%struct.RB = type { i32, i32, [1048576 x i32], [1048576 x i32], [1048576 x i32], [1048576 x float] }

so I'm not sure why it is showing up as an opaque type for you.

-Tom

Ok, thanks for your help anyways. Perhaps AMDdev who put together this particular stack have an idea.
What compiler / version were you using?

Ok, thanks for your help anyways. Perhaps AMDdev who put together this particular stack have an idea.
What compiler / version were you using?

I tested it with clang 3.2 and also clang 3.6. Here is how I did it:

/clang -emit-llvm -include /usr/local/include/clc/clc.h -I/usr/local//include/ -Dcl_clang_storage_class_specifiers -c test.cl -S -o -

You can get the libclc headers from here:

-Tom