Offloading structure with dynamic arrays to target

Hi,

I am trying to move a structure variable from host to device for openmp offload region

I have structure something like this

typedef struct {
    int *xdim, *ydim, *zdim;
    int *strideX, *strideY, *strideZ;
    int *mgridStrideX, *mgridStrideY, *mgridStrideZ;
} argDims;

I have allocated memober of this structure and assigned some values to variable and then tried passing it offload pragma

argDims *argdim = (argDims*) calloc(1, sizeof(argDims));

    argdim[0].xdim = (int*) calloc(2, sizeof(int));
    argdim[0].strideX = (int*) calloc(2, sizeof(int));

    argdim[0].ydim = (int*) calloc(2, sizeof(int));
    argdim[0].strideY = (int*) calloc(2, sizeof(int));
// some assignments
#pragma omp target teams distribute parallel for collapse(2) map(to:argdim[0:1])
for (int n_x = start_indx[0]; n_x < end_indx[0]; n_x++)
        {
               //using argdim[0].xdim[0] here
        }

but this results in error

"PluginInterface" error: Faliure to copy data from device to host. Pointers: host = 0x00007ffcb711deb8, device = 0x00007fea9                                5600800, size = 8: Error in cuMemcpyDtoHAsync: an illegal memory access was encountered
Libomptarget error: Copying data from device failed.
Libomptarget error: Call to targetDataEnd failed, abort target.

so what is the correct way to pass structure

Pointers allocated on the host are generally not accessible on the device so they need to be mapped as you have done with the main struct. The problem is that you must also perform a deep copy to map over the other pointers contained within the struct. As you have written it, you are copying the values of the argdim[:1] struct over to the device. This means you can access it, but the elements inside are pointers to host memory. So, when you try to access it on the device you get the segmentation fault you’ve observed.

There’s a few solutions to this, the first would be to explicitly map all the sub values. E.g. something like this

#pragma omp target teams distribute parallel for collapse(2) \
  map(to:argdim[0:1]) map(to:argdim[0].xdim[:1]) map(to:argdim[0].strideX) ...

and so on. This is obviously really annoying, OpenMP has the mapper clause which can be used to automate this process somewhat Mapper Identifiers and mapper Modifiers. I’m not overly familiar with the syntax but there’s examples online.

Another solution is to simply allocate the memory for the device manually. This however means you’ll need to take charge of copying to and from the device. You can do that with the following.

  int *device_ptr =    
      omp_alloc(sizeof(int), llvm_omp_target_device_mem_alloc);        
#pragma omp target is_device_ptr(device_ptr)
   *device_ptr = 1;
  omp_free(device_ptr, llvm_omp_target_device_mem_alloc);

Alternatively, the easiest and slowest memory to use is so-called “managed” or “fine-grained” or “shared” memory that is accessible from both sides. This is slow, but saves you the burden of copying it.

  int *shared_ptr =                                
      omp_alloc(sizeof(int), llvm_omp_target_shared_mem_alloc); 
  *shared_ptr = 2;
#pragma omp target is_device_ptr(shared_ptr)
  *shared_ptr = 1;                      
  omp_free(shared_ptr, llvm_omp_target_shared_mem_alloc);

Let me know if this helps.

Hi Jhuber,

I tried defining mapper

it looks like following

typedef struct {
    size_t size;
    int *xdim, *ydim, *zdim;
    int *strideX, *strideY, *strideZ;
    int *mgridStrideX, *mgridStrideY, *mgridStrideZ;
} argDims_t;

#pragma omp declare mapper(4d_or_3d_mdim: argDims_t A) \
    map(A.strideX[0:A.size],A.strideY[0:A.size],A.strideZ[0:A.size], \
        A.xdim[0:A.size],A.ydim[0:A.size],A.zdim[0:A.size])

and then using it like

argDims_t argdim;

    argdim.size = 2;
    argdim.strideX = (int*) calloc(argdim.size, sizeof(int));
    argdim.strideY = (int*) calloc(argdim.size, sizeof(int));
// allocation and assignments

#pragma omp target teams distribute parallel for collapse(2) map(tofrom:A_p[0:arg0_size]) \
                                                                map(mapper(4d_or_3d_mdim), to: argdim)

but it result in compilation error

error: illegal OpenMP user-defined mapper identifier
#pragma omp declare mapper(4d_or_3d_mdim: argDims_t A) \

Try to use mapper_4d_or_3d_mdim or some other id, starting from letter, not a digit.

Hi Alexey

Able to compile now

but getting runtime error for following piece of code

#pragma omp target teams distribute parallel for reduction(max:p_a2_0) map(to:A_p[0:arg0_size]) map(tofrom:Anew_p[0:arg1_size]) \
                                                    map(mapper(mapper_2d_or_1d_mdim), to: argdim)
      for (int n_y = start_indx[1]; n_y < end_indx[1]; n_y++)
      {
        for (int n_x = start_indx[0]; n_x < end_indx[0]; n_x++)
        {

                const  ACC<double> A(argdim.xdim[0], A_p + (n_x * argdim.strideX[0]) + (n_y * argdim.xdim[0] * argdim.strideY[0]));

                 ACC<double> Anew(argdim.xdim[1], Anew_p + (n_x * argdim.strideX[1]) + (n_y * argdim.xdim[1] * argdim.strideY[1]));

            double error[1];
            error[0] = -INFINITY_double;

  Anew(0,0) = 0.25f * ( A(1,0) + A(-1,0)
      + A(0,-1) + A(0,1));
  *error = fmax( *error, fabs(Anew(0,0)-A(0,0)));

                        p_a2_0 = MAX(p_a2_0, error[0]);

        }

    }

the error is

Libomptarget message: explicit extension not allowed: host address specified is 0x00007ffc308e5978 (40 bytes), bu                                           t device allocation maps to host at 0x00007ffc308e5990 (40 bytes)
Libomptarget error: Call to getTargetPointer returned null pointer (device failure or illegal mapping).
Libomptarget error: Call to targetDataBegin via targetDataMapper for custom mapper failed.
Libomptarget error: Call to targetDataBegin failed, abort target.
Libomptarget error: Failed to process data before launching the kernel.
Libomptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
apply_stencil_kernel.cpp:121:5: Libomptarget fatal error 1: failure of target construct while offloading is manda                                           tory
Aborted

earlier this used to work without mapper

the below one is another loop which runs fine

#pragma omp target teams distribute parallel for collapse(2) map(tofrom:A_p[0:arg0_size]) \
                                                                map(mapper(mapper_2d_or_1d_mdim), to: argdim)
      for (int n_y = start_indx[1]; n_y < end_indx[1]; n_y++)
      {
        for (int n_x = start_indx[0]; n_x < end_indx[0]; n_x++)
        {
            int idx[] = {arg_idx[0] + n_x, arg_idx[1] + n_y};

                 ACC<double> A(argdim.xdim[0], A_p + (n_x * argdim.strideX[0]) + (n_y * argdim.xdim[0] * argdim.strideY[0]));

  A(0,0) = sin(pi * (idx[1]+1) / (jmax+1));

        }

    }

Check the size field and the allocated buffer, probably there is a mismatch

Hi Alexey,

The allocation looks ok to me

here is the allocation

argDims_t argdim;

    argdim.size = 3;
    argdim.strideX = (int*) calloc(argdim.size, sizeof(int));
    argdim.strideY = (int*) calloc(argdim.size, sizeof(int));
    argdim.xdim = (int*) calloc(argdim.size, sizeof(int));

and the resp mapper

#pragma omp declare mapper(mapper_2d_or_1d_mdim: argDims_t A) \
    map(A.strideX[0:A.size],A.strideY[0:A.size], \
        A.xdim[0:A.size])

the same mapper i can use for different object of argDims_t structure right??

typedef struct {
    size_t size;
    int *xdim, *ydim, *zdim;
    int *strideX, *strideY, *strideZ;
    int *mgridStrideX, *mgridStrideY, *mgridStrideZ;
} argDims_t;

I have different mapper defined for different condition and only allocate few arrays based on that condition and use the resp mapper.

This would not be a problem right

I am actually working on code-generation library which helps to generate auto code-generation of openmp offload code

so i am allocating the variables as per need only
for ex. for 1D dataset i need only strideX while for 2D i will need strideX and xdim
so i have two separate mapper for this

What is Anew_p[0:arg1_size]? How do you allocate memory for it?

Those are already allocated arrays

ths piece of code used to work earlier, when instead of mapper, I was using local variablesfor the dim and strides within the openmp offload loop for ex. xdim0, xdim1 instead of argdim.xdim[0] and argdim.xdim[1]

but for other applications, i was getting “Too many argument” error. So i thought of putting this dim and strides in structure and then pass it to avoid this error

right now it is is failing at new kernel, the earlier is running now

this is the kernel where it fails now

argDims_t argdim;

    argdim.size = 2;
    argdim.strideX = (int*) calloc(argdim.size, sizeof(int));
    argdim.strideY = (int*) calloc(argdim.size, sizeof(int));
    argdim.xdim = (int*) calloc(argdim.size, sizeof(int));

    argdim.strideX[0] = args[0].stencil->stride[0];
    argdim.strideY[0] = args[0].stencil->stride[1];
    argdim.xdim[0] = args[0].dat->size[0];
    argdim.strideX[1] = args[1].stencil->stride[0];
    argdim.strideY[1] = args[1].stencil->stride[1];
    argdim.xdim[1] = args[1].dat->size[0];

    #pragma omp target teams distribute parallel for collapse(2) map(tofrom:A_p[0:arg0_size]) map(to:Anew_p[0:arg1_size]) \
                                                                map(mapper(mapper_2d_or_1d_mdim), to: argdim)
      for (int n_y = start_indx[1]; n_y < end_indx[1]; n_y++)
      {
        for (int n_x = start_indx[0]; n_x < end_indx[0]; n_x++)
        {

                 ACC<double> A(argdim.xdim[0], A_p + (n_x * argdim.strideX[0]) + (n_y * argdim.xdim[0] * argdim.strideY[0]));

                const  ACC<double> Anew(argdim.xdim[1], Anew_p + (n_x * argdim.strideX[1]) + (n_y * argdim.xdim[1] * argdim.strideY[1]));

  A(0,0) = Anew(0,0);

        }

    }

You can follow the along what the runtime allocates and associates with LIBOMPTARGET_DEBUG=-1 and LIBOMPTARGET_INFO=-1, more on the options is described on our webpage openmp.llvm.org.
It’s hard to help you from these partial programs. If you have something you think should work but doesn’t can you make it a small standalone program we can look at and run?

Hi Guys,
Thanks for all suggestion.

I followed first suggestion → describing map(to: list) for each pointers inside structure instead of using Mapper

its working!!!