Unpacking Variadic Templates inside target region is extremely slow with clang compiler

Hello All,
I have observed that unpacking variadic templates inside the target region is extremely slow compared to manually writing member functions for each case.

I have written a test-case that is attached to this email.

In the test case there are 2 classes for multi-dimensional arrays :

  1. Class specialized for 2D arrays (Array2D)
  2. Class that accepts a parameter pack and can hence be used to create generic MD arrays of any reasonable number of dimensions (ArrayMD).

I observe that using a 2D array created with the ArrayMD class is ~50x slower when it accesses it’s elements inside the target compared to creating the same array using Array2D class.

Clang-version = LLVM/10

There is an associated makefile for the clang compiler that accepts the following parameters :
make (sequential)
make OPENMP=y (OpenMP3.0)
make OPENMP_TARGET=y (OpenMP4.5)
make OPENMP_TARGET=y VD=y (Use the ArrayMD class to create the 2D array).

Can someone take a look at this test-case.

Regards,
Rahul.

testCase_variadicTemplates.tar (20 KB)