How to apply custom pass to OpenMP offloading code

I’m trying to develop and apply my custom pass to an OpenMP target offloading code. But when using ‘opt’ I face the following error:
opt: :29:1: error: expected top-level entity

source_filename = “test.cpp”

I receive this error message even if I do not use my pass and just try to use opt with -O1.

Sample code I tried to build:
$ cat test.cpp
int main()
#pragma omp target teams distribute parallel for
for(int i=0; i<1000;i++);
return 0;

Command used to build the code and run
$ clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_70 test.cpp
$ nvprof ./a.out
==85306== NVPROF is profiling process 85306, command: ./a.out
==85306== Profiling application: ./a.out
==85306== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 91.79% 30.752us 1 30.752us 30.752us 30.752us __omp_offloading_2c_715c4b_main_l3
4.68% 1.5680us 1 1.5680us 1.5680us 1.5680us [CUDA memcpy DtoH]
3.53% 1.1840us 1 1.1840us 1.1840us 1.1840us [CUDA memcpy HtoD]
API calls: 80.54% 362.46ms 1 362.46ms 362.46ms 362.46ms cuDevicePrimaryCtxRetain
17.76% 79.946ms 1 79.946ms 79.946ms 79.946ms cuDevicePrimaryCtxRelease
1.00% 4.4941ms 1 4.4941ms 4.4941ms 4.4941ms cuModuleLoadDataEx
0.54% 2.4175ms 1 2.4175ms 2.4175ms 2.4175ms cuModuleUnload
0.10% 446.26us 32 13.945us 2.0710us 174.72us cuStreamCreate
0.02% 111.17us 32 3.4730us 2.9480us 13.457us cuStreamDestroy
0.01% 64.949us 1 64.949us 64.949us 64.949us cuStreamSynchronize
0.01% 44.065us 1 44.065us 44.065us 44.065us cuMemcpyDtoH
0.01% 27.451us 1 27.451us 27.451us 27.451us cuLaunchKernel
0.00% 12.617us 1 12.617us 12.617us 12.617us cuDeviceGetPCIBusId
0.00% 9.4680us 1 9.4680us 9.4680us 9.4680us cuMemcpyHtoD
0.00% 5.9570us 1 5.9570us 5.9570us 5.9570us cuModuleGetFunction
0.00% 5.1610us 2 2.5800us 1.1090us 4.0520us cuModuleGetGlobal
0.00% 5.1190us 6 853ns 277ns 1.9740us cuCtxSetCurrent
0.00% 3.5270us 6 587ns 214ns 1.2310us cuDeviceGetAttribute
0.00% 3.4210us 3 1.1400us 465ns 2.4020us cuDeviceGetCount
0.00% 3.3210us 2 1.6600us 1.5970us 1.7240us cuDeviceGet
0.00% 1.6230us 1 1.6230us 1.6230us 1.6230us cuFuncGetAttribute
0.00% 1.1600us 1 1.1600us 1.1600us 1.1600us cuDevicePrimaryCtxGetState
0.00% 685ns 1 685ns 685ns 685ns cuDevicePrimaryCtxSetFlags
0.00% 346ns 1 346ns 346ns 346ns cuCtxGetDevice

This shows that my Clang and OpenMP are built properly and target offloading works.

Next I’m trying to apply any pass to this code, so I convert the code into LLVM-IR and then apply -O1 to it using opt.
$ clang++ -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_70 -emit-llvm -S test.cpp
$ opt -O1 < test.ll
opt: :891:1: error: expected top-level entity
source_filename = “test.cpp”

If you open the IR module you generate you'd see that it contains both host and device code.
Short of manually applying all steps the driver takes to build a offload binary, you cannot
run custom passes via opt. Load them in via a plugin, that's the proper way to add custom passes.

~ Johannes