@jhuber6 Thanks, that worked. I am waiting for the GPU allocation and will test it soon. Will update you if that helps.
@jhuber6 llvm_omp_target_host_mem_alloc
does not solve the issue. Zero copy still does not work, nvvp shows the same behaviour nonetheless.
To copy the data, I am using
int host = omp_get_initial_device();
int device = omp_get_default_device();
void* device_ptr = omp_get_mapped_ptr(*to, device);
omp_target_memcpy(device_ptr, *from, size, 0, 0, device, host);
Is that the issue?
I’m confused. You want zero copy, right? So direct access from the device to the host memory? And you said you want the host memory to be pinned and stay on the host, right?
If that is so, do not copy memory to the device. If you call omp_target_memcpy
, we will copy memory, so no zero copy.
int *ptr = (int*)omp_alloc(4, llvm_omp_target_host_mem_alloc);
#pragma omp target is_device_ptr(ptr)
*ptr = 42
printf("val %i\n", *ptr);
I do need to copy it to *to
from *from
, in that case, do I just do a memcpy? or a target memcpy but both to and from are device?
*to
is a device pointer associated with a host ptr.
void* device_ptr = omp_target_alloc(bytes, device);
omp_target_associate_ptr(*to, device_ptr, bytes, 0, device);
I’m a little confused as well. The CUDA documentation describes that the pointers allocated here should be unified. There should not need to be any copies, see CUDA Driver API :: CUDA Toolkit Documentation .
All host memory allocated in all contexts using cuMemAllocHost() and cuMemHostAlloc() is always directly accessible from all contexts on all devices that support unified addressing. This is the case regardless of whether or not the flags CU_MEMHOSTALLOC_PORTABLE and CU_MEMHOSTALLOC_DEVICEMAP are specified.
You shouldn’t need to do any association here. I believe since OpenMP 5.1 we treat implicitly captured arguments as firstprivate if they are not already present in the mapping table. That means you should just be able to do the following and have it “just work”
#include <omp.h>
int main() {
int *ptr = (int*)omp_alloc(4, llvm_omp_target_host_mem_alloc);
#pragma omp target
{ *ptr = 1; }
int v = *ptr
omp_free(ptr, llvm_omp_target_host_mem_alloc);
return v;
}
*to and *from are both dynamically allocated arrays. *from is allocated with the llvm_omp_target_host_mem_alloc where as *to is a device array. The original datatypes of to and from is **void so it might be confusing when I wrote *to, *to is an array not a value.
complete mallochost. *ptr should be *from
void ops_device_mallochost(OPS_instance *instance, void** ptr, size_t bytes) {
#if defined(__clang__)
*ptr = omp_alloc(bytes, llvm_omp_target_host_mem_alloc);
#else
*ptr = ops_malloc(bytes);
#endif
}
*to should be *ptr:
void ops_device_malloc(OPS_instance *instance, void** ptr, size_t bytes) {
*ptr = ops_malloc(bytes);
int device = omp_get_default_device();
void* device_ptr = omp_target_alloc(bytes, device);
omp_target_associate_ptr(*ptr, device_ptr, bytes, 0, device);
}
and this is h2d:
void ops_device_memcpy_h2d(OPS_instance *instance, void** to, void **from, size_t size) {
int host = omp_get_initial_device();
int device = omp_get_default_device();
void* device_ptr = omp_get_mapped_ptr(*to, device);
omp_target_memcpy(device_ptr, *from, size, 0, 0, device, host);
}
The only option I can think of is
#pragma omp target is_device_ptr(*from)
memcpy(*to, *from, size);
will that work?
That would definitely “work”, but it would probably be very slow. Also you’d need to use __builtin_memcpy
unless you’re an early adopter of my libc for GPUs — The LLVM C Library project.
You should be able to copy this stuff with omp_target_memcpy
as you’ve done, but then it’s not zero copy anymore. What I thought you had in mind when you said “zero copy” was to use this memory directly instead of device memory. The overhead of launching a kernel to do the copying is almost certainly going to be on the order of the memcpy itself. If the amount of memory in the host pointer is smaller it’d probably be best to roll writing back the result into the main kernel, e.g.
void *device = omp_alloc(size, llvm_omp_target_device_mem_alloc)
void *host = omp_alloc(size, llvm_omp_target_host_mem_alloc)
#pragma omp target
{
work(device);
update(device, host);
}
With how the application is designed, it is not possible to change that. I’ll need to search and change a fair bit of code in order to that. What I want is to copy call the values from the pointer allocated with mallochost to a device pointer. Is there no way the target memcpy will work in that case?
In the cuda variant, these operations work just fine:
void ops_device_malloc(OPS_instance *instance, void** ptr, size_t bytes) {
cutilSafeCall(instance->ostream(), cudaMalloc(ptr, bytes));
}
void ops_device_mallochost(OPS_instance *instance, void** ptr, size_t bytes) {
cutilSafeCall(instance->ostream(), cudaMallocHost(ptr, bytes));
}
void ops_device_memcpy_h2d(OPS_instance *instance, void** to, void **from, size_t size) {
cutilSafeCall(instance->ostream(), cudaMemcpy(*to, *from, size, cudaMemcpyHostToDevice));
}
cudaMalloc → omp_target_alloc
cudaMallocHost → omp_alloc(size, llvm_omp_target_host_mem_alloc)
cudaMemcpy → omp_target_memcpy
EDIT:
I now see you want to return the host pointer from ops_device_mallochost
, ok. What did not work with that approach? Zero Copy openmp offload - #13 by addy419 ?
So far, so good.
Now you say you want 0 copy, but in fact you want explicit copies, correct?
You can get either, but not both, I mean 0 copies contradicts explicit copies.
Let’s assume for a second you want to emulate your ops_device_ functions:
void ops_device_malloc(OPS_instance *instance, void** ptr, size_t bytes) {
*ptr = omp_target_alloc(bytes, omp_get_default_device());
}
void ops_device_mallochost(OPS_instance *instance, void** ptr, size_t bytes) {
*ptr = omp_alloc(size, llvm_omp_target_host_mem_alloc);
}
void ops_device_memcpy_h2d(OPS_instance *instance, void** to, void **from, size_t size) {
omp_target_memcpy(*to, *from, size, 0, 0, /* Dst Device */ omp_get_default_device(), /* Src Device */ omp_get_initial_device());
}
The invocation for cudaMemcpy are far more in openmp offload than in cuda. For example in openmp offload, application makes 5561 calls to [CUDA memcpy HtoD]. For cuda, same application makes 192 calls. It takes a total of 229.15us for HtoD in cuda and 6.5745ms for ompoffload. I assumed this is because the memory is not pinned, but same thing happened after pinning. Due to this, the average runtime which is 27 sec for cuda changes to 92 sec for ompoffload when running more iterations (the profiling was done on 2 iterations, I need 2955 iterations in total for the whole application).
OK, now we are talking.
You can check why and what OpenMP is actually transferring. Our documentation has more info (LLVM/OpenMP Runtimes — LLVM/OpenMP 18.0.0git documentation), but basically try:
env LIBOMPTARGET_INFO=$((0x20)) ./your-application
If you don’t have line numbers and variable names in the output, add -gline-tables-only
to your compilation.
You can also use
env LIBOMPTARGET_PROFILE=prof.json ./your-application
to get a chrome profile file, which might shed light on some things, see also LLVM/OpenMP Runtimes — LLVM/OpenMP 18.0.0git documentation
My office hour was earlier today, but if you are stuck, let me know.
We also have a slack where you can get quick help:
https://join.slack.com/t/openmp-opt/shared_invite/zt-1my2xbwds-VqpEQc2SIOkQ5~U7vZzZXg
Hi @jdoerfert, thank you so much for the help. I figured out that the loop variables (start[0], end[0], start[1], end[1] etc) were being continuously moved to the target which was causing this issue. changing them to normal integers reduced the runtime from 81 sec to 53 sec. With just that one change, HtoD calls reduced from 5561 to 1767. I need to move other variables as well, but I’m thankful for the hint.
Kind Regards,
Aditya
@jdoerfert The variables for reductions are transferred every time as well. Is that a normal thing? Libomptarget device 0 info: Copying data from host to device, HstPtr=0x00007ffcab90e6c8, TgtPtr=0x000014b24d806000, Size=8, Name=p_a7_0
Libomptarget device 0 info: Copying data from host to device, HstPtr=0x00007ffcab90e6c0, TgtPtr=0x000014b24d806200, Size=8, Name=p_a8_0
Libomptarget device 0 info: Copying data from host to device, HstPtr=0x00007ffcab90e6b8, TgtPtr=0x000014b24d805a00, Size=8, Name=p_a9_0
Libomptarget device 0 info: Copying data from host to device, HstPtr=0x00007ffcab90e6b0, TgtPtr=0x000014b24d805c00, Size=8, Name=p_a10_0
Libomptarget device 0 info: Copying data from host to device, HstPtr=0x00007ffcab90e7b8, TgtPtr=0x000014b24d805e00, Size=8, Name=p_a11_0
Libomptarget device 0 info: Copying data from device to host, TgtPtr=0x000014b24d805e00, HstPtr=0x00007ffcab90e7b8, Size=8, Name=p_a11_0
Libomptarget device 0 info: Copying data from device to host, TgtPtr=0x000014b24d805c00, HstPtr=0x00007ffcab90e6b0, Size=8, Name=p_a10_0
Libomptarget device 0 info: Copying data from device to host, TgtPtr=0x000014b24d805a00, HstPtr=0x00007ffcab90e6b8, Size=8, Name=p_a9_0
Libomptarget device 0 info: Copying data from device to host, TgtPtr=0x000014b24d806200, HstPtr=0x00007ffcab90e6c0, Size=8, Name=p_a8_0
Libomptarget device 0 info: Copying data from device to host, TgtPtr=0x000014b24d806000, HstPtr=0x00007ffcab90e6c8, Size=8, Name=p_a7_0
It depends, if you have a
#pragma omp target teams distribute parallel for reduction(+:var) <clauses>
we need to copy the value at least from the device so you can use it on the host, right?
That said, we probably could avoid the h2d transfer.
If you want the data to stay on the device, before and after, you need to write it differently, e.g.
#pragma omp target teams
{
double var = ...;
#pragma omp distribute parallel for reduction(+:var)
...
}
That said, you probably want to enable remarks (-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt
, see OpenMP Optimization Remarks — LLVM/OpenMP 18.0.0git documentation) to ensure the code is properly optimized.
If any remark is a “bad” one, OMP112 or OMP130, you should investigate.
Also, LIBOMPTARGET_INFO=16 will tell you how the code is executed, which should hopefully be (Generic-)SPMD mode.
Hi @jdoerfert I am wasting quite a bit of time in this
67.98% 264.83ms 3 88.277ms 569ns 157.89ms cuDevicePrimaryCtxRetain
Can you tell me what this is?
Btw, these are the remarks, found one 112, but I don’t know where it is, since it is a library function.
remark: <unknown>:0:0: Found thread data sharing on the GPU. Expect degraded performance due to data globalization. [OMP112]
remark: ./OpenMP_offload/initialise_chunk_kernel_xx_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_xx_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_yy_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_yy_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_zz_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_zz_ompoffload_kernel.cpp:101:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_x_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_x_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_y_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_y_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_z_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_z_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_cellx_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_cellx_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_celly_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_celly_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_cellz_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_cellz_ompoffload_kernel.cpp:97:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_volume_ompoffload_kernel.cpp:122:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/initialise_chunk_kernel_volume_ompoffload_kernel.cpp:122:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/generate_chunk_kernel_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/generate_chunk_kernel_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/ideal_gas_kernel_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/ideal_gas_kernel_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_b2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_b2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_b1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_b1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_t2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_t2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_t1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_t1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_l2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_l2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_l1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_l1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_r2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_r2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_r1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_r1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_ba2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_ba2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_ba1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_ba1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_fr2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_fr2_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_fr1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel1_fr1_ompoffload_kernel.cpp:143:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_minus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_xvel_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_minus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_yvel_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_bot_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_top_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel2_zvel_minus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_minus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel3_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_minus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel4_plus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_a_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_b_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_left_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_4_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_plus_2_right_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_4_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_2_back_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_4_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/update_halo_kernel5_minus_2_front_ompoffload_kernel.cpp:112:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/field_summary_kernel_ompoffload_kernel.cpp:176:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/field_summary_kernel_ompoffload_kernel.cpp:176:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/viscosity_kernel_ompoffload_kernel.cpp:153:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/viscosity_kernel_ompoffload_kernel.cpp:153:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_min_ompoffload_kernel.cpp:102:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_min_ompoffload_kernel.cpp:102:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_get_ompoffload_kernel.cpp:133:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_get_ompoffload_kernel.cpp:133:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_print_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/calc_dt_kernel_print_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/PdV_kernel_predict_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/PdV_kernel_predict_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/PdV_kernel_nopredict_ompoffload_kernel.cpp:185:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/PdV_kernel_nopredict_ompoffload_kernel.cpp:185:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/revert_kernel_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/revert_kernel_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/accelerate_kernel_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/accelerate_kernel_ompoffload_kernel.cpp:166:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernelx_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernelx_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernely_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernely_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernelz_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/flux_calc_kernelz_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_xdir_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_xdir_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_xdir_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_xdir_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_xdir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_xdir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_xdir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_xdir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_ydir_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_ydir_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_ydir_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_ydir_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_ydir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_ydir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_ydir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_ydir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_zdir_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel1_zdir_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_zdir_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel2_zdir_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_zdir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel3_zdir_ompoffload_kernel.cpp:128:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_zdir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_cell_kernel4_zdir_ompoffload_kernel.cpp:147:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x1_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x1_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_z1_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_z1_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x2_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x2_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_y2_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_y2_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x3_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_x3_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_z3_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_z3_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_x_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_x_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_x_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_x_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_x_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_x_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_x_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_x_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_y_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_y_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_y_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_y_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_y_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_y_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_y_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_y_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_z_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_mass_flux_z_ompoffload_kernel.cpp:91:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_z_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel_post_pre_advec_z_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_z_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel1_z_nonvector_ompoffload_kernel.cpp:110:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_z_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/advec_mom_kernel2_z_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/reset_field_kernel1_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/reset_field_kernel1_ompoffload_kernel.cpp:103:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/reset_field_kernel2_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: ./OpenMP_offload/reset_field_kernel2_ompoffload_kernel.cpp:116:3: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_common.cpp:125:3: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_common.cpp:125:3: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_rt_support_kernels.cpp:68:1: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_rt_support_kernels.cpp:68:1: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_rt_support_kernels.cpp:117:1: Moving memory allocation from the heap to the stack.
remark: src/ompoffload/ops_ompoffload_rt_support_kernels.cpp:117:1: Moving memory allocation from the heap to the stack.
Not much more than the NVIDIA docs can:
That said, are you really “wasting time in this” IT says 68% but it also says 265ms. This function should not be called often, I am surprised to see it called 3 times, potentially once per device? Are you sure this is a bottleneck in your application?
That might be just fine. You can try to compile with assertions on, that often helps;
-fopenmp-target-assume-no-thread-state
-fopenmp-target-assume-no-nested-parallelism
It was an issue with SPMD, looking more into it, I realised that clang does not support omp target loop yet (not perfectly at least). With NVHPC, a significant performance bump was observed.
So, omp target loop
is not openmp at all. omp target teams loop
can be used as a shorthand for the omp target teams distribute parallel for simd
form.
That said, did LIBOMPTARGET_INFO=16 tell you the kernels were executed in Generic mode? or (Generic-)SPMD mode?