Hello OpenMP,
Our language spec seems fairly light on what it means to call malloc from a target region.
I can think of a few interpretations:
- One heap per process. Malloc on target or host, free from either. Writable from either, or some other device. Might mean intercepting host libc. Convenient, slow.
- One heap per device + one for host. Each independent, pointers only valid on the thing that called malloc.
- One heap per target offload region, inaccessible from host.
- Some other granularity.
Generally gets faster as the restrictions increase.
Anyone willing to state / guess what they or their users would expect? Bearing in mind that new is likely to call malloc and will gain the same properties.
Thanks,
Jon
Hi, Jon,
This is a great question.
With reverse-offload support, and unified memory, we can support a model where memory allocation triggers reverse offload to the memory allocator on the host. In this mode, everything works as expected. We can, of course, do some static analysis and move allocations that don’t escape to use some local allocation scheme, such as what we use without unified memory + reverse offload.
Without such support, I think that “One heap per device + one for host. Each independent, pointers only valid on the thing that called malloc.” makes the most sense. This also, as far as I know, matches what’s available in CUDA today.
“One heap per target offload region” doesn’t make sense to me. One might clearly want to allocate in one target region, store the pointers in some data structure, and then access them in some other target region on the same device.
Thanks again,
Hal
Does OpenMP 5 allow calling symbols that are not omp-declare-target?
In any case, I am concerned about malloc interception, having seen that break many times in the past. OpenMP isn’t the only thing that wants to intercept malloc, and nested interception may not be stable.
Why is omp_alloc not the best option here? That avoids essentially all of the issues I can think of.
Jeff
Does OpenMP 5 allow calling symbols that are not omp-declare-target?
I'm not sure this is relevant because the implementation gets to provide such symbols (and we depend on it doing so in other cases, such as for math.h functions).
In any case, I am concerned about malloc interception, having seen that break many times in the past. OpenMP isn't the only thing that wants to intercept malloc, and nested interception may not be stable.
Why are you bringing up interception? I did not think that anything in this message implied intercepting malloc.
-Hal
Is LLVM OpenMP going to assume one and only C RTL? Which one?
Supporting glibc, musl, as well as third party allocators like tbbmalloc, jemalloc, tcmalloc, etc. may be broken by such assumptions.
In any case, I am concerned about malloc interception, having seen that break many times in the past. OpenMP isn’t the only thing that wants to intercept malloc, and nested interception may not be stable.
Why are you bringing up interception? I did not think that anything in this message implied intercepting malloc.
I was responding to the previous email where it said “Might mean intercepting host libc.“
Jeff
Does OpenMP 5 allow calling symbols that are not omp-declare-target?
I'm not sure this is relevant because the implementation gets to provide such symbols (and we depend on it doing so in other cases, such as for math.h functions).
Is LLVM OpenMP going to assume one and only C RTL? Which one?
Supporting glibc, musl, as well as third party allocators like tbbmalloc, jemalloc, tcmalloc, etc. may be broken by such assumptions.
In any case, I am concerned about malloc interception, having seen that break many times in the past. OpenMP isn't the only thing that wants to intercept malloc, and nested interception may not be stable.
Why are you bringing up interception? I did not think that anything in this message implied intercepting malloc.
I was responding to the previous email where it said “Might mean intercepting host libc.“
Ah, thanks. I think we should support this only in cases where we can do so without interception. On Linux + HMM, for example. As you point out, it's hard to do this in a robust way otherwise.
-Hal
I can think of a few interpretations:
>>>>> - One heap per process. Malloc on target or host, free from either. Writable from either, or some other device. Might mean intercepting host libc. Convenient, slow.
>>>>> - One heap per device + one for host. Each independent, pointers only valid on the thing that called malloc.
>>>>> - One heap per target offload region, inaccessible from host.
>>>>> - Some other granularity.
I think the only thing we can reasonably support and that makes sense in
the general picture is:
One heap per device + one for host
>>> Does OpenMP 5 allow calling symbols that are not omp-declare-target?
>>
>> I'm not sure this is relevant because the implementation gets to provide such symbols (and we depend on it doing so in other cases, such as for math.h functions).
>>
> Is LLVM OpenMP going to assume one and only C RTL? Which one?
Yeah, that is a good question, ...
I think your last thought was the right starting point, you need to
declare target the symbol to access it on a device. That said, some
symbols the compiler will automagically provides as declare target
because people expect it. `math.h` is the poster child.
My first thought is that malloc on the device is provided and defined as
either
`omp_alloc(size, omp_default_mem_alloc)`, or
`omp_alloc(size, omp_cgroup_mem_allo)`.
free then with the respective `omp_free` call.
realloc and friends are available as soon as we have the counterparts in
OpenMP.
Cheers,
Johannes
P.S. Thursday & Friday we have our workshop 