[LLVM][RFC] Representing the target device information in the LLVM IR

RFC: Representing the target device information in the LLVM IR

I don’t see how this inconsistency is a problem… at least, not on its own. The host code doesn’t call either of these functions directly; it calls the OpenMP runtime, which should invoke the offloaded function correctly. (If it doesn’t, that’s a bug in the OpenMP lowering, not the LLVM backend.) -Eli

Given a global variable @gg, the compiler has to generate code on the host to specify whether it is passed by value or passed by reference. In the following example, if the compiler generates the code for passing by value, the outlined function on the target i386-pc-linux-gnu cannot get the correct value since it assumes the variable @gg is passed by reference.

Here is the corresponding IR on the host side.

%0 = load double, double* @gg, align 8, !tbaa !3

%1 = bitcast double %0 to i64

%12 = getelementptr inbounds [4 x i8*], [4 x i8*]* %.offload_baseptrs, i32 0, i32 2

%13 = bitcast i8** %12 to i64*

store i64 %1, i64* %13, align 8

%20 = call i32 @__tgt_target(i64 -1, i8* @.omp_offload.region_id, i32 4, i8** %4, i8** %6, i64* getelementptr inbounds ([4 x i64], [4 x i64]* @.offload_sizes, i32 0, i32 0), i64* getelementptr inbounds ([4 x i64], [4 x i64]* @.offload_maptypes, i32 0, i32 0))

Thanks,

Jin

Could you describe the overall process of calling an offloaded function in a bit more detail? How do you describe the ABI of the called function to the OpenMP runtime? I suspect you shouldn’t be trying to store things which aren’t pointers into offload_baseptrs. -Eli

For the firstprivate clause, the compiler generates code to pass it by value or by reference to the outlined function. The reason the first private scalars is generally passed by value is for the performance reason.

For this particular case, the compiler cannot generate code to pass the double @gg by value under i386-pc-linux-gnu since the value is 64 bit while the architecture is 32bit.

For the host compilation, the compiler generates the code to pass the data as well as the outlined function name to the OMP runtime.

For the target compilation, the compiler generates the outlined function so that it can be called by the OMP runtime.

So, the compiler is required to generate a single call on the host to support all the targets. All the target versions must have the same interface. So the common interface of the outline function should be used. For this particular example, the variable @gcc should be passed by reference under x86_64-mic.

Please let me know if you have more questions.

Jin

Hi, Jin,

Can you please back up a bit and talk about the programming environment in which this problem manifests?

If I have a host and a target with different ABIs, then it seems we have lots of problems. For one thing, the layouts of structures are different, the sizes of some integer types are different, the sizes of pointers are different, and so on. It seems like a solution in this space should address, somehow, this general translation problem. Fixing this particular problem with the dispatch function’s parameters feels like only the tip of the iceberg. What if I’m passing a pointer to some structure, or a pointer to other pointers, etc.?

I understand that OpenMP v5 is expected to have some custom “mappers” to handle deep copying and translation. Is this related to the design space here?

Thanks again,

Hal

To follow on here:

This is most assuredly just the tip of the iceberg if you need to co-mingle two targets as part of the module. Basically any solution is going to be better than trying to do that.

-eric

Hi Hal,

We are not trying to address issues where the object mapped are of different sizes between host and target with different ABI. The issue is when the objects are of same size like double which is 8bytes on both 32bit and 64bit platform. If a double is used in a first_private on a target clause, the 64 bit side will pass it as value whereas on the 32bit side since the value does not fit in the argument it will be passed as pointer to a double. There will be a mis-match at the call site and entry site on this value.

The main reason for this change is that when we do backend outlining for target pragmas the targets information needs to be communicated to the backend to generate the tables with the right names. Generate LLVM IR for passing this information is one mechanism and other is passing the command option to the backend. For the later each pass which needs this info will have to change.

Thanks

Ravi

Why are you not trying to address that issue? Â -Hal

I think because it is prohibited by the standard. According to OpenMP standard the new copies of the variables must be of the same type and the same size. If the type has the different size on the device, we become not compatible with the standard.

I agree with Eric Christofer. From my point of view, it just breaks the existing ABI. Instead of breaking the existing ABI, I think, it would better to introduce a new, portable, ABI. The classic ABI could be used to get a little bit more performance, while the new one could be used to get the compatibility between pointer-size incompatible targets. In this ABI all variables must be passed by reference. I think it will solve all the problems.

With the proposed change there would be a difference between the code compiled for (1) 64 bit host+64 bit device and (2) 64 bit host +64|32 bit device. You won’t be able to make it work properly the program linked from the (2)nd host object + the (1)st device object because of the different ABIs. I think it is better to explicitly specify that we’re going to use some special ABI rather than doing such tricky and dangerous things as in this proposal.

Hi Hal,
   We are not trying to address issues where the object mapped are of different sizes between host and target with different ABI.

Why are you not trying to address that issue?

Good question. It is a harder problem and no vendor except Intel (offloading from windows to Linux) required this and so were not pushing for this.

Also bring up a whole lot of problems with shared memory, how do use shared memory if the objects are different, are copies made, when is the original item synced up.

Ravi

The issue is when the objects are of same size like double which is 8bytes on both 32bit and 64bit platform. If a double is used in a first_private on a target clause, the 64 bit side will pass it as value whereas on the 32bit side since the value does not fit in the argument it will be passed as pointer to a double. There will be a mis-match at the call site and entry site on this value.
  The main reason for this change is that when we do backend outlining for target pragmas the targets information needs to be communicated to the backend to generate the tables with the right names. Generate LLVM IR for passing this information is one mechanism and other is passing the command option to the backend. For the later each pass which needs this info will have to change.
Thanks
Ravi