Generating target dependent function calls

Hi,

raghesh and I are working in Polly on automatically generating OpenMP calls. This works nicely on a 64bit architecture,
however the functions we need to generate are slightly different on different platforms.

The reason for the difference is that e.g "long" in

> bool GOMP_loop_runtime_next(long, long)

has a different size on different architectures.

Currently we generate the prototypes and functions ourselves:
> declare i8 @GOMP_loop_runtime_next(i64*, i64*) nounwind

To support a 32bit architecture we would need to generate:
> declare i8 @GOMP_loop_runtime_next(i32*, i32*) nounwind

Has anybody an idea what's a conceptually clean way to generate OpenMP function calls for different architectures and best even for different OpenMP implementations (e.g. mpc.sf.net)

Would overloaded intrinsics be a possible approach? How could we model/derive the right signature for our target architecture. TargetData does not seem to be enough for this. Is there a better approach than passing this information using a command line switch?

Cheers and thanks for your help

Tobi

Hi Tobias,

I'm facing a similar problem generating CUDA runtime code for
different architectures. One solution I'm currently considering
is to use the TargetInfo class from Clang's Basic library which can
be used to obtain target specific parameters (such as sizeof(long))
given a target triple.

Thanks,

Interesting. Maybe I can get enough information from the target triple itself. Module::&getTargetTriple should be enough. I did not think about this, but that might work. Thanks for the pointer.

Tobi

The reason for the difference is that e.g "long" in

bool GOMP_loop_runtime_next(long, long)

has a different size on different architectures.

Currently we generate the prototypes and functions ourselves:

declare i8 @GOMP_loop_runtime_next(i64*, i64*) nounwind

To support a 32bit architecture we would need to generate:

declare i8 @GOMP_loop_runtime_next(i32*, i32*) nounwind

Has anybody an idea what's a conceptually clean way to generate OpenMP
function calls for different architectures and best even for different
OpenMP implementations (e.g. mpc.sf.net)

Would overloaded intrinsics be a possible approach? How could we
model/derive the right signature for our target architecture. TargetData
does not seem to be enough for this. Is there a better approach than
passing this information using a command line switch?

Cheers and thanks for your help

Hi Tobias,

I'm facing a similar problem generating CUDA runtime code for
different architectures. One solution I'm currently considering
is to use the TargetInfo class from Clang's Basic library which can
be used to obtain target specific parameters (such as sizeof(long))
given a target triple.

Interesting. Maybe I can get enough information from the target triple
itself. Module::&getTargetTriple should be enough. I did not think about
this, but that might work. Thanks for the pointer.

You could also just use a small function to define the API function you
wish to use and then use TargetData to get some of the final parts:

i.e. if you know you've got something like

char GOMP_foo(pointer, pointer)

you could declare in a function that would then inquire pointer size
from TargetData like:

FunctionType *geti8LongLong() {
  const PointerType *LongTy = Type::getIntNPtrTy(Context, TD.getPointerSizeInBits());
  std::vector<const Type*> Params = { LongTy, LongTy };
  const Type *ResultTy = Type::getInt8Ty(Context);
  return FunctionType::get(ResultTy, ArgTys, 0);
}

and then just use the return from geti8LongLong() as the function type when you declare the function. Or you can abstract out away the LongTy and use it in general.

-eric

Hey Eric,

thanks for this hint. This seems to be a reasonable approach.

However, is TD.getPointerSizeInBits() always equivalent to the number of bits a long has on a platform? It might be correct on Linux, however I do not think this is true on Windows.

I believe together with Peter's ideas I can create something that will be correct on Linux and that will warn if used on any other platform.

Thanks for your help
Tobi

thanks for this hint. This seems to be a reasonable approach.

However, is TD.getPointerSizeInBits() always equivalent to the number of bits a long has on a platform? It might be correct on Linux, however I do not think this is true on Windows.

It's the size of a pointer which is, I think, what they meant there, but yeah, it's the size of a pointer and not necessarily size of long.

I believe together with Peter's ideas I can create something that will be correct on Linux and that will warn if used on any other platform.

It should be correct on OSX as well.

You could also add a configure check that verifies that sizeof(ptrdiff_t) == sizeof(long) and disable support otherwise.

-eric