32-bit target device support

Hi,

I am currently adding support for an experimental 32-bit OpenMP target device using x86_64 as a host. There seems to be a bug in

bool Sema::Sema::isOpenMPCapturedByRef(…,) {

if (!IsByRef &&

(Ctx.getTypeSizeInChars(Ty) >

Ctx.getTypeSizeInChars(Ctx.getUIntPtrType()) ||

Ctx.getDeclAlign(D) > Ctx.getTypeAlignInChars(Ctx.getUIntPtrType()))) {

IsByRef = true;

}

return IsByRef;

}

The above code assumes that the target device’s UIntPtr size is the same as the host’s UIntPtr which is not true in my case. So, if you pass a 64-bit double to a 32-bit device the host compilation pass uses by-value argument passing, because the value fits into the 64-bit pointers. However, if you build for the target (-fopenmp-is-device), the 64-bit float does not fit into the 32-bit pointer, so the target assumes that the value is passed by-reference and the device does not pick up the correct argument value.

A possible fix would to use something like this

if (!IsByRef &&

(Ctx.getTypeSizeInChars(Ty) >

DCtx.getTypeSizeInChars(DCtx.getUIntPtrType()) ||

Ctx.getDeclAlign(D) > DCtx.getTypeAlignInChars(DCtx.getUIntPtrType()))) {

IsByRef = true;

}

where DCtx is the context of the target device. But that context does not seem to be available in the host’s Sema.

Sema.LangOpts.OMPTargetTriples seems to be the only available starting point to get to a device’s context, or am I missing something?

Thanks,

Andreas

In embedded environments it is not uncommon to pair a 64-bit ARM with 32-bit accelerator devices (as in my case).

I have just started this, so maybe I am just naïve here, but I can run a lot of examples on our device using this ugly hack:

CharUnits TySz = Ctx.getTypeSizeInChars(Ty);

CharUnits PtrSz = CharUnits::fromQuantity(4); // our device pointers are 32-bits wide

if (!IsByRef &&

(TySz > PtrSz ||

Ctx.getDeclAlign(D) > Ctx.getTypeAlignInChars(Ctx.getUIntPtrType()))) {

IsByRef = true;

}

Is it really that hard to support this (I could imagine that things could become messy with mixed host/device endianness))?

The much harder 32-bit <-> 64-bit pointer translation problem already seems to work flawlessly in my setup.

-Andreas

In embedded environments it is not uncommon to pair a 64-bit ARM with 32-bit accelerator devices (as in my case).
I have just started this, so maybe I am just naïve here, but I can run a lot of examples on our device using this ugly hack:

  CharUnits TySz = Ctx.getTypeSizeInChars(Ty);
  CharUnits PtrSz = CharUnits::fromQuantity(4); // our device pointers are 32-bits wide
  if (!IsByRef &&
      (TySz > PtrSz ||
       Ctx.getDeclAlign(D) > Ctx.getTypeAlignInChars(Ctx.getUIntPtrType()))) {
    IsByRef = true;
  }

Is it really that hard to support this (I could imagine that things could become messy with mixed host/device endianness))?

One issue in this kind of configuration is that, if the host and accelerator have different data layouts, then sharing any kind of aggregate, in general, will cause problems. Simple cases will work (e.g., you have an array of floats), but it will be fragile even if no pointers are involved. Do you work in an environment where you can force the accelerator's structure layout rules (etc.) to match the host's rules?

-Hal

The much harder 32-bit <-> 64-bit pointer translation problem already seems to work flawlessly in my setup.

-Andreas

You can make structure layouts compatible by using types, e.g., int32_t in the common definitions and (worst case) add some alignment attributes. But that would have to be done even if you are not usingOpenMP.

Structure with embedded pointers will need special attention, but they will not work anyway unless you have a host/device unified shared memory model.

I have just started this internal experiment, but I was able to resolve all issues (except for that byval/byref capture issue in clang) in my omptarget plugin and I am able to run some simple examples that target our 32-bit device from an x86_64 host.

-Andreas

You can make structure layouts compatible by using <stdint> types, e.g., int32_t in the common definitions and (worst case) add some alignment attributes. But that would have to be done even if you are not usingOpenMP.

That maybe true, but from a programming-model perspective, that's pretty fragile. The frontend knows if the layouts won't match, and maybe a good set of warnings (or similar) would make all of this work well. It's unclear to me, but I'm certainly interested in your experience. We should understand what it takes to make this kind of configuration work well.

-Hal

Structure with embedded pointers will need special attention, but they will not work anyway unless you have a host/device unified shared memory model.

I have just started this internal experiment, but I was able to resolve all issues (except for that byval/byref capture issue in clang) in my omptarget plugin and I am able to run some simple examples that target our 32-bit device from an x86_64 host.

-Andreas