>>> I don't think that using getTypeAllocSize() instead of
>>> getTypeStoreSize() to calculate clover::argument::size would be a
>>> satisfactory solution. By doing that you'd redefine the API argument
>>> size exposed to the host for *all* argument types to be the
>>> device-dependent aligned size, which is definitely not what you want.
>>> AFAIK 3-element vectors are an exception because they are the only types
>>> that are defined to have a different API size from their actual usable
>>> size, so they probably deserve special handling in invocation.cpp (as
>>> you did in your first patch). As the API size is target-independent I
>>> don't think that the fix belongs in Clang or LLVM, Clover is likely at
>> The way I understood the ch 6.1.5 is that both API and OpenCL C 3
>> element vectors are required to be 4*sizeof(component). So a
>> sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
>> target independent.
> Well, sizeof() is supposed to take into account the alignment, so this
> should be the case already.
AFAIK sizeof only includes padding. I explain below.
>> That's why clang (or more precisely libclc) looked like a correct
> Right. I guess the other possibility would be to redefine cl_float3 as
> cl_float4 in libclc.
Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
element type "gentype".
> You mentioned that it had undesired consequences. Which exactly?
With this change:
-typedef __attribute__((ext_vector_type(3))) float float3;
+typedef __attribute__((ext_vector_type(4))) float float3;
errors about duplicate ops (since type3 and type4 are now the same type)
can be fixed. However, I don't know a clean way to fix the following:
error: too few elements in vector initialization (expected 4 elements,
Even if we can get rid of it, or reduce it to warning.
having the same type for type3 and type4 causes problem that we won't be
able to distinguish following situations:
float3 B = A.xyz; // This is OK, should not produce warning/error;
float4 C = A.xyz; // This should at least produce a warning. even if we
allow using fewer elements for vector initialization.
>> I understand that target device can have stricter alignment rules, but I
>> don't see how it can have different type sizes (my reading of the specs
>> is that these are binding for the target as well).
> In theory the sizes are binding for most built-in types, but R600
> doesn't support certain integer types so clover has code to take into
> account the differing size, alignment and endianness between host and
> device. I guess that in most cases it would be possible to use the
> "official" ABI for passing kernel arguments to the device (and that's a
> requirement for your solution using DataLayout::getTypeAllocSize() to
> work reliably, otherwise you'll be taking into account device-specific
> padding), but you'll have to fix the R600 back-end so it's able to deal
> with (i.e. lower) all built-in types specified by OpenCL. I think that
> this would be useful on its own, Tom should know better than I how
> difficult it would be.
I didn't know about the additional R600 type size magic, in this case I
agree that the original approach (detect 3elem vectors and change size)
is probably the best in this situation.
>> I can resend the original patch with debug output replaced by a comment.
> A slightly more general solution than what you did could be to align the
> store size of scalar and vector arguments to the next power of two to
> calculate the API size (in particular, that would make 3- and 4-element
> vectors have the same size). From 6.1.5: "A built-in data type that is
> not a power of two bytes in size must be aligned to the next larger
> power of two."
This part is what made me think that the _alignment_ only restricts
starting address (and not size). Otherwise, the addition of 3 element
vector special case would not be necessary. i.e I think that the
following mem layout is legal in OCL 1.0 but not in OCL1.1+
0x10: float3 here
0x1c: int here // this should belong to float3 in ocl 1.1+
while the following is illegal in both:
0x8: float3 here
So, I don't think that the quoted text requires all builtin types to
have 2^x size (although all but 3 element vectors do).