3 element vectors in opencl 1.1+

Hi,

I ran into a problem caused by this part of the OCL specs (6.1.5
Alignment of Types):
"For 3-component vector data types, the size of the data type is 4 *
sizeof(component)."

and the corresponding part of Khronos cl_platform.h (with all types, not
just float):
/* cl_float3 is identical in size, alignment and behavior to cl_float4.
See section 6.1.5. */
typedef cl_float4 cl_float3;

So when I try to run kernel that takes 3 element vectors as arguments I
get 'invalid arg size' error.

Not sure whether this is best solved in clang, libclc or clover. I tried
changing float3 to have 4 elements in libclc, it caused clang to
complain in thousand places. I don't think this can be handled cleanly
in clang, unless we add something like __attribute__((padding)).

I have attached a workaround that I use now.

any advice welcome,
Jan

workaround.diff (1.11 KB)

Hi,

I ran into a problem caused by this part of the OCL specs (6.1.5
Alignment of Types):
"For 3-component vector data types, the size of the data type is 4 *
sizeof(component)."

and the corresponding part of Khronos cl_platform.h (with all types, not
just float):
/* cl_float3 is identical in size, alignment and behavior to cl_float4.
See section 6.1.5. */
typedef cl_float4 cl_float3;

So when I try to run kernel that takes 3 element vectors as arguments I
get 'invalid arg size' error.

Not sure whether this is best solved in clang, libclc or clover. I tried
changing float3 to have 4 elements in libclc, it caused clang to
complain in thousand places. I don't think this can be handled cleanly
in clang, unless we add something like __attribute__((padding)).

I have attached a workaround that I use now.

You may want to ask this question on the pocl mailing list as they
have likely solved this issue already. Ideally, TD.getTypeStoreSize
would return the correct value. Also, maybe look at the DataLayout
description for R600 and see if there is a way to specify the
correct type size.

-Tom

Hi,

I ran into a problem caused by this part of the OCL specs (6.1.5
Alignment of Types):
"For 3-component vector data types, the size of the data type is 4 *
sizeof(component)."

and the corresponding part of Khronos cl_platform.h (with all types, not
just float):
/* cl_float3 is identical in size, alignment and behavior to cl_float4.
See section 6.1.5. */
typedef cl_float4 cl_float3;

So when I try to run kernel that takes 3 element vectors as arguments I
get 'invalid arg size' error.

Not sure whether this is best solved in clang, libclc or clover. I tried
changing float3 to have 4 elements in libclc, it caused clang to
complain in thousand places. I don't think this can be handled cleanly
in clang, unless we add something like __attribute__((padding)).

I have attached a workaround that I use now.

You may want to ask this question on the pocl mailing list as they
have likely solved this issue already. Ideally, TD.getTypeStoreSize
would return the correct value. Also, maybe look at the DataLayout
description for R600 and see if there is a way to specify the
correct type size.

-Tom

I think this is what v96:128 is for

>> Hi,
>>
>> I ran into a problem caused by this part of the OCL specs (6.1.5
>> Alignment of Types):
>> "For 3-component vector data types, the size of the data type is 4 *
>> sizeof(component)."
>>
>> and the corresponding part of Khronos cl_platform.h (with all types, not
>> just float):
>> /* cl_float3 is identical in size, alignment and behavior to cl_float4.
>> See section 6.1.5. */
>> typedef cl_float4 cl_float3;
>>
>> So when I try to run kernel that takes 3 element vectors as arguments I
>> get 'invalid arg size' error.
>>
>> Not sure whether this is best solved in clang, libclc or clover. I tried
>> changing float3 to have 4 elements in libclc, it caused clang to
>> complain in thousand places. I don't think this can be handled cleanly
>> in clang, unless we add something like __attribute__((padding)).
>>
>> I have attached a workaround that I use now.
>>
> You may want to ask this question on the pocl mailing list as they
> have likely solved this issue already. Ideally, TD.getTypeStoreSize
> would return the correct value. Also, maybe look at the DataLayout
> description for R600 and see if there is a way to specify the
> correct type size.
>
> -Tom
I think this is what v96:128 is for

according to [0], it specifies only alignment, not size. I could not
find an __attribute__ that would change size either.

It should be possible to have ADMGPUDataLayout: public DataLayout class
that would intercept the call and fix the reported value, but I think it
would only move the hack to different place.

I have added pocl-devel list as suggested.

regards,
Jan

[0]http://llvm.org/docs/LangRef.html#data-layout

Only the size in memory matters, which is what the required alignment specifies. DataLayout::getTypeAllocSize accounts for the alignment, but getTypeStoreSize does not. I actually thought this was half of what getTypeStoreSize was for, but it turns out it isn't.

<snip>

>> I think this is what v96:128 is for
> according to [0], it specifies only alignment, not size. I could not
> find an __attribute__ that would change size either.
>
> It should be possible to have ADMGPUDataLayout: public DataLayout class
> that would intercept the call and fix the reported value, but I think it
> would only move the hack to different place.
>
> I have added pocl-devel list as suggested.
>
> regards,
> Jan
>
> [0]http://llvm.org/docs/LangRef.html#data-layout
>

Only the size in memory matters, which is what the required alignment
specifies. DataLayout::getTypeAllocSize accounts for the alignment, but
getTypeStoreSize does not. I actually thought this was half of what
getTypeStoreSize was for, but it turns out it isn't.

hm, I always thought that alignment only puts restrictions on starting
address and using padding was just a tool to do the job.

anyway, thanks for the hint, using getTypeAllocSize works nicely.
since we are allocating space in the argument vector I think
getAllocSize is the right function to use.

I'll post a patch.

regards,
Jan

Jan Vesely <jan.vesely@rutgers.edu> writes:

<snip>

>> I think this is what v96:128 is for
> according to [0], it specifies only alignment, not size. I could not
> find an __attribute__ that would change size either.
>
> It should be possible to have ADMGPUDataLayout: public DataLayout class
> that would intercept the call and fix the reported value, but I think it
> would only move the hack to different place.
>
> I have added pocl-devel list as suggested.
>
> regards,
> Jan
>
> [0]http://llvm.org/docs/LangRef.html#data-layout
>

Only the size in memory matters, which is what the required alignment
specifies. DataLayout::getTypeAllocSize accounts for the alignment, but
getTypeStoreSize does not. I actually thought this was half of what
getTypeStoreSize was for, but it turns out it isn't.

hm, I always thought that alignment only puts restrictions on starting
address and using padding was just a tool to do the job.

anyway, thanks for the hint, using getTypeAllocSize works nicely.
since we are allocating space in the argument vector I think
getAllocSize is the right function to use.

I'll post a patch.

I don't think that using getTypeAllocSize() instead of
getTypeStoreSize() to calculate clover::argument::size would be a
satisfactory solution. By doing that you'd redefine the API argument
size exposed to the host for *all* argument types to be the
device-dependent aligned size, which is definitely not what you want.

AFAIK 3-element vectors are an exception because they are the only types
that are defined to have a different API size from their actual usable
size, so they probably deserve special handling in invocation.cpp (as
you did in your first patch). As the API size is target-independent I
don't think that the fix belongs in Clang or LLVM, Clover is likely at
fault.

Thanks.

Jan Vesely <jan.vesely@rutgers.edu> writes:

> <snip>
>
>> >> I think this is what v96:128 is for
>> > according to [0], it specifies only alignment, not size. I could not
>> > find an __attribute__ that would change size either.
>> >
>> > It should be possible to have ADMGPUDataLayout: public DataLayout class
>> > that would intercept the call and fix the reported value, but I think it
>> > would only move the hack to different place.
>> >
>> > I have added pocl-devel list as suggested.
>> >
>> > regards,
>> > Jan
>> >
>> > [0]http://llvm.org/docs/LangRef.html#data-layout
>> >
>>
>> Only the size in memory matters, which is what the required alignment
>> specifies. DataLayout::getTypeAllocSize accounts for the alignment, but
>> getTypeStoreSize does not. I actually thought this was half of what
>> getTypeStoreSize was for, but it turns out it isn't.
>
> hm, I always thought that alignment only puts restrictions on starting
> address and using padding was just a tool to do the job.
>
> anyway, thanks for the hint, using getTypeAllocSize works nicely.
> since we are allocating space in the argument vector I think
> getAllocSize is the right function to use.
>
> I'll post a patch.
>

I don't think that using getTypeAllocSize() instead of
getTypeStoreSize() to calculate clover::argument::size would be a
satisfactory solution. By doing that you'd redefine the API argument
size exposed to the host for *all* argument types to be the
device-dependent aligned size, which is definitely not what you want.

AFAIK 3-element vectors are an exception because they are the only types
that are defined to have a different API size from their actual usable
size, so they probably deserve special handling in invocation.cpp (as
you did in your first patch). As the API size is target-independent I
don't think that the fix belongs in Clang or LLVM, Clover is likely at
fault.

The way I understood the ch 6.1.5 is that both API and OpenCL C 3
element vectors are required to be 4*sizeof(component). So a
sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
target independent. That's why clang (or more precisely libclc) looked
like a correct place.

I understand that target device can have stricter alignment rules, but I
don't see how it can have different type sizes (my reading of the specs
is that these are binding for the target as well).

I can resend the original patch with debug output replaced by a comment.

regards,
Jan

OpenCL specifies the alignment of types; this is not up to the target. For the basic types, the alignment is their size.

-erik

Jan Vesely <jan.vesely@rutgers.edu> writes:

Jan Vesely <jan.vesely@rutgers.edu> writes:

> <snip>
>
>> >> I think this is what v96:128 is for
>> > according to [0], it specifies only alignment, not size. I could not
>> > find an __attribute__ that would change size either.
>> >
>> > It should be possible to have ADMGPUDataLayout: public DataLayout class
>> > that would intercept the call and fix the reported value, but I think it
>> > would only move the hack to different place.
>> >
>> > I have added pocl-devel list as suggested.
>> >
>> > regards,
>> > Jan
>> >
>> > [0]http://llvm.org/docs/LangRef.html#data-layout
>> >
>>
>> Only the size in memory matters, which is what the required alignment
>> specifies. DataLayout::getTypeAllocSize accounts for the alignment, but
>> getTypeStoreSize does not. I actually thought this was half of what
>> getTypeStoreSize was for, but it turns out it isn't.
>
> hm, I always thought that alignment only puts restrictions on starting
> address and using padding was just a tool to do the job.
>
> anyway, thanks for the hint, using getTypeAllocSize works nicely.
> since we are allocating space in the argument vector I think
> getAllocSize is the right function to use.
>
> I'll post a patch.
>

I don't think that using getTypeAllocSize() instead of
getTypeStoreSize() to calculate clover::argument::size would be a
satisfactory solution. By doing that you'd redefine the API argument
size exposed to the host for *all* argument types to be the
device-dependent aligned size, which is definitely not what you want.

AFAIK 3-element vectors are an exception because they are the only types
that are defined to have a different API size from their actual usable
size, so they probably deserve special handling in invocation.cpp (as
you did in your first patch). As the API size is target-independent I
don't think that the fix belongs in Clang or LLVM, Clover is likely at
fault.

The way I understood the ch 6.1.5 is that both API and OpenCL C 3
element vectors are required to be 4*sizeof(component). So a
sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
target independent.

Well, sizeof() is supposed to take into account the alignment, so this
should be the case already.

That's why clang (or more precisely libclc) looked like a correct
place.

Right. I guess the other possibility would be to redefine cl_float3 as
cl_float4 in libclc. You mentioned that it had undesired consequences.
Which exactly?

I understand that target device can have stricter alignment rules, but I
don't see how it can have different type sizes (my reading of the specs
is that these are binding for the target as well).

In theory the sizes are binding for most built-in types, but R600
doesn't support certain integer types so clover has code to take into
account the differing size, alignment and endianness between host and
device. I guess that in most cases it would be possible to use the
"official" ABI for passing kernel arguments to the device (and that's a
requirement for your solution using DataLayout::getTypeAllocSize() to
work reliably, otherwise you'll be taking into account device-specific
padding), but you'll have to fix the R600 back-end so it's able to deal
with (i.e. lower) all built-in types specified by OpenCL. I think that
this would be useful on its own, Tom should know better than I how
difficult it would be.

I can resend the original patch with debug output replaced by a comment.

A slightly more general solution than what you did could be to align the
store size of scalar and vector arguments to the next power of two to
calculate the API size (in particular, that would make 3- and 4-element
vectors have the same size). From 6.1.5: "A built-in data type that is
not a power of two bytes in size must be aligned to the next larger
power of two."

regards,
Jan

Thanks.

Francisco Jerez <currojerez@riseup.net> writes:

Jan Vesely <jan.vesely@rutgers.edu> writes:

Jan Vesely <jan.vesely@rutgers.edu> writes:

> <snip>
>
>> >> I think this is what v96:128 is for
>> > according to [0], it specifies only alignment, not size. I could not
>> > find an __attribute__ that would change size either.
>> >
>> > It should be possible to have ADMGPUDataLayout: public DataLayout class
>> > that would intercept the call and fix the reported value, but I think it
>> > would only move the hack to different place.
>> >
>> > I have added pocl-devel list as suggested.
>> >
>> > regards,
>> > Jan
>> >
>> > [0]http://llvm.org/docs/LangRef.html#data-layout
>> >
>>
>> Only the size in memory matters, which is what the required alignment
>> specifies. DataLayout::getTypeAllocSize accounts for the alignment, but
>> getTypeStoreSize does not. I actually thought this was half of what
>> getTypeStoreSize was for, but it turns out it isn't.
>
> hm, I always thought that alignment only puts restrictions on starting
> address and using padding was just a tool to do the job.
>
> anyway, thanks for the hint, using getTypeAllocSize works nicely.
> since we are allocating space in the argument vector I think
> getAllocSize is the right function to use.
>
> I'll post a patch.
>

I don't think that using getTypeAllocSize() instead of
getTypeStoreSize() to calculate clover::argument::size would be a
satisfactory solution. By doing that you'd redefine the API argument
size exposed to the host for *all* argument types to be the
device-dependent aligned size, which is definitely not what you want.

AFAIK 3-element vectors are an exception because they are the only types
that are defined to have a different API size from their actual usable
size, so they probably deserve special handling in invocation.cpp (as
you did in your first patch). As the API size is target-independent I
don't think that the fix belongs in Clang or LLVM, Clover is likely at
fault.

The way I understood the ch 6.1.5 is that both API and OpenCL C 3
element vectors are required to be 4*sizeof(component). So a
sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
target independent.

Well, sizeof() is supposed to take into account the alignment, so this
should be the case already.

That's why clang (or more precisely libclc) looked like a correct
place.

Right. I guess the other possibility would be to redefine cl_float3 as
cl_float4 in libclc.

Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
element type "gentype".

<snip>

>>>
>>> I don't think that using getTypeAllocSize() instead of
>>> getTypeStoreSize() to calculate clover::argument::size would be a
>>> satisfactory solution. By doing that you'd redefine the API argument
>>> size exposed to the host for *all* argument types to be the
>>> device-dependent aligned size, which is definitely not what you want.
>>
>>> AFAIK 3-element vectors are an exception because they are the only types
>>> that are defined to have a different API size from their actual usable
>>> size, so they probably deserve special handling in invocation.cpp (as
>>> you did in your first patch). As the API size is target-independent I
>>> don't think that the fix belongs in Clang or LLVM, Clover is likely at
>>> fault.
>>
>> The way I understood the ch 6.1.5 is that both API and OpenCL C 3
>> element vectors are required to be 4*sizeof(component). So a
>> sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
>> target independent.
>
> Well, sizeof() is supposed to take into account the alignment, so this
> should be the case already.

AFAIK sizeof only includes padding. I explain below.

>
>> That's why clang (or more precisely libclc) looked like a correct
>> place.
>>
>
> Right. I guess the other possibility would be to redefine cl_float3 as
> cl_float4 in libclc.

Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
element type "gentype".

> You mentioned that it had undesired consequences. Which exactly?

With this change:
-typedef __attribute__((ext_vector_type(3))) float float3;
+typedef __attribute__((ext_vector_type(4))) float float3;

errors about duplicate ops (since type3 and type4 are now the same type)
can be fixed. However, I don't know a clean way to fix the following:

error: too few elements in vector initialization (expected 4 elements,
have 3)

Even if we can get rid of it, or reduce it to warning.
having the same type for type3 and type4 causes problem that we won't be
able to distinguish following situations:

flaot4 A

float3 B = A.xyz; // This is OK, should not produce warning/error;

float4 C = A.xyz; // This should at least produce a warning. even if we
allow using fewer elements for vector initialization.

>
>> I understand that target device can have stricter alignment rules, but I
>> don't see how it can have different type sizes (my reading of the specs
>> is that these are binding for the target as well).
>>
>
> In theory the sizes are binding for most built-in types, but R600
> doesn't support certain integer types so clover has code to take into
> account the differing size, alignment and endianness between host and
> device. I guess that in most cases it would be possible to use the
> "official" ABI for passing kernel arguments to the device (and that's a
> requirement for your solution using DataLayout::getTypeAllocSize() to
> work reliably, otherwise you'll be taking into account device-specific
> padding), but you'll have to fix the R600 back-end so it's able to deal
> with (i.e. lower) all built-in types specified by OpenCL. I think that
> this would be useful on its own, Tom should know better than I how
> difficult it would be.

I didn't know about the additional R600 type size magic, in this case I
agree that the original approach (detect 3elem vectors and change size)
is probably the best in this situation.

>
>> I can resend the original patch with debug output replaced by a comment.
>>
> A slightly more general solution than what you did could be to align the
> store size of scalar and vector arguments to the next power of two to
> calculate the API size (in particular, that would make 3- and 4-element
> vectors have the same size). From 6.1.5: "A built-in data type that is
> not a power of two bytes in size must be aligned to the next larger
> power of two."

This part is what made me think that the _alignment_ only restricts
starting address (and not size). Otherwise, the addition of 3 element
vector special case would not be necessary. i.e I think that the
following mem layout is legal in OCL 1.0 but not in OCL1.1+

0x10: float3 here
0x1c: int here // this should belong to float3 in ocl 1.1+

while the following is illegal in both:

0x8: float3 here

So, I don't think that the quoted text requires all builtin types to
have 2^x size (although all but 3 element vectors do).

regards,
Jan

Jan Vesely <jan.vesely@rutgers.edu> writes:

<snip>

>>>
>>> I don't think that using getTypeAllocSize() instead of
>>> getTypeStoreSize() to calculate clover::argument::size would be a
>>> satisfactory solution. By doing that you'd redefine the API argument
>>> size exposed to the host for *all* argument types to be the
>>> device-dependent aligned size, which is definitely not what you want.
>>
>>> AFAIK 3-element vectors are an exception because they are the only types
>>> that are defined to have a different API size from their actual usable
>>> size, so they probably deserve special handling in invocation.cpp (as
>>> you did in your first patch). As the API size is target-independent I
>>> don't think that the fix belongs in Clang or LLVM, Clover is likely at
>>> fault.
>>
>> The way I understood the ch 6.1.5 is that both API and OpenCL C 3
>> element vectors are required to be 4*sizeof(component). So a
>> sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
>> target independent.
>
> Well, sizeof() is supposed to take into account the alignment, so this
> should be the case already.

AFAIK sizeof only includes padding. I explain below.

Yes, and 3-component vectors are being padded to 4 components already.

>
>> That's why clang (or more precisely libclc) looked like a correct
>> place.
>>
>
> Right. I guess the other possibility would be to redefine cl_float3 as
> cl_float4 in libclc.

Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
element type "gentype".

> You mentioned that it had undesired consequences. Which exactly?

With this change:
-typedef __attribute__((ext_vector_type(3))) float float3;
+typedef __attribute__((ext_vector_type(4))) float float3;

errors about duplicate ops (since type3 and type4 are now the same type)
can be fixed. However, I don't know a clean way to fix the following:

error: too few elements in vector initialization (expected 4 elements,
have 3)

Even if we can get rid of it, or reduce it to warning.
having the same type for type3 and type4 causes problem that we won't be
able to distinguish following situations:

flaot4 A

float3 B = A.xyz; // This is OK, should not produce warning/error;

float4 C = A.xyz; // This should at least produce a warning. even if we
allow using fewer elements for vector initialization.

Right, I agree that this wouldn't be a good idea.

>
>> I understand that target device can have stricter alignment rules, but I
>> don't see how it can have different type sizes (my reading of the specs
>> is that these are binding for the target as well).
>>
>
> In theory the sizes are binding for most built-in types, but R600
> doesn't support certain integer types so clover has code to take into
> account the differing size, alignment and endianness between host and
> device. I guess that in most cases it would be possible to use the
> "official" ABI for passing kernel arguments to the device (and that's a
> requirement for your solution using DataLayout::getTypeAllocSize() to
> work reliably, otherwise you'll be taking into account device-specific
> padding), but you'll have to fix the R600 back-end so it's able to deal
> with (i.e. lower) all built-in types specified by OpenCL. I think that
> this would be useful on its own, Tom should know better than I how
> difficult it would be.

I didn't know about the additional R600 type size magic, in this case I
agree that the original approach (detect 3elem vectors and change size)
is probably the best in this situation.

>
>> I can resend the original patch with debug output replaced by a comment.
>>
> A slightly more general solution than what you did could be to align the
> store size of scalar and vector arguments to the next power of two to
> calculate the API size (in particular, that would make 3- and 4-element
> vectors have the same size). From 6.1.5: "A built-in data type that is
> not a power of two bytes in size must be aligned to the next larger
> power of two."

This part is what made me think that the _alignment_ only restricts
starting address (and not size).

It also restricts the size. From section 6.3 on the sizeof operator:
"The sizeof operator yields the size (in bytes) of its operand,
including any padding bytes (refer to section 6.1.5) needed for
alignment [...]. When applied to an operand that has structure or union
type, the result is the total number of bytes in such an object,
including internal and trailing padding."

Otherwise, the addition of 3 element vector special case would not be
necessary.

Maybe it wouldn't be strictly necessary, but it's still clarifying as an
illustrative corollary of the general rule.

i.e I think that the
following mem layout is legal in OCL 1.0 but not in OCL1.1+

0x10: float3 here
0x1c: int here // this should belong to float3 in ocl 1.1+

Does OpenCL 1.0 support 3-component vectors at all?

while the following is illegal in both:

0x8: float3 here

So, I don't think that the quoted text requires all builtin types to
have 2^x size (although all but 3 element vectors do).

Well, don't interpret it that way if you don't want to, but all OpenCL
built-in types do (or at least the ones you are allowed to use as kernel
argument types), so this solution seems more elegant to me than
special-casing 3-component vectors.

Thanks.

Jan Vesely <jan.vesely@rutgers.edu> writes:

> <snip>
>> >>>
>> >>> I don't think that using getTypeAllocSize() instead of
>> >>> getTypeStoreSize() to calculate clover::argument::size would be a
>> >>> satisfactory solution. By doing that you'd redefine the API argument
>> >>> size exposed to the host for *all* argument types to be the
>> >>> device-dependent aligned size, which is definitely not what you want.
>> >>
>> >>> AFAIK 3-element vectors are an exception because they are the only types
>> >>> that are defined to have a different API size from their actual usable
>> >>> size, so they probably deserve special handling in invocation.cpp (as
>> >>> you did in your first patch). As the API size is target-independent I
>> >>> don't think that the fix belongs in Clang or LLVM, Clover is likely at
>> >>> fault.
>> >>
>> >> The way I understood the ch 6.1.5 is that both API and OpenCL C 3
>> >> element vectors are required to be 4*sizeof(component). So a
>> >> sizeof(float3) == sizeof(cl_float3) == 16, and should be both host and
>> >> target independent.
>> >
>> > Well, sizeof() is supposed to take into account the alignment, so this
>> > should be the case already.
>
> AFAIK sizeof only includes padding. I explain below.
>

Yes, and 3-component vectors are being padded to 4 components already.

>> >
>> >> That's why clang (or more precisely libclc) looked like a correct
>> >> place.
>> >>
>> >
>> > Right. I guess the other possibility would be to redefine cl_float3 as
>> > cl_float4 in libclc.
>>
>> Sorry, I meant to redefine "gentype3" as "gentype4" for every vector
>> element type "gentype".
>>
>> > You mentioned that it had undesired consequences. Which exactly?
>
> With this change:
> -typedef __attribute__((ext_vector_type(3))) float float3;
> +typedef __attribute__((ext_vector_type(4))) float float3;
>
> errors about duplicate ops (since type3 and type4 are now the same type)
> can be fixed. However, I don't know a clean way to fix the following:
>
> error: too few elements in vector initialization (expected 4 elements,
> have 3)
>
> Even if we can get rid of it, or reduce it to warning.
> having the same type for type3 and type4 causes problem that we won't be
> able to distinguish following situations:
>
> flaot4 A
>
> float3 B = A.xyz; // This is OK, should not produce warning/error;
>
> float4 C = A.xyz; // This should at least produce a warning. even if we
> allow using fewer elements for vector initialization.
>

Right, I agree that this wouldn't be a good idea.

>
>> >
>> >> I understand that target device can have stricter alignment rules, but I
>> >> don't see how it can have different type sizes (my reading of the specs
>> >> is that these are binding for the target as well).
>> >>
>> >
>> > In theory the sizes are binding for most built-in types, but R600
>> > doesn't support certain integer types so clover has code to take into
>> > account the differing size, alignment and endianness between host and
>> > device. I guess that in most cases it would be possible to use the
>> > "official" ABI for passing kernel arguments to the device (and that's a
>> > requirement for your solution using DataLayout::getTypeAllocSize() to
>> > work reliably, otherwise you'll be taking into account device-specific
>> > padding), but you'll have to fix the R600 back-end so it's able to deal
>> > with (i.e. lower) all built-in types specified by OpenCL. I think that
>> > this would be useful on its own, Tom should know better than I how
>> > difficult it would be.
>
> I didn't know about the additional R600 type size magic, in this case I
> agree that the original approach (detect 3elem vectors and change size)
> is probably the best in this situation.
>
>> >
>> >> I can resend the original patch with debug output replaced by a comment.
>> >>
>> > A slightly more general solution than what you did could be to align the
>> > store size of scalar and vector arguments to the next power of two to
>> > calculate the API size (in particular, that would make 3- and 4-element
>> > vectors have the same size). From 6.1.5: "A built-in data type that is
>> > not a power of two bytes in size must be aligned to the next larger
>> > power of two."
>
> This part is what made me think that the _alignment_ only restricts
> starting address (and not size).

It also restricts the size. From section 6.3 on the sizeof operator:
"The sizeof operator yields the size (in bytes) of its operand,
including any padding bytes (refer to section 6.1.5) needed for
alignment [...]. When applied to an operand that has structure or union
type, the result is the total number of bytes in such an object,
including internal and trailing padding."

> Otherwise, the addition of 3 element vector special case would not be
> necessary.

Maybe it wouldn't be strictly necessary, but it's still clarifying as an
illustrative corollary of the general rule.

> i.e I think that the
> following mem layout is legal in OCL 1.0 but not in OCL1.1+
>
> 0x10: float3 here
> 0x1c: int here // this should belong to float3 in ocl 1.1+

Does OpenCL 1.0 support 3-component vectors at all?

ah, I missed that.

>
> while the following is illegal in both:
>
> 0x8: float3 here
>
> So, I don't think that the quoted text requires all builtin types to
> have 2^x size (although all but 3 element vectors do).
>

Well, don't interpret it that way if you don't want to, but all OpenCL
built-in types do (or at least the ones you are allowed to use as kernel
argument types), so this solution seems more elegant to me than
special-casing 3-component vectors.

with no 3 element vectors in ocl1.0 and the sizeof clarification. it all
makes sense to me. Thanks for the explanation.

I'll post a patch that checks the power-of-two rule.

thanks again,
Jan