Completing the OpenCL Vector Extensions

This email is just a request for ideas/thoughts (i.e. lower than a request for comment).

Something that has been on my interest list for a while has been to complete the Clang OpenCL Vector Extensions by creating target independent vector conversions for the ``usual'' types (i.e. not 8 bit float vectors but rather float -> double, uint32x2 -> uint64x2, etc).

A few ideas have come up in discussions with various people.

1. Put the vector types as .ll files into compiler-rt in some manner. This option will suck in terms of performance though. We could use LTO to perform inlining but compiler-rt in many instances is a dylib so we would really need a separate compiler-rt for .ll code. Then we would get nice performance and would not need to extend the compiler itself.

2. Create a builtin like __builtin_shuffle_vector but for conversions wrapped in a header. This idea has two different implementations: 2a. the type approach and 2b. the flag approach.
2a. The type approach. We already know the types of the two vectors so why not just assuming the number of lanes are compatible and that the conversion is valid for the underlying scalar types output the conversion in IR. We could also potential put in a check if given the current platform supports the given conversion or just say use the header you are not supposed to use this directly. The header declarations would look like this (I just made up the function names):

float64x4 ConvertFloatToDouble(float32x4 v) {
  float64x4 result;
  __builtin_vector_convert(v, &result);
  return result;

2b. The flag approach. The flag approach is to follow along the lines of __builtin_shuffle_vector and just encode the various conversion operations in an integer using the first x bits to encode the operation, the second x bits to encode the type of the input, and the third x bits to encode the type out the output.

float64x4 ConvertFloatToDouble(float32x4 v) {
  return __builtin_vector_convert(v, CONSTANT);

Any feedback/comments/pitchforks are welcome = ).


Hi Michael,
Why not to implement something like the __builtin_astype - a builtin that gets a type parameter.

It could be something like:

double4 convert_double4(float4 v) {
  return __builtin_vector_convert(v, double4);

or maybe even use a preprocessor macro:

#define convert_double4(V) __builtin_vector_convert((V), double4)


Ah. That bridges the gap in-between the two options. I like it!

This weekend I was thinking (as I finished up the patch for this). Maybe the *RIGHT* thing to do is to change the ext vector types to do the right thing with casts (i.e. c-style), but by default (for now) turn off the switch (which would always be off in OpenCL mode). Then a few releases down the line we could swap the defaults (i.e. keep the switch on by default, except in the case of OpenCL). The main reason to do this is that OpenCL does not support implicit casts anymore so there is no *real* need to keep the bit cast cast, no?

I still think we should provide the builtin no manner what is done (to give relief to people who work in the original OpenCL Mode).



The builtin looks like a good idea. It keeps the door open for
a builtin-capable wg-vectorizer too.

What about the modifiers though? The conversion functions in OpenCL 1.2
support saturation and several rounding modes.

Pekka / pocl

I was purposely avoiding the modifiers since there seemed to be some resistance to that on the list before. But now that I reread the messages it looks like the resistance had to do with OpenCL specific related things. This builtin would not be turned on in OpenCL mode so perhaps the original patch would work.

Any opinions and or pitchforks?