OpenCL & SPIR specific types - proposal and patch

I’ve refactored the code to remove the repeated parts, and also fixed Anton Lokhmotov’s comments.

Please review.

Thanks

image001.png

opencl_types4.patch (30.8 KB)

Debug info looks better, thanks.

-eric

image001.png

Are there any additional comments about this patch?

Thanks

image001.png

opencl_types4.patch (30.8 KB)

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Tuesday, October 09, 2012 11:30 AM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

There needs to be a way to differentiate between an integer and a sampler by only looking at the type. The sampler itself is an opaque type in OpenCL. The only requirement is that it is initialized with a 32bit unsigned integer, not that the type itself is an integer.

I understand that the spec doesn't require it to be a 32 bit integer, but if you are only ever assigning it a 32 bit integer, it doesn't make any sense to have the extra inttoptr instruction.

Secondly, your idea of differentiating an integer and sampler by type alone is then based upon the name of the type that you are pointing to and not actually the type itself. Anything based upon names can be risky. There was a point in time where the linker was not properly merging structures that were identical and therefore renaming them. It makes much more sense to attach metadata to samplers and images to denote what is what.
[Villmow, Micah] Metadata isn’t possible because you can’t attach metadata to arguments.
Take these functions:
Kernel void foo(sampler_t a, int b)
{
}
Kernel void foo(int a, int b)
{
}
In LLVMIR with sampler as i32, they are both the same function, but they are fundamentally different and have to be handled differently(not only at the compiler level, but also at the runtime level). On a 32bit system, on the runtime side, the size of a sampler_t object is 4 bytes, on a 64bit system it is 8 bytes, but as an i32, its always 4 bytes.

We have a way to attach metadata to arguments right now for kernel arg info. So its not impossible. Our implementation attaches sampler metadata to kernel args and therefore the backends can do the right thing.

What does the compiler need to do in regards to differentiating each function at the Clang level? I'm not talking about targets, but just the frontend.

If the spec says its initialized with a 32 bit value, then why does it matter if its fixed to i32?

Another problem is this case:
Module 1:
int foo(sampler_t a, int b)
{
Return 0;
}
Module 2:
int foo(int a, int b)
{
Return 0;
}
Module 3:
Kernel void bar(global int* a, sampler_t b)
{
*a = foo(b, *a);
}

Linking Module 1 and 3 together is fine, linking module 2 and 3 together should produce a linker error. If a sampler is an i32, they both are accepted since they have the same signature.

You have the exact same problem with opaque types. The linker will assume they are the same unless it can prove otherwise.

I agree that the opaque pointer with a specific name isn’t the best solution, but unless we add these as fundamental LLVM types, I don’t know of a better way to handle this. I however think relying on metadata for information that is required for correctness is wrong(metadata should not affect correctness) and differentiating if an i32 is a sampler or not based on its use is not the right approach either.

Relying on names for correctness is exactly the same as relying on metadata for correctness. How can you guarantee it will never be changed?

I actually believe that all of these special types should have some metadata attached to them to denote whats what.

-Tanya

I'm hoping to have comments on this patch tomorrow, but since I have proposed several patches to Clang for the sampler type (and have another in revision).. can you explain why you want to change the type from an integer to a pointer?

-Tanya

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Thursday, October 11, 2012 3:25 PM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Tuesday, October 09, 2012 11:30 AM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

There needs to be a way to differentiate between an integer and a sampler by only looking at the type. The sampler itself is an opaque type in OpenCL. The only requirement is that it is initialized with a 32bit unsigned integer, not that the type itself is an integer.

I understand that the spec doesn't require it to be a 32 bit integer, but if you are only ever assigning it a 32 bit integer, it doesn't make any sense to have the extra inttoptr instruction.

Secondly, your idea of differentiating an integer and sampler by type alone is then based upon the name of the type that you are pointing to and not actually the type itself. Anything based upon names can be risky. There was a point in time where the linker was not properly merging structures that were identical and therefore renaming them. It makes much more sense to attach metadata to samplers and images to denote what is what.
[Villmow, Micah] Metadata isn’t possible because you can’t attach metadata to arguments.
Take these functions:
Kernel void foo(sampler_t a, int b)
{
}
Kernel void foo(int a, int b)
{
}
In LLVMIR with sampler as i32, they are both the same function, but they are fundamentally different and have to be handled differently(not only at the compiler level, but also at the runtime level). On a 32bit system, on the runtime side, the size of a sampler_t object is 4 bytes, on a 64bit system it is 8 bytes, but as an i32, its always 4 bytes.

We have a way to attach metadata to arguments right now for kernel arg info. So its not impossible. Our implementation attaches sampler metadata to kernel args and therefore the backends can do the right thing.

What does the compiler need to do in regards to differentiating each function at the Clang level? I'm not talking about targets, but just the frontend.

If the spec says its initialized with a 32 bit value, then why does it matter if its fixed to i32?

Another problem is this case:
Module 1:
int foo(sampler_t a, int b)
{
Return 0;
}
Module 2:
int foo(int a, int b)
{
Return 0;
}
Module 3:
Kernel void bar(global int* a, sampler_t b)
{
*a = foo(b, *a);
}

Linking Module 1 and 3 together is fine, linking module 2 and 3 together should produce a linker error. If a sampler is an i32, they both are accepted since they have the same signature.

You have the exact same problem with opaque types. The linker will assume they are the same unless it can prove otherwise.

I agree that the opaque pointer with a specific name isn’t the best solution, but unless we add these as fundamental LLVM types, I don’t know of a better way to handle this. I however think relying on metadata for information that is required for correctness is wrong(metadata should not affect correctness) and differentiating if an i32 is a sampler or not based on its use is not the right approach either.

Relying on names for correctness is exactly the same as relying on metadata for correctness. How can you guarantee it will never be changed?
[Villmow, Micah] The difference is metadata is allowed to be dropped, you can’t drop the types. Rename it sure, but not drop it.
I think a better solution would to somehow allow the IR to represent a static opaque type that cannot be changed.

How is that any different? EIther way you lose the information you are trying to keep.

-Tanya

I'm hoping to have comments on this patch tomorrow, but since I have proposed several patches to Clang for the sampler type (and have another in revision).. can you explain why you want to change the type from an integer to a pointer?

-Tanya

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Thursday, October 11, 2012 3:40 PM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Thursday, October 11, 2012 3:25 PM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

From: Tanya Lattner [mailto:lattner@apple.com]
Sent: Tuesday, October 09, 2012 11:30 AM
To: Villmow, Micah
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

There needs to be a way to differentiate between an integer and a sampler by only looking at the type. The sampler itself is an opaque type in OpenCL. The only requirement is that it is initialized with a 32bit unsigned integer, not that the type itself is an integer.

I understand that the spec doesn't require it to be a 32 bit integer, but if you are only ever assigning it a 32 bit integer, it doesn't make any sense to have the extra inttoptr instruction.

Secondly, your idea of differentiating an integer and sampler by type alone is then based upon the name of the type that you are pointing to and not actually the type itself. Anything based upon names can be risky. There was a point in time where the linker was not properly merging structures that were identical and therefore renaming them. It makes much more sense to attach metadata to samplers and images to denote what is what.
[Villmow, Micah] Metadata isn’t possible because you can’t attach metadata to arguments.
Take these functions:
Kernel void foo(sampler_t a, int b)
{
}
Kernel void foo(int a, int b)
{
}
In LLVMIR with sampler as i32, they are both the same function, but they are fundamentally different and have to be handled differently(not only at the compiler level, but also at the runtime level). On a 32bit system, on the runtime side, the size of a sampler_t object is 4 bytes, on a 64bit system it is 8 bytes, but as an i32, its always 4 bytes.

We have a way to attach metadata to arguments right now for kernel arg info. So its not impossible. Our implementation attaches sampler metadata to kernel args and therefore the backends can do the right thing.

What does the compiler need to do in regards to differentiating each function at the Clang level? I'm not talking about targets, but just the frontend.

If the spec says its initialized with a 32 bit value, then why does it matter if its fixed to i32?

Another problem is this case:
Module 1:
int foo(sampler_t a, int b)
{
Return 0;
}
Module 2:
int foo(int a, int b)
{
Return 0;
}
Module 3:
Kernel void bar(global int* a, sampler_t b)
{
*a = foo(b, *a);
}

Linking Module 1 and 3 together is fine, linking module 2 and 3 together should produce a linker error. If a sampler is an i32, they both are accepted since they have the same signature.

You have the exact same problem with opaque types. The linker will assume they are the same unless it can prove otherwise.

I agree that the opaque pointer with a specific name isn’t the best solution, but unless we add these as fundamental LLVM types, I don’t know of a better way to handle this. I however think relying on metadata for information that is required for correctness is wrong(metadata should not affect correctness) and differentiating if an i32 is a sampler or not based on its use is not the right approach either.

Relying on names for correctness is exactly the same as relying on metadata for correctness. How can you guarantee it will never be changed?
[Villmow, Micah] The difference is metadata is allowed to be dropped, you can’t drop the types. Rename it sure, but not drop it.
I think a better solution would to somehow allow the IR to represent a static opaque type that cannot be changed.

How is that any different? EIther way you lose the information you are trying to keep.
[Villmow, Micah] The type is not confused with a different first order type. Even though might we lose the information on what type it is exactly, we don’t confuse it with another valid type. This would allow proper detection of the cases where things are renamed and they can be fixed to behave properly in this situation. Neither is an ideal solution, but there are in my view more downsides to using i32 instead of an opaque type. That being said, our current OpenCL implementation uses i32, its just I would prefer to move away from doing so.

If the type gets renamed, how are you going to identify what is a sampler?

See my comment above about linking and opaque types as it doesn't solve that problem for you either.

Can you please share your list of pros and cons of pointer to opaque type versus int?

-Tanya

If the type gets renamed, how are you going to identify what is a sampler?
[Villmow, Micah] Usage.

See my comment above about linking and opaque types as it doesn't solve that problem for you either.

Can you please share your list of pros and cons of pointer to opaque type versus int?
[Villmow, Micah] Sure, here is what I view as the pros and cons.
Opaque type:
Pros:
Implementation details are target dependent
No confusion with valid i32 type
Utilizes the type system instead of metadata
Can be materialized to i32 if needed
No operations can occur on the type.
Easier to track through stack.
Cons:
Can be renamed and loose information
Requires initialization function
Requires global variables

I32:
Pros:
Can be initialized inline and constant propagated.
Already works(for AMD at least).
Metadata can make for easy identification if it exists.
Maps 1-1 to initializers

Cons:
Optimizers don't know the differences between sampler and other i32
Harder to track through stack
An integer i32 and a sampler_t i32 is only determined by usage and not type.
Operations can work on the type, which is illegal.
Metadata can be lost requiring extensive analysis which cannot always be determined.

-Tanya

Hi Guy,

+// OpenCL image types
+BUILTIN_TYPE(OCLImage1d, OCLImage1dTy)
+BUILTIN_TYPE(OCLImage1dArray, OCLImage1dArrayTy)
+BUILTIN_TYPE(OCLImage1dBuffer, OCLImage1dBufferTy)
+BUILTIN_TYPE(OCLImage2d, OCLImage2dTy)
+BUILTIN_TYPE(OCLImage2dArray, OCLImage2dArrayTy)
+BUILTIN_TYPE(OCLImage3d, OCLImage3dTy)

I agree with Micah and Guy that the sampler type should be an opaque type. The i32 solution has too many cons, and the pros are not clear-cut either. (For example, I don’t quite understand why constant propagation on samplers should even work.)

The problem with renaming of opaque types should better be solved by extending LLVM to disallow renaming given types.

Without seeing a follow-up patch, I cannot comment on the use of functions to initialize samplers. In our implementation, we use both opaque types and metadata, which works fine.

Anton.

Hi,

I agree with Micah and Guy that the sampler type should be an opaque type. The i32 solution has too many cons, and the pros are not clear-cut either. (For example, I don’t quite understand why constant propagation on samplers should even work.)

Sorry, can you please explain why constant propagation shouldn't work on program scope samplers?

The problem with renaming of opaque types should better be solved by extending LLVM to disallow renaming given types.

Without seeing a follow-up patch, I cannot comment on the use of functions to initialize samplers. In our implementation, we use both opaque types and metadata, which works fine.

Anton.

From: Villmow, Micah [mailto:Micah.Villmow@amd.com]
Sent: 12 October 2012 00:02
To: Tanya Lattner
Cc: Benyei, Guy; cfe-dev@cs.uiuc.edu; Anton Lokhmotov
Subject: RE: [cfe-dev] OpenCL & SPIR specific types - proposal and patch

If the type gets renamed, how are you going to identify what is a sampler?
[Villmow, Micah] Usage.

See my comment above about linking and opaque types as it doesn't solve that problem for you either.

Can you please share your list of pros and cons of pointer to opaque type versus int?
[Villmow, Micah] Sure, here is what I view as the pros and cons.
Opaque type:
Pros:
Implementation details are target dependent
No confusion with valid i32 type
Utilizes the type system instead of metadata
Can be materialized to i32 if needed

Sorry, I don't quite understand. If it can be materialized as a i32 type, wouldn't this transformation lead it to be confused as a valid i32 type?

No operations can occur on the type.
Easier to track through stack.
Cons:
Can be renamed and loose information
Requires initialization function

Requiring initialization functions to be generated for CL where CLK_* are integer enums is a bit strange for a C based language.
  const sampler_t samplerA = CLK_NORMALIZED_COORDS_TRUE | CLK_ADDRESS_REPEAT |
CLK_FILTER_NEAREST;

Requires global variables

I'm not sure what that means. For program scope samplers, wouldn't we use global variables?

I32:
Pros:
Can be initialized inline and constant propagated.
Already works(for AMD at least).
Metadata can make for easy identification if it exists.
Maps 1-1 to initializers

Cons:
Optimizers don’t know the differences between sampler and other i32

We know at the CL language level that this is a const value or a constant. I don't see why we would do any optimizations on this variable other than constant propagating to its use as a bad thing (in the case of program scope sampler).

Harder to track through stack

I'm not sure what you mean here of harder to track through the stack. Since this a const value, I would expect it to be used directly by its use (e.g., read_image).

An integer i32 and a sampler_t i32 is only determined by usage and not type.

If you use metadata, this is not an issue. Are we trying to make LLVM IR here into a high level IR with type safety?

Operations can work on the type, which is illegal.

What operations will be done on this type? It is true that this could happen but since at the CL level, the value is const so I don't expect any optimization to manipulate it other than to constant propagate the value to its use.

Metadata can be lost requiring extensive analysis which cannot always be determined.

Metadata doesn't get dropped on globals or functions.

-- Mon Ping

If a sampler_t is just an i32 constant, then why didn't OpenCL C just
define a sampler_t to be a 32-bit constant? There must have been a
reason.

I looked at the OpenCL spec and apparently some devices limit the
number of samplers that can be used in a kernel. See
CL_DEVICE_MAX_SAMPLERS in Table 4.3 (OpenCL device queries).

This probably implies that it's important on some architectures for
either functionality or performance reasons to preserve the sampler
*variables*. In other words, *avoid* constant folding and
propagation so that some samplers would potentially disappear.

If I understand things correctly, in SPIR the preservation of the
sampler *variables* is a side effect of the calls to
@__spir_sampler_initialize in @__spir_globals_initializer().

david

Hi,

If a sampler_t is just an i32 constant, then why didn't OpenCL C just
define a sampler_t to be a 32-bit constant? There must have been a
reason.

Sure, there are good reasons. I think it makes it clearer to define it as a type just like people find it clearer to typedef enums.

I looked at the OpenCL spec and apparently some devices limit the
number of samplers that can be used in a kernel. See
CL_DEVICE_MAX_SAMPLERS in Table 4.3 (OpenCL device queries).

This probably implies that it's important on some architectures for
either functionality or performance reasons to preserve the sampler
*variables*. In other words, *avoid* constant folding and
propagation so that some samplers would potentially disappear.

If I understand things correctly, in SPIR the preservation of the
sampler *variables* is a side effect of the calls to
@__spir_sampler_initialize in @__spir_globals_initializer().

Sure, hardware have a limit on the number of samplers in a kernel but do they really want to preserve the number of sampler variables or the number of unique samplers values? Is the preservation of the variable desirables when two samplers are identical and one is losing one of these valuable resources?

  -- Mon Ping

For example, I don't quite understand why constant propagation
on samplers should even work.

Sorry, can you please explain why constant propagation shouldn't
work on program scope samplers?

Feel free to enlighten me with an example. (Perhaps we do it in the
backend, but I'm unaware of the need to do it at the LLVM-IR level.)

Thanks,
Anton.

Given there is some discussion on sampler still, I suggest you remove it from your patch so the rest can get a final review and hopefully get into the tree.

-Tanya

Hi,

I would usually phrase it the other way around. If we can't apply a standard optimization, I would like to know the reason why that optimization should be illegal.

For this particular case, I don't understand why constant propagation shouldn't work. For example, constant propagation forwards the value to a read image. For my particular target, I look at the sampler value and image at read image and generate the appropriate code instead of going to the original definition and determine what the sampler is before code generation.

Another way of stating the issue is that by creating a pointer to an opaque object for sampler, a backend will be forced to support a special initialization function. If for my target that is unnecessary, I forced to put in support for an initialization function that I do not need. At the CL language level, program samplers are initialized using a 32b value so it seems more natural to me to match the CL language in this way instead of creating an opaque object. If for a target, one wants to create an opaque object, it can go through the IR and replace samplers with the opaque object appropriate for that target.

  -- Mon Ping

Hi All,

Thank you for a very good discussion. Clearly we are still not in agreement on the way OpenCL samplers are represented in LLVM IR.
I think we have a consensus on the rest of the patch, and I agree with Tanya & Mon Ping that there is no reason to postpone the rest of the patch and the patches that rely on it.
Guy will submit a new patch which includes everything that was included in the previous patch but preserves the i32 data type for samplers.
This patch will also demonstrate how easy it is to switch between the 2 versions (opaque vs. i32)

And now a little about the samplers discussion itself.
Samplers are not integers, they are just initialized by a 32 bit integer and hence, can't be represented as i32. What clang does today - is a device specific interpretation of what this type really is.
Our goal is to make clang a sample OpenCL backend that demonstrates how an OpenCL frontend should be implemented.

As per constant propagation, one can always run this optimization in the backend and achieve the same thing once this type is lowered to i32. I agree that, OpenCL backends will need to support initialization functions.
If we want to change this we should work on changing the OpenCL specification first.

Thanks for the patience,
Boaz

Mon Ping,
I think you are misunderstanding the point of SPIR. Backends themselves should not be handling SPIR. SPIR should be lowered to valid LLVMIR for your platform, and then optimizations can be applied to the resulting LLVMIR. It is in this loading that you can convert from spir.sampler_t into an i32 and then constant propagation will work just like normal. One reason why sampler_t is an opaque type is that different vendors have different implementations and some do not want their implementations to be a i32, but want to represent it closer to their hardware.

Micah

Given LLVMs current constraints and lack of fundamental type for
samplers, I think as well that sampler as opaque type makes for much
cleaner representation than as integers. If represented as i32, the
only way to deduce its a sampler is based on its use and thats risky -
consider situations like
    - Sampler is unused - but you still need to preserve the info
    - Compiled at O0. You have to go through stack accesses to know its type

I feel the need for optimizations is barely there for samplers and so
we need to consider the representation issue first.

thanks
Sumesh