StreamExecutor design questions

Hi,

I’ve thrown together a quick and dirty OpenCL backend for SE to play around with writing some application code that uses it, and as a result have one or two design questions. I know it’s early days so I apologise if I’m jumping the gun.

What’s the motivation for having the PlatformKernelHandle and PlatformStreamHandle classes, rather than just using opaque pointers? These classes result in a some extra subclassing when implementing an SE backend, just to wrap a platform specific pointer (e.g. cl_kernel or cl_command_queue), but maybe I’m missing the point here. It also seems to be slightly inconsistent with the device memory stuff, which uses raw opaque pointers at the PlatformDevice level.

It also seems that there is no way for a user to fully release the resources used by a SE device, since the end-user doesn’t own the Device pointer. Maybe I’m worrying about nothing here, but I wonder whether there may be some devices that become ‘unhappy' if you don’t release their contexts/default queues etc before the application exits.

Other minor points that I came across:

- If you *do* want the PlatformKernelHandle class, PlatformInterfaces.cpp was missing its destructor implementation.

- Should Device have a getName() that forwards onto PlatformDevice? Otherwise there seems to be no way for the end-user to get at the name.

- Missing LICENSE.txt at the top-level parallel-libs directory?

For reference, I’ve thrown this OpenCL backend up here (it’s not great, but it works):

Cheers,

James

Hi James,

I’m glad to hear you are experimenting with OpenCL in SE, and I really appreciate your insights and questions about the design. I definitely don’t think it’s too early for questions like these, and I hope you will continue to ask any questions you have in the future as well.

What’s the motivation for having the PlatformKernelHandle and PlatformStreamHandle classes, rather than just using opaque pointers? These classes result in a some extra subclassing when implementing an SE backend, just to wrap a platform specific pointer (e.g. cl_kernel or cl_command_queue), but maybe I’m missing the point here. It also seems to be slightly inconsistent with the device memory stuff, which uses raw opaque pointers at the PlatformDevice level.

My idea with PlatformKernelHandle and PlatformStreamHandle was that their destructors would free any resources they held. So, for example, in the case of your OCLStreamHandle class, I was thinking that you would also define a destructor that called clReleaseCommandQueue.

I agree this is inconsistent with the way device memory is currently handled using a void* handle, and I’m glad you pointed that out. The device memory object currently handles its own cleanup by making a call to PlatformDevice::freeDeviceMemory, so we could create consistency by following this model. PlatformKernelHandle and PlatformStreamHandle would each store a void* opaque handle to the platform-specific implementation and new functions would be added to the PlatformDevice interface to destroyKernel(PlatformKernelHandle*) and destroyStream(PlatformStreamHandle*). Then the destructors for PlatformKernelHandle and PlatformStreamHandle would call the corresponding Device destroy functions just like GlobalDeviceMemoryBase calls the freeDeviceMemory function in its destructor.

Now that I’ve laid it out this way, I think you’re right that that design would be better. It would be consistent and it would be easier to implement a platform. Does that sound like the right design to you, as well? If so, I’ll write up some patches to make those changes.

It also seems that there is no way for a user to fully release the resources used by a SE device, since the end-user doesn’t own the Device pointer. Maybe I’m worrying about nothing here, but I wonder whether there may be some devices that become ‘unhappy’ if you don’t release their contexts/default queues etc before the application exits.

I think you are right to say there is no way for a user to free an SE device. I don’t know if anyone will ever need to free those resources, but I think we will want to add a method to the platform class to free the resources for a device with a given ordinal. Of course that might leave dangling references to a device that has been freed, so we might want to have the platform pass out std::shared_ptr to Device instead of raw Device*. In any case, I think we have a few good options here that won’t be hard to change later. What do you think?

  • If you do want the PlatformKernelHandle class, PlatformInterfaces.cpp was missing its destructor implementation.

Good catch. As per the discussion above, I will probably be adding it in with a non-default body if we switch to using void* handles.

  • Should Device have a getName() that forwards onto PlatformDevice? Otherwise there seems to be no way for the end-user to get at the name.

I think that’s a great idea. I can add a patch for it, or you can feel free to submit a patch as well and I’ll be glad to review it and check it in as needed.

  • Missing LICENSE.txt at the top-level parallel-libs directory?

You’re probably right, but I want to check with chandlerc when he gets back from vacation next week. Since parallel-libs is supposed to be made up of different sub-projects, I think there’s some possibility that there might be different licenses. I’ll try to figure out if I should put a LICENSE.txt in the top-level, the streamexecutor subdirectory, or both.

Thanks for all your help with this project,

-Jason

we might want to have the platform pass out std::shared_ptr to Device instead of raw Device*

Maybe...but then we could easily get into a state where naively
written client code ends up causing us to eagerly destroy Device
objects... Like, most client code probably wants the Device to live
until exit.

Also I'm a little paranoid about doing anything that would force us to
do atomic ops, since we've seen performance problems related to that.

I believe there is a CUDA "shutdown the device" function that we might
want to call atexit. If nothing else, it flushes profiling
information, which is a good thing to do. It would probably be nice
if we called that function automagically...

Hi Jason,

I agree this is inconsistent with the way device memory is currently handled using a void* handle, and I’m glad you pointed that out. The device memory object currently handles its own cleanup by making a call to PlatformDevice::freeDeviceMemory, so we could create consistency by following this model. PlatformKernelHandle and PlatformStreamHandle would each store a void* opaque handle to the platform-specific implementation and new functions would be added to the PlatformDevice interface to destroyKernel(PlatformKernelHandle*) and destroyStream(PlatformStreamHandle*). Then the destructors for PlatformKernelHandle and PlatformStreamHandle would call the corresponding Device destroy functions just like GlobalDeviceMemoryBase calls the freeDeviceMemory function in its destructor.

Now that I’ve laid it out this way, I think you’re right that that design would be better. It would be consistent and it would be easier to implement a platform. Does that sound like the right design to you, as well? If so, I’ll write up some patches to make those changes.

That’s halfway to what I was thinking. If PlatformDevice is the thing that creates the objects, it makes sense for it to also be in charge of destroying them with the destroyKernel() and destroyStream() methods you propose, and this is indeed more consistent with the device memory handling.

However, this makes it even more unclear (to me) what the benefit is of having the Platform{Kernel,Stream}Handle wrapper classes at all. Why not just have createKernel() and createStream() return a raw void* which is then stored directly in the Kernel and Stream classes? The raw pointers could then be passed to PlatformDevice as needed, including to the new destroy*() methods, and the platform implementor no longer has to create a couple of subclasses to hold those raw pointers with an extra level of indirection.

If, however, the Platform*Handle classes are intended to also serve another purpose in the future, then it probably makes sense to keep them. I’m no C++ design guru, so I’m also happy to accept if the wrapper class approach is simply better practice or safer in some way than raw pointers.

It also seems that there is no way for a user to fully release the resources used by a SE device, since the end-user doesn’t own the Device pointer. Maybe I’m worrying about nothing here, but I wonder whether there may be some devices that become ‘unhappy’ if you don’t release their contexts/default queues etc before the application exits.

I think you are right to say there is no way for a user to free an SE device. I don’t know if anyone will ever need to free those resources, but I think we will want to add a method to the platform class to free the resources for a device with a given ordinal. Of course that might leave dangling references to a device that has been freed, so we might want to have the platform pass out std::shared_ptr to Device instead of raw Device*. In any case, I think we have a few good options here that won’t be hard to change later. What do you think?

A ‘shutdown everything in the platform’ function that is called automatically as Justin proposed feels like it would be a good solution.

James

However, this makes it even more unclear (to me) what the benefit is of having the Platform{Kernel,Stream}Handle wrapper classes at all. Why not just have createKernel() and createStream() return a raw void* which is then stored directly in the Kernel and Stream classes? The raw pointers could then be passed to PlatformDevice as needed, including to the new destroy*() methods, and the platform implementor no longer has to create a couple of subclasses to hold those raw pointers with an extra level of indirection.

You’re exactly right. I forgot that the void* handles could be stored directly in the Stream and Kernel objects. I had no other plans for the Platform*Handle classes in the future, so I’ll plan to get rid of them, just as you suggested. I’ll plan to do that early next week.

A ‘shutdown everything in the platform’ function that is called automatically as Justin proposed feels like it would be a good solution.

I’m fine with the idea of a “platform kill switch” that can leave dangling pointers to Device, etc. If we make one, I’ll just try to make big, scary warnings in the documentation that tell folks to make sure they don’t use any of the dangling platform handles after killing the platform.

I decided I couldn’t wait until next week to fix this unnecessary wrapper class stuff. A patch is now up at https://reviews.llvm.org/D24213.