[RFC] Add SYCL programming model support

bader · January 11, 2019, 6:02pm

TLDR

We (Intel) would like to request to add SYCL programming model support to LLVM/Clang project to facilitate collaboration on C++ single-source heterogeneous programming for accelerators like GPU, FPGA, DSP, etc. from different hardware and software vendors. SYCL programming model is described in detail in the specification document available at the Khronos site: https://www.khronos.org/registry/SYCL/specs/sycl-1.2.1.pdf.

Getting started

I’m going to start sending patches to the clang project with the basic functionalities (including a new command line option to enable SYCL programming model) and RFCs for features requiring design review with clang community (e.g. the interface or protocol between the device compiler and the runtime library).

I’m looking for suggestions on what is the best way to proceed with this proposal. I would appreciate any feedback.

Features

Here is short list of features we would like to contribute first:

· SYCL compiler

o Adding device compiler diagnostics (this should almost 100% overlap with OpenCL C++ compiler diagnostics)

o Functionality to separate SYCL device code out from the single source

o Address-space handling (including address space inference/deduction)

o Functionality to translate SYCL device code (C++) to SPIR-V format

o Adding two attributes to mark SYCL kernel functions (can be invoked from the host) and SYCL device functions (available on the device)

o Functionality to emit the “integration header” by SYCL device compiler with the device specific information for SYCL runtime library, which is used to launch SYCL kernels on the OpenCL device.

o SYCL compiler driver

· Implementation of two compilation modes: device-only and two-step compilation

· Functionality to support device code compilation and linking from multiple translation units

· Enhancing the driver with clang-offload-wrapper tool and corresponding job to support “fat objects” (the device code and the host code bundled together).

· Adding SYCL toolchain support including llvm-spirv and offload-wrapper tools.

· Contributing SYCL runtime library under LLVM projects.

o SYCL C++ Template Library: the template library provides a set of C++ templates and classes which provide the programming model to the user. It enables the creation of runtime classes such as SYCL queues, buffers and images, as well as access to some underlying OpenCL runtime object, such as contexts, platforms, devices and program objects.

o SYCL runtime: The SYCL runtime interfaces with the underlying OpenCL implementations and handles scheduling of commands in queues, moving of data between host and devices, manages contexts, programs, kernel compilation and memory management.

o The SYCL system assumes the existence of one or more OpenCL implementations available on the host machine. If no OpenCL implementation is available, then the SYCL implementation provides only the SYCL host device to run kernels on.

Almost all compiler modifications are supposed to be made in the clang project and SYCL runtime library (located in “/sycl” directory). The only change planned in LLVM project so far is new environment component in the triple.

What is SYCL

(from https://www.khronos.org/sycl/)

SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++ language. SYCL single-source programming enables the host and kernel code for an application to be contained in the same source file, in a type-safe way and with the simplicity of a cross-platform asynchronous task graph. SYCL includes templates and generic lambda functions to enable higher-level application software to be cleanly coded with optimized acceleration of kernel code across the extensive range of shipping OpenCL implementations.

High level overview of Intel’s SYCL implementation

Intel’s SYCL implementation consists of two major components: SYCL compiler and runtime library. Although SYCL is designed as “extension-free” standard C++ API, there is a need to have some “compiler” extensions to enable C++ code execution on accelerators (e.g. special attributes to mark “accelerated” functions).

SYCL compiler is responsible for “extracting” device part of code and compiling it to SPIR-V format or device native binary. SPIR-V (Standard Portable Intermediate Representation) is a standard form of the code for OpenCL^TM offload API. In addition, the compiler also emits auxiliary information, which is used by the SYCL runtime to run the device code on the accelerator via OpenCL^TM API.

SYCL runtime library API is a C++ abstraction layer on top of the OpenCL^TM API which enables execution of C++ SYCL code on accelerators like FPGA or GPU.

We are working on making Intel’s implementation sources available at GitHub (hopefully next week). Our implementation is not complete, but we would like to start collaboration with the community interested in heterogeneous programming as early as possible to improve the quality of the implementation through design and code review process.

Available SYCL resources

https://www.khronos.org/sycl/ - SYCL page at Khronos Group site.

http://sycl.tech/ - SYCL ecosystem site (supported by Codeplay). There is a list of project implemented using SYCL programming model (e.g. Tensorflow SYCL back-end, machine learning and linear algebra libraries, etc.)

https://github.com/KhronosGroup/SyclParallelSTL - Parallel STL implementation based on SYCL

https://github.com/triSYCL/triSYCL - open source SYCL implementation driven by Xilinx

https://www.codeplay.com/products/computesuite/computecpp - closed source SYCL implementation from Codeplay

Ronan_KERYELL · January 11, 2019, 7:28pm

TLDR We (Intel) would like to request to add SYCL

    > programming model support to LLVM/Clang project to
    > facilitate collaboration on C++ single-source heterogeneous
    > programming for accelerators like GPU, FPGA, DSP, etc. from
    > different hardware and software vendors.

Just... amazing!

> I'm looking for suggestions on what is the best way to
> proceed with this proposal. I would appreciate any feedback.

Keep me counted, when you make it available on-line.

    > We are working on making Intel's implementation sources
    > available at GitHub (hopefully next week). Our
    > implementation is not complete, but we would like to start
    > collaboration with the community interested in heterogeneous
    > programming as early as possible to improve the quality of
    > the implementation through design and code review process.

That sounds great!

Thanks,

bader · January 25, 2019, 8:11pm

Hi,

A short update: we uploaded SYCL compiler and runtime sources to the GitHub https://github.com/intel/llvm/tree/sycl.

Thanks to all who provided feedback and expressed interest in this project (mostly off the clang mailing list).

If there are no objections, we are going to start sending patches for review in a week or two.

Alexey

Dave_Airlie1 · January 29, 2019, 5:44am

Hi Alexey,

I've just started looking over this, and first of all great work! I'm
not directly the best person yet to review all of this, but I'm
starting to look over it and one thing stood out:

There seems to be some interaction or reliance on old address space behaviour,

I don't think (maybe I'm wrong) that upstream will want to accept:
[SYCL] Revert "[OpenCL] Enable address spaces for references in C++"
or at least the OpenCL C++ people need to be talked to.

Is there some more work that could be done in these to avoid the revert?
[SYCL] Implement SYCL address-space rules.
- do we have to use separate sycl address spaces as at all here btw?
or are the opencl ones defined already not sufficient?
[SYCL] Add SYCL-specific address spaces fixer pass.

Dave.

bader · January 30, 2019, 10:06pm

Hi Dave,

Thanks for your comments.
We definitely will integrate address spaces support required for SYCL into ToT. The code currently uploaded to the GitHub is our first approach to implement SYCL address spaces inference rules, but we found that it is not robust and difficult to maintain. We are working on alternative implementation emitting "raw" pointers (i.e. w/o specified address space) in "generic" address space and later re-using existing LLVM pass to inference address space.
This approach should be aligned with existing implementation of address spaces for OpenCL C++.

Thanks,
Alexey

Anastasia_Stulova · February 6, 2019, 11:49am

Hi Alexey,

Sorry for the delay. It took me sometime to look at your prototype in github. It seems quite a substantial amount of work!

I have provided my feedback on your review already https://reviews.llvm.org/D57768 but I think the topics mainly belong here.

There are a number of big architectural aspects of Clang that this work affects. My personal feeling is that some more senior developers in Clang architecture should provide feedback.

Particularly, the following aspects require more attention:

SYCL seems to require adding tight dependencies from the standard libraries into the compiler because many language features are hidden behind library classes. This is not common for Clang. We had a discussion about this issue during the implementation of OpenCL C++ and it was decided not to go this route for upstream Clang. Can you explain your current approach to implement this?
I am not sure how the change of direction for OpenCL C++ to just enabling C++ in OpenCL would affect your work now? Particularly we are establishing a lot of rules in the areas of interplay between OpenCL features and C++. Address space handling is one example here. As far as I am aware SYCL doesn’t detail many of these rules either. So I am wondering how it would work… would you just inherit the same rules? Also keep in mind they are not documented anywhere yet other than the source code. Additionally for the address spaces we are trying to generalize the rules as much as possible rather than just adding separate language checks all over. It would be nice if you adhere to this approach too!
What is your solution for integration with SPIR-V and how does it relate to our previous discussions in October: http://lists.llvm.org/pipermail/cfe-dev/2018-October/059974.html
Can you explain the purpose of https://github.com/intel/llvm/tree/sycl/clang/lib/CodeGen/OclCxxRewrite that you are adding to Clang?
Cheers,
Anastasia

bader · February 6, 2019, 1:18pm

Hi Anastasia,

Thanks for your feedback.

I agree that some SYCL features require in-depth review from compiler implementers and I mentioned some them in my original email.

SYCL seems to require adding tight dependencies from the standard libraries into the compiler because many language features are hidden behind library classes. This is not common for Clang. We had a discussion about this issue during the implementation of OpenCL C++ and it was decided not to go this route for upstream Clang. Can you explain your current approach to implement this?

Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++. The implementation of SYCL pointers classes relies on the device compiler extension enabling new keywords like __global, __local, etc., but these are not exposed to the user.

The way we expose this functionality to the user is one aspect of this feature, another important part is integration with existing C++ code, which doesn’t use new extensions or pointer wrapper classes, but still want to execute on accelerator.

I am not sure how the change of direction for OpenCL C++ to just enabling C++ in OpenCL would affect your work now? Particularly we are establishing a lot of rules in the areas of interplay between OpenCL features and C++. Address space handling is one example here. As far as I am aware SYCL doesn’t detail many of these rules either. So I am wondering how it would work… would you just inherit the same rules? Also keep in mind they are not documented anywhere yet other than the source code. Additionally for the address spaces we are trying to generalize the rules as much as possible rather than just adding separate language checks all over. It would be nice if you adhere to this approach too!

I think it’s common interest to share as much as possible and do not diverge implementations enabling similar functionality.

So I think it would be great if you can document the way you see address spaces integrated into C++.

What is your solution for integration with SPIR-V and how does it relate to our previous discussions in October: http://lists.llvm.org/pipermail/cfe-dev/2018-October/059974.html

Current implementation relies on existing “LLVM to SPIR-V” translator [1]. It’s integrated as an external tool into the toolchain for SYCL. We would like LLVM to have native support of SPIR-V format, so we rely on your work here.

Can you explain the purpose of https://github.com/intel/llvm/tree/sycl/clang/lib/CodeGen/OclCxxRewrite that you are adding to Clang?

We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

Anyway https://reviews.llvm.org/D57768 is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

Thanks,

Alexey

[1] https://github.com/KhronosGroup/SPIRV-LLVM-Translator

[2] https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/master/docs/SPIRVRepresentationInLLVM.rst

Anastasia_Stulova · February 6, 2019, 3:13pm

Hi Alexey,

Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

As for address spaces I think you can shortcut by mapping to OpenCL address spaces indeed. Although I don’t know why address space qualifiers wouldn’t be used directly actually? At some point I would like to enable address spaces without “__” prefix btw to allow porting OpenCL C code to C++. Not sure if it can create issues for SYCL then. But it’s worth making this clear now.

However, I think for SYCL address spaces is just one example of much broader picture? What about all other language features that are wrapped into the libraries? For example the code in SemaOverload.cpp of this commit illustrates that you need a tight coupling between compiler and library.

I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++.

As for OpenCL we are just enabling C++ functionality to work in OpenCL. That would mean all the library based language features from OpenCL C++ won’t be implemented. Btw, I feel there is a little contradiction here because if you can just use “extension-free” standard C++ then you wouldn’t need to modify Clang?

Current implementation relies on existing “LLVM to SPIR-V” translator [1]. It’s integrated as an external tool into the toolchain for SYCL. We would like LLVM to have native support of SPIR-V format, so we rely on your work here.

Just to understand: do you plan to add a SPIR-V triple to Clang&LLVM and a special action into Clang that would invoke the translator after generation of IR? If yes I would quite like to see it done more generically in Clang and not just for SYCL. But I think there were some concerns from the other members of LLVM. I will let them comment if it’s still the case.

We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

Just to be more specific I guess you mean the OpenCL C++ prototype compiler here (which is quite different from the implementation in mainline Clang)? Can you explain what kind of adjustments you are trying to make and why the approach from OpenCL C wouldn’t apply in your case?

Anyway ⚙ D57768 [SYCL] Add clang front-end option to enable SYCL device compilation flow. is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

As I wrote, these comments are not to the review but they are conceptually important aspects that the community should align on. It might be good to have a concrete plan before starting to work on something?

Cheers,
Anastasia

zygoloid · February 6, 2019, 7:02pm

Hi Alexey, thank you for starting this discussion, and for offering to contribute this extension!

Our policy for accepting language extensions is documented here:

http://clang.llvm.org/get_involved.html

… and, on the assumption that you / Intel will be providing long-term support and maintenance for SYCL in Clang, I’m satisfied that all of those points are met. (+rjmccall in case he has concerns in this area.)

Since there seems to be a lot of overlap between SYCL and OpenCL, you should come to an agreement with Anastasia about code ownership and how the two features will harmoniously coexist, and I’m happy to see that that discussion has already begun.

Best wishes,
Richard

John_McCall · February 6, 2019, 11:47pm

Right. As long as Intel understands that this isn't a one-shot project
and will require ongoing maintenance even after being feature-complete,
and as long as you're willing to cooperate with contributors with similar
projects to try to build a good common infrastructure, I have no objection
to taking this into Clang.

If this is a single-source language, then you may also find it helpful to
coordinate with the contributors who've worked on OpenMP and CUDA.

John.

bader · February 7, 2019, 9:25am

Hi Richard, John,

Thank you for the feedback.
We have a long term plan to support SYCL and we are willing collaborate with the community on building unified infrastructure for SYCL.
I'll make sure that contributors working with CUDA, OpenMP, OpenCL and alternative SYCL implementations are involved in review process as you suggested.

Thanks,
Alexey

bader · February 7, 2019, 7:55pm

>> Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

> As for address spaces I think you can shortcut by mapping to OpenCL address spaces indeed. Although I don’t know why address space qualifiers wouldn’t be used directly actually? At some point I would like to enable address spaces without “__” prefix btw to allow porting OpenCL C code to C++. Not sure if it can create issues for SYCL then. But it’s worth making this clear now.

One of the goals of SYCL design is to allow developers to compile and run SYCL code even if the compiler toolchain doesn’t support acceleration through OpenCL. Unfortunately “OpenCL address spaces” is not C++ standard feature (yet :)), so if we expose them to the user, the program written with these extensions will not be supported by other C++ compiler like GCC or MSVC. Using standard API allows us to utilize all sorts of extensions for API implementation and emulate them with standard C++ if extensions are not available.

It is plausible to assume that it should be easier for C++ developers to adopt new functionality through standard C++ concepts like class/function rather than through language extensions.

> However, I think for SYCL address spaces is just one example of much broader picture? What about all other language features that are wrapped into the libraries? For example the code in SemaOverload.cpp of this commit illustrates that you need a tight coupling between compiler and library.

https://github.com/intel/llvm/commit/03354a29868b79a30e6fb2c8311bb409a8cc2346#diff-811283eaa55fa65f65713fdd7ecaf4aa

We switched to using “function attributes” in later commit https://github.com/llvm/llvm-project/commit/120b4b509d758e27c17111eaa0398b4cecf7575a. Basically SYCL runtime marks functions supposed to be offloaded to the target device with special function attribute (similar to OpenCL kernel attribute) and compiler doesn’t rely on particular library function names.

One the other hand there are other places where similar dependencies exist. For instance, typical SYCL kernel function captures “accessor” parameters, which provides “view” on the data accessed the by the device code. This accessor class contains a pointer this data and it’s initialized on the host. To pass C++ class with a pointer to memory from the host to accelerator we need either:

system to support some sort of virtual memory, so the target know how to handle host pointers
some cooperation between the compiler and runtime on converting host pointers to target pointers

As OpenCL implementation is not guaranteed to support option (1), we implemented option (2) and current implementation relies on SYCL class method names from the standard, but I guess this might be not the best option. I am going to send a separate email to discuss this topic in more details.

>> I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++.

>As for OpenCL we are just enabling C++ functionality to work in OpenCL. That would mean all the library based language features from OpenCL C++ won’t be implemented.

By “library based language features” you mean OpenCL specific data types. Right? If so, I think SYCL can re-use OpenCL C++ implementation by outlining “device code” from the single source and then treating it as a OpenCL C++ program.

>Btw, I feel there is a little contradiction here because if you can just use “extension-free” standard C++ then you wouldn’t need to modify Clang?

SYCL code is supposed to be valid C++ and should work with any C++11 compiler, but when we compile it “in SYCL mode” the device code can be offloaded to OpenCL accelerator. This feature requires Clang modifications to enable offloading of the “device part” inside a single source, enforce additional target restrictions for this “device part” (e.g. OpenCL devices typically do not support function pointers), lowering device code to the format accepted by OpenCL runtime (e.g. binary, SPIR-V).

>> Current implementation relies on existing “LLVM to SPIR-V” translator [1]. It’s integrated as an external tool into the toolchain for SYCL. We would like LLVM to have native support of SPIR-V format, so we rely on your work here.

> Just to understand: do you plan to add a SPIR-V triple to Clang&LLVM and a special action into Clang that would invoke the translator after generation of IR? If yes I would quite like to see it done more generically in Clang and not just for SYCL. But I think there were some concerns from the other members of LLVM. I will let them comment if it’s still the case.

I agree that SPIR-V support must be added not only for SYCL, but for OpenCL C++ too, as it’s necessary part of OpenCL C++ compiler toolchain. I think other extensions/APIs might benefit from having native SPIR-V support in LLVM (e.g. OpenMP/Vulkan).

IIRC, the latest discussion ended with a request to build a community around the translator tool. IMHO, we have the community for a long time, but it’s not vocal in the LLVM mailing lists and not visible for LLVM community (I can blame myself too J). I’m aware of multiple projects using this tool to offload computation to OpenCL accelerators and I’ll try to provide the evidence in dedicated mailing thread.

>> We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

>Just to be more specific I guess you mean the OpenCL C++ prototype compiler here (which is quite different from the implementation in mainline Clang)? Can you explain what kind of adjustments you are trying to make and why the approach from OpenCL C wouldn’t apply in your case?

Sure. OpenCL C approach for built-in functions is “Itanuim C++ ABI mangled names in global name space”. This doesn’t work for OpenCL C++/SYCL as these built-ins collide with user functions. We re-use existing prototype to speed-up SYCL development, but according to my understanding it might violate some LLVM guidelines for extending LLVM IR.

>> Anyway https://reviews.llvm.org/D57768 is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

>As I wrote, these comments are not to the review but they are conceptually important aspects that the community should align on. It might be good to have a concrete plan before starting to work on something?

I’ll write a design document to provide more details on how things are done.

Finkel_Hal_J · February 7, 2019, 8:34pm

Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

As for address spaces I think you can shortcut by mapping to OpenCL address spaces indeed. Although I don't know why address space qualifiers wouldn't be used directly actually? At some point I would like to enable address spaces without "__" prefix btw to allow porting OpenCL C code to C++. Not sure if it can create issues for SYCL then. But it's worth making this clear now.

One of the goals of SYCL design is to allow developers to compile and run SYCL code even if the compiler toolchain doesn’t support acceleration through OpenCL. Unfortunately "OpenCL address spaces" is not C++ standard feature (yet :)), so if we expose them to the user, the program written with these extensions will not be supported by other C++ compiler like GCC or MSVC. Using standard API allows us to utilize all sorts of extensions for API implementation and emulate them with standard C++ if extensions are not available.

It is plausible to assume that it should be easier for C++ developers to adopt new functionality through standard C++ concepts like class/function rather than through language extensions.

However, I think for SYCL address spaces is just one example of much broader picture? What about all other language features that are wrapped into the libraries? For example the code in SemaOverload.cpp of this commit illustrates that you need a tight coupling between compiler and library.

We switched to using "function attributes" in later commit https://github.com/llvm/llvm-project/commit/120b4b509d758e27c17111eaa0398b4cecf7575a. Basically SYCL runtime marks functions supposed to be offloaded to the target device with special function attribute (similar to OpenCL `kernel` attribute) and compiler doesn't rely on particular library function names.

One the other hand there are other places where similar dependencies exist. For instance, typical SYCL kernel function captures "accessor" parameters, which provides "view" on the data accessed the by the device code. This accessor class contains a pointer this data and it's initialized on the host. To pass C++ class with a pointer to memory from the host to accelerator we need either:

1. system to support some sort of virtual memory, so the target know how to handle host pointers

2. some cooperation between the compiler and runtime on converting host pointers to target pointers

As OpenCL implementation is not guaranteed to support option (1), we implemented option (2) and current implementation relies on SYCL class method names from the standard, but I guess this might be not the best option. I am going to send a separate email to discuss this topic in more details.

I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++.

As for OpenCL we are just enabling C++ functionality to work in OpenCL. That would mean all the library based language features from OpenCL C++ won't be implemented.

By "library based language features" you mean OpenCL specific data types. Right? If so, I think SYCL can re-use OpenCL C++ implementation by outlining "device code" from the single source and then treating it as a OpenCL C++ program.

Btw, I feel there is a little contradiction here because if you can just use “extension-free” standard C++ then you wouldn't need to modify Clang?

SYCL code is supposed to be valid C++ and should work with any C++11 compiler, but when we compile it "in SYCL mode" the device code can be offloaded to OpenCL accelerator. This feature requires Clang modifications to enable offloading of the "device part" inside a single source, enforce additional target restrictions for this "device part" (e.g. OpenCL devices typically do not support function pointers), lowering device code to the format accepted by OpenCL runtime (e.g. binary, SPIR-V).

Current implementation relies on existing “LLVM to SPIR-V” translator [1]. It’s integrated as an external tool into the toolchain for SYCL. We would like LLVM to have native support of SPIR-V format, so we rely on your work here.

Just to understand: do you plan to add a SPIR-V triple to Clang&LLVM and a special action into Clang that would invoke the translator after generation of IR? If yes I would quite like to see it done more generically in Clang and not just for SYCL. But I think there were some concerns from the other members of LLVM. I will let them comment if it's still the case.

I agree that SPIR-V support must be added not only for SYCL, but for OpenCL C++ too, as it's necessary part of OpenCL C++ compiler toolchain. I think other extensions/APIs might benefit from having native SPIR-V support in LLVM (e.g. OpenMP/Vulkan).

IIRC, the latest discussion ended with a request to build a community around the translator tool. IMHO, we have the community for a long time, but it's not vocal in the LLVM mailing lists and not visible for LLVM community (I can blame myself too :)). I'm aware of multiple projects using this tool to offload computation to OpenCL accelerators and I'll try to provide the evidence in dedicated mailing thread.

First, let me say that I support this effort to add SYCL support to Clang/LLVM. Having a standard, single-source accelerator programming model will be important to the HPC ecosystem and beyond.

Are you thinking about supporting SYCL only via lowering to SPIR-V, or also via direct invocation of appropriate hardware backends? One thing that worries me is that SPIR-V does not support function pointers, and while SYCL doesn't either (or virtual functions, as noted on pg 16, ch 2 of the SYCL 1.2.1 spec), given our experience with other accelerator programming models, I'm not sure how many of our applications would find SYCL an appealing model without this support. Thus, while I think that SYCL is an interesting model, and I know a number of developers interested in learning more about it, being trapped into this restriction by a SPIR-V funnel seems highly undesirable. It seems like this could undesirably limit our ability to support extensions of this kind. Support for inline assembly is another important feature that seems like it might have trouble passing through a SPIR-V layer. There might be other SPIR-V restrictions that pose a similar problem.

I know that ComputeCPP from CodePlay supports some kind of direct-to-PTX path in their LLVM/Clang fork, for example. This is certainly a feature that is important to our current/planned evaluation work regarding SYCL.

Thanks again,

Hal

We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

Just to be more specific I guess you mean the OpenCL C++ prototype compiler here (which is quite different from the implementation in mainline Clang)? Can you explain what kind of adjustments you are trying to make and why the approach from OpenCL C wouldn't apply in your case?

Sure. OpenCL C approach for built-in functions is "Itanuim C++ ABI mangled names in global name space". This doesn't work for OpenCL C++/SYCL as these built-ins collide with user functions. We re-use existing prototype to speed-up SYCL development, but according to my understanding it might violate some LLVM guidelines for extending LLVM IR.

Anyway ⚙ D57768 [SYCL] Add clang front-end option to enable SYCL device compilation flow. is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

As I wrote, these comments are not to the review but they are conceptually important aspects that the community should align on. It might be good to have a concrete plan before starting to work on something?

I'll write a design document to provide more details on how things are done.

Finkel_Hal_J · February 7, 2019, 8:54pm

Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

As for address spaces I think you can shortcut by mapping to OpenCL address spaces indeed. Although I don't know why address space qualifiers wouldn't be used directly actually? At some point I would like to enable address spaces without "__" prefix btw to allow porting OpenCL C code to C++. Not sure if it can create issues for SYCL then. But it's worth making this clear now.

One of the goals of SYCL design is to allow developers to compile and run SYCL code even if the compiler toolchain doesn’t support acceleration through OpenCL. Unfortunately "OpenCL address spaces" is not C++ standard feature (yet :)), so if we expose them to the user, the program written with these extensions will not be supported by other C++ compiler like GCC or MSVC. Using standard API allows us to utilize all sorts of extensions for API implementation and emulate them with standard C++ if extensions are not available.

It is plausible to assume that it should be easier for C++ developers to adopt new functionality through standard C++ concepts like class/function rather than through language extensions.

However, I think for SYCL address spaces is just one example of much broader picture? What about all other language features that are wrapped into the libraries? For example the code in SemaOverload.cpp of this commit illustrates that you need a tight coupling between compiler and library.

We switched to using "function attributes" in later commit https://github.com/llvm/llvm-project/commit/120b4b509d758e27c17111eaa0398b4cecf7575a. Basically SYCL runtime marks functions supposed to be offloaded to the target device with special function attribute (similar to OpenCL `kernel` attribute) and compiler doesn't rely on particular library function names.

One the other hand there are other places where similar dependencies exist. For instance, typical SYCL kernel function captures "accessor" parameters, which provides "view" on the data accessed the by the device code. This accessor class contains a pointer this data and it's initialized on the host. To pass C++ class with a pointer to memory from the host to accelerator we need either:

1. system to support some sort of virtual memory, so the target know how to handle host pointers

2. some cooperation between the compiler and runtime on converting host pointers to target pointers

As OpenCL implementation is not guaranteed to support option (1), we implemented option (2) and current implementation relies on SYCL class method names from the standard, but I guess this might be not the best option. I am going to send a separate email to discuss this topic in more details.

I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++.

As for OpenCL we are just enabling C++ functionality to work in OpenCL. That would mean all the library based language features from OpenCL C++ won't be implemented.

By "library based language features" you mean OpenCL specific data types. Right? If so, I think SYCL can re-use OpenCL C++ implementation by outlining "device code" from the single source and then treating it as a OpenCL C++ program.

Btw, I feel there is a little contradiction here because if you can just use “extension-free” standard C++ then you wouldn't need to modify Clang?

SYCL code is supposed to be valid C++ and should work with any C++11 compiler, but when we compile it "in SYCL mode" the device code can be offloaded to OpenCL accelerator. This feature requires Clang modifications to enable offloading of the "device part" inside a single source, enforce additional target restrictions for this "device part" (e.g. OpenCL devices typically do not support function pointers), lowering device code to the format accepted by OpenCL runtime (e.g. binary, SPIR-V).

Current implementation relies on existing “LLVM to SPIR-V” translator [1]. It’s integrated as an external tool into the toolchain for SYCL. We would like LLVM to have native support of SPIR-V format, so we rely on your work here.

Just to understand: do you plan to add a SPIR-V triple to Clang&LLVM and a special action into Clang that would invoke the translator after generation of IR? If yes I would quite like to see it done more generically in Clang and not just for SYCL. But I think there were some concerns from the other members of LLVM. I will let them comment if it's still the case.

I agree that SPIR-V support must be added not only for SYCL, but for OpenCL C++ too, as it's necessary part of OpenCL C++ compiler toolchain. I think other extensions/APIs might benefit from having native SPIR-V support in LLVM (e.g. OpenMP/Vulkan).

IIRC, the latest discussion ended with a request to build a community around the translator tool. IMHO, we have the community for a long time, but it's not vocal in the LLVM mailing lists and not visible for LLVM community (I can blame myself too :)). I'm aware of multiple projects using this tool to offload computation to OpenCL accelerators and I'll try to provide the evidence in dedicated mailing thread.

First, let me say that I support this effort to add SYCL support to Clang/LLVM. Having a standard, single-source accelerator programming model will be important to the HPC ecosystem and beyond.

Are you thinking about supporting SYCL only via lowering to SPIR-V, or also via direct invocation of appropriate hardware backends? One thing that worries me is that SPIR-V does not support function pointers, and while SYCL doesn't either (or virtual functions, as noted on pg 16, ch 2 of the SYCL 1.2.1 spec), given our experience with other accelerator programming models, I'm not sure how many of our applications would find SYCL an appealing model without this support. Thus, while I think that SYCL is an interesting model, and I know a number of developers interested in learning more about it, being trapped into this restriction by a SPIR-V funnel seems highly undesirable. It seems like this could undesirably limit our ability to support extensions of this kind. Support for inline assembly is another important feature that seems like it might have trouble passing through a SPIR-V layer. There might be other SPIR-V restrictions that pose a similar problem.

I know that ComputeCPP from CodePlay supports some kind of direct-to-PTX path in their LLVM/Clang fork, for example. This is certainly a feature that is important to our current/planned evaluation work regarding SYCL.

Also, let me add that my perspective here has been significantly shaped by listening to Michael Wong talk about this for many years. A good example, however, is the talk that Michael gave at the LLVM dev meeting (https://www.youtube.com/watch?v=7Y3-pV_b-1U) last year, where SYCL is presented as a path toward standardizing relevant functionality in C++ itself. Thus, I'll believe we'll want to support SYCL with extensions not just because it would be necessary for many applications should they wish to use SYCL in the near term, but also because I feel like it will be important deployment experience in the context of later, potential C++ standardization.

-Hal

Thanks again,

Hal

We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

Just to be more specific I guess you mean the OpenCL C++ prototype compiler here (which is quite different from the implementation in mainline Clang)? Can you explain what kind of adjustments you are trying to make and why the approach from OpenCL C wouldn't apply in your case?

Sure. OpenCL C approach for built-in functions is "Itanuim C++ ABI mangled names in global name space". This doesn't work for OpenCL C++/SYCL as these built-ins collide with user functions. We re-use existing prototype to speed-up SYCL development, but according to my understanding it might violate some LLVM guidelines for extending LLVM IR.

Anyway ⚙ D57768 [SYCL] Add clang front-end option to enable SYCL device compilation flow. is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

As I wrote, these comments are not to the review but they are conceptually important aspects that the community should align on. It might be good to have a concrete plan before starting to work on something?

I'll write a design document to provide more details on how things are done.

Ronan_KERYELL · February 8, 2019, 12:02am

Are you thinking about supporting SYCL only via lowering to

    > SPIR-V, or also via direct invocation of appropriate hardware
    > backends? One thing that worries me is that SPIR-V does not
    > support function pointers, and while SYCL doesn't either (or
    > virtual functions, as noted on pg 16, ch 2 of the SYCL 1.2.1
    > spec), given our experience with other accelerator programming
    > models, I'm not sure how many of our applications would find
    > SYCL an appealing model without this support. Thus, while I
    > think that SYCL is an interesting model, and I know a number of
    > developers interested in learning more about it, being trapped
    > into this restriction by a SPIR-V funnel seems highly
    > undesirable. It seems like this could undesirably limit our
    > ability to support extensions of this kind. Support for inline
    > assembly is another important feature that seems like it might
    > have trouble passing through a SPIR-V layer. There might be
    > other SPIR-V restrictions that pose a similar problem.

Actually at Xilinx we are interested by SYCL also without SPIR or SPIR-V
support too, because we do not support it. But interestingly we can use
LLVM IR with our processors, FPGA and CGRA.

As you say, since SYCL is pure C++, we can use any kind of extensions
(inline assembly, attributes, intrinsic functions, etc.) by just letting
them flow through Clang/LLVM, which is a top motivation for us.

While we have some experiments without SPIR-V with
GitHub - triSYCL/triSYCL: Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group , it would be great to have a
production-quality implementation of SYCL up-streamed into Clang/LLVM...

    > I know that ComputeCPP from CodePlay supports some kind of
    > direct-to-PTX path in their LLVM/Clang fork, for example. This
    > is certainly a feature that is important to our current/planned
    > evaluation work regarding SYCL.

Yes they have several back-ends.

There is also another implementation targeting CUDA or hip:

Globally I agree that for all the programming models we should decouple
the front-end languages from the target architectures as much as
possible to develop the ecosystems.

Anastasia_Stulova · February 8, 2019, 12:52pm

>> Let me check that I understand this question correctly. Are you asking about implementation of pointer classes representing pointers to different address spaces?

> As for address spaces I think you can shortcut by mapping to OpenCL address spaces indeed. Although I don’t know why address space qualifiers wouldn’t be used directly actually? At some point I would like to enable address spaces without “__” prefix btw to allow porting OpenCL C code to C++. Not sure if it can create issues for SYCL then. But it’s worth making this clear now.

One of the goals of SYCL design is to allow developers to compile and run SYCL code even if the compiler toolchain doesn’t support acceleration through OpenCL.

This is somehow very unfortunate because the specification for SYCL is titled “SYCL integrates OpenCL devices with modern C++”. That implies that it targets OpenCL explicitly. If there is shift of focus potentially some update is needed to avoid confusions. However, I believe this is still a side goal of SYCL? There are plenty of other parallel languages that don’t target OpenCL. What is the benefit of using SYCL if there is no OpenCL available? Anyway, this is probably not the discussion that belongs here, but since we are touching this topic I feel somehow unfortunate that we have to pay the price in the compiler implementation to working around something that doesn’t seem to be a primary use case.

Unfortunately “OpenCL address spaces” is not C++ standard feature (yet :)), so if we expose them to the user, the program written with these extensions will not be supported by other C++ compiler like GCC or MSVC. Using standard API allows us to utilize all sorts of extensions for API implementation and emulate them with standard C++ if extensions are not available.

So how do you plan to emulate this it in GCC or MSVC and why can’t we use the same pure C++ library based approach in Clang?

It is plausible to assume that it should be easier for C++ developers to adopt new functionality through standard C++ concepts like class/function rather than through language extensions.

My personal opinion is that learning library APIs or a set of new keywords is approximately the same especially for those that already mastered the complexity of C++. However, I have to say understanding the extra “magic” behind what appears to be regular C++ classes some developers might find somewhat counter-intuitive.

> However, I think for SYCL address spaces is just one example of much broader picture? What about all other language features that are wrapped into the libraries? For example the code in SemaOverload.cpp of this commit illustrates that you need a tight coupling between compiler and library.

https://github.com/intel/llvm/commit/03354a29868b79a30e6fb2c8311bb409a8cc2346#diff-811283eaa55fa65f65713fdd7ecaf4aa

We switched to using “function attributes” in later commit https://github.com/llvm/llvm-project/commit/120b4b509d758e27c17111eaa0398b4cecf7575a. Basically SYCL runtime marks functions supposed to be offloaded to the target device with special function attribute (similar to OpenCL kernel attribute) and compiler doesn’t rely on particular library function names.

One the other hand there are other places where similar dependencies exist. For instance, typical SYCL kernel function captures “accessor” parameters, which provides “view” on the data accessed the by the device code. This accessor class contains a pointer this data and it’s initialized on the host. To pass C++ class with a pointer to memory from the host to accelerator we need either:

system to support some sort of virtual memory, so the target know how to handle host pointers

some cooperation between the compiler and runtime on converting host pointers to target pointers

As OpenCL implementation is not guaranteed to support option (1), we implemented option (2) and current implementation relies on SYCL class method names from the standard, but I guess this might be not the best option. I am going to send a separate email to discuss this topic in more details.

Yes, I think an RFC on that is a good idea! We can potentially brainstorm with the rest of Clang developer and find a more salable and elegant solution rather than trying to emulate libraries behavior in the compiler. Besides breaking conventional libraries design approach it incurs the overhead of costly string operations that in some places might have to be performed on every function call or declaration. I am also wondering if you have made any benchmarking of that. Even if it will be gated away from the rest of the code it will still have to be maintained by others if common functionality is required. It can potentially also impact the Clang test suite time as well.

I think it would be good to have a list of those with some information of how they impact the parser. Hopefully we can reuse attributes or Clang builtin function mechanism for most of those.

>> I need better understand the “OpenCL C++ route” and how it’s aligned with SYCL design philosophy, which tries to enable programing of accelerators via “extension-free” standard C++.

>As for OpenCL we are just enabling C++ functionality to work in OpenCL. That would mean all the library based language features from OpenCL C++ won’t be implemented.

By “library based language features” you mean OpenCL specific data types. Right? If so, I think SYCL can re-use OpenCL C++ implementation by outlining “device code” from the single source and then treating it as a OpenCL C++ program.

Yes, it’s address spaces and data types mainly. We do plan to port some of new useful C++ libraries such as for example an array container. I think you should be able to reuse OpenCL native types/contracts for SYCL. It would be good to have a list however to see how we can make best use of available functionality rather than duplicating similar features.

I agree that SPIR-V support must be added not only for SYCL, but for OpenCL C++ too, as it’s necessary part of OpenCL C++ compiler toolchain. I think other extensions/APIs might benefit from having native SPIR-V support in LLVM (e.g. OpenMP/Vulkan).

IIRC, the latest discussion ended with a request to build a community around the translator tool. IMHO, we have the community for a long time, but it’s not vocal in the LLVM mailing lists and not visible for LLVM community (I can blame myself too J). I’m aware of multiple projects using this tool to offload computation to OpenCL accelerators and I’ll try to provide the evidence in dedicated mailing thread.

Cool, perhaps it’s time to revisit this! I would suggest another RFC!

>> We re-used the OpenCL C++ compiler component here to emit LLVM IR for the “LLVM to SPIR-V” translator. For instance, this pass adjusts accelerator specific data types to the format recognized by the translator [2]. I’m open to the suggestions how to improve the format, so we don’t need “adjusting passes”.

>Just to be more specific I guess you mean the OpenCL C++ prototype compiler here (which is quite different from the implementation in mainline Clang)? Can you explain what kind of adjustments you are trying to make and why the approach from OpenCL C wouldn’t apply in your case?

Sure. OpenCL C approach for built-in functions is “Itanuim C++ ABI mangled names in global name space”. This doesn’t work for OpenCL C++/SYCL as these built-ins collide with user functions. We re-use existing prototype to speed-up SYCL development, but according to my understanding it might violate some LLVM guidelines for extending LLVM IR.

Was that not the same for OpenCL C? All BIFs could be re-declared in the user code. The same applies to C/C++ standard libraries. Perhaps, I am not yet clear what problem you are trying to solve with this. I think it’s something SPIR-V related? Just as a side note, LLVM only has one intermediate format that is its IR. Any design should take this into account. The implementation of any fronted feature should work such that generic IR is generated. It’s responsibility of the consumer to lower this down to the required format.

>> Anyway https://reviews.llvm.org/D57768 is not related to these topics and it’s aligned with existing CUDA/OpenMP functionality.

>As I wrote, these comments are not to the review but they are conceptually important aspects that the community should align on. It might be good to have a concrete plan before starting to work on something?

I’ll write a design document to provide more details on how things are done.

Sure I think this is really great way moving forward! I would specifically be interested in where and how OpenCL implementation can be re-used in SYCL to make sure we can work together towards as much common infrastructure as possible. As suggested before, you might want to cover other related areas like CUDA/OpenMP (if any common functionality exist on a single source concept side) that can be assessed by other communities.

Thanks for clarifications btw! They helped a lot!

Anastasia

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

bader · February 8, 2019, 4:08pm

Ø Are you thinking about supporting SYCL only via lowering to SPIR-V, or also via direct invocation of appropriate hardware backends? One thing that worries me is that SPIR-V does not support function pointers, and while SYCL doesn’t either (or virtual functions, as noted on pg 16, ch 2 of the SYCL 1.2.1 spec), given our experience with other accelerator programming models, I’m not sure how many of our applications would find SYCL an appealing model without this support. Thus, while I think that SYCL is an interesting model, and I know a number of developers interested in learning more about it, being trapped into this restriction by a SPIR-V funnel seems highly undesirable. It seems like this could undesirably limit our ability to support extensions of this kind. Support for inline assembly is another important feature that seems like it might have trouble passing through a SPIR-V layer. There might be other SPIR-V restrictions that pose a similar problem.

We are going to support “direct invocation of appropriate hardware backends”. There are multiple reasons to support this option including performance benefits from bypassing JIT compilation and that some OpenCL implementations doesn’t support SPIR-V (e.g. Intel OpenCL FPGA device can accept only pre-built programs).

I’m fully agree with you on that some SYCL/OpenCL limitations can be relaxed for particular hardware targets (e.g. there should be no reason in additional restrictions for x86 architecture).

I think we can make SYCL restrictions target dependent and/or enforced by compiler knob. This should allow developers to validate portability of SYCL programs across OpenCL capable accelerators and enable use cases where developers prefer performance over portability and use target-specific extensions.

Ø I know that ComputeCPP from CodePlay supports some kind of direct-to-PTX path in their LLVM/Clang fork, for example. This is certainly a feature that is important to our current/planned evaluation work regarding SYCL.

This is not something I’m particularly going to work on, but AFAIK there is a PTX back-end in LLVM, so implementing direct-to-PTX path should be straightforward once we have SYCL support in the clang – someone will just need to make sure that SYCL uses NVPTX conventions. BTW, there is significant overlap in NVPTX and SPIR features, if we could unify these it would simplify implementation of this path (e.g. both mark some LLVM functions as “kernels”, both support multiple address spaces - but different mapping, intrinsics can be generalized). I realize that there still might be some differences, but both representations enable programming of GPU architectures and there are similarities even between different vendors.

Finkel_Hal_J · February 8, 2019, 5:58pm

Ø Are you thinking about supporting SYCL only via lowering to SPIR-V, or also via direct invocation of appropriate hardware backends? One thing that worries me is that SPIR-V does not support function pointers, and while SYCL doesn't either (or virtual functions, as noted on pg 16, ch 2 of the SYCL 1.2.1 spec), given our experience with other accelerator programming models, I'm not sure how many of our applications would find SYCL an appealing model without this support. Thus, while I think that SYCL is an interesting model, and I know a number of developers interested in learning more about it, being trapped into this restriction by a SPIR-V funnel seems highly undesirable. It seems like this could undesirably limit our ability to support extensions of this kind. Support for inline assembly is another important feature that seems like it might have trouble passing through a SPIR-V layer. There might be other SPIR-V restrictions that pose a similar problem.

We are going to support “direct invocation of appropriate hardware backends”. There are multiple reasons to support this option including performance benefits from bypassing JIT compilation and that some OpenCL implementations doesn’t support SPIR-V (e.g. Intel OpenCL FPGA device can accept only pre-built programs).
I’m fully agree with you on that some SYCL/OpenCL limitations can be relaxed for particular hardware targets (e.g. there should be no reason in additional restrictions for x86 architecture).

Great.

I think we can make SYCL restrictions target dependent and/or enforced by compiler knob. This should allow developers to validate portability of SYCL programs across OpenCL capable accelerators and enable use cases where developers prefer performance over portability and use target-specific extensions.

I think giving users a pedantic knob but otherwise allowing useful extensions is an important capability here.

Ø I know that ComputeCPP from CodePlay supports some kind of direct-to-PTX path in their LLVM/Clang fork, for example. This is certainly a feature that is important to our current/planned evaluation work regarding SYCL.

This is not something I’m particularly going to work on, but AFAIK there is a PTX back-end in LLVM, so implementing direct-to-PTX path should be straightforward once we have SYCL support in the clang – someone will just need to make sure that SYCL uses NVPTX conventions<https://llvm.org/docs/NVPTXUsage.html#conventions>\. BTW, there is significant overlap in NVPTX and SPIR features, if we could unify these it would simplify implementation of this path (e.g. both mark some LLVM functions as “kernels”, both support multiple address spaces - but different mapping, intrinsics can be generalized). I realize that there still might be some differences, but both representations enable programming of GPU architectures and there are similarities even between different vendors.

I certainly agree we can likely abstract many of the differences and make this part of the porting process easier.

Thanks again,

Hal

Ronan_KERYELL · February 8, 2019, 6:55pm

One of the goals of SYCL design is to allow developers to

> compile and run SYCL code even if the compiler toolchain
> doesn’t support acceleration through OpenCL.

    > This is somehow very unfortunate because the
    > specification for SYCL is titled "SYCL integrates OpenCL
    > devices with modern C++". That implies that it targets
    > OpenCL explicitly. If there is shift of focus potentially
    > some update is needed to avoid confusions.

Yes, this is the kind of things which are discussed inside the standard
committee.

> However, I believe this is still a side goal of SYCL?

No, the *fundamental goal* since the beginning in SYCL is to have a CPU
mode, for example if you do not have an accelerator available or take
advantage of your multicore SIMD CPU while some other SYCL kernels are
using the accelerators at the same time for example.

    > There are plenty of other parallel languages that don't
    > target OpenCL. What is the benefit of using SYCL if there
    > is no OpenCL available? Anyway, this is probably not the
    > discussion that belongs here, but since we are touching
    > this topic I feel somehow unfortunate that we have to pay
    > the price in the compiler implementation to working
    > around something that doesn't seem to be a primary use
    > case.

Perhaps there is some misunderstanding here on the goals and vision.
Please participate to the SYCL committee and ISO C++ committee if you
can.

Even in plain C++ std::thread works on a monocore non-SMT processor.
std::simd works on processor without SIMD instructions.
But if you have some fancy processor then you can use take advantage
from this.

    > Unfortunately "OpenCL address spaces" is not C++ standard
    > feature (yet :)), so if we expose them to the user, the
    > program written with these extensions will not be supported
    > by other C++ compiler like GCC or MSVC. Using standard API
    > allows us to utilize all sorts of extensions for API
    > implementation and emulate them with standard C++ if
    > extensions are not available.

    > So how do you plan to emulate this it in GCC or MSVC and
    > why can't we use the same pure C++ library based approach
    > in Clang?

If the compiler does not support outlining of the SYCL kernels to the
accelerators, the sycl.hpp library is just plain C++ and will just run
your code on your CPU because it is just plain C++. This is important
for source code portability.

    > It is plausible to assume that it should be easier for
    > C++ developers to adopt new functionality through
    > standard C++ concepts like class/function rather than
    > through language extensions.

Yes, the ISO C++ committee is very reluctant to add new keywords...

    > My personal opinion is that learning library APIs or a
    > set of new keywords is approximately the same especially
    > for those that already mastered the complexity of
    > C++. However, I have to say understanding the extra
    > "magic" behind what appears to be regular C++ classes
    > some developers might find somewhat counter-intuitive.

A fundamental problem with an OpenCL, CUDA, Cilk, C++AMP... program is
that if you insert it in a plain C++ program it just does not compile
because it is not... C++ since the compiler will choke on some strange
keywords. And that is a pain if you have to port a big application from
one standard to the other.

Most of the modern C++ features are provided through classes rather than
keywords. Just think about threads, futures and on-coming executors,
SIMD types, fixed point... A lot of the modern STL has some extra magic.
Or just in plain old C there are some magical functions: exit(),
setjmp()/longjmp()...

But then, how we implement this by splitting the implementation between
some C++ library, Clang and LLVM is what we have to discuss here.

For example, I cannot see why most of your great work on OpenCL address
spaces in C++ cannot be used as is by a SYCL implementation targeting
OpenCL, since the memory model is the same and in that case some SYCL
classes will be just some proxy/wrapper objects hiding some OpenCL
address space attributes .

As I have already told you, triSYCL made a lot of progress just by your
up-streamed Clang OpenCL work. Go on.

Thank you for this again,

Anastasia_Stulova · February 11, 2019, 12:20pm

Yes, this is the kind of things which are discussed inside the standard committee.

Ok, while this might be in discussion I am referring to an existing published spec that clearly indicates that SYCL is to run on OpenCL accelerated devices.

No, the fundamental goal since the beginning in SYCL is to have a CPU mode, for example if you do not have an accelerator available or take advantage of your multicore SIMD CPU while some other SYCL kernels are using the accelerators at the same time for example.

There is OpenCL implementation for CPUs too. I might be wrong but I am not sure how something that’s written to run using such massively parallel model like OpenCL can be executed and run in a performant way as a C++ library. I like the idea of one language that can be good at everything, however, such things usually come with a “special” price.

Most of the modern C++ features are provided through classes rather than keywords. Just think about threads, futures and on-coming executors, SIMD types, fixed point… A lot of the modern STL has some extra magic. Or just in plain old C there are some magical functions: exit(), setjmp()/longjmp()…

As for the compiler design, it’s preferable that everything not requiring special compiler support to be represented in the libraries. However, if there are features that need compiler change they are likely to end up as language constructs explicitly. This way they can be parsed and mapped to AST conventionally.

As for C++, many Clang developers are active contributors to ISO C++ spec and many proposals are prototypes or even implemented upstream before they are propagated into spec to ensure compiler design is aligned with the language concepts.

For example, I cannot see why most of your great work on OpenCL address spaces in C++ cannot be used as is by a SYCL implementation targeting
OpenCL, since the memory model is the same and in that case some SYCL classes will be just some proxy/wrapper objects hiding some OpenCL address space attributes .

Right. I hope we can find such language constructs for all/most of SYCL features that will help to simplify the frontend architecture.

Kind Regards,

Anastasia

Topic		Replies	Views
[RFC] Add Full Support for the SYCL Programming Model LLVM Project sycl	5	3564	November 21, 2023
[RFC] SYCL Host Compiler Integration Header and Footer Clang Frontend sycl	83	3413	October 19, 2024
[RFC] Re-use OpenCL address space attributes for SYCL Clang Frontend sycl	19	390	August 27, 2020
[RFC] SYCL runtime upstreaming Runtimes sycl	5	1416	August 28, 2024
SYCL round table at LLVM Dev Clang Frontend	0	115	October 3, 2019

[RFC] Add SYCL programming model support

Related topics