[PATCH] Add vstore_half helpers for ptx

Index: ptx/lib/SOURCES_4.0

Index: ptx/lib/SOURCES_4.0

--- ptx/lib/SOURCES_4.0 (nonexistent)
+++ ptx/lib/SOURCES_4.0 (working copy)
@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/SOURCES_5.0

--- ptx/lib/SOURCES_5.0 (nonexistent)
+++ ptx/lib/SOURCES_5.0 (working copy)

you probably need SOURCES_3.9 as well.
or add a comment why it's not needed.

@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/shared/vstore_half_helpers.ll

--- ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
+++ ptx/lib/shared/vstore_half_helpers.ll (working copy)

can you add datalayout, or would it prevent sharing the file between
ptx and ptx64?
I assume GENERIC == private and SHARED == local?

other than that
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>

Jan

Hi Jan,

Index: ptx/lib/SOURCES_4.0

— ptx/lib/SOURCES_4.0 (nonexistent)
+++ ptx/lib/SOURCES_4.0 (working copy)
@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/SOURCES_5.0

— ptx/lib/SOURCES_5.0 (nonexistent)
+++ ptx/lib/SOURCES_5.0 (working copy)

you probably need SOURCES_3.9 as well.
or add a comment why it’s not needed.

Yes, if we’re supporting that. I don’t know what the current policy/status is?

@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/shared/vstore_half_helpers.ll

— ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
+++ ptx/lib/shared/vstore_half_helpers.ll (working copy)

can you add datalayout, or would it prevent sharing the file between
ptx and ptx64?

That would prevent sharing. I wonder if we might come up with some kind of solution
that adds the data layout at configuration or compilation time?

I assume GENERIC == private and SHARED == local?

Correct.

Thanks for the review,

Jeroen

Hi Jan,

>
> > Index: ptx/lib/SOURCES_4.0
> > ===================================================================
> > --- ptx/lib/SOURCES_4.0 (nonexistent)
> > +++ ptx/lib/SOURCES_4.0 (working copy)
> > @@ -0,0 +1 @@
> > +shared/vstore_half_helpers.ll
> > Index: ptx/lib/SOURCES_5.0
> > ===================================================================
> > --- ptx/lib/SOURCES_5.0 (nonexistent)
> > +++ ptx/lib/SOURCES_5.0 (working copy)
>
> you probably need SOURCES_3.9 as well.
> or add a comment why it's not needed.

Yes, if we’re supporting that. I don’t know what the current policy/status is?

llvm-3.9 support was restored last week. I don't have a testing setup
but the generated library is sane (there's even a travis CI to check
that).

>
> > @@ -0,0 +1 @@
> > +shared/vstore_half_helpers.ll
> > Index: ptx/lib/shared/vstore_half_helpers.ll
> > ===================================================================
> > --- ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
> > +++ ptx/lib/shared/vstore_half_helpers.ll (working copy)
>
> can you add datalayout, or would it prevent sharing the file between
> ptx and ptx64?

That would prevent sharing. I wonder if we might come up with some kind of solution
that adds the data layout at configuration or compilation time?

I was considering passing .ll files through c preprocessor (clang -E
with the same options as .cl compilation), so that you can use things
like #ifdef and #include.
It should not be a lot of work, and it'd allow us to share .ll files
without linker complaints.

> I assume GENERIC == private and SHARED == local?

Correct.

just out of curiosity. what's your use case for nvptx libclc?

Jan

Hi Jan,

Index: ptx/lib/SOURCES_4.0

— ptx/lib/SOURCES_4.0 (nonexistent)
+++ ptx/lib/SOURCES_4.0 (working copy)
@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/SOURCES_5.0

— ptx/lib/SOURCES_5.0 (nonexistent)
+++ ptx/lib/SOURCES_5.0 (working copy)

you probably need SOURCES_3.9 as well.
or add a comment why it’s not needed.

Yes, if we’re supporting that. I don’t know what the current policy/status is?

llvm-3.9 support was restored last week. I don’t have a testing setup
but the generated library is sane (there’s even a travis CI to check
that).

Ok, I’ll add that. Was there any particular reason it was restored? In the past
libclc generally only supported the tip of the Clang/LLVM tree.

@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/shared/vstore_half_helpers.ll

— ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
+++ ptx/lib/shared/vstore_half_helpers.ll (working copy)

can you add datalayout, or would it prevent sharing the file between
ptx and ptx64?

That would prevent sharing. I wonder if we might come up with some kind of solution
that adds the data layout at configuration or compilation time?

I was considering passing .ll files through c preprocessor (clang -E
with the same options as .cl compilation), so that you can use things
like #ifdef and #include.
It should not be a lot of work, and it’d allow us to share .ll files
without linker complaints.

I’m a bit hesitant about (ab)using the preprocessor. Although this is a simple
use case, I’ve seen more advanced uses lead to problems in the past, e.g.,
the ghc compiler at some point stopped working on MacOS, because
clang’s preprocessor diverged from gcc’s.

I assume GENERIC == private and SHARED == local?

Correct.

just out of curiosity. what’s your use case for nvptx libclc?

GPUVerify: http://multicore.doc.ic.ac.uk/tools/GPUVerify/

We use clang to compile CUDA and OpenCL down to bitcode for the NVPTX
target (linking with libclc in the latter case) before feeding it to the tool proper.

Jeroen

>
> > Hi Jan,
> >
> > >
> > > > Index: ptx/lib/SOURCES_4.0
> > > > ===================================================================
> > > > --- ptx/lib/SOURCES_4.0 (nonexistent)
> > > > +++ ptx/lib/SOURCES_4.0 (working copy)
> > > > @@ -0,0 +1 @@
> > > > +shared/vstore_half_helpers.ll
> > > > Index: ptx/lib/SOURCES_5.0
> > > > ===================================================================
> > > > --- ptx/lib/SOURCES_5.0 (nonexistent)
> > > > +++ ptx/lib/SOURCES_5.0 (working copy)
> > >
> > > you probably need SOURCES_3.9 as well.
> > > or add a comment why it's not needed.
> >
> > Yes, if we’re supporting that. I don’t know what the current policy/status is?
>
> llvm-3.9 support was restored last week. I don't have a testing setup
> but the generated library is sane (there's even a travis CI to check
> that).

Ok, I’ll add that. Was there any particular reason it was restored? In the past
libclc generally only supported the tip of the Clang/LLVM tree.

couple of reasons:
I think it has always been mostly 'whatever is convenient' and adding
back one version was easy enough.
clover recently decided to support llvm-3.9+ (to match mesa's amdgpu
llvm requirements)
the last libclc version that worked with llvm-3.9 was rather
incomplete, especially now that we are within 30 functions of complete
1.1 (and afaik only printf for 1.2).

What llvm versions we want to support going forward is Tom's decision.
I think it'd be nice to stay aligned with clover.

>
> >
> > >
> > > > @@ -0,0 +1 @@
> > > > +shared/vstore_half_helpers.ll
> > > > Index: ptx/lib/shared/vstore_half_helpers.ll
> > > > ===================================================================
> > > > --- ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
> > > > +++ ptx/lib/shared/vstore_half_helpers.ll (working copy)
> > >
> > > can you add datalayout, or would it prevent sharing the file between
> > > ptx and ptx64?
> >
> > That would prevent sharing. I wonder if we might come up with some kind of solution
> > that adds the data layout at configuration or compilation time?
>
> I was considering passing .ll files through c preprocessor (clang -E
> with the same options as .cl compilation), so that you can use things
> like #ifdef and #include.
> It should not be a lot of work, and it'd allow us to share .ll files
> without linker complaints.

I’m a bit hesitant about (ab)using the preprocessor. Although this is a simple
use case, I’ve seen more advanced uses lead to problems in the past, e.g.,
the ghc compiler at some point stopped working on MacOS, because
clang’s preprocessor diverged from gcc’s.

we'd need some kind of preprocessing if we want to share the same file.
  using something already available is the easiest way out.
the alternative is to have separate files per target.

I don't think it's worth spending much effort over, especially if .ll
files are mainly used to provide support for older llvm.
at any rate, this patch does not need to wait for any such solution.

>
> >
> > > I assume GENERIC == private and SHARED == local?
> >
> > Correct.
>
> just out of curiosity. what's your use case for nvptx libclc?

GPUVerify: http://multicore.doc.ic.ac.uk/tools/GPUVerify/

We use clang to compile CUDA and OpenCL down to bitcode for the NVPTX
target (linking with libclc in the latter case) before feeding it to the tool proper.

Interesting, nice to see the nvptx part being useful.
thanks for sharing.

I skipped nvptx when cleaning up the barrier builtin,
ptx-nvidiacl/lib/synchronization/barrier.cl looks broken in a way that
would interfere with at least barrier divergence analysis.

regards,
Jan

Hi Jan,

Hi Jan,

Index: ptx/lib/SOURCES_4.0

— ptx/lib/SOURCES_4.0 (nonexistent)
+++ ptx/lib/SOURCES_4.0 (working copy)
@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/SOURCES_5.0

— ptx/lib/SOURCES_5.0 (nonexistent)
+++ ptx/lib/SOURCES_5.0 (working copy)

you probably need SOURCES_3.9 as well.
or add a comment why it’s not needed.

Yes, if we’re supporting that. I don’t know what the current policy/status is?

llvm-3.9 support was restored last week. I don’t have a testing setup
but the generated library is sane (there’s even a travis CI to check
that).

Ok, I’ll add that. Was there any particular reason it was restored? In the past
libclc generally only supported the tip of the Clang/LLVM tree.

couple of reasons:
I think it has always been mostly ‘whatever is convenient’ and adding
back one version was easy enough.
clover recently decided to support llvm-3.9+ (to match mesa’s amdgpu
llvm requirements)
the last libclc version that worked with llvm-3.9 was rather
incomplete, especially now that we are within 30 functions of complete
1.1 (and afaik only printf for 1.2).

What llvm versions we want to support going forward is Tom’s decision.
I think it’d be nice to stay aligned with clover.

Thanks for the clarification regarding the versions. I don’t follow clover
development, so I wasn’t aware of this.

@@ -0,0 +1 @@
+shared/vstore_half_helpers.ll
Index: ptx/lib/shared/vstore_half_helpers.ll

— ptx/lib/shared/vstore_half_helpers.ll (nonexistent)
+++ ptx/lib/shared/vstore_half_helpers.ll (working copy)

can you add datalayout, or would it prevent sharing the file between
ptx and ptx64?

That would prevent sharing. I wonder if we might come up with some kind of solution
that adds the data layout at configuration or compilation time?

I was considering passing .ll files through c preprocessor (clang -E
with the same options as .cl compilation), so that you can use things
like #ifdef and #include.
It should not be a lot of work, and it’d allow us to share .ll files
without linker complaints.

I’m a bit hesitant about (ab)using the preprocessor. Although this is a simple
use case, I’ve seen more advanced uses lead to problems in the past, e.g.,
the ghc compiler at some point stopped working on MacOS, because
clang’s preprocessor diverged from gcc’s.

we’d need some kind of preprocessing if we want to share the same file.
using something already available is the easiest way out.

Agreed. So, yes, the preprocessor might be the best option.

the alternative is to have separate files per target.

Which seems unnecessary code duplication given that the only difference between
nvptx and nvptx64 data layouts is in the pointer width.

I don’t think it’s worth spending much effort over, especially if .ll
files are mainly used to provide support for older llvm.

Agreed.

at any rate, this patch does not need to wait for any such solution.

Committed.

I assume GENERIC == private and SHARED == local?

Correct.

just out of curiosity. what’s your use case for nvptx libclc?

GPUVerify: http://multicore.doc.ic.ac.uk/tools/GPUVerify/

We use clang to compile CUDA and OpenCL down to bitcode for the NVPTX
target (linking with libclc in the latter case) before feeding it to the tool proper.

Interesting, nice to see the nvptx part being useful.
thanks for sharing.

I skipped nvptx when cleaning up the barrier builtin,
ptx-nvidiacl/lib/synchronization/barrier.cl looks broken in a way that
would interfere with at least barrier divergence analysis.

I’m currently not using what’s in the ptx-nvidiacl directory. However, might have a
look at this at some point.

Jeroen