[RFC] Re-implementing -fveclib with OpenMP

Hi all,

I am submitting the following RFC [1] to re-implement -fveclib via OpenMP constructs. The RFC was discussed during a round table at the last LLVM developer meeting, and presented during the BoF [2].

The proposal is published on Phabricator, for the purpose of keeping track of the comments, and it now ready for a review from a wider audience after being polished by Hal Finkel and Hideki Saito (thank you!).

Kind regards,

Francesco

[1] ⚙ D54412 [RFC] Re-implementing -fveclib with OpenMP
[2] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Hi all, I have been asked to include the RFC into the email message.

Here it goes.

Kind regards,

Francesco

Hi Francesco,

This is a huge RFC and I don't think we can discuss all of it at the
same time, at least not in a constructive manner.

What ends up happening is that people ignore the thread and developers
get upset.

So, I'll start with the summary, to make sure the overall assumptions
in the RFC match the ones I have about it, then we can delve into
details.

I also think we should not discuss user-include files now. Whatever we
define for the standard ones will work for user driven ones, but user
driven have additional complexities that will only get in the way of
the standard discussion.

Comments inline...

Summary

New `veclib` directives in clang

I know this is not new but, why "fveclib"? From the review, I take
this is the same as GCC's "mveclibabi", and if it is, why come up with
a new name for the same thing?

If it's not, what justifies implementing a different way of handling
the same concept (vector math libraries), which is surely going to
confuse a lot of users.

In some reviews, it was said that some proprietary compilers already
use "fveclib", but between being coherent with other OSS compilers and
closed source compilers, I think the answer is clear.

I'm not against the name, I'm just making sure we're not creating
problem for ourselves.

--------------------------------

1. `#pragma veclib declare simd [clause, ]`, same as
    `#pragma omp declare simd` from OpenMP 4.0+.

Why not just use "pragma omp simd"?

If I recall correctly, there's an option to allow OMP SIMD pragmas
without enabling full OMP, so that we can use it without needing all
the headers and libraries, just to control vectorisation.

Creating new pragmas should be seen with extreme prejudice, as these
things tend to simplify the life of the compiler developers but create
nightmares for application developers, especially if they want to use
multiple compilers.

2. `#pragma omp declare variant`, same as `#pragma omp declare variant`
    restricted to the `simd` context selector, from OpenMP 5.0+.

Is this just for the user-driven stuff? If so, let's look at it later.

New `math.h` header file
------------------------

Shipped in `<clang>/lib/Headers/math.h`, contains all the declaration of
the functions available in the vector library `X`, `ifdef` guarded by
the macro `__CLANG_ENABLE_LIBRARY_X`.

So, the compiler will have the header files and the libraries will be
in charge of implementing them, to avoid linkage errors?

If this is a standard ABI that multiple libraries follow, I'm in
favour. If we'll end up with one (or more) header(s) per library or
worse, need to update the header every time the library changes
something, then I'm completely against.

The latter will generate the compatibility issue I mentioned in one of
the reviews, where the compiler has different header files but the
implementations are slightly off-base. Keeping multiple copies of
those libraries in the same file system (for different users in the
same clusters) is even worse.

That's the kind of thing that is better left for the libraries
themselves. If they have both headers and objects, keeping all
together into one directory is enough.

Option behavior, and interaction with OpenMP
--------------------------------------------

`-fveclib=X`

: The driver transform this into
    `-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX`. This is used only
    for users that want to vectorize `math.h` functions.

Why not just include the header when you use it, instead of include
and guard for all cases?

`-fopenmp[-simd]`

: No vectorization happens other then for those functions that are
    marked with OpenMP declare simd. The header `math.h` is loaded, but
    the `veclib` decorated declarations are invisible to the compiler
    instance because hidden behind the `__CLANG_ENABLE_LIBRARY_X`
    macros, which are not defined.

"No vectorisation" you mean, no "function" vectorisation. Other
vectorisation (from -O3 etc) will still happen.

`-fopenmp[-simd] -fveclib=X` or

: Same behavior as without the `-fopenmp[-simd]` option.

So, fveclib will enable OMP SIMD by default? I think that's what some
of the reviews (particularly on certification) were against. This is
not correct.

The only way this can work is without including OMP dependencies when
using vector libraries. If the omp-simd option does not add OMP deps
(as I hinted above, there may be a way), then this is fine. But if
veclib flags force OMP dependencies, than this cannot work.

Hi Renato,

Thank you for your review!

Hi Francesco,

This is a huge RFC and I don’t think we can discuss all of it at the
same time, at least not in a constructive manner.

That’s why I was more on the idea to keep it in Phabricator, it would have been easier to track people comments (at least for me). :slight_smile:

What ends up happening is that people ignore the thread and developers
get upset.

So, I’ll start with the summary, to make sure the overall assumptions
in the RFC match the ones I have about it, then we can delve into
details.

I also think we should not discuss user-include files now. Whatever we
define for the standard ones will work for user driven ones, but user
driven have additional complexities that will only get in the way of
the standard discussion.

Comments inline…

Summary

New veclib directives in clang

I know this is not new but, why “fveclib”? From the review, I take
this is the same as GCC’s “mveclibabi”, and if it is, why come up with
a new name for the same thing?

Although I see your reasoning around the compatibility with other compilers, I don’t this this is the place to discuss this. The -fveclib option was introduced prior to this RFC, and for now we have to leave with it. Whether we want to keep it or change it to a gcc compatible one, is not something we have to discuss here. In particular, I suspect that there are users of -fveclib that would shout in the mailing list is we convert it to a new option, as it will break their build system. Again, not for this RFC discussion.

If it’s not, what justifies implementing a different way of handling
the same concept (vector math libraries), which is surely going to
confuse a lot of users.

In some reviews, it was said that some proprietary compilers already
use “fveclib”, but between being coherent with other OSS compilers and
closed source compilers, I think the answer is clear.

I’m not against the name, I’m just making sure we’re not creating
problem for ourselves.


  1. #pragma veclib declare simd [clause, ], same as
    #pragma omp declare simd from OpenMP 4.0+.

Why not just use “pragma omp simd”?

If I recall correctly, there’s an option to allow OMP SIMD pragmas
without enabling full OMP, so that we can use it without needing all
the headers and libraries, just to control vectorisation.

Creating new pragmas should be seen with extreme prejudice, as these
things tend to simplify the life of the compiler developers but create
nightmares for application developers, especially if they want to use
multiple compilers.

Yes, the idea was to use OpenMP pragmas only. From the discussion it turned out that OpenMP vectorization and function vectorization are two orthogonal problems (in the sense that we want to be able to turn on math function vectorization without enabling vectorization of the functions that users may mark as declare simd, and vice versa), so we decided to introduce something new (the veclib pragma). It is 100% compatible with the OpenMP one, so it minimizes the work needed in the compiler to support it, and at the same time it is based on a public standard, so I think it is the best choice we could do.

The section on the compatibility with OpenMP explain how -fopenmp-[simd] and -fveclib interacts.

  1. #pragma omp declare variant, same as #pragma omp declare variant
    restricted to the simd context selector, from OpenMP 5.0+.

Is this just for the user-driven stuff? If so, let’s look at it later.

No - this is needed to be able to attach non standard names to the standard ones (see the example of the vector-variant attribute for SVML).

New math.h header file

Shipped in <clang>/lib/Headers/math.h, contains all the declaration of
the functions available in the vector library X, ifdef guarded by
the macro __CLANG_ENABLE_LIBRARY_X.

So, the compiler will have the header files and the libraries will be
in charge of implementing them, to avoid linkage errors?

If this is a standard ABI that multiple libraries follow, I’m in
favour. If we’ll end up with one (or more) header(s) per library or
worse, need to update the header every time the library changes
something, then I’m completely against.

The compiler doesn’t have control on the library - the behavior you are describing will always happen. The only advantage of storing in a header file with standard descriptor (the openmp based ones) is that it makes it easier to maintain and modify. The different sets are guarded by preprocessor macros, it could be done also with macros that are specific to version of the libraries.

The alternative is to require that the libraries are shipped with a header file with the descriptors of the vector version (OpnMP would be the best choice, because it is standard). Unfortunately, I don’t think this is something that is going to happen (but I would be very happy to be proved wrong here!)

The latter will generate the compatibility issue I mentioned in one of
the reviews, where the compiler has different header files but the
implementations are slightly off-base. Keeping multiple copies of
those libraries in the same file system (for different users in the
same clusters) is even worse.

That’s the kind of thing that is better left for the libraries
themselves. If they have both headers and objects, keeping all
together into one directory is enough.

We have to store the list of the available vector functions somewhere. Now it is done in the backend of LLVM, this RFC proposes to move it to the frontend, in a convenient way that will enable more vectorization opportunities by being compatible with what OpenMP provides.

Option behavior, and interaction with OpenMP

-fveclib=X

: The driver transform this into
-fparse-veclib -D__CLANG_ENABLE_LIBRARY_X=1 -lX. This is used only
for users that want to vectorize math.h functions.

Why not just include the header when you use it, instead of include
and guard for all cases?

Hum - I am not sure I understand what you re saying here. The idea is to keep user code as it is, with just #include <math.h>. If we come up with a set of library-specific header files shipped with the compiler, we we would have to -include them at command line, so that -fveclib=X would become -lX -include=path/to/X.h

-fopenmp[-simd]

: No vectorization happens other then for those functions that are
marked with OpenMP declare simd. The header math.h is loaded, but
the veclib decorated declarations are invisible to the compiler
instance because hidden behind the __CLANG_ENABLE_LIBRARY_X
macros, which are not defined.

“No vectorisation” you mean, no “function” vectorisation. Other
vectorisation (from -O3 etc) will still happen.

Yes, I will fix it

-fopenmp[-simd] -fveclib=X or

: Same behavior as without the -fopenmp[-simd] option.

So, fveclib will enable OMP SIMD by default? I think that’s what some
of the reviews (particularly on certification) were against. This is
not correct.

No, I think you got this wrong. -fveclib itself doesn’t enable any OpenMP. OpenMP is enabled only when -fopenmp[-simd] is invoked.

The only way this can work is without including OMP dependencies when
using vector libraries. If the omp-simd option does not add OMP deps
(as I hinted above, there may be a way), then this is fine. But if
veclib flags force OMP dependencies, than this cannot work.

I think I haven’t been clear enough on describing this last combination. Would it be better if I replace it with the following?

-fopenmp[-simd] -fveclib=X or -fopenmp[-simd] -fveclib-include=path/to/user/provided/header/file.h

: Same behavior as without the -fopenmp[-simd] option. In particular, both the “veclib” functions in math.h (or those in the user provided functions when -fveclib-include is used ) are available for vectorization, together with those marked by the OpenMP pragmas.

Francesco

Although I see your reasoning around the compatibility with other compilers, I don’t this this is the place to discuss this. The -fveclib option was introduced prior to this RFC, and for now we have to leave with it. Whether we want to keep it or change it to a gcc compatible one, is not something we have to discuss here. In particular, I suspect that there are users of -fveclib that would shout in the mailing list is we convert it to a new option, as it will break their build system. Again, not for this RFC discussion.

I'm trying to avoid the proliferation of something that may have
passed in unnoticed.

So, if there is no special reason to be called 'fveclib' I strongly
suggest we move it to 'mvelibabi' sooner rather than later.

While this is not relevant to this RFC in particular, it's a relevant
subject that needs to be raised. It's not uncommon that people choose
names that are familiar to them without considering the wider
ecosystem. I'm just making sure we do.

Yes, the idea was to use OpenMP pragmas only. From the discussion it turned out that OpenMP vectorization and function vectorization are two orthogonal problems (in the sense that we want to be able to turn on math function vectorization without enabling vectorization of the functions that users may mark as declare simd, and vice versa), so we decided to introduce something new (the veclib pragma). It is 100% compatible with the OpenMP one, so it minimizes the work needed in the compiler to support it, and at the same time it is based on a public standard, so I think it is the best choice we could do.

I see. While doing so would simplify a new implementation, it would
also add yet another set of pragmas that are rarely used, while there
are already existing pragmas that do a similar job.

More importantly, if we try to cater to every possible scenario, the
maintenance in the compiler will increase considerably, and I'm
worried that this is already the case in a multitude of issues around
this RFC.

I want to avoid confusing the users, which will happen if:
- We implement more than we need in an attempt to mimic openmp-simd
for a simpler issue
- We use different flags than other compilers for no strong reason
- We create more and more pragmas that will have different impact,
regardless of their semantics

The section on the compatibility with OpenMP explain how -fopenmp-[simd] and -fveclib interacts.

That section says the behaviour of 0fveclib is the same with or
without -fopenmp-simd, which is confusing, because I would imagine the
additional flag would enable other simd optimisations that just
fveclib wouldn't.

No - this is needed to be able to attach non standard names to the standard ones (see the example of the vector-variant attribute for SVML).

I'm not sure I understand what non-standard names are, then.

The compiler doesn’t have control on the library

Exactly my point

the behavior you are describing will always happen.

Which one?

The only advantage of storing in a header file with standard descriptor (the openmp based ones) is that it makes it easier to maintain and modify.

Easier for whom? We're talking about two completely separate
communities. Making it easier for one by making it harder for others
won't work.

The different sets are guarded by preprocessor macros, it could be done also with macros that are specific to version of the libraries.

Now you're passing even more external library knowledge into the
compiler and that's not going to fly.

The alternative is to require that the libraries are shipped with a header file with the descriptors of the vector version (OpnMP would be the best choice, because it is standard). Unfortunately, I don't think this is something that is going to happen (but I would be very happy to be proved wrong here!)

If the headers describe the library implementation, especially if it
has different implementation for different versions, then it _must_ be
in the library. Keeping that in the compiler is just wrong.

We have to store the list of the available vector functions somewhere. Now it is done in the backend of LLVM, this RFC proposes to move it to the frontend, in a convenient way that will enable more vectorization opportunities by being compatible with what OpenMP provides.

To me, neither the back-end nor the front-end are good alternatives.
The back-end has them because they're standard (is this the libmvec
ones?), but if we want to emit external library calls, which can
change without the compiler knowledge, then this _has_ to be outside
of the compiler, in a header file that is controlled by the library.

How to find the alternatives would be a matter of having an ABI that
encodes that (like Intel's and Arm's) and making sure the libraries
provide those. Just like any old C library.

Further specialised (non-standard?) functions would need pragma
support, and AFAIK, OpenMP supports that, so re-implementing in
another way just because a few users would benefit is causing a big
cost to a lot of people for the benefit of a few.

In such cases, I usually recommend the few to swallow the cost by
changing their source codes to force specialisation.

Hum - I am not sure I understand what you re saying here. The idea is to keep user code as it is, with just #include <math.h>. If we come up with a set of library-specific header files shipped with the compiler, we we would have to -include them at command line, so that -fveclib=X would become -lX -include=path/to/X.h

Yes, add the includes in the front-end command line so the user
doesn't have to, but don't call it <math.h>, or that'd create problems
in debuggability and "least surprise" as in "where did this math.h
came from anyway?".

If you do that, then there's no need for macros, as well as making it
easier for libraries to provide headers in default locations. So, the
compiler adds /default/include/folder/library_name_abi.h and the
library package is responsible for providing that header in that
place, so that both clang and gcc can use the same header (if they
want).

No, I think you got this wrong. -fveclib itself doesn't enable any OpenMP. OpenMP is enabled only when -fopenmp[-simd] is invoked.

Ok.

: Same behavior as without the ``-fopenmp[-simd]`` option. In particular, both the "veclib" functions in math.h (or those in the user provided functions when -fveclib-include is used ) are available for vectorization, together with those marked by the OpenMP pragmas.

That's why I thought this would enable OpenMP, which is why it's
confusing. If -fopenmp-simd doesn't enable OpenMP, then why create a
veclib pragma in the first place, instead of using omp simd with
-fopenmp-simd?