[RFC] Expose user provided vector function for auto-vectorization.

Dear all,

This RFC is a proposal to provide auto-vectorization functionality for user provided vector functions.

The proposal is a modification of an RFC that I have sent out a couple of months ago, with the title `[RFC] Re-implementing -fveclib with OpenMP` (see http://lists.llvm.org/pipermail/llvm-dev/2018-December/128426.html). The previous RFC is to be considered abandoned.

The original RFC was proposing to re-implement the `-fveclib` command line option. This proposal avoids that, and limits its scope to the mechanics of providing vector function in user code that the compiler can pick up for auto-vectorization. This narrower scope limits the impact of changes that are needed in both clang and LLVM.

Please let me know what you think.

Kind regards,

Francesco

I generally like the idea of having support in IR for vectorization of
custom functions. I have several use cases which would benefit from this.

I'd suggest a couple of reframings to the IR representation though.

First, this should probably be specified as metadata/attribute on a
function declaration. Allowing the callsite variant is fine, but it
should primarily be a property of the called function, not of the call
site. Being able to specify it once per declaration is much cleaner.

Second, I really don't like the mangling use here. We need a better way
to specify the properties of the function then it's mangled name. One
thought to explore is to directly use the Value of the function
declaration (since this is metadata and we can do that), and then tie
the properties to the function declaration in some way? Sorry, I don't
really have a specific suggestion here.

Philip

There is no way to notify the backend how conformant the SIMD versions
are. While the initial spec said that floating point status registers
would be not supported, this is not difficult to do and the
implementations that I wrote support this[1]. Then if a status
register is set after a calculation, the calculation can be run with
the scalar versions to determine exactly which operation(s) causes it.

[1]Shawn Landden - [v2 1/2] PPC64: Add libmvec SIMD single-precision natural exponent funct

Hi Francesco,

Nice to finally see this RFC, thanks! :slight_smile:

Overall, I like the proposal. Clean, concise and complete.

I have a few comments inline, but from a high level this is looking good.

The directive `#pragma clang declare variant` follows the syntax of the
`#pragma omp declare variant` directive of OpenMP.

We define the new directive in the `clang` namespace instead of using
the `omp` one of OpenMP to allow the compiler to perform
auto-vectorization outside of an OpenMP SIMD context.

So, the only difference is that pragma "omp" includes and links OMP
stuff, while pragma "clang" doesn't, right?

What happens if I have code with "pragma clang declare variant" and
"pragma omp" elsewhere, would the clang pragma behave identically as
if it was omp?

The mechanism is base on OpenMP to provide a uniform user experience
across the two mechanism, and to maximise the number of shared
components of the infrastructure needed in the compiler frontend to
enable the feature.

Changes in LLVM IR {#llvmIR}
------------------

The IR is enriched with metadata that details the availability of vector
versions of an associated scalar function. This metadata is attached to
the call site of the scalar function.

If the metadata gets dropped by some middle-end pass, the user will be
confused why the vector function is not being called.

Do we have that problem with OMP stuff already? If so, how do we fix this?

    // ...
    ... = call double @sin(double) #0
    // ...

    #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
                              _ZGVdN4v_sin(__svml_sin4),
                              ..."} }

I'm assuming in all ABIs, the arguments and return values are
guaranteed to be the same, but vector versions. Ie. there are no
special arguments / flags / status regs that are used / changed in the
vector version that the compiler will have to "just know". If the
whole point is that this is a "variant", having specialist knowledge
for random variant ABIs won't scale.

Is that a problem or can we class those as "user-error"?

The SVFS can add new function definitions, in the same module as the
`Call`, to provide vector functions that are not present within the
vector-variant metadata. For example, if a library provides a vector
version of a function with a vectorization factor of 2, but the
vectorizer is requesting a vectorization factor of 4, the SVFS is
allowed to create a definition that calls the 2-lane version twice. This
capability applies similarly for providing masked and unmasked versions
when the request does not match what is available in the library.

Nice! Those thunks will play nicely with inlining later, sort of like unrolling.

The `construct` set in the directive, together with the `device` set, is
used to generate the vector mangled name to be used in the
`vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting
AArch64 Advanced SIMD code generation. The rule for mangling the name of
the scalar function in the vector name are defined in the the Vector
Function ABI specification of the target.

And I assume the user is responsible for linking the libraries that
export those signatures or they will have a linking error. Giving that
this is user definition (in the pragma), and that we really can't know
if the library will be available at link time, it's the only thing we
can do.

cheers,
--renato

Hi Francesco,

Nice to finally see this RFC, thanks! :slight_smile:

Overall, I like the proposal. Clean, concise and complete.

I have a few comments inline, but from a high level this is looking good.

> The directive `#pragma clang declare variant` follows the syntax of the
> `#pragma omp declare variant` directive of OpenMP.
>
> We define the new directive in the `clang` namespace instead of using
> the `omp` one of OpenMP to allow the compiler to perform
> auto-vectorization outside of an OpenMP SIMD context.

So, the only difference is that pragma "omp" includes and links OMP
stuff, while pragma "clang" doesn't, right?

What happens if I have code with "pragma clang declare variant" and
"pragma omp" elsewhere, would the clang pragma behave identically as
if it was omp?

>
> The mechanism is base on OpenMP to provide a uniform user experience
> across the two mechanism, and to maximise the number of shared
> components of the infrastructure needed in the compiler frontend to
> enable the feature.
>
> Changes in LLVM IR {#llvmIR}
> ------------------
>
> The IR is enriched with metadata that details the availability of vector
> versions of an associated scalar function. This metadata is attached to
> the call site of the scalar function.

If the metadata gets dropped by some middle-end pass, the user will be
confused why the vector function is not being called.

Do we have that problem with OMP stuff already? If so, how do we fix this?

> // ...
> ... = call double @sin(double) #0
> // ...
>
> #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
> _ZGVdN4v_sin(__svml_sin4),
> ..."} }

I'm assuming in all ABIs, the arguments and return values are
guaranteed to be the same, but vector versions. Ie. there are no
special arguments / flags / status regs that are used / changed in the
vector version that the compiler will have to "just know". If the
whole point is that this is a "variant", having specialist knowledge
for random variant ABIs won't scale.

Is that a problem or can we class those as "user-error"?

On architectures that have sticky floating point status flags, these
can be supported, and
my versions of exp and expf for PPC support them. I'd like to see the
compile know if
these work, because supporting them in the compiler is extra work
because you have
to re-run the calculation if you get a flag (and you care about which
specific operation caused
that flag).
-Shawn

Having to teach the compiler about user-provided libraries with hidden
side-effects without any kind of meta-data really doesn't scale.

A work around is, in specific hardware, *always* check if the flags
are changed and if they are, reset and run again. Now, what to do if
it changes again? Infinite loop?

Anything more "intelligent" will need hacks in the compiler that are
specific to the combination of library-hardware and that won't work
because neither the library (and sometimes nor the hardware) guarantee
they won't change.

--renato

Hi Francesco,

Nice to finally see this RFC, thanks! :slight_smile:

Overall, I like the proposal. Clean, concise and complete.

I have a few comments inline, but from a high level this is looking good.

+1

The directive `#pragma clang declare variant` follows the syntax of the
`#pragma omp declare variant` directive of OpenMP.

We define the new directive in the `clang` namespace instead of using
the `omp` one of OpenMP to allow the compiler to perform
auto-vectorization outside of an OpenMP SIMD context.

So, the only difference is that pragma "omp" includes and links OMP
stuff, while pragma "clang" doesn't, right?

I'm assuming that the difference is simpler: We don't process OpenMP
directives by default, but we will process these Clang pragmas by default.

What happens if I have code with "pragma clang declare variant" and
"pragma omp" elsewhere, would the clang pragma behave identically as
if it was omp?

I think that this is an interesting question. My preference is that draw
a distinction between 'system' directives (i.e., things provided by
system headers, and by headers from libraries treated like system
libraries) and user-provided directives. Then we process
directly-conflicting directives in the following order:

Lowest priority: #pragma clang declare system variant

Medium priority: #pragma omp declare variant

Highest priority: #pragma clang declare variant

My logic is this: We should have a way for users to override variants
provided by system headers. If users write a variant using OpenMP, it
should override a system-provided variant. The compiler-specific (Clang)
variant that a user provides should have the highest priority (because
it's a Clang pragma and we're Clang). As with the general OpenMP scheme,
more-specific variants should have priority over more-general variants
(regardless of whether they're OpenMP variants or Clang variants).

The mechanism is base on OpenMP to provide a uniform user experience
across the two mechanism, and to maximise the number of shared
components of the infrastructure needed in the compiler frontend to
enable the feature.

Changes in LLVM IR {#llvmIR}
------------------

The IR is enriched with metadata that details the availability of vector
versions of an associated scalar function. This metadata is attached to
the call site of the scalar function.

If the metadata gets dropped by some middle-end pass, the user will be
confused why the vector function is not being called.

Why metadata and not a call-site/function attribute?

Do we have that problem with OMP stuff already? If so, how do we fix this?

I don't think that we currently use metadata like this for OpenMP in any
relevant sense, and so we don't currently have this problem.

     // ...
     ... = call double @sin(double) #0
     // ...

     #0 = { vector-variant = {"_ZGVcN2v_sin(__svml_sin2),
                               _ZGVdN4v_sin(__svml_sin4),
                               ..."} }

I'm assuming in all ABIs, the arguments and return values are
guaranteed to be the same, but vector versions. Ie. there are no
special arguments / flags / status regs that are used / changed in the
vector version that the compiler will have to "just know". If the
whole point is that this is a "variant", having specialist knowledge
for random variant ABIs won't scale.

Is that a problem or can we class those as "user-error"?

The SVFS can add new function definitions, in the same module as the
`Call`, to provide vector functions that are not present within the
vector-variant metadata. For example, if a library provides a vector
version of a function with a vectorization factor of 2, but the
vectorizer is requesting a vectorization factor of 4, the SVFS is
allowed to create a definition that calls the 2-lane version twice. This
capability applies similarly for providing masked and unmasked versions
when the request does not match what is available in the library.

Nice! Those thunks will play nicely with inlining later, sort of like unrolling.

The `construct` set in the directive, together with the `device` set, is
used to generate the vector mangled name to be used in the
`vector-variant` attribute, for example `_ZGVnN2v_sin`, when targeting
AArch64 Advanced SIMD code generation. The rule for mangling the name of
the scalar function in the vector name are defined in the the Vector
Function ABI specification of the target.

And I assume the user is responsible for linking the libraries that
export those signatures or they will have a linking error.

Either user, or the driver (based on whatever compiler flags enables
targeting the vector-math library in the first place). I'm hoping that
the driver can get this on it's own in all common cases.

Thanks again,

Hal

I generally like the idea of having support in IR for vectorization of
custom functions. I have several use cases which would benefit from this.

I'd suggest a couple of reframings to the IR representation though.

First, this should probably be specified as metadata/attribute on a
function declaration. Allowing the callsite variant is fine, but it
should primarily be a property of the called function, not of the call
site. Being able to specify it once per declaration is much cleaner.

I agree. We should support this both on the function declaration and on
the call sites.

Second, I really don't like the mangling use here. We need a better way
to specify the properties of the function then it's mangled name. One
thought to explore is to directly use the Value of the function
declaration (since this is metadata and we can do that), and then tie
the properties to the function declaration in some way? Sorry, I don't
really have a specific suggestion here.

Is the problem the mangling or the fact that the mangling is
ABI/target-specific? One option is to use LLVM's mangling scheme (the
one we use for intrinsics) and then provide some backend infrastructure
to translate later.

-Hal

I generally like the idea of having support in IR for vectorization of
custom functions. I have several use cases which would benefit from this.

I'd suggest a couple of reframings to the IR representation though.

First, this should probably be specified as metadata/attribute on a
function declaration. Allowing the callsite variant is fine, but it
should primarily be a property of the called function, not of the call
site. Being able to specify it once per declaration is much cleaner.

I agree. We should support this both on the function declaration and on
the call sites.

Second, I really don't like the mangling use here. We need a better way
to specify the properties of the function then it's mangled name. One
thought to explore is to directly use the Value of the function
declaration (since this is metadata and we can do that), and then tie
the properties to the function declaration in some way? Sorry, I don't
really have a specific suggestion here.

Is the problem the mangling or the fact that the mangling is
ABI/target-specific? One option is to use LLVM's mangling scheme (the
one we use for intrinsics) and then provide some backend infrastructure
to translate later.

Well, both honestly. But mangling with a non-target specific scheme is
a lot better, so I might be okay with that. Good idea.

I generally like the idea of having support in IR for vectorization of
custom functions. I have several use cases which would benefit from this.

I'd suggest a couple of reframings to the IR representation though.

First, this should probably be specified as metadata/attribute on a
function declaration. Allowing the callsite variant is fine, but it
should primarily be a property of the called function, not of the call
site. Being able to specify it once per declaration is much cleaner.

I agree. We should support this both on the function declaration and on
the call sites.

Second, I really don't like the mangling use here. We need a better way
to specify the properties of the function then it's mangled name. One
thought to explore is to directly use the Value of the function
declaration (since this is metadata and we can do that), and then tie
the properties to the function declaration in some way? Sorry, I don't
really have a specific suggestion here.

Is the problem the mangling or the fact that the mangling is
ABI/target-specific? One option is to use LLVM's mangling scheme (the
one we use for intrinsics) and then provide some backend infrastructure
to translate later.

Well, both honestly. But mangling with a non-target specific scheme is
a lot better, so I might be okay with that. Good idea.

I liked your idea of directly encoding the signature in the metadata,
but I think that we want to continue to use attributes, and not
metadata, and the options for attributes seem more limited - unless we
allow attributes to take metadata arguments - maybe that's an
enhancement worth considering.

-Hal

>>> I generally like the idea of having support in IR for vectorization of
>>> custom functions. I have several use cases which would benefit from this.
>>>
>>> I'd suggest a couple of reframings to the IR representation though.
>>>
>>> First, this should probably be specified as metadata/attribute on a
>>> function declaration. Allowing the callsite variant is fine, but it
>>> should primarily be a property of the called function, not of the call
>>> site. Being able to specify it once per declaration is much cleaner.
>> I agree. We should support this both on the function declaration and on
>> the call sites.
>>
>>
>>> Second, I really don't like the mangling use here. We need a better way
>>> to specify the properties of the function then it's mangled name. One
>>> thought to explore is to directly use the Value of the function
>>> declaration (since this is metadata and we can do that), and then tie
>>> the properties to the function declaration in some way? Sorry, I don't
>>> really have a specific suggestion here.
>> Is the problem the mangling or the fact that the mangling is
>> ABI/target-specific? One option is to use LLVM's mangling scheme (the
>> one we use for intrinsics) and then provide some backend infrastructure
>> to translate later.
> Well, both honestly. But mangling with a non-target specific scheme is
> a lot better, so I might be okay with that. Good idea.

I liked your idea of directly encoding the signature in the metadata,
but I think that we want to continue to use attributes, and not
metadata, and the options for attributes seem more limited - unless we
allow attributes to take metadata arguments - maybe that's an
enhancement worth considering.

I recently talked to people in the OpenMP language committee meeting
about this and, thinking forward to the actual implementation/use of the
OpenMP 5.x declare variant feature, I'd say:

  - We will need a mangling scheme if we want to allow variants on
    declarations that are defined elsewhere.
  - We will need a (OpenMP) standardized mangling scheme if we want
    interoperability between compilers.

I assume we want both so I think we will need both.

That said, I think this should allow us to avoid attributes/metadata
which seems to me like a good thing right now.

Cheers,
  Johannes

I generally like the idea of having support in IR for vectorization of
custom functions. I have several use cases which would benefit from this.

I'd suggest a couple of reframings to the IR representation though.

First, this should probably be specified as metadata/attribute on a
function declaration. Allowing the callsite variant is fine, but it
should primarily be a property of the called function, not of the call
site. Being able to specify it once per declaration is much cleaner.

I agree. We should support this both on the function declaration and on
the call sites.

Second, I really don't like the mangling use here. We need a better way
to specify the properties of the function then it's mangled name. One
thought to explore is to directly use the Value of the function
declaration (since this is metadata and we can do that), and then tie
the properties to the function declaration in some way? Sorry, I don't
really have a specific suggestion here.

Is the problem the mangling or the fact that the mangling is
ABI/target-specific? One option is to use LLVM's mangling scheme (the
one we use for intrinsics) and then provide some backend infrastructure
to translate later.

Well, both honestly. But mangling with a non-target specific scheme is
a lot better, so I might be okay with that. Good idea.

I liked your idea of directly encoding the signature in the metadata,
but I think that we want to continue to use attributes, and not
metadata, and the options for attributes seem more limited - unless we
allow attributes to take metadata arguments - maybe that's an
enhancement worth considering.

I recently talked to people in the OpenMP language committee meeting
about this and, thinking forward to the actual implementation/use of the
OpenMP 5.x declare variant feature, I'd say:

  - We will need a mangling scheme if we want to allow variants on
    declarations that are defined elsewhere.
  - We will need a (OpenMP) standardized mangling scheme if we want
    interoperability between compilers.

I assume we want both so I think we will need both.

If I'm reading this correctly, this describes a need for the frontend to
have a mangling scheme. Nothing in here would seem to prevent the
frontend for generating a declaration for a mangled external symbol and
then referencing that declaration. Am I missing something?

I think that a standardized naming scheme is needed and that it solves the problem motivating the RFC without the need for attributes or metadata.

If we want to use a vectorized version at a call site we know what the symbol is supposed to look like and we can check if it’s available.

Maybe I misunderstood the problem people want to solve here but the way I see it the above is all we need.

Hi All,

Thank you for the feedback so far.

I am replying to all your questions/concerns/suggestions in this single email. Please let me know if I have missed any.

I will update the RFC accordingly to what we end up deciding here.

Kind regards,

Francesco

# TOPIC 1: concerns about name mangling

I understand that there are concerns in using the mangling scheme I proposed, and that it would be preferred to have a mangling scheme that is based on (and standardized by) OpenMP. I hear the argument on having some common ground here. In fact, there is already common ground between the x86 and aarch64 backend, who have based their respective Vector Function ABI specifications on OpenMP.

In fact, the mangled name grammar can be summarized as follows:

_ZGV<isa><masking><VLEN><parameter type>_<scalar name>

Across vector extensions the only <token> that will differ is the <isa> token.

This might lead people to think that we could drop the _ZGV<isa> prefix and consider the <masking><VLEN><parameter type>_<scalar name> part as a sort of unofficial OpenMP mangling scheme: in fact, the signature of an “unmasked 2-lane vector vector of `sin`” will always be `<2 x double>(2 x double>).

The problem with this choice is the number of vector version available for a target is not unique.

In particular, the following declaration generates multiple vector versions, depending on the target:

#pragma omp declare simd simdlen(2) notinbranch
double foo(double) {…};

On x86, this generates at least 4 symbols (one for SSE, one for AVX, one for AVX2, and one for AVX512: Compiler Explorer)

On aarch64, the same declaration generates a unique symbol, as specified in the Vector Function ABI.

This means that the attribute (or metadata) that carries the information on the available vector version needs to deal also with things that are not usually visible at IR level, but that might still need to be provided to be able to decide which particular instruction set/ vector extension needs to be targeted.

I used an example based on `declare simd` instead of `declare variant` because the attribute/metadata needed for `declare variant` is a modification of the one needed for `declare simd`, which has already been agreed in a previous RFC proposed by Intel [1], and for which Intel has already provided an implementation [2]. The changes proposed in this RFC are fully compatible with the work that is being don for the VecClone pass in [2].

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
[2] VecCLone pass: https://reviews.llvm.org/D22792

The good news is that as far as AArch64 and x86 are concerned, the only thing that will differ in the mangled name is the “<isa>” token. As far as I can tell, the mangling scheme of the rest of the vector name is the same, therefore a lot of infrastructure in terms of mangling and demangling can be reused. In fact, the `mangleVectorParameters` function in https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could already be shared among x86 and aarch64.

TOPIC 2: metadata vs attribute

From a functionality point of view, I don’t care whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the following:

attributes #0 = { nounwind uwtable “vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}

This is an attribute (I though it was metadata?), I am happy to reword the RFC using the right terminology (sorry for messing this up).

Also, @Renato expressed concern that metadata might be dropped by optimization passes - would using attributes prevent that?

TOPIC 3: "There is no way to notify the backend how conformant the SIMD versions are.”

@Shawn, I am afraid I don’t understand what you mean by “conformant” here. Can you elaborate with an example?

TOPIC 3: interaction of the `omp declare variant` with `clang declare variant`

I believe this is described in the `Option behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based one orthogonal. Of course, we might decide to make -fclang-declare-variant on/off by default, and have default behavior when interacting with -fopenmp-simd. For the sake of compatibility with other compilers, we might need to require -fno-clang-declare-variant when targeting -fopenmp-[simd].

TOPIC 3: "there are no special arguments / flags / status regs that are used / changed in the vector version that the compiler will have to "just know”

I believe that this concern is raised by the problem of handling FP exceptions? If that’s the case, the compiler is not allowed to do any assumption on the vector function about that, and treat it with the same knowledge of any other function, depending on the visibility it has in the compilation unit. @Renato, does this answer your question?

TOPIC 4: attribute in function declaration vs attribute function call site

We discussed this in the previous version of the proposal. Having it in the call sites guarantees that incompatible vector version are used when merging modules compiled for different targets. I don’t have a use case for this, if I remember correctly this was asked by @Hideki Saito. Hideki, any comment on this?

TOPIC 5: overriding system header (the discussion on #pragma omp/clang/system variants initiated by @Hal Finkel).

I though that the split among #pragma clang declare variant and #pragma omp declare variant was already providing the orthogonality between system header and user header. Meaning that a user should always prefer the omp version (for portability to other compilers) instead of the #pragma clang one, which would be relegated to system headers and headers provided by the compiler. Am I missing something? If so, I am happy to add a “system” version of the directive, as it would be quite easy to do given most of the parsing infrastructure will be shared.

Hi All,

Thank you for the feedback so far.

I am replying to all your questions/concerns/suggestions in this single email. Please let me know if I have missed any.

I will update the RFC accordingly to what we end up deciding here.

Kind regards,

Francesco

# TOPIC 1: concerns about name mangling

I understand that there are concerns in using the mangling scheme I proposed, and that it would be preferred to have a mangling scheme that is based on (and standardized by) OpenMP. I hear the argument on having some common ground here. In fact, there is already common ground between the x86 and aarch64 backend, who have based their respective Vector Function ABI specifications on OpenMP.

In fact, the mangled name grammar can be summarized as follows:

_ZGV<isa><masking><VLEN><parameter type>_<scalar name>

Across vector extensions the only <token> that will differ is the <isa> token.

This might lead people to think that we could drop the _ZGV<isa> prefix and consider the <masking><VLEN><parameter type>_<scalar name> part as a sort of unofficial OpenMP mangling scheme: in fact, the signature of an “unmasked 2-lane vector vector of `sin`” will always be `<2 x double>(2 x double>).

The problem with this choice is the number of vector version available for a target is not unique.

In particular, the following declaration generates multiple vector versions, depending on the target:

#pragma omp declare simd simdlen(2) notinbranch
double foo(double) {…};

On x86, this generates at least 4 symbols (one for SSE, one for AVX, one for AVX2, and one for AVX512: Compiler Explorer)

On aarch64, the same declaration generates a unique symbol, as specified in the Vector Function ABI.

This means that the attribute (or metadata) that carries the information on the available vector version needs to deal also with things that are not usually visible at IR level, but that might still need to be provided to be able to decide which particular instruction set/ vector extension needs to be targeted.

I used an example based on `declare simd` instead of `declare variant` because the attribute/metadata needed for `declare variant` is a modification of the one needed for `declare simd`, which has already been agreed in a previous RFC proposed by Intel [1], and for which Intel has already provided an implementation [2]. The changes proposed in this RFC are fully compatible with the work that is being don for the VecClone pass in [2].

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
[2] VecCLone pass: https://reviews.llvm.org/D22792

The good news is that as far as AArch64 and x86 are concerned, the only thing that will differ in the mangled name is the “<isa>” token. As far as I can tell, the mangling scheme of the rest of the vector name is the same, therefore a lot of infrastructure in terms of mangling and demangling can be reused. In fact, the `mangleVectorParameters` function in https://clang.llvm.org/doxygen/CGOpenMPRuntime_8cpp_source.html#l09918 could already be shared among x86 and aarch64.

TOPIC 2: metadata vs attribute

From a functionality point of view, I don’t care whether we use metadata or attributes. The VecClone pass mentioned in TOPIC 1 uses the following:

attributes #0 = { nounwind uwtable “vector-variants"="_ZGVbM4vv_vec_sum,_ZGVbN4vv_vec_sum,_ZGVcM8vv_vec_sum,_ZGVcN8vv_vec_sum,_ZGVdM8vv_vec_sum,_ZGVdN8vv_vec_sum,_ZGVeM16vv_vec_sum,_ZGVeN16”}

This is an attribute (I though it was metadata?), I am happy to reword the RFC using the right terminology (sorry for messing this up).

Also, @Renato expressed concern that metadata might be dropped by optimization passes - would using attributes prevent that?

TOPIC 3: "There is no way to notify the backend how conformant the SIMD versions are.”

@Shawn, I am afraid I don’t understand what you mean by “conformant” here. Can you elaborate with an example?

TOPIC 3: interaction of the `omp declare variant` with `clang declare variant`

I believe this is described in the `Option behavior, and interaction with OpenMP`. The option `-fclang-declare-variant` is there to make the OpenMP based one orthogonal. Of course, we might decide to make -fclang-declare-variant on/off by default, and have default behavior when interacting with -fopenmp-simd. For the sake of compatibility with other compilers, we might need to require -fno-clang-declare-variant when targeting -fopenmp-[simd].

TOPIC 3: "there are no special arguments / flags / status regs that are used / changed in the vector version that the compiler will have to "just know”

I believe that this concern is raised by the problem of handling FP exceptions? If that’s the case, the compiler is not allowed to do any assumption on the vector function about that, and treat it with the same knowledge of any other function, depending on the visibility it has in the compilation unit. @Renato, does this answer your question?

TOPIC 4: attribute in function declaration vs attribute function call site

We discussed this in the previous version of the proposal. Having it in the call sites guarantees that incompatible vector version are used when merging modules compiled for different targets. I don’t have a use case for this, if I remember correctly this was asked by @Hideki Saito. Hideki, any comment on this?

TOPIC 5: overriding system header (the discussion on #pragma omp/clang/system variants initiated by @Hal Finkel).

I though that the split among #pragma clang declare variant and #pragma omp declare variant was already providing the orthogonality between system header and user header. Meaning that a user should always prefer the omp version (for portability to other compilers) instead of the #pragma clang one, which would be relegated to system headers and headers provided by the compiler. Am I missing something? If so, I am happy to add a “system” version of the directive, as it would be quite easy to do given most of the parsing infrastructure will be shared.

One more point to consider - is there prior art? Does e.g. GCC already
do something like that?
The question in particular: will this work across the DSO boundary?

I.e. if the library A contains some function 'c', that has multiple versions,
but only the declaration of the function is exposed in the header file
(with some pragmas),
and the definition is in a source file (not header file).
So when that function is used by some other program, will the variants
be picked up?

Roman.

I think we should split this discussion:
  TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP 5.X
                   features, including compatibility with other
                   compilers and cross module support.
  TOPIC 3b & 5: Interoperability with clang declare (system vs. user
                 declares)
  TOPIC 3a & 3c: floating point issues?

I inlined comments for Topic 1 below.

I hope that we do not have to discuss topic 2 if we agree neither
attributes nor metadata is necessary, or better, will solve the actual
problem at hand. I don't have strong feeling on topic 4 but I have the
feeling this will become less problematic once we figure out topic 1.

Thanks,
  Johannes

# TOPIC 1: concerns about name mangling

I understand that there are concerns in using the mangling scheme I
proposed, and that it would be preferred to have a mangling scheme
that is based on (and standardized by) OpenMP.

I still think it will be required to have a standardized one, not
only preferred.

I hear the argument on having some common ground here. In fact, there
is already common ground between the x86 and aarch64 backend, who have
based their respective Vector Function ABI specifications on OpenMP.

In fact, the mangled name grammar can be summarized as follows:

_ZGV<isa><masking><VLEN><parameter type>_<scalar name>

Across vector extensions the only <token> that will differ is the
<isa> token.

This might lead people to think that we could drop the _ZGV<isa>
prefix and consider the <masking><VLEN><parameter type>_<scalar name>
part as a sort of unofficial OpenMP mangling scheme: in fact, the
signature of an “unmasked 2-lane vector vector of `sin`” will always
be `<2 x double>(2 x double>).

The problem with this choice is the number of vector version available
for a target is not unique.

For me, this simply means this mangling scheme is not sufficient.

In particular, the following declaration generates multiple vector
versions, depending on the target:

#pragma omp declare simd simdlen(2) notinbranch
double foo(double) {…};

On x86, this generates at least 4 symbols (one for SSE, one for AVX,
one for AVX2, and one for AVX512: Compiler Explorer)

On aarch64, the same declaration generates a unique symbol, as
specified in the Vector Function ABI.

I fail to see the problem. We generate X symbols for X different
contexts. Once we get to the point where we vectorize, we determine
which context fits best and choose the corresponding symbol version.

Maybe my view is to naive here, please feel free to correct me.

This means that the attribute (or metadata) that carries the
information on the available vector version needs to deal also with
things that are not usually visible at IR level, but that might still
need to be provided to be able to decide which particular instruction
set/ vector extension needs to be targeted.

The symbol names should carry all the information we need. If they do
not, we need to improve the mangling scheme such that they do. There is
no attributes/metadata we could use at library boundaries.

I used an example based on `declare simd` instead of `declare variant`
because the attribute/metadata needed for `declare variant` is a
modification of the one needed for `declare simd`, which has already
been agreed in a previous RFC proposed by Intel [1], and for which
Intel has already provided an implementation [2]. The changes proposed
in this RFC are fully compatible with the work that is being don for
the VecClone pass in [2].

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
[2] VecCLone pass: https://reviews.llvm.org/D22792

Having an agreed upon mangling for the older feature is not necessarily
important here. We will need more functionality for variants and keeping
the old scheme around with some metadata is not an extensible long-term
solution. So, I would not try to fit variants into the existing
simd-scheme but instead do it the other way around. We define what we
need for variants and implement simd in that scheme.

TOPIC 2: metadata vs attribute

Also, @Renato expressed concern that metadata might be dropped by optimization passes - would using attributes prevent that?

I think it would, thanks!

TOPIC 3: "there are no special arguments / flags / status regs that are used / changed in the vector version that the compiler will have to "just know”

I believe that this concern is raised by the problem of handling FP exceptions? If that’s the case, the compiler is not allowed to do any assumption on the vector function about that, and treat it with the same knowledge of any other function, depending on the visibility it has in the compilation unit. @Renato, does this answer your question?

So, if there are side-effects on the scalar version, there will be
also in the vector version? Unfortunately, this does not work in
practice by default (different units have different rules).

If we want to enforce this, it's up to the library implementation to
provide similar behaviour (either hide or create side-effects) and it
will be "library error" if they do not.

This seems a bit heavy handed, though...

--renato

TOPIC 2: metadata vs attribute

Also, @Renato expressed concern that metadata might be dropped by optimization passes - would using attributes prevent that?

I think it would, thanks!

TOPIC 3: "there are no special arguments / flags / status regs that are used / changed in the vector version that the compiler will have to "just know”

I believe that this concern is raised by the problem of handling FP exceptions? If that’s the case, the compiler is not allowed to do any assumption on the vector function about that, and treat it with the same knowledge of any other function, depending on the visibility it has in the compilation unit. @Renato, does this answer your question?

So, if there are side-effects on the scalar version, there will be
also in the vector version? Unfortunately, this does not work in
practice by default (different units have different rules).

The OpenMP use of `declare simd` and `declare variant` informs the compiler that the underlying scalar function is “safe to vectorize”. I believe that side-effects are excluded from the list of things that are “safe to vectorize”.

See https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 118, lines 23-24:

“The execution of the function or subroutine cannot have any side effects that would alter its execution for concurrent iterations of a SIMD chunk."

Right, but this is a clang-based directive, not an OMP one, so we
should extend the same guarantees explicitly. I think following OMP's
restrictions in our own document is the right way to go.

thanks,
--renato

Hi Johannes,

Thank you for your feedback! See mi replies below.

Francesco

I think we should split this discussion:
TOPIC 1 & 2 & 4: How do implement all use cases and OpenMP 5.X
                  features, including compatibility with other
                  compilers and cross module support.

Yes, and we have to carefully make this as standard and compatible as possible.

TOPIC 3b & 5: Interoperability with clang declare (system vs. user
                declares)

I think that Alexey explanation of how the directive are handled internally in the frontend makes us propound towards the attribute.

TOPIC 3a & 3c: floating point issues?

I believe there is no issue there. I have quoted the openMP standard in reply to Renato:

See https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf, page 118, lines 23-24:

“The execution of the function or subroutine cannot have any side effects that would alter its execution for concurrent iterations of a SIMD chunk."

I inlined comments for Topic 1 below.

I hope that we do not have to discuss topic 2 if we agree neither
attributes nor metadata is necessary, or better, will solve the actual
problem at hand. I don't have strong feeling on topic 4 but I have the
feeling this will become less problematic once we figure out topic 1.

Thanks,
Johannes

# TOPIC 1: concerns about name mangling

I understand that there are concerns in using the mangling scheme I
proposed, and that it would be preferred to have a mangling scheme
that is based on (and standardized by) OpenMP.

I still think it will be required to have a standardized one, not
only preferred.

I am all with you in standardizing. x86 and arch64 have their own vector function ABI, which, although “private”, are to be considered standard. Opensource and commercial compilers are using them, therefore we have to deal with this mangling scheme, whether or not OpenMP comes up with a standard mangling scheme.

I hear the argument on having some common ground here. In fact, there
is already common ground between the x86 and aarch64 backend, who have
based their respective Vector Function ABI specifications on OpenMP.

In fact, the mangled name grammar can be summarized as follows:

_ZGV<isa><masking><VLEN><parameter type>_<scalar name>

Across vector extensions the only <token> that will differ is the
<isa> token.

This might lead people to think that we could drop the _ZGV<isa>
prefix and consider the <masking><VLEN><parameter type>_<scalar name>
part as a sort of unofficial OpenMP mangling scheme: in fact, the
signature of an “unmasked 2-lane vector vector of `sin`” will always
be `<2 x double>(2 x double>).

The problem with this choice is the number of vector version available
for a target is not unique.

For me, this simply means this mangling scheme is not sufficient.

Can you explain more why you think the mangling scheme is not sufficient? The mangling scheme is shaped to provide all the information that the OpenMP directive describes.

The fact that x86 and aarch64 realize such information in different way (multiple signature/vector extensions) is something that cannot be avoided, because it is related to architectural aspects that are specific to the vector extension and transparent to the OpenMP standard.

In particular, the following declaration generates multiple vector
versions, depending on the target:

#pragma omp declare simd simdlen(2) notinbranch
double foo(double) {…};

On x86, this generates at least 4 symbols (one for SSE, one for AVX,
one for AVX2, and one for AVX512: Compiler Explorer)

On aarch64, the same declaration generates a unique symbol, as
specified in the Vector Function ABI.

I fail to see the problem. We generate X symbols for X different
contexts. Once we get to the point where we vectorize, we determine
which context fits best and choose the corresponding symbol version.

Yes, this is exactly what we need to do, under the constrains that the rules for generating "X symbols for X different contexts” are decided by the Vector Function ABI of the target.

Maybe my view is to naive here, please feel free to correct me.

This means that the attribute (or metadata) that carries the
information on the available vector version needs to deal also with
things that are not usually visible at IR level, but that might still
need to be provided to be able to decide which particular instruction
set/ vector extension needs to be targeted.

The symbol names should carry all the information we need. If they do
not, we need to improve the mangling scheme such that they do. There is
no attributes/metadata we could use at library boundaries.

Hum, I am not sure what you mean by "There is no attributes/metadata we could use at library boundaries."

In our downstream compiler (Arm compiler for HPC, based on LLVM), we use `declare simd` to provide vector math functions via custom header file. It works brilliantly, if not for specific aspects that would be perfectly covered by the `declare variant`, which might be one of the reason why the OpenMP committee decided to introduce `declare variant`.

If your concerns is that by adding an attribute that somehow represent something that is available in an external library is not enough to guarantee that that symbol is available in the library… not even C code can guarantee that? If the linker is not pointing to the right library, there is nothing that can prevent it to fail if the symbol is not present?

I used an example based on `declare simd` instead of `declare variant`
because the attribute/metadata needed for `declare variant` is a
modification of the one needed for `declare simd`, which has already
been agreed in a previous RFC proposed by Intel [1], and for which
Intel has already provided an implementation [2]. The changes proposed
in this RFC are fully compatible with the work that is being don for
the VecClone pass in [2].

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html
[2] VecCLone pass: https://reviews.llvm.org/D22792

Having an agreed upon mangling for the older feature is not necessarily
important here. We will need more functionality for variants and keeping
the old scheme around with some metadata is not an extensible long-term
solution. So, I would not try to fit variants into the existing
simd-scheme but instead do it the other way around. We define what we
need for variants and implement simd in that scheme.

I kinda think that having agreed on something is important. It allows to build other things on top of what have been agreed without breaking compatibility.

On the specific, which are the new functionalities needed for the variants that would make the current metadata (attributes) for declare simd non extensible?