Implementing OpenMP function variants

Background:

The prototype for OpenMPs 5.1 feature `begin/end declare variant` caused
a discussion that seems to be stuck. I'll briefly outline the semantics
of the OpenMP 5.0 `declare variant` and the `begin/end declare variant`
here before I present the two competing designs on a very high level. If
there is a need for more information, a summary of what was said, or
anything else, please feel free to ask for it.

Richard, John, et al.,

Let me top-post here quickly to add that this question comes directly
from a disagreement about the application of Clang's design principle of
keeping the AST faithful to the source code (and I suspect that such a
question may be of wider interest to the community).

Johannes has summarized the OpenMP language semantics below (thanks,
Johannes!).

The relevant review thread is here: https://reviews.llvm.org/D71241

Alexey's position is that, because the source code contains what appears
to be a call to the base() function, the AST should always reflect that
fact by having getCallee() return a reference to base(), and we should
lower the call to the selected variant using logic in CodeGen. Some
other member function should be used by AST-level tools to retrieve the
actually-called variant. This has the benefit that the primary AST
representation is independent of the compilation target and other
relevant OpenMP context.

My position is that, like other cases where we perform overload
resolution and specializaton selection (including host/device overloads
in CUDA), we should resolve the variant selected in Sema, and
getCallee() should return a reference to the function that will actually
be called (even if that function has a different name from the name used
syntactically to form the call expression). This will ensure that
static-analysis tools see the correct call-site <-> callee relationship.
We should, I think, also keep a reference to the original OpenMP base
function in the AST, but some other member function should be used to
retrieve it.

We say that we keep Clang's AST faithful to the source code, but how to
best apply that philosophy in this case is now under debate.

Thanks again,

Hal

Richard, John, et al.,

Let me top-post here quickly to add that this question comes directly
from a disagreement about the application of Clang's design principle
of keeping the AST faithful to the source code (and I suspect that
such a question may be of wider interest to the community).

Johannes has summarized the OpenMP language semantics below (thanks,
Johannes!).

The relevant review thread is here: https://reviews.llvm.org/D71241

Alexey's position is that, because the source code contains what
appears to be a call to the base() function, the AST should always
reflect that fact by having getCallee() return a reference to base(),
and we should lower the call to the selected variant using logic in
CodeGen. Some other member function should be used by AST-level tools
to retrieve the actually-called variant. This has the benefit that the
primary AST representation is independent of the compilation target
and other relevant OpenMP context.

My position is that, like other cases where we perform overload
resolution and specializaton selection (including host/device
overloads in CUDA), we should resolve the variant selected in Sema,
and getCallee() should return a reference to the function that will
actually be called (even if that function has a different name from
the name used syntactically to form the call expression). This will
ensure that static-analysis tools see the correct call-site <-> callee
relationship. We should, I think, also keep a reference to the
original OpenMP base function in the AST, but some other member
function should be used to retrieve it.

Another question that has come up in the review is this: are there
potential issues with introducing additional cases where FoundDecl != Decl.

-Hal

Richard, John, et al.,

Let me top-post here quickly to add that this question comes directly
from a disagreement about the application of Clang's design principle of
keeping the AST faithful to the source code (and I suspect that such a
question may be of wider interest to the community).

Johannes has summarized the OpenMP language semantics below (thanks,
Johannes!).

The relevant review thread is here: https://reviews.llvm.org/D71241

Alexey's position is that, because the source code contains what appears
to be a call to the base() function, the AST should always reflect that
fact by having getCallee() return a reference to base(), and we should
lower the call to the selected variant using logic in CodeGen. Some
other member function should be used by AST-level tools to retrieve the
actually-called variant. This has the benefit that the primary AST
representation is independent of the compilation target and other
relevant OpenMP context.

My position is that, like other cases where we perform overload
resolution and specializaton selection (including host/device overloads
in CUDA), we should resolve the variant selected in Sema, and
getCallee() should return a reference to the function that will actually
be called (even if that function has a different name from the name used
syntactically to form the call expression). This will ensure that
static-analysis tools see the correct call-site <-> callee relationship.
We should, I think, also keep a reference to the original OpenMP base
function in the AST, but some other member function should be used to
retrieve it.

We say that we keep Clang's AST faithful to the source code, but how to
best apply that philosophy in this case is now under debate.

Is it always immediately decidable when parsing a reference to a function which variant should be used, or is it sometimes dynamic or at least delayed?

How is this expected to interact with C++ overloading? Can you independently declare variants of each overload?

How does this interact with nested scopes? If all variants are unacceptable for the use context, is it like the declaration just doesn’t exist, and so lookup continues to outer scopes? Or is this impossible because there always has to be a non-variant declaration in the current scope?

My immediate intuition is that, assuming the semantics are always static and that there’s always a non-variant function, this should be handled as a sort of second level of overload resolution. The variant declarations should be considered more-or-less independent functions; they are not redeclarations of the original. 5.1-type variants should be hidden from lookup, so that only the original function is found. When we resolve a use of a declaration with variants we then pick the appropriate variant and treat the declaration as if it was using that originally. The fact that we resolved it via a variant should be recorded in the FoundDecl, which would now have an additional possible state: we could have looked through a using declaration, and we could have resolved a variant. This shouldn’t be a problem for FoundDecl.

John.

> Richard, John, et al.,
>
> Let me top-post here quickly to add that this question comes directly
> from a disagreement about the application of Clang's design principle of
> keeping the AST faithful to the source code (and I suspect that such a
> question may be of wider interest to the community).
>
> Johannes has summarized the OpenMP language semantics below (thanks,
> Johannes!).
>
> The relevant review thread is here: https://reviews.llvm.org/D71241
>
> Alexey's position is that, because the source code contains what appears
> to be a call to the base() function, the AST should always reflect that
> fact by having getCallee() return a reference to base(), and we should
> lower the call to the selected variant using logic in CodeGen. Some
> other member function should be used by AST-level tools to retrieve the
> actually-called variant. This has the benefit that the primary AST
> representation is independent of the compilation target and other
> relevant OpenMP context.
>
> My position is that, like other cases where we perform overload
> resolution and specializaton selection (including host/device overloads
> in CUDA), we should resolve the variant selected in Sema, and
> getCallee() should return a reference to the function that will actually
> be called (even if that function has a different name from the name used
> syntactically to form the call expression). This will ensure that
> static-analysis tools see the correct call-site <-> callee relationship.
> We should, I think, also keep a reference to the original OpenMP base
> function in the AST, but some other member function should be used to
> retrieve it.
>
> We say that we keep Clang's AST faithful to the source code, but how to
> best apply that philosophy in this case is now under debate.

Is it always immediately decidable when parsing a reference to a function
which variant should be used, or is it sometimes dynamic or at least
delayed?

For now, which means OpenMP 5.0 and TR8, it is conceptually immediately
decidable. It only depends on compilation parameters, e.g., the target
triple, and the lexical context.

How is this expected to interact with C++ overloading? Can you
independently declare variants of each overload?

While the standard is not specific on this, I don't see why not.

I tried to summarize what the standard says yesterday [0]:

OpenMP basically says, if you have a call to a (base)function* that has
variants with contexts that match at the call site, call the variant
with the highest score. The variants are specified by a variant-func-id,
which is a base language identifier or C++ template-id. For C++, the
variant declaration is identified by *performing the base language
lookup rules on the variant-func-id with arguments that correspond to
the base function argument types*.

* However you figured out that the base function is the one called in
  the first place.

[0] https://reviews.llvm.org/D71241#1788003

How does this interact with nested scopes? If all variants are unacceptable
for the use context, is it like the declaration just doesn’t exist, and so
lookup continues to outer scopes? Or is this impossible because there
always has to be a non-variant declaration in the current scope?

The latter. As mentioned above, you first find the "normal" call target
and then apply the variant logic from there.

My immediate intuition is that, assuming the semantics are always static and
that there’s always a non-variant function, this should be handled as a sort
of second level of overload resolution. The variant declarations should be
considered more-or-less independent functions; they are not redeclarations
of the original. 5.1-type variants should be hidden from lookup, so that
only the original function is found. When we resolve a use of a declaration
with variants we then pick the appropriate variant and treat the declaration
as if it was using that originally. The fact that we resolved it via a
variant should be recorded in the FoundDecl, which would now have an
additional possible state: we could have looked through a using declaration,
and we could have resolved a variant. This shouldn’t be a problem for
FoundDecl.

I think your assumptions are met.

Is there a good reason to make 5.1-type variants different from
multi-versions (as we have them)? They do not depend on the lexical call
context but only on the compilation parameters.

Richard, John, et al.,

Let me top-post here quickly to add that this question comes directly
from a disagreement about the application of Clang's design principle of
keeping the AST faithful to the source code (and I suspect that such a
question may be of wider interest to the community).

Johannes has summarized the OpenMP language semantics below (thanks,
Johannes!).

The relevant review thread is here: https://reviews.llvm.org/D71241

Alexey's position is that, because the source code contains what appears
to be a call to the base() function, the AST should always reflect that
fact by having getCallee() return a reference to base(), and we should
lower the call to the selected variant using logic in CodeGen. Some
other member function should be used by AST-level tools to retrieve the
actually-called variant. This has the benefit that the primary AST
representation is independent of the compilation target and other
relevant OpenMP context.

My position is that, like other cases where we perform overload
resolution and specializaton selection (including host/device overloads
in CUDA), we should resolve the variant selected in Sema, and
getCallee() should return a reference to the function that will actually
be called (even if that function has a different name from the name used
syntactically to form the call expression). This will ensure that
static-analysis tools see the correct call-site <-> callee relationship.
We should, I think, also keep a reference to the original OpenMP base
function in the AST, but some other member function should be used to
retrieve it.

We say that we keep Clang's AST faithful to the source code, but how to
best apply that philosophy in this case is now under debate.

Is it always immediately decidable when parsing a reference to a function
which variant should be used, or is it sometimes dynamic or at least
delayed?

For now, which means OpenMP 5.0 and TR8, it is conceptually immediately
decidable. It only depends on compilation parameters, e.g., the target
triple, and the lexical context.

How is this expected to interact with C++ overloading? Can you
independently declare variants of each overload?

While the standard is not specific on this, I don't see why not.

I tried to summarize what the standard says yesterday [0]:

OpenMP basically says, if you have a call to a (base)function* that has
variants with contexts that match at the call site, call the variant
with the highest score. The variants are specified by a variant-func-id,
which is a base language identifier or C++ template-id. For C++, the
variant declaration is identified by *performing the base language
lookup rules on the variant-func-id with arguments that correspond to
the base function argument types*.

* However you figured out that the base function is the one called in
  the first place.

[0] https://reviews.llvm.org/D71241#1788003

How does this interact with nested scopes? If all variants are unacceptable
for the use context, is it like the declaration just doesn’t exist, and so
lookup continues to outer scopes? Or is this impossible because there
always has to be a non-variant declaration in the current scope?

The latter. As mentioned above, you first find the "normal" call target
and then apply the variant logic from there.

My immediate intuition is that, assuming the semantics are always static and
that there’s always a non-variant function, this should be handled as a sort
of second level of overload resolution. The variant declarations should be
considered more-or-less independent functions; they are not redeclarations
of the original. 5.1-type variants should be hidden from lookup, so that
only the original function is found. When we resolve a use of a declaration
with variants we then pick the appropriate variant and treat the declaration
as if it was using that originally. The fact that we resolved it via a
variant should be recorded in the FoundDecl, which would now have an
additional possible state: we could have looked through a using declaration,
and we could have resolved a variant. This shouldn’t be a problem for
FoundDecl.

I think your assumptions are met.

Is there a good reason to make 5.1-type variants different from
multi-versions (as we have them)? They do not depend on the lexical call
context but only on the compilation parameters.

Are multi-versions yet another feature? Do they interact with this one?

John.

In 5.1, `begin/end declare variant` multi-version a function basically
the same way as `__attribute__((target(...)))` does. The condition can
be more than only a target though. (I mispoke earlier, it can include
call site context information). So we have multiple versions of a
function, let's say "sin", and depending on the compilation target,
e.g., are we compiling for nvptx or not, ans call site context, e.g.,
are we syntacitally inside a parallel region, we pick on of them. The
prototype for this reuses almost all of the multi-version code that
enables the target attribute as it seemed to be the natural fit.

I see. And that’s still totally statically selected at use time, right?

John.

Yes, as of OpenMP TR8 (Nov this year). But (to me) it is fairly certain
we'll also get an dynamic dispatch version too, maybe even this year (for
OpenMP 5.1).

A dynamic version of __attribute__((target)), or of variants, or both?

John.

A dynamic version of variants, potentially both kinds:
The 5.0 declare variant (=different names for base and variant
function).
The 5.1 begin/end declare variant (=same name for base and variant
function).

I maybe was a bit confusing earlier:
We don't actually have __attribute__((target)) but begin/end declare
variant is very similar in the behavior right now.

Don’t you have context-specific variants? How would you expect dynamic
dispatch to work?

Anyway, if dynamic dispatch is on the table, then this question gets
very interesting.

As a general matter, I think the AST should represent the formal semantics
of the program. Absent dynamic dispatch, the semantics are that a use is
resolved to a particular function. And Sema generally needs to know
what declaration(s) are actually being used — we may need to diagnose
something about the use (e.g. deprecation or unavailability), or we may
just need to mark other declarations as used transitively, instantiate
templates, and so on. So resolving the lookup down to the underlying
variant but recording the lookup/delegation structure via the FoundDecl
makes sense to me.

But if the semantics are potentially dynamic then that starts coming
apart a little because it’s very important whether a reference is
semantically to the whole variant set or specifically to the non-variant
base declaration. We have some limited ability to represent differences
like this with GlobalDecl; however, this is a much more substantial
difference than e.g. constructor variants because using the whole variant
set means using all of the declarations in it. If we decide that
GlobalDecl is the right abstraction for this, we may have to push it
through a bunch of different places. Alternatively, we might want to
introduce a new kind of declaration that represents a variant set.

John.

>>>>>>> Is there a good reason to make 5.1-type variants different from
>>>>>>> multi-versions (as we have them)? They do not depend on
>>>>>>> the lexical call
>>>>>>> context but only on the compilation parameters.
>>>>>>
>>>>>> Are multi-versions yet another feature? Do they interact
>>>>>> with this one?
>>>>>
>>>>> In 5.1, `begin/end declare variant` multi-version a function
>>>>> basically
>>>>> the same way as `__attribute__((target(...)))` does. The
>>>>> condition can
>>>>> be more than only a target though. (I mispoke earlier, it can
>>>>> include
>>>>> call site context information). So we have multiple versions of a
>>>>> function, let's say "sin", and depending on the compilation target,
>>>>> e.g., are we compiling for nvptx or not, ans call site context,
>>>>> e.g.,
>>>>> are we syntacitally inside a parallel region, we pick on of
>>>>> them. The
>>>>> prototype for this reuses almost all of the multi-version code that
>>>>> enables the target attribute as it seemed to be the natural fit.
>>>>
>>>> I see. And that’s still totally statically selected at use time,
>>>> right?
>>>
>>> Yes, as of OpenMP TR8 (Nov this year). But (to me) it is fairly certain
>>> we'll also get an dynamic dispatch version too, maybe even this year
>>> (for
>>> OpenMP 5.1).
>>
>> A dynamic version of `__attribute__((target))`, or of variants, or both?
>
> A dynamic version of variants, potentially both kinds:
> The 5.0 declare variant (=different names for base and variant
> function).
> The 5.1 begin/end declare variant (=same name for base and variant
> function).
>
> I maybe was a bit confusing earlier:
> We don't actually have __attribute__((target)) but begin/end declare
> variant is very similar in the behavior right now.

Don’t you have context-specific variants? How would you expect dynamic
dispatch to work?

I'm not sure I understand your question but I expect us to generate an
if-cascade in the most generic case.

Anyway, if dynamic dispatch is on the table, then this question gets
very interesting.

As a general matter, I think the AST should represent the formal semantics
of the program. Absent dynamic dispatch, the semantics are that a use is
resolved to a particular function. And Sema generally needs to know
what declaration(s) are actually being used — we may need to diagnose
something about the use (e.g. deprecation or unavailability), or we may
just need to mark other declarations as used transitively, instantiate
templates, and so on. So resolving the lookup down to the underlying
variant but recording the lookup/delegation structure via the FoundDecl
makes sense to me.

Sounds good. Thanks.

But if the semantics are potentially dynamic then that starts coming
apart a little because it’s very important whether a reference is
semantically to the whole variant set or specifically to the non-variant
base declaration. We have some limited ability to represent differences
like this with GlobalDecl; however, this is a much more substantial
difference than e.g. constructor variants because using the whole variant
set means using all of the declarations in it. If we decide that
GlobalDecl is the right abstraction for this, we may have to push it
through a bunch of different places. Alternatively, we might want to
introduce a new kind of declaration that represents a variant set.

I would prefer we table this discussion until after the January OpenMP
standards meeting. I'll probably look into prototyping what we will
define as preliminary semantics there.

Is there a good reason to make 5.1-type variants different from
multi-versions (as we have them)? They do not depend on
the lexical call
context but only on the compilation parameters.

Are multi-versions yet another feature? Do they interact
with this one?

In 5.1, `begin/end declare variant` multi-version a function
basically
the same way as `__attribute__((target(...)))` does. The
condition can
be more than only a target though. (I mispoke earlier, it can
include
call site context information). So we have multiple versions of a
function, let's say "sin", and depending on the compilation target,
e.g., are we compiling for nvptx or not, ans call site context,
e.g.,
are we syntacitally inside a parallel region, we pick on of
them. The
prototype for this reuses almost all of the multi-version code that
enables the target attribute as it seemed to be the natural fit.

I see. And that’s still totally statically selected at use time,
right?

Yes, as of OpenMP TR8 (Nov this year). But (to me) it is fairly certain
we'll also get an dynamic dispatch version too, maybe even this year
(for
OpenMP 5.1).

A dynamic version of `__attribute__((target))`, or of variants, or both?

A dynamic version of variants, potentially both kinds:
The 5.0 declare variant (=different names for base and variant
function).
The 5.1 begin/end declare variant (=same name for base and variant
function).

I maybe was a bit confusing earlier:
We don't actually have __attribute__((target)) but begin/end declare
variant is very similar in the behavior right now.

Don’t you have context-specific variants? How would you expect dynamic
dispatch to work?

I'm not sure I understand your question but I expect us to generate an
if-cascade in the most generic case.

Oh, so spell it out at the use site? Yes, that makes sense; I don’t know
why I didn’t consider that.

Anyway, if dynamic dispatch is on the table, then this question gets
very interesting.

As a general matter, I think the AST should represent the formal semantics
of the program. Absent dynamic dispatch, the semantics are that a use is
resolved to a particular function. And Sema generally needs to know
what declaration(s) are actually being used — we may need to diagnose
something about the use (e.g. deprecation or unavailability), or we may
just need to mark other declarations as used transitively, instantiate
templates, and so on. So resolving the lookup down to the underlying
variant but recording the lookup/delegation structure via the FoundDecl
makes sense to me.

Sounds good. Thanks.

But if the semantics are potentially dynamic then that starts coming
apart a little because it’s very important whether a reference is
semantically to the whole variant set or specifically to the non-variant
base declaration. We have some limited ability to represent differences
like this with GlobalDecl; however, this is a much more substantial
difference than e.g. constructor variants because using the whole variant
set means using all of the declarations in it. If we decide that
GlobalDecl is the right abstraction for this, we may have to push it
through a bunch of different places. Alternatively, we might want to
introduce a new kind of declaration that represents a variant set.

I would prefer we table this discussion until after the January OpenMP
standards meeting. I'll probably look into prototyping what we will
define as preliminary semantics there.

WFM.

John.