Sorry, this is going to be a long reply again with lots of individual comments – but to front-load the conclusion (which should also give some detail for all the detailed comments):
If we can make [[clang::annotate]] work in type positions, then I would definitely find that palatable and probably even preferable! I think it may be non-trivial to implement though.
I would like to propose a new attribute annotate_type
that would be analogous to the existing annotate
attribute but for use on types. As for annotate
, the typical use case would be to add annotations to be used by static analysis tools.
I have a draft patch implementing this attribute (https://reviews.llvm.org/D111548), but before it’s reviewed I wanted to solicit some feedback more broadly on whether people feel this makes sense.
Thanks for the proposal! One question I have is whether we need an
additional attribute for this instead of reworking [[clang::annotate]]
so that it applies to either a type or a declaration. Is a separate
attribute necessary because there may be unfortunate problems with
using attribute((annotate)) as the spelling?
Yes, that’s exactly the problem. Today, the GNU spelling can be applied both before and after the type name:
attribute((annotate(“foo”))) int i1;
int attribute((annotate(“foo”))) i2;
We would need to reinterpret the second variant to refer (or “appertain”) to the decl-specifier-seq, not to the declaration (which is, IIUC, what the C++ standard would prescribe if this was a C++ attribute). But as it’s possible that this variant is already being used “in the wild” with the intent that it should refer to the declaration, I think this isn’t something we can change.
Agreed, thanks for the confirmation that this was the reason why we
need a second attribute.
Hence, the practical solution seems to be to introduce a separate attribute for types, for which it is then unambiguous what it should apply to.
Another possible solution would be to not support
attribute((annotate)) as a type attribute (e.g., you can use
[[clang::annotate]] as a declaration or a type attribute, but we would
restrict attribute((annotate)) to only ever be a declaration
attribute.).
I guess that would be an option too, though I think this would require some changes to Clang to allow C++ attributes to be used in this position:
int [[clang::annotate]] foo;
Clang already supports type attributes in that position today. e.g.,
https://godbolt.org/z/rTxoWWfdb
Quoting the godbolt example to save having to follow the link:
int * [[clang::address_space(126)]] ip = nullptr;
This is different from my example above though because the attribute comes after the *
, which makes it part of the declarator or, more specifically, the ptr-operator:
http://eel.is/c++draft/dcl.dcl#nt:ptr-operator
In contrast, with the example int [[clang::annotate]] foo
that I was referring to, the attribute is part of the decl-specifier-seq (specifically, it’s an attribute on the defining-type-specifier, as you also discuss below).
The position Clang doesn’t currently support is on decl specifiers.
e.g., https://godbolt.org/z/rG1aP7Tz5
Agreed.
See also discussion below.
As a side note, IIUC, Clang currently more generally lumps GNU attributes that occur in these two positions together, right? I’m looking at Parser::ParseSimpleDeclaration(), specifically this line:
DS.takeAttributesFrom(Attrs);
For C++ attributes, I believe this would not be correct, but that’s not (currently) a concern because AFAICT Clang currently doesn’t allow C++ attributes to occur after a decl-specifier-seq at all (because, I presume, Clang doesn’t yet support any C++ attributes that can occur in this position).
We currently don’t support any attributes that appertain to the
declaration specifiers (in any spelling mode).
But one of the enumerators in TypeAttrLocation is this:
/// The attribute is in the decl-specifier-seq.
TAL_DeclSpec,
And it looks as if it is used?
Ah, yeah, this is confusing stuff. decl-specifier-seq can have
attributes (http://eel.is/c++draft/dcl.spec#nt:decl-specifier-seq) and
TAL_DeclSpec tracks that. However, we do not currently support
attributes on all kinds of decl specifiers. e.g., [[supported #1]]
static [[not supported #2]] int [[supported #3]] i [[supported #4]];
Just to make sure we’re on the same page: Not all of these attributes are part of the decl-specfier-seq – right?
#1 is the attribute-specifier-seq in a simple-declaration: http://eel.is/c++draft/dcl.dcl#nt:simple-declaration
#4 is the attribute-specifier-seq in a noptr-declarator: http://eel.is/c++draft/dcl.dcl#nt:noptr-declarator
We support the ones that are in the declaration position (#1), the
type position (#2),
I think you mean #3?
and the identifier position (#4) but do not
support ones on storage class specifiers, function specifiers, or
other kinds of decl specifiers other than a defining-type-specifier.
e.g., https://godbolt.org/z/8sq5Wo8sT
I think [] is treated differently from known attribute (and unknown attributes may be treated differently yet again?). Anyway, I think we get into this more deeply below anyway.
GNU attributes will
“slide” to the type or the declaration based on the name of the
attribute when doing semantic processing for them (see the
moveAttrFromListToList() calls in SemaType.cpp for some examples of
this).
Thanks, that’s an interesting pointer.
I wondering now though: What do attributes attach to when the “sliding” logic doesn’t apply to them (because it doesn’t know them)?
Heh, now you’re seeing why GNU attributes were replaced by what was
standardized in C and C++. The answer is: we don’t know what to do and
this can sometimes even impact our ability to parse properly (for
example, if it’s a type attribute in C, it may be the difference
between ANSI “implicit int” being allowed or not). However, in Clang,
we do not retain unknown attributes in the AST, so we parse as best we
can and then drop what we parsed onto the floor if the attribute is
unknown.
Hm. Got it.
This does seem to make a pretty strong argument for what you’re suggesting, namely that we shouldn’t support the GNU syntax of the annotate
attribute for annotations on types and instead should support only the C++ syntax (where, thankfully, the standard makes it pretty clear what the attribute appertains to).
Maybe I should clarify my use case (which will probably benefit the whole discusssion). What I’m looking to be able to do is to attach attributes to different levels of a potentially multi-level pointer. For example:
int attribute((annotate_type(“A”))) * attribute((annotate_type(“B”))) * attribute((annotate_type(“C”))) pp;
The “A” annotation is what I mean by being able to attach an attribute to the decl specifiers (maybe my terminology isn’t accurate here?).
One use case for this is being able to specify which of the levels a nullability annotation refers to and not just have it always implicitly refer to the outermost pointer, as I believe the existing nullability annotations do. (See the Clang fork here for an example that does something very similar: https://github.com/sampsyo/quala)
Staying with the nullability example, it’s important to be able to attach an annotation to the base type itself – imagine the int
in the example above was instead a unique_ptr
.
This does seem to be possible to some extent with Clang today because if I feed the example above to the code in my draft change, I get AttributedTypeLoc
s for the types int
, int *
, and int **
, with annotations “A”, “B”, and “C” respectively.
It is possible today, but I note that we might have a bug with the
[] implementation: https://godbolt.org/z/vr81KzG33 – I’d have to
dig around more to see what’s going on there.
Yes, I think this example makes the problem pretty clear, namely that C++ attributes can’t be applied to the defining-type-specifier today.
There’s an additional wrinkle here in that, IIUC, the address_space
attribute only makes sense to apply to a ptr-declarator, not a defining-type-specifier. I think the code that is causing the attribute to be rejected here isn’t specific to address_space
though – I think it’s just the general code we discuss below.
This also means that if we want to use the proposed annotate_type
attribute in this position, we have to use the GNU spelling.
I don’t think that’s correct. We certainly support type attributes
using a C++ spelling already.
I think this only works though if the attribute is applied to a declarator, but not to the decl specifiers? Here’s an excerpt from the test in my draft change (https://reviews.llvm.org/D111548):
int attribute((annotate_type(“bar”))) w;
int [[clang::annotate_type(“bar”)]] w2; // expected-error {{‘annotate_type’ attribute cannot be applied to types}}
int *attribute((annotate_type(“bar”))) x;
int *[[clang::annotate_type(“bar”)]] x2;
The error seems to be coming from the following chunk of code in Parser::ParseDeclarationSpecifiers(), which is specific to C++ attributes:
// Reject C++11 attributes that appertain to decl specifiers as
// we don’t support any C++11 attributes that appertain to decl
// specifiers. This also conforms to what g++ 4.8 is doing.
ProhibitCXX11Attributes(attrs, diag::err_attribute_not_type_attr);
Oh hey, this is the possible bug I just noted above.
I think we
should not be rejecting if the decl specifier is a defining type
specifier because that has utility,
Agree – I’d like to add support for this in Clang.
I suspect the reason this isn’t supported today is because none of the existing type attributes need to be applied to the declaring-type-specifier. As noted above, address_space
only really makes sense on ptr-declarators. And without having checked, I suspect the other attributes also refer to properties of pointers (which is where I think most of the practical reasons for attributing types comes from).
Having said that, it would be great if the nullability attributes could, for example, be applied to a unique_ptr, but it seems that this isn’t the case (https://godbolt.org/z/qEn68xMz9).
but I’d have to go stare at the
standard a bit more to be sure. Declarator parsing is not the easiest
thing to reason about.
Let me know what you conclude.
From what I’ve seen in the standard (see links that you and I posted above), I think it’s pretty clear where this is supposed to slot into the grammar. But I assume what you’re referring to is more the problem of how to make this parse unambiguously in Clang?
This is an unfortunate niggle, but teaching Clang to correctly process C++ attributes in this position seems like a non-trivial undertaking because it would require us to treat GNU spellings and C++ spellings differently in Parser::ParseSimpleDeclaration(), which seems like it would be a bit of a headache.
I don’t think we’ll have to go down that route, hopefully.
What kind of type semantic impacts does this attribute have on the
type identity for things like overloading, name mangling, type
conversion, etc?
Good point. The intent is that the attribute should have no impact. Anything else would be pretty problematic from a portability point of view because other compilers will simply ignore the attribute – so if it had an effect on the type system in Clang, that could cause other compilers to generate different code.
Vendor-specific attributes are inherently not portable or safe to
ignore.
But for C++ attributes at least, the standard says that compilers should do exactly this (i.e. ignore unknown attributes)?
Same for C. The standard says that vendor-specific attributes are
fully implementation-defined (so they’re permitted to have whatever
semantics the implementation wants, including necessary side effects).
The standard also says that if the implementation doesn’t know about
an attribute, the implementation should ignore the attribute. The
combination of these rules is what makes vendor-specific attributes
inherently nonportable or safe to ignore.
As an example: https://godbolt.org/z/vsv3Wo6En
Thanks, makes sense!
Generally speaking, if a type attribute has no type semantics,
it probably shouldn’t be a type attribute. However, because this
attribute doesn’t really have any semantics beyond “keep this
information associated with the type”, it’s a bit more complex.
Consider:
void func(int i);
void func(int [[clang::annotate(“hoo boy”)]] i);
I can see an argument that this is not a valid overload set because
both functions accept an integer.
I think nullability attributes are a good analogy here (and essentially “prior art”). Here’s an example:
void func(int* i) {}
void func(int* _Nonnull i) {}
(https://godbolt.org/z/sP6za4WEE)
Clang doesn’t consider these to be overloads and instead complains about a redefinition.
I think modeling on the nullability attributes may not be a bad idea,
but note that those are surfaced to the user as keywords and not
attributes at all. I believe that was done specifically because the
nullability attributes had type semantics, but I could be remembering
incorrectly.
Do you think you can dig out this discussion again?
I don’t think it’s really important though – for the [[clang::annotate]] case (assuming that’s what we’ll go with in preference to annotate_type
), we certainly want to specify that the annotation does not create a new type.
I can certainly see how for different use cases you would want an attribute to create a different type – but I would say those would then require a second, separate attribute.
However:
void func(int i) [[clang::annotate(“ABI=v1”)]];
void func(int i) [[clang::annotate(“ABI=v2”)]];
I can see an argument that this is a valid overload set because
something else (the linker, whatever) is using that annotated
information to provide additional semantics beyond what the compiler
cares about.
As a data point that my be useful, I looked at what Clang does when I move those attributes to a position where they are declaration attributes (so that I can run the current Clang on them), i.e.:
[[clang::annotate(“ABI=v1”)]] void func(int i) {}
[[clang::annotate(“ABI=v2”)]] void func(int i) {}
(https://godbolt.org/z/fo5vrso6d)
Again, Clang doesn’t consider these to be overloads and instead complains about a redefinition.
FWIW, that’s because declaration attributes do not impact the type of
the function. We do have a declaration attribute that says “this
declaration is allowed to be overloaded” in C, but that’s subtly a
different thing.
So it seems that the existing semantics of the annotate
attribute are similar to what I’m proposing for the annotate_type
attribute.
Also, one problem here is – [[clang::annotate]] isn’t just used for
the static analyzer, it’s also something that can be used by backend
tools (for example, when using attribute plugins).
It’s good that you bring this up because I’m actually also thinking about extending ParsedAttrInfo to handle type attributes (with a new handleTypeAttribute() function as an analog to handleDeclAttribute()). The proposed annotate_type attribute would then also be a natural candidate for an attribute that the new handleTypeAttribute() could add to types.
Oh, neat! Unifying type/stmt/decl attributes more is definitely a great goal.
Is this the concern that you had here or were you thinking about something different?
Not really, it was more along the lines of lowering to LLVM IR (below)
so that the backend can also take advantage of this.
Do we need to
consider what information should be lowered to LLVM IR when this new
attribute appears on a type?
Thanks!
My intent was that this should have no effect on code generation.
Lowering annotations on declarations to LLVM IR has some obvious use cases, but for annotations on types it’s less obvious how it would be useful to lower these to IR (and also less obvious to me how one would represent these). (Did you have anything specific in mind here?)
LLVM currently has the ability to describe some types (e.g.,
%struct.X = type { i32 }
), so I was envisioning attaching this
attribute to the type (either as an LLVM attribute or metadata, I have
no idea what the distinction really is between them). LLVM currently
supports plugins as does Clang, and Clang’s support for attributes in
plugins still strongly encourages users to use existing attributes to
surface functionality, so I can imagine users wanting to use the
annotate type attribute in the same way they already use the annotate
declaration attribute for this sort of collusion. However, I don’t
think the codegen needs to happen up front – mostly trying to
understand the shape of the design still.
Makes sense.
Sorry that this has become such a long email, but I hope I’ve been able to give you a clearer picture of what I’m trying to achieve.
No worries, I appreciate the detailed discussion of the design!
I think we should double-check that I’m correct about the bug with
Clang accepting:
static int [] i;
but rejecting:
int [] i;
I don’t think that’s the distinction:
https://godbolt.org/z/MWcojK8n4
(Noting that it accepts both of these.)
Rather, this seems to be a case of [] being treated specially (probably earlier in the parser) and therefore being accepted in this position, while named, known attributes are not accepted here.
If we’re agreed that this is a bug, then it seems like once we fix
that bug, you could support [[clang::annotate]] in type positions
(with no type semantics) as well as declarations, but only if you were
willing to define attribute((annotate)) as applying to
declarations when there’s an ambiguity between type vs decl. Would
that be a palatable approach for you?
Yes, I think that would definitely work for me, and there would be an attraction in not having to introduce an additional annotate_type
attribute that otherwise works mostly the same as annotate
.
I think making this work may not be as simple as lifting the restriction in Parser::ParseDeclarationSpecifiers() that we discussed above. I think there’s another issue that I touched on previously (https://lists.llvm.org/pipermail/cfe-dev/2021-October/069094.html), namely that Clang currently lumps attributes for the declaration and the decl-specifier-seq together.
This happens in Parser::ParseSimpleDeclaration(), specifically this line
<https://github.com/llvm/llvm-project/blob/main/clang/lib/Parse/ParseDecl.cpp#L1818>
:
DS.takeAttributesFrom(Attrs);
To quote the conclusion I reached at the time:
For C++ attributes, I believe this would not be correct, but that’s not
(currently) a concern because AFAICT Clang currently doesn’t allow C++
attributes to occur after a decl-specifier-seq at all (because, I presume,
Clang doesn’t yet support any C++ attributes that can occur in this
position). This also means that if we want to use the proposed
annotate_type
attribute in this position, we have to use the GNU
spelling. This is an unfortunate niggle, but teaching Clang to correctly
process C++ attributes in this position seems like a non-trivial
undertaking because it would require us to treat GNU spellings and C++
spellings differently in Parser::ParseSimpleDeclaration(), which seems like
it would be a bit of a headache.
And I believe this is still true?
Do you have any other ideas on what we could do about this?
Thanks,
Martin