[RFC] New attribute `annotate_type` (iteration 2)

Martin Brænne @martinboehme
Dmitri Gribenko @gribozavr

Summary

We propose adding a new attribute clang::annotate_type to Clang. This attribute is analogous to the existing clang::annotate declaration attribute but for use on types. As for annotate, the typical use case is to add annotations for static analysis tools that are not integrated into the core Clang compiler (e.g., Clang-Tidy checks or out-of-tree Clang-based tools).

The particular use case that motivates our proposal is annotating lifetime contracts. See this RFC for details on this use case.

We propose a general-purpose annotation attribute rather than an attribute specific to our use case for the following reasons:

  • The tooling for inferring and validating lifetime contracts will initially be experimental and will require extensive validation before we stabilize the annotation syntax. We would prefer not to add lifetime-specific attributes to Clang that might need to be modified repeatedly or even removed again entirely.

  • We expect a general-purpose type annotation attribute to be useful for other types of static analysis. For example, see the Java Checker Framework, which provides a wide array of static checks based on Java’s general-purpose type annotations.

This is an update of an earlier proposal. In particular, we have significantly extended the discussion of type system implications. Rather than continue discussion on the old thread, we have decided to open a new thread so that we can summarize the current state of the entire proposal at the top of the thread.

Proposal

We propose adding a type attribute called clang::annotate_type.

Attribute syntax: C++11 / C2x, not GNU

This type attribute will only support C++11 and C2x attribute syntax, not GNU attribute syntax. This is because the C and C++ standards define explicitly where a C++11 / C2x attribute may appear and what it appertains to when it appears in a given position, whereas the GNU syntax is more ad-hoc and fuzzy. This is particularly important for a general-purpose type attribute, which may appear in many different positions; see also the discussion in the “alternatives considered” section below.

No influence on program semantics

The attribute will not have any effect on the semantics of C++ code, neither type checking rules, nor runtime semantics. In particular:

  • No effect on the formal C++ type system:
    • std::is_same<T, T [[clang::annotate_type("foo")]] is true for all types T.
    • It is not permissible for overloaded functions or template specializations to differ merely by an annotate_type attribute.
    • The presence of an annotate_type attribute will not affect name mangling.
  • The attribute will not cause any additional data to be output to LLVM IR, object files or debug information. It is only intended to be consumed using Clang APIs by source-level static analysis tools such as Clang-Tidy.

Note that a language extension that starts out as [[clang::annotate_type]] may, in the future, transition to a first-class Clang attribute, which may affect how undefined behavior manifests. In other words, static analysis implemented using [[clang::annotate_type]] may be supplemented with a sanitizer-like dynamic analysis if it only emits diagnostics in the case of UB.

Rationale

We believe there are compelling reasons why a custom type attribute should not affect the formal C++ type system:

  • Portability between standard C++ compilers and compilers that understand the attribute.

  • Allowing the attribute to follow type checking rules that go beyond the standard C++ type system, for example, flow-sensitive typing. Of course, such type checking rules can be only used to subset the set of programs allowed by standard C++.

Portability

For portability, it is customary to wrap attributes in macros that expand to nothing if the compiler does not support the attribute. For example:

#define CLANG_ANNOTATE_TYPE(x)
#ifdef __has_cpp_attribute
#if __has_cpp_attribute(clang::annotate_type)
#undef CLANG_ANNOTATE_TYPE
#define CLANG_ANNOTATE_TYPE(x) [[clang::annotate_type(x)]]
#endif
#endif
#endif

If the attribute did affect type semantics, then a program containing the attribute would have different behavior when compiled depending on the compiler it was compiled with. Moreover, whether a program even compiles or not could be affected by whether the compiler supports the attribute. To demonstrate this, let us assume we allowed template specializations to differ only by an annotate_type attribute:

// (Counterexample. We propose this program should be invalid.)
template <class T>
struct S {};

template<>
struct S<int*> {};

template<>
struct S<int* CLANG_ANNOTATE_TYPE("foo")> {};

On compilers that do not support the attribute, the macro expands to nothing, and the program would therefore be rejected because of a duplicate template specialization. Meanwhile, compilers that do support the attribute would compile the program. In other words, the attribute would extend the set of valid programs. A developer might not realize this until they try to port to a compiler that does not support the attribute. This does not seem desirable.

Subsetting C++ with a “shadow” type system

We expect that static analysis checks enabled by annotate_type will impose constraints that cannot be expressed within the C++ type system. Otherwise, one could simply create a pure C++ library solution by defining types that enforce the desired constraints through the C++ type system.

An example of constraints that cannot be expressed within the C++ type system is flow-sensitive typing, where the type of an expression depends on its position in the control-flow graph. This concept does not exist in the C++ type system.

Extending the C++ type system to accommodate flow-sensitive typing is an extremely difficult problem without a clear path to a solution. It is more practicable for this kind of analysis to impose a shadow type system that exists only within the context of the analysis but has no effect on the formal C++ type system or runtime semantics. An implication of this is that the constraints imposed by the shadow type system must strictly reduce the set of valid programs from what is allowed by standard C++. Within the formal C++ type system, on the other hand, all annotated variants of a type are permissively considered to be the same type; this also avoids portability issues, as discussed above.

Formally, the shadow type system introduces refinement types, which restrict the set of permissible values of an existing type. Refinement types express pre- and postconditions on functions that use them in their signature.

Language subsetting through a shadow type system has the following benefits:

  • Type annotations can be rolled out incrementally to an existing codebase.
  • Annotated programs can be compiled by a standard C++ compiler without affecting runtime semantics.
  • Type annotations can be checked by a sanitizer-like tool at runtime because undefined behavior can be defined by an implementation to be a reliable program termination.

As a concrete example of an analysis that requires flow-sensitive typing to express the desired semantics, consider the existing nullability qualifiers (_Nullable, _Nonnull). Like the proposed annotate_type attribute, these are used to annotate types with additional information needed for static analysis (see also the Clang RFC that proposed the introduction of these qualifiers) and do not affect the formal C++ type system or runtime semantics.

Consider this piece of code, which is fine from a nullability point of view:

void f(int* _Nonnull nonnull_p) {
   std::cout << *nonnull_p;
}
void g(int* _Nullable nullable_p) {
  if (nullable_p) {
    f(nullable_p);
  }
}

The function call f(nullable_p) is permissible only because of the preceding if (nullable_p) check. Without this check, the function call should not be permissible. We therefore need to consider nullable_p to have different types in different parts of the function: int* _Nullable outside the if (nullable_p) block, and int* _Nonnull inside the if block.

This is an example of flow-sensitive typing, but note that the existing checks in Clang for the nullability annotations are not in fact flow-sensitive and therefore produce false negatives. In the above example, Clang does not warn if the if (nullable_p) condition is removed. A sound check of the nullability annotations would need to be flow-sensitive.

Let us return now to the proposed annotate_type attribute. Because this attribute does not affect the C++ type but is just type sugar, a static analysis tool that wants to use this attribute may need to reconstruct type sugar in places where Clang canonicalizes types. This appears to be a necessary tradeoff to achieve the desired type semantics, and the authors of the nullability qualifier RFC come to the same conclusion (see section “Type System Impact” in the RFC).

Alternative considered: Extend annotate to apply to types

Instead of introducing a new attribute, we considered extending the existing annotate attribute to apply not just to declarations but also to types.

The reason we decided not to go this route is because the annotate attribute allows GNU attribute syntax to be used, and, as we have noted above, the semantics of GNU attributes, in terms of what they appertain to, are different than those of C++11 attributes and potentially more surprising, particularly when they are used as type attributes.

For example, the GNU syntax for the annotate attribute can be applied in both of the following positions today:

__attribute__((annotate(“foo”))) int i1;
int __attribute__((annotate(“foo”))) i2;

In both positions, the attribute is interpreted as appertaining to the variable declaration. We must assume that there is existing code that uses the attribute in both of these positions.

If we want to extend annotate to serve as a type attribute, this is a problem. What syntax should we use to annotate the int type, rather than the declaration?

One possibility would be to retain the existing semantics for the GNU syntax (i.e. interpret the attribute as a declaration attribute in both cases) but use the semantics mandated by the C++ standard for the C++11 syntax (i.e., interpret the attribute as a declaration attribute when it is placed at the beginning of the entire declaration, and as a type attribute when it is placed after the type):

[[clang::annotate(“foo”)]] int i3;  // declaration attribute
int [[clang::annotate(“foo”)]] i4;  // type attribute

This approach would not raise any backwards compatibility concerns, as Clang currently rejects the C++11 syntax for annotate when it is placed after a type.

However, this would mean that the semantics of the attribute would change depending on which syntax is used:

int __attribute__((annotate(“foo”))) i2;  // declaration attribute
int [[clang::annotate(“foo”)]] i4;  // type attribute

This is confusing and seems likely to trip programmers up. We therefore believe it is better to introduce a new attribute rather than extend annotate to be a type attribute.

Implementation status

We have uploaded a draft patch to Phabricator that implements the proposed annotate_type attribute. We have used the attribute to implement a lifetime annotation scheme (details here) and have demonstrated the ability to annotate lifetimes on a diverse set of C++ language constructs, including inside template arguments.

2 Likes

@AaronBallman As we had an extensive discussion on an earlier version of this proposal, I wanted to make sure you saw this. Thank you for all of the feedback you gave at the time – it really helped us to become clear on how the details of this proposal should work.

1 Like

Thank you for this second iteration! In general, I’m really happy with what you’ve proposed here. I think the syntax choice of using [[]] attributes is perfect, I think making a new spelling that’s distinct from annotate is perfect, and I think the semantics (or lack thereof) is exactly what we want (at least initially – I could see someone wanting to extend this to produce different LLVM IR so that analysis operating at that level can use the extra type information, but no need to go there initially).

You mention that the attribute won’t effect the type system, but there is one case where I think it kind of sorta should – type aliasing. I think it’s useful to be able to do: typedef int [[clang::annotate_type("foo")]] FooAnnotatedInt; and for uses of FooAnnotatedInt to behave as though they were the int marked with the attribute. (You may already have this case in mind and expect to support it, but it wasn’t in the RFC and I wanted to be sure it was called out explicitly.)

I did have a question about the shadow type system – are you also proposing to add those facilities to Clang (or perhaps the Clang Static Analyzer specifically) more formally, or are you expecting to re-use the existing type facilities we have? (I’m trying to gauge if we’re getting closer to a pluggable type system, which, if I understand properly, is basically the same thing you’re looking to do with this annotation.)

Another question I have is on the design of what you expect to be able to annotate with this. It’s not clear whether you expect to be able to do something like const [[clang::annotate_type("foo")]] int [[clang::annotate_type("bar")]] i = 12; where you are annotating the type qualifier as well as the type specifier? (Note, Clang currently has a bug where we don’t parse the attribute properly on the qualifier and we don’t have any facilities to attach attribute information to a qualifier. But at the same time, if this is a general attribute for types where you want to use flow-sensitive annotations, the qualifier may be something you can attach extra information to specifically.) And if we’re thinking about flow sensitive information about qualifiers… do we also want to think about flow sensitive information about storage class specifiers (static, extern, etc) and function specifiers (inline, etc)? (Those have similar burdens as type qualifiers.) Personally, I think it’d be nice to annotate qualifiers specifically, but not necessary for the initial patch. Other kinds of specifiers seem less compelling and equally as unnecessary for the initial patch. But perhaps you have different use cases in mind that make one or both of these more interesting for the initial offering.

Regardless of the answers to the above, I think this RFC is a good idea and I support moving forward with it. FWIW, I intend to start reviewing the draft patch sometime in the next few days.

This sounds reasonable, but I think it would be up to the static analysis tools that interpret the attribute to perform this propagation. From the point of view of Clang, I’m not sure there’s a meaningful distinction to be made, beyond answering the question of how this attribute will show up in the AST. Let’s use a concrete example:

typedef int [[clang::annotate_type("foo")]] FooAnnotatedInt;
FooAnnotatedInt f();

Here’s how I think this should behave (and how it does currently behave in the draft patch):

  • The FooAnnotatedInt in the declaration of f() would be a TypedefType (pretty obviously).

  • Calling desugar() on this TypedefType would yield an AttributedType representing int [[clang::annotate_type("foo")]] (also pretty obviously, as that’s the underlying type of FooAnnotatedInt).

  • Calling getCanonicalType() on the TypedefType would yield a BuiltinType representing int. (Yet again, it pretty obviously has to do this, as we’ve said that the attribute doesn’t have any semantics from the C++ point of view, so it’s pretty much a given that the type has to canonicalize to int.)

Does all of this mean that FooAnnotatedInt behaves as if it were an int marked with the attribute? As I say, from the point of view of Clang, I’m not sure there’s really a distinction to be made, as within Clang an int with the attribute behaves just like an int without the attribute.

A static analysis tool that interprets the AST, on the other hand, has the choice. If it wants FooAnnotatedInt to behave like an attributed int, the AST contains all the information the tool needs to do this. If, on the other hand, the tool wants FooAnnotatedInt to behave like a plain int (or some other choice that may make sense for a specific tool), it can do that too.

No, we’re not planning to add any “library” functionality to Clang or Clang Static Analyzer that would provide facilities for shadow type systems, at least for now. The shadow type system would be implemented entirely within the static analysis check that interprets the annotate_type annotations. If at some point it turns out that there is common functionality that should be shared among multiple such checks, then of course it could make sense to factor it out into a library.

Thanks, that’s an interesting paper. I’m not really familiar with the other literature around this, but what the paper describes does sound a lot like what we’re trying to achieve.

As I recall, this is something that we discussed on the earlier version of this proposal, but I have since realized that I must have misread the C++ grammar. Looking at the grammar again, I believe it is not permissible to put an attribute after each individual decl-specifier, but only after the entire decl-specifier-seq. The relevant part of the grammar is here.

What this means is that in your example above, an annotate_type can be placed after the int, but not after the const, and indeed both Clang and gcc produce an error if a C++11 attribute is placed after the const. So the question of whether we want to allow annotate_type on qualifiers is, for better or worse, moot.

Thanks, that sounds great!

It definitely is – sorry for being unclear, I was talking about the semantic effects in the static analyzer when I meant “effect the type system” there, as opposed to things like overload resolution or template specializations. It sounds like we’re on the same page here.

That seems perfectly reasonable to me.

Yeah, that’s why I was like “wait, are we getting MORE goodies than I thought?” :slight_smile: It’s fine that we’re not, though.

In a simple-declaration ([dcl.dcl]), there’s a decl-specifier-seq ([dcl.dcl]) which takes a decl-specifier followed by an optional attribute-specifier-seq; the decl-specifier can be a defining-type-specifier ([dcl.dcl]) which can be a type-specifier ([dcl.dcl]), which includes qualifiers.

If you remove the attribute from the attribute specifier seq, you see GCC does accept it: Compiler Explorer.

Amusingly, Clang and ICC reject my example while GCC and MSVC accept it. It’s possible that we’re reading [dcl.dcl] differently – if the decl-specifier doesn’t specify a type (because it’s just a qualifier), what happens?

Based on that, I agree, let’s punt on this entirely. We can burn this bridge later if we need to.

Sounds good!

Agreed – but in terms of productions, all this allows us to do is to go from a decl-specifier-seq to a single qualifier followed by an attribute-specifier-seq – we can’t then add a type (and possibly another attribute-specifier-seq) after the first attribute-specifier-seq.

To make this more concrete, here is a sample sequence of productions:

decl-specifier-seqdecl-specifier attribute-specifier-seq
decl-specifier attribute-specifier-seqdefining-type-specifier attribute-specifier-seq
defining-type-specifier attribute-specifier-seqtype-specifier attribute-specifier-seq
type-specifier attribute-specifier-seqcv-qualifier attribute-specifier-seq
cv-qualifier attribute-specifier-seqconst [[foo]]

But if we’ve got a cv-qualifier, then we also need some type that it qualifies – in other words, we need another decl-specifier. We can obtain additional decl-specifiers using the production decl-specifier-seqdecl-specifier decl-specifier-seq. For example, if we want two decl-specifiers (e.g. const int):

decl-specifier-seqdecl-specifier decl-specifier-seq
decl-specifier decl-specifier-seqdecl-specifier decl-specifier attribute-specifier-seq

and then, omitting intermediate steps:

decl-specifier decl-specifier attribute-specifier-seqconst int [[foo]]

But I can’t see any way to invoke these productions where we could end up with an attribute-specifier-seq interspersed between two decl-specifiers.

Does this make sense? Am I misinterpreting something here?

Oh, great catch, I was reading the decl-specifier-seq backwards! When we have two decl-specifiers, we can’t have the attribute list between them. Thank you for pushing back, it seems Clang + ICC are correct and GCC + MSVC have a parsing bug.

OK, great to hear we interpret this the same!

I originally misread the grammar too when we were discussing this in the context of the first proposal – I probably led you down the garden path by that.

If we’ve resolved all of the questions here, the implementation should be ready for review now. (I added documentation today, which I noticed was still missing, and also a unit test that shows that the arguments of the attribute are parsed and preserved correctly.)

“Contributing Extensions to Clang” criteria

A reviewer on https://reviews.llvm.org/D111548 requested input on the criteria set up in “Contributing Extensions to Clang”. It appears that it is no longer possible to edit the original post, so I will provide that material here.

A specific need to reside within the Clang tree

While it is possible for Clang-based tools to add custom declaration attributes using Clang plugins, the same is not true for type attributes.

We investigated the feasibility of extending the plugin mechanism to type attributes (see this abandoned patch). However, the conclusion from this discussion was that adding type attributes through a plugin is significantly more difficult than adding declaration attributes because one would want to be able to specify the effects that the pluggable type attribute has on the type system. A quote from @AaronBallman on the review: “I love the idea of plugin type attributes in theory, but I don’t think we’re architecturally in a place where we can do that very easily.”

Based on this, we decided to proceed with the alternative of adding a general-purpose type annotation attribute to Clang.

A specification

D111548 adds documentation that describes both the syntax and semantics of the annotate_type attribute.

Representation within the appropriate governing organization

At this time, we do not intend to propose the annotate_type attribute as an extension to the C or C++ standards. Like the existing annotate declaration attribute, we are proposing to make annotate_type a vendor extension and, hence, to place the attribute in the clang namespace.

A long-term support plan

The change is small and self-contained, and we do not expect it to require significant support. To the extent that the feature does require support in the face of changes to Clang or the language itself, we can commit to providing this support as part of Google’s broader commitment to the Clang/LLVM project.

Note that a significant part of D111548 is code that is being moved around. The non-test, non-documentation code changes are less than 100 lines:

  • A small change in Parser::ParseDeclarationSpecifiers enables adding C++11 attributes to a decl-specifier-seq.
  • The rest of the changes consist of an attribute definition and attribute handlers. These are well-contained within switch cases for the new attribute kind. They do not make any broader architectural or behavioral changes to Clang.

A high-quality implementation

D111548 provides an implementation that follows LLVM’s coding conventions and meets Clang’s quality standards. The primary reviewer has stated “I’m going to add another reviewer just to make sure I’ve not missed something, but I think this is about ready to go”.

A test suite

D111548 adds a comprehensive test suite for the annotate_type attribute, covering semantic analysis, codegen (or rather lack of influence on codegen), and representation of the attribute in the AST.