Martin Brænne @martinboehme
Dmitri Gribenko @gribozavr
Summary
We propose adding a new attribute clang::annotate_type
to Clang. This attribute is analogous to the existing clang::annotate
declaration attribute but for use on types. As for annotate
, the typical use case is to add annotations for static analysis tools that are not integrated into the core Clang compiler (e.g., Clang-Tidy checks or out-of-tree Clang-based tools).
The particular use case that motivates our proposal is annotating lifetime contracts. See this RFC for details on this use case.
We propose a general-purpose annotation attribute rather than an attribute specific to our use case for the following reasons:
-
The tooling for inferring and validating lifetime contracts will initially be experimental and will require extensive validation before we stabilize the annotation syntax. We would prefer not to add lifetime-specific attributes to Clang that might need to be modified repeatedly or even removed again entirely.
-
We expect a general-purpose type annotation attribute to be useful for other types of static analysis. For example, see the Java Checker Framework, which provides a wide array of static checks based on Java’s general-purpose type annotations.
This is an update of an earlier proposal. In particular, we have significantly extended the discussion of type system implications. Rather than continue discussion on the old thread, we have decided to open a new thread so that we can summarize the current state of the entire proposal at the top of the thread.
Proposal
We propose adding a type attribute called clang::annotate_type
.
Attribute syntax: C++11 / C2x, not GNU
This type attribute will only support C++11 and C2x attribute syntax, not GNU attribute syntax. This is because the C and C++ standards define explicitly where a C++11 / C2x attribute may appear and what it appertains to when it appears in a given position, whereas the GNU syntax is more ad-hoc and fuzzy. This is particularly important for a general-purpose type attribute, which may appear in many different positions; see also the discussion in the “alternatives considered” section below.
No influence on program semantics
The attribute will not have any effect on the semantics of C++ code, neither type checking rules, nor runtime semantics. In particular:
- No effect on the formal C++ type system:
-
std::is_same<T, T [[clang::annotate_type("foo")]]
is true for all typesT
. - It is not permissible for overloaded functions or template specializations to differ merely by an
annotate_type
attribute. - The presence of an
annotate_type
attribute will not affect name mangling.
-
- The attribute will not cause any additional data to be output to LLVM IR, object files or debug information. It is only intended to be consumed using Clang APIs by source-level static analysis tools such as Clang-Tidy.
Note that a language extension that starts out as [[clang::annotate_type]]
may, in the future, transition to a first-class Clang attribute, which may affect how undefined behavior manifests. In other words, static analysis implemented using [[clang::annotate_type]]
may be supplemented with a sanitizer-like dynamic analysis if it only emits diagnostics in the case of UB.
Rationale
We believe there are compelling reasons why a custom type attribute should not affect the formal C++ type system:
-
Portability between standard C++ compilers and compilers that understand the attribute.
-
Allowing the attribute to follow type checking rules that go beyond the standard C++ type system, for example, flow-sensitive typing. Of course, such type checking rules can be only used to subset the set of programs allowed by standard C++.
Portability
For portability, it is customary to wrap attributes in macros that expand to nothing if the compiler does not support the attribute. For example:
#define CLANG_ANNOTATE_TYPE(x)
#ifdef __has_cpp_attribute
#if __has_cpp_attribute(clang::annotate_type)
#undef CLANG_ANNOTATE_TYPE
#define CLANG_ANNOTATE_TYPE(x) [[clang::annotate_type(x)]]
#endif
#endif
#endif
If the attribute did affect type semantics, then a program containing the attribute would have different behavior when compiled depending on the compiler it was compiled with. Moreover, whether a program even compiles or not could be affected by whether the compiler supports the attribute. To demonstrate this, let us assume we allowed template specializations to differ only by an annotate_type
attribute:
// (Counterexample. We propose this program should be invalid.)
template <class T>
struct S {};
template<>
struct S<int*> {};
template<>
struct S<int* CLANG_ANNOTATE_TYPE("foo")> {};
On compilers that do not support the attribute, the macro expands to nothing, and the program would therefore be rejected because of a duplicate template specialization. Meanwhile, compilers that do support the attribute would compile the program. In other words, the attribute would extend the set of valid programs. A developer might not realize this until they try to port to a compiler that does not support the attribute. This does not seem desirable.
Subsetting C++ with a “shadow” type system
We expect that static analysis checks enabled by annotate_type
will impose constraints that cannot be expressed within the C++ type system. Otherwise, one could simply create a pure C++ library solution by defining types that enforce the desired constraints through the C++ type system.
An example of constraints that cannot be expressed within the C++ type system is flow-sensitive typing, where the type of an expression depends on its position in the control-flow graph. This concept does not exist in the C++ type system.
Extending the C++ type system to accommodate flow-sensitive typing is an extremely difficult problem without a clear path to a solution. It is more practicable for this kind of analysis to impose a shadow type system that exists only within the context of the analysis but has no effect on the formal C++ type system or runtime semantics. An implication of this is that the constraints imposed by the shadow type system must strictly reduce the set of valid programs from what is allowed by standard C++. Within the formal C++ type system, on the other hand, all annotated variants of a type are permissively considered to be the same type; this also avoids portability issues, as discussed above.
Formally, the shadow type system introduces refinement types, which restrict the set of permissible values of an existing type. Refinement types express pre- and postconditions on functions that use them in their signature.
Language subsetting through a shadow type system has the following benefits:
- Type annotations can be rolled out incrementally to an existing codebase.
- Annotated programs can be compiled by a standard C++ compiler without affecting runtime semantics.
- Type annotations can be checked by a sanitizer-like tool at runtime because undefined behavior can be defined by an implementation to be a reliable program termination.
As a concrete example of an analysis that requires flow-sensitive typing to express the desired semantics, consider the existing nullability qualifiers (_Nullable
, _Nonnull
). Like the proposed annotate_type
attribute, these are used to annotate types with additional information needed for static analysis (see also the Clang RFC that proposed the introduction of these qualifiers) and do not affect the formal C++ type system or runtime semantics.
Consider this piece of code, which is fine from a nullability point of view:
void f(int* _Nonnull nonnull_p) {
std::cout << *nonnull_p;
}
void g(int* _Nullable nullable_p) {
if (nullable_p) {
f(nullable_p);
}
}
The function call f(nullable_p)
is permissible only because of the preceding if (nullable_p)
check. Without this check, the function call should not be permissible. We therefore need to consider nullable_p
to have different types in different parts of the function: int* _Nullable
outside the if (nullable_p)
block, and int* _Nonnull
inside the if block.
This is an example of flow-sensitive typing, but note that the existing checks in Clang for the nullability annotations are not in fact flow-sensitive and therefore produce false negatives. In the above example, Clang does not warn if the if (nullable_p)
condition is removed. A sound check of the nullability annotations would need to be flow-sensitive.
Let us return now to the proposed annotate_type
attribute. Because this attribute does not affect the C++ type but is just type sugar, a static analysis tool that wants to use this attribute may need to reconstruct type sugar in places where Clang canonicalizes types. This appears to be a necessary tradeoff to achieve the desired type semantics, and the authors of the nullability qualifier RFC come to the same conclusion (see section “Type System Impact” in the RFC).
Alternative considered: Extend annotate
to apply to types
Instead of introducing a new attribute, we considered extending the existing annotate
attribute to apply not just to declarations but also to types.
The reason we decided not to go this route is because the annotate
attribute allows GNU attribute syntax to be used, and, as we have noted above, the semantics of GNU attributes, in terms of what they appertain to, are different than those of C++11 attributes and potentially more surprising, particularly when they are used as type attributes.
For example, the GNU syntax for the annotate
attribute can be applied in both of the following positions today:
__attribute__((annotate(“foo”))) int i1;
int __attribute__((annotate(“foo”))) i2;
In both positions, the attribute is interpreted as appertaining to the variable declaration. We must assume that there is existing code that uses the attribute in both of these positions.
If we want to extend annotate
to serve as a type attribute, this is a problem. What syntax should we use to annotate the int
type, rather than the declaration?
One possibility would be to retain the existing semantics for the GNU syntax (i.e. interpret the attribute as a declaration attribute in both cases) but use the semantics mandated by the C++ standard for the C++11 syntax (i.e., interpret the attribute as a declaration attribute when it is placed at the beginning of the entire declaration, and as a type attribute when it is placed after the type):
[[clang::annotate(“foo”)]] int i3; // declaration attribute
int [[clang::annotate(“foo”)]] i4; // type attribute
This approach would not raise any backwards compatibility concerns, as Clang currently rejects the C++11 syntax for annotate
when it is placed after a type.
However, this would mean that the semantics of the attribute would change depending on which syntax is used:
int __attribute__((annotate(“foo”))) i2; // declaration attribute
int [[clang::annotate(“foo”)]] i4; // type attribute
This is confusing and seems likely to trip programmers up. We therefore believe it is better to introduce a new attribute rather than extend annotate
to be a type attribute.
Implementation status
We have uploaded a draft patch to Phabricator that implements the proposed annotate_type
attribute. We have used the attribute to implement a lifetime annotation scheme (details here) and have demonstrated the ability to annotate lifetimes on a diverse set of C++ language constructs, including inside template arguments.