[RFC] Lifetime annotations for C++

martinboehme · April 1, 2022, 12:30pm

Martin Brænne @martinboehme
Rosica Dejanovska @scentini
Gábor Horváth @Xazax-hun
Dmitri Gribenko @gribozavr
Luca Versari @veluca93

Summary

We are designing, implementing, and evaluating an attribute-based annotation scheme for C++ that describes object lifetime contracts. It allows relatively cheap, scalable, local static analysis to find many common cases of heap-use-after-free and stack-use-after-return bugs. It allows other static analysis algorithms to be less conservative in their modeling of the C++ object graph and potential mutations done to it. Lifetime annotations also enable better C++/Rust and C++/Swift interoperability.

This annotation scheme is inspired by Rust lifetimes, but it is adapted to C++ so that it can be incrementally rolled out to existing C++ codebases. Furthermore, the annotations can be automatically added to an existing codebase by a tool that infers the annotations based on the current behavior of each function’s implementation.

Clang has existing features for detecting lifetime bugs ([[clang::lifetimebound]] and -Wdangling-gsl). The lifetime annotations we propose are a strict superset of [[clang::lifetimebound]]. They support the majority of use cases of -Wdangling-gsl and many that it cannot express. A dedicated section below contains a detailed comparison with these existing approaches. We plan to enable our lifetime analysis to understand the existing annotations by translating them into our annotation syntax internally (where possible).

We are looking to contribute our current early implementation of lifetime annotations and supporting static analysis to Clang and Clang-Tidy. Developing it upstream would allow us to more easily collaborate on the design and implementation and get feedback from the community and early adopters that build the LLVM/Clang toolchain from git HEAD (for example, Chrome).

High-level implementation plan

We propose the following:

Add a general-purpose type annotation attribute annotate_type to Clang (see this separate RFC for details).
Add an experimental Clang-Tidy check that infers the lifetime contracts based on the current behavior of source code and suggests annotations to add to the source code that describe these contracts. Each user will use this check once to annotate their codebase.
Add an experimental Clang-Tidy check that validates that the code follows the lifetime contracts described by the annotations.
Develop lifetime annotations for libc and libc++, stored in API notes files. Upstream code from the apple/swift-clang fork of Clang for ingesting API notes in Sema.
Extend Clang’s API notes to not require Clang Modules.
Evaluate the annotations and Clang-Tidy checks with early adopters. Fine-tune the system based on the feedback.
Make a decision about stabilizing the Clang-Tidy checks and marking them non-experimental. If and when this happens, we could consider introducing attributes that are specific to our annotation scheme instead of using general-purpose annotation attributes.
Move libc++ annotations from API notes files into headers.

Implementation status

We have implemented a work-in-progress Clang-Tidy check that infers the lifetime contracts from un-annotated C++ code. It is already able to infer lifetimes in a wide range of non-trivial situations (see appendix B for examples). This gives us enough confidence in the annotation scheme to present it publicly and propose moving experimentation upstream.

We have not started working on the verification tool yet, but we believe it is a lot less risky than inference. Verification can reuse most of the complex static analysis algorithms required for inference; this also implies that inference will generate lifetimes that satisfy the verification tool. The two main additions that are required for verification are being able to distinguish different local lifetimes and producing good error messages when lifetime contracts are violated. Implementing this additional functionality will require some effort but not much innovation. We prioritize figuring out the type checking and inference rules and are therefore focusing on inference tooling first.

Rollback plan

Because the annotation scheme will be experimental for a while, we are not proposing to add any attributes to Clang that are specific to lifetimes. The only change to core Clang that we are proposing is adding a general-purpose type annotation attribute. Apart from this, the implementation will be contained in new Clang-Tidy checks, making it well isolated from the rest of the codebase.

In case our experimentation fails, the Clang-Tidy checks can be easily removed from Clang without breaking users’ builds. The general-purpose annotate_type attribute will remain in Clang as we expect it to be useful for other purposes.

Use cases enabled by lifetime annotations

Lifetime annotations describe the lifetime contracts of C++ APIs in a modular, machine-readable manner, with enough flexibility to cover many modern C++ architectural and local coding patterns. Having such descriptions available in C++ source code enables the following use cases:

Improved readability for humans. Users can easily find the lifetime contracts in the function signature, and trust this information to be correct. Typical current practice is to use prose in documentation comments to describe lifetime contracts, but code authors don’t do this consistently or reliably: The information is often missing, and when it is present, it is sometimes incorrect.
Improved static analysis capabilities: understanding of the object graph and mutations. Static analysis tooling today often suffers from an inability to precisely reason about mutations in a modular way. Scalable, local static analysis that needs soundness has to conservatively assume that all pointers passed to a function call will escape, and that subsequent function calls will mutate objects reachable from those pointers. Lifetime annotations allow static analysis tools to derive a more precise approximation of possible object graph state and mutations. See appendix C for an example.

Concretely, lifetime annotations improve modeling of function call side effects in the Clang dataflow analysis framework.
Better C++/Rust interoperability. Lifetime annotations open an avenue for more complete, more automatic, more ergonomic, and safer C++/Rust interoperability than is currently provided by state-of-the-art Rust crates such as cxx and autocxx. Existing interop solutions can only bridge C++ APIs that accept and return objects either by value or inside owning smart pointers. Lifetime annotations allow us to automatically bridge C++ functions with complex lifetime contracts. Lifetime contracts from C++ function signatures can be mapped to Rust lifetimes, enabling us to map C++ pointers and references to safe references in Rust. See appendix D for a concrete example.
Better C++/Swift interoperability. Lifetime annotations can help provide safer C++/Swift interoperability. Swift does not expose lifetime annotations in the language, but internally in the compiler the mechanisms and principles are rather similar to what Rust exposes at the language level. The Swift compiler starts tracking object lifetimes after converting Swift AST to the Swift intermediate language (SIL). When Swift code calls a C++ foreign function or uses an instance of a C++ struct/class, the Swift compiler can get the corresponding lifetime contract from the C++ header and validate that the input and output objects live long enough.

Limitations

The static analysis based on the proposed lifetime annotations cannot catch all memory safety problems in C++ code. Specifically, it cannot catch all temporal memory safety bugs (for example, ones caused by iterator invalidation), and of course lifetime annotations don’t help with spatial memory safety (for example, indexing C-style arrays out of bounds). See the comparison with Rust below for a detailed discussion.

Overview of lifetime annotations

Note
This is not a design doc; the design is still in flux. Design docs will be mailed as patches.

The lifetime annotation scheme we propose is inspired by and similar to lifetimes in Rust. Rust is an industrial-strength language with complete and consistent support for static lifetime checking. It embodies a wealth of experience on how to make lifetime checking work on large real-world codebases, and we think this is a good reason to borrow these tried-and-true concepts for C++. We will, however, present the annotation scheme in a way that should make it understandable to readers without any knowledge of Rust.

Defining the annotation scheme completely would take many pages, and we don’t feel it would be productive to go into this level of detail in this high-level RFC. Instead, we will present a few representative examples with explanations that provide enough detail to give a feel for how the annotation scheme works. We’re happy to provide more details if needed.

Example

Here is a simple example:

const std::string& [[clang::annotate_type("lifetime", "a")]] smaller(
    const std::string& [[clang::annotate_type("lifetime", "a")]] s1,
    const std::string& [[clang::annotate_type("lifetime", "a")]] s2) {
  if (s1 < s2) {
    return s1;
  } else {
    return s2;
  }
}

This function takes two references to strings and returns a reference to the lexicographically smaller of the two strings. Because the return value might refer to either of the two input strings, its lifetime is tied to the two inputs. This is expressed by the annotation [[clang::annotate_type("lifetime", "a")]].

The annotate_type attribute has no effect on the formal C++ type system or runtime semantics; the lifetime inference and verification tooling use it to establish a “shadow” type system. For more details, see the RFC for annotate_type.

The annotation in its “raw” form is verbose and obscures the rest of the function signature. In practice, it is preferable to define a macro that expands to the attribute. In the rest of this proposal, we will assume that a macro $a has been defined to expand to [[clang::annotate_type("lifetime", "a")]], and similarly $b, $c, and so on. (Most major compilers, including Clang, GCC, and MSVC, allow $ as an implementation-defined character in identifiers.) With this, the function signature looks as follows:

const std::string& $a smaller(const std::string& $a s1, const std::string& $a s2) {
   ...
}

We think this style of macros makes the lifetimes visually distinctive as well as brief, so we will use it throughout this proposal. However, the macros are not part of our proposal; every codebase can define its own macro shortcuts that work within the context of that codebase.

The names of lifetimes have no connection to any other identifiers in the program. A lifetime may happen to have the same name as another entity in the program, but this does not affect its meaning. Lifetimes in function signatures are implicitly scoped to the function in which they appear; we will elaborate on scoping rules in detailed design docs.

Tooling can use the annotations to detect lifetime bugs, for example:

void f() {
  std::string foo = "foo";
  const std::string& first = smaller(foo, "bar");
  std::cout << first << “\n”;
}

The second argument to smaller is a temporary std::string object, whose lifetime lasts only until the end of the statement. The lifetime annotations tell us that the reference first may be bound to this temporary, and that therefore accessing this reference in the following line is UB.

Note: Both parameters of smaller are annotated with the same lifetime $a, but this does not mean that the objects passed in as arguments need to have exactly the same lifetime. Indeed, this is not the case in the example call smaller(foo, "bar") above.

Informally, the annotation means that the return value can have the lifetime of either of the two arguments.

Formally, we can think of the lifetime $a as being a generic parameter of the function smaller(). A concrete lifetime is substituted for this parameter at every callsite of smaller(). A reference with a given lifetime may be implicitly converted to a reference of shorter lifetime. For the example call smaller(foo, "bar") above, we therefore choose $a to be the shorter of the two argument lifetimes; this is the lifetime of the second argument, the implicitly constructed temporary. The string foo has a longer lifetime than this temporary, so it can be implicitly converted to a reference with lifetime $a. We therefore conclude that the lifetime of the reference returned by smaller() is equal to the lifetime of the temporary.

Lifetime of `this`

The annotation for the lifetime of a this pointer is placed at the end of the member function declaration, for example:

struct StringPair {
  std::string first, second;
  const std::string& $a smaller() const $a {
    if (first < second) {
      return first;
    } else {
      return second;
    }
  }
};

This expresses that the lifetime of the reference returned by StringPair::smaller() is tied to the lifetime of the StringPair object on which the member function is called. Note that the $a signifying the lifetime of the this pointer comes in a natural position directly after the const signifying the constness of the this pointer. The syntax remains consistent if we added a ref-qualifier, e.g., const std::string& $a smaller() const & $a.

Lifetimes in template arguments

Lifetimes may be added to template arguments, e.g.

int* $a get_first(const std::vector<int* $a>& $b v) {
  return v.at(0);
}

This expresses that the lifetime $a of the return value is tied to the lifetime of the pointers contained in the vector, and that this lifetime is independent of the lifetime $b of the vector itself.

Lifetime-parameterized types

Some types are reference-like in the sense that they refer to data whose lifetime is independent of their own lifetime. An example of this from the standard library is string_view: It refers to string data whose lifetime is independent of the lifetime of the string_view itself.

This is expressed by adding a lifetime parameter to the type that represents the lifetime of the data referred to by the type. Here is an excerpt of what this would look like for a string_view-like type:

class LIFETIME_PARAM(s) simple_string_view {
  char* $s data_ptr;
  size_t data_size;
public:
  const char* $s data() const $a {
    return data_ptr;
  }
// …
};

LIFETIME_PARAM(s) is a macro that expands to the attribute [[clang::annotate(“lifetime_param”, “s”)]]. Again, the particular name of the macro is not part of this proposal.

The lifetime parameter $s is used in the definition of the member variable data_ptr to express that the lifetime of the string data is $s, a lifetime that is independent of the lifetime of the simple_string_view itself.

Similarly, $s is used in the data() member function to express that the lifetime of the return value is equal to the lifetime of the string data pointed to by data_ptr, not the lifetime $a of the simple_string_view itself.

When lifetime-parameterized types are used elsewhere in the code, they should be annotated with a lifetime in the same way that pointers and references are. For example, here is a simple_string_view version of the function smaller() that we showed earlier:

simple_string_view $a smaller(simple_string_view $a s1, simple_string_view $a s2) {
  if (s1 < s2) {
    return s1;
  } else {
    return s2;
  }
}

Appendix A shows an annotated version of the most important parts of the actual standard string_view type.

Formally, lifetimes are generic type parameters, identified by their index, and type-erased at code generation time.

Lifetime elision

As in Rust, to avoid unnecessary annotation clutter, we allow lifetime annotations to be elided (omitted) from a function signature when they conform to certain regular patterns. Lifetime elision is merely a shorthand for these regular lifetime patterns. Elided lifetimes are treated exactly as if they had been spelled out explicitly; in particular, they are subject to lifetime verification, so they are just as safe as explicitly annotated lifetimes.

We propose to use the same rules as in Rust, as these transfer naturally to C++. We call lifetimes on parameters input lifetimes and lifetimes on return values output lifetimes. (Note that all lifetimes on parameters are called input lifetimes, even if those parameters are output parameters.) Here are the rules:

Each input lifetime that is elided (i.e., not stated explicitly) becomes a distinct lifetime.
If there is exactly one input lifetime (whether stated explicitly or elided), that lifetime is assigned to all elided output lifetimes.
If there are multiple input lifetimes but one of them applies to the implicit this parameter, that lifetime is assigned to all elided output lifetimes.

In practice, lifetime elision allows explicit annotations to be omitted in many cases. For example, the lifetimes of the StringPair::smaller() example we showed earlier are implied by the elision rules and could therefore be omitted: const std::string& $a smaller() const $a.

Introducing lifetimes to a codebase will have to happen incrementally. During this process, missing lifetimes need to be interpreted differently in different files:

In files on which we have already run the lifetime inference tooling, the elision rules should be applied to types that require lifetimes but do not have lifetime annotations (these are pointers, references, and lifetime-parameterized types).
In files on which we have not yet run the inference tooling, none of the functions have lifetime annotations, and the elision rules should not be applied because the lifetimes they imply are generally not correct.

We therefore propose using a pragma #pragma clang lifetime_elision to mark source files where lifetime elision should be applied. Note that support for this pragma can be implemented entirely within the Clang-Tidy check using the clang::PragmaHandler API; no changes to Clang itself are needed.

Alternative annotation syntax using only `[[clang::annotate]]`

If our proposal to add a general-purpose type annotation attribute annotate_type to Clang does not meet with approval, we can instead use the existing [[clang::annotate]] attribute, though at the cost of readability. For example:

class [[clang::annotate("lifetime_params”, “s")]] simple_string_view {
  [[clang::annotate("member_lifetimes”, “s")]]
  const char* data_ptr;
};

[[clang::annotate("function_lifetimes”, “a, a -> a")]]
const std::string& smaller(const std::string& s1, const std::string& s2);

template<typename T, typename U>
[[clang::annotate("function_lifetimes”, “(a, b) -> a")]]
int* get_first(const std::vector<int*>& v);

Since [[clang::annotate]] is a declaration attribute, it can’t appear inline within a type, and must be attached to the declaration. This attribute placement detaches the lifetime information from the type, and we think that it is less readable. Certain cases of lifetime elision, where only some of the lifetimes in a function are elided, would also not be possible with this notation.

Current limitations of proposed lifetime annotations

No subtyping constraints between lifetimes

We do not have an equivalent of Rust’s where clauses, which establish “outlives” constraints between lifetimes. Consider this example:

void push_first(std::vector<int*>& a, std::vector<int*>& b) {
  a.push_back(b[0]);
}

We should be able to call push_first if the lifetime of the pointers in b is at least as long as the lifetime of the pointers in a, but there is no way to express this constraint with the current annotations.

This limitation could be solved by introducing a LIFETIME_CONSTRAINTS annotation:

LIFETIME_CONSTRAINTS(a <= b)
void push_first(std::vector<int* $a>& a, std::vector<int* $b>& b) {
  a.push_back(b[0]);
}

No equality constraints between lifetime parameters

If a class has multiple lifetime parameters, those lifetimes are always assumed to be independent of each other; individual member functions cannot impose constraints on them. This creates a limitation in expressivity. For example, we cannot annotate Pair::Method() in the following example with lifetimes since it may only be called when $a == $b:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  void Method() {
    TakeSpecialPair(this);
  }
};

void TakeSpecialPair(Pair $a $a * p);

Rust solves this issue by allowing users to write multiple impl blocks for a struct, where each carries its own generic signature for self.

Again, we could solve this issue in C++ by adding per-method equality constraints:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  LIFETIME_CONSTRAINTS(a == b)
  void Method() {
    TakeSpecialPair(this);
  }
};

C++23’s explicit object parameter syntax ((P0847R7)[Deducing this]) will allow this constraint to be expressed directly:

struct LIFETIME_PARAM(a, b) Pair {
  int* $a first;
  int* $b second;

  void Method(this Pair $c $c &self) {
    TakeSpecialPair(self);
  }
};

Cannot define different constraints for function entry and exit

Our annotation scheme cannot express different lifetime constraints at function entry and exit, i.e., it cannot express separate pre- and post-conditions. Note that Rust has the same limitation.

The use cases for different lifetime constraints at function entry and exit are probably rare, but they do exist. As an example, consider string_view::swap. It exchanges the data pointers of the two string_view objects and hence also their lifetimes, but our annotation scheme cannot express this. Instead, we must more conservatively demand that the lifetimes of the two string_views are the same:

class LIFETIME_PARAM(s) string_view {
  size_t __size;
  const char* $s __data;

public:
  void swap(string_view $s & __other);
};

A similar limitation applies to std::swap.

However, this overly conservative annotation does not appear to be an issue for most practical applications. For example, when using string_view::swap to implement a sorting algorithm, the lifetimes of the string_views being sorted will anyhow be the same. The fact that Rust’s lifetime annotations have the same limitation is further evidence that it does not appear to be a problem in practice.

Lifting this limitation is possible, but it would require more complexity in the annotation scheme and likely also significant additional complexity in the lifetime inference and verification algorithms. We should only commit to this additional complexity if we discover an important use case that requires it.

Comparison with other work in this area

[[clang::lifetimebound]]

Clang implements an attribute [[clang::lifetimebound]] (Attributes in Clang — Clang 18.0.0git documentation) that can express a strict subset of the lifetime annotations that we are proposing. Specifically, [[clang::lifetimebound]] only supports connecting the top-level lifetime of a function argument object to all lifetimes of the return value. It does not support, for example, expressing a relationship between two lifetimes of arguments, or talking about a lifetime that is not at the top level of the type (for example, nested in a template argument):

void push_back_if_not_null(std::vector<int* $a> xs, int* $a x) {
  if (x != nullptr) {
    xs.push_back(x);
  }
}

The function push_back_if_not_null can be annotated with our proposed lifetime annotations as shown, but cannot be annotated with [[clang::lifetimebound]].

Our lifetime analysis will desugar [[clang::lifetimebound]] into the lifetime representation that it uses.

Lifetime safety: preventing common dangling (WG21 proposal P1179, -Wdangling-gsl)

P1179 describes an analysis that has preliminary implementations in MSVC and a fork of Clang. This analysis also inspired some statement-local warnings that are implemented in MSVC and the Clang trunk (-Wdangling-gsl, on by default); see tests here. The statement-local warnings have found many bugs in many real-world codebases.

The analysis described in P1179 is a flow-sensitive points-to analysis. It had the explicit goal to only warn when dangling pointers are actually dereferenced (not when they are created). It aims to prevent many kinds of errors, including:

Use after free
Use of a moved-from object
Dereferencing an invalid iterator
Null dereference

The analysis uses contract-style annotations to describe lifetime preconditions and postconditions. The separate pre- and postconditions help circumvent the limitations described above in the section “Cannot define different constraints for function entry and exit”.

Implementations

The Clang implementation (in the fork) lacks full support for field-sensitivity.
The Clang implementation will not attempt to find use-after-move errors.
The MSVC implementation does not support annotations.
None of the implementations support the SharedOwner concept that was introduced in the R1 version of the paper.
Both implementations do fixed-point iteration (as opposed to doing the acyclic CFG approach suggested by the paper).
According to the benchmarks, the Clang implementation imposes ~5% impact on full compilation including codegen (closer to 10% without codegen).
Currently, none of the implementations are actively developed, as contracts were not voted into the standard.

The readme of the Clang fork has direct links to the tests that can give a picture of the current state.

Comparison to Rust-style lifetimes

Here is a comparison between the properties of the Rust-style lifetime annotations proposed here and the P1179-style lifetime annotations:

Annotation syntax and semantics
- This proposal: Introduces lifetime parameters via type annotations. Users need to learn a new concept, but the annotations are concise, spelled within the relevant type, and syntactically close to the function parameter names.
- P1179: Describes points-to relationships via contracts, often in terms of abstract locations (e.g., the syntax o' refers to the memory owned by an owner o). Developers are familiar with points-to relationships, but the contracts-style annotations can be overly verbose and syntactically far from the parameters. Certain ambiguities require additional annotations, e.g., a non-const reference parameter can be either out or in-out, which has implications on its assumed “moved-from”-ness.
New concepts
- This proposal: Introduces a relatively low number of new concepts.
- P1179: Reuses concepts developers are already familiar with, such as “Owner” or “Pointer”.
Scope
- This proposal: Iterator invalidation, use-after-move, null dereference are not in scope.
- P1179: Can catch problems related to iterator invalidation but might need additional annotations to avoid certain false positives. Certain patterns (e.g. std::vector::reserve) cannot be supported in the model.
Limitations
- This proposal: Certain patterns (like conditional lifetimes) cannot be represented.
- P1179: Certain concepts (like conditional points-to relationships) cannot be represented. Moreover, the dataflow analysis cannot handle arbitrary code patterns and can be confused even when the underlying pattern is supported.
Treatment of dangling pointers
- This proposal: Warns when a dangling pointer is created.
- P1179: Warns when a dangling pointer is dereferenced.
Rules for default lifetimes
- This proposal: Simple, easy-to understand default lifetimes and lifetime elision rules.
- P1179: More sophisticated, harder to understand, rules to infer default annotations from signatures that cover the most common cases.
Mutations
- This proposal: Cannot represent certain mutations (e.g., std::swap(ptr1, ptr2) requires ptr1 and ptr2 to have the same lifetimes).
- P1179: Has no problems with mutations in general.
Support for user-defined classes
- This proposal: Supports arbitrary user-defined classes as long as they don’t do anything forbidden (e.g., conditional lifetimes).
- P1179: Certain user-defined constructs are not supported (e.g., a pointer-like type with multiple pointees at the same time).

Examples

Here are some code examples annotated in both styles.

Function that returns a pointer parameter

// This proposal
int* $a f(int* $a i);

// P1179
int* f(int* i)
  [[post: lifetime(Return, i)]];

Struct containing a pointer

// This proposal
struct LIFETIME_PARAM(s) S {
  int* $s m;
};

void f(int* $a i, S $a * out) {
  out->m = i;
}

// P1179
struct S { int* m; };

void f(int* i, S* out)
  [[post: lifetime(out->m, i)]]
{
  out->m = i;
}

Lifetimes of pointers in template arguments

// This proposal
void push_back_if_not_null(std::vector<int* $a>& xs, int* $a x) {
  if (x != nullptr) {
    xs.push_back(x);
  }
}

// Not actually supported by P1179, but the Clang implementation had experiments in
// this direction.
void push_back_if_not_null(std::vector<int*>& xs, int* x)
  [[pre: lifetime(deref(xs), x)]]
  [[post: lifetime(deref(xs), x)]]
{
  if (x != nullptr) {
    xs.push_back(x);
  }
}

The deref notation in the P1179 example above was originally developed for smart pointer types, hence the “dereference” nomenclature. It would require additional annotation (not shown above) of std::vector<int*> member functions that take or return an int*.

Template with multiple pointer arguments

// This proposal
void insert_if_not_null(map<int* $a, int* $b>& m, int* $a key, int* $b value) {
  if (key != nullptr && value != nullptr) {
    m[key] = value;
  }
}

// This is not supported in P1179, as confirmed with Herb Sutter, but he is willing
// to look into making this work (and include something officially for the case above).

Lifetimes and the borrow checker in Rust

Rust code that passes type checking and does not use unsafe is guaranteed to be memory safe. Our proposed lifetime annotations are heavily inspired by Rust, but they don’t catch all memory safety problems in C++ code. Specifically:

Lifetimes don’t help with statically proving spatial memory safety (that all reads/writes are in bounds). This is expected, since lifetime annotations and the borrow checker in Rust don’t help with spatial memory safety either. Instead Rust relies on runtime bounds checking and API design that makes accesses in-bounds by construction (for example, range-based for loops).
The proposed static analysis for C++ is not a borrow checker. It does not enforce Rust’s borrowing rule: “At any given time, you can have either one mutable reference or any number of immutable references.”

Enforcing the borrowing rule is a critical component of Rust’s memory safety guarantee. For example, memory safety bugs caused by iterator invalidation are not caught by lifetime annotations alone.

For example, the following code passes lifetime verification, but it contains a possible use-after-free (it might or might not happen at runtime depending on the implementation details of std::vector):

#include <iostream>
#include <vector>

int main() {
  std::vector<int> xs = { 10, 20, 30 };
  auto it = xs.cbegin();
  xs.push_back(40);
  std::cout << *it; // possible use-after-free: dereferencing an iterator that was invalidated
}

The Rust compiler would reject the equivalent Rust code because xs.push_back() needs to borrow xs mutably within the live region of the variable it, which borrows xs immutably.

Unfortunately, C++ iterators seem to be incompatible with Rust’s borrowing rule, since the vast majority of algorithms operate on pairs of non-const iterators borrowed from the same container.

To summarize, enforcing the borrowing rule in C++ is unfortunately not so simple because there is a lot of existing code that creates multiple non-const pointers or references to the same object, intentionally violating the borrowing rule. At this point we don’t have a plan of how we could incrementally roll out the borrowing rule to existing C++ code, but it is a very interesting direction for future work.

Appendix A: std::string_view annotated with lifetimes

As an example of real-world code with our proposed lifetime annotations, here is an annotated version of representative parts of `std::string_view`.

namespace std {

template<class _CharT, class _Traits = char_traits<_CharT> >
    class basic_string_view;

typedef basic_string_view<char>     string_view;

template<class _CharT, class _Traits>
class LIFETIME_PARAM(s) basic_string_view {
public:
    // types
    LIFETIME_PARAM(d)  typedef _CharT* $d                pointer;
    LIFETIME_PARAM(d)  typedef const _CharT* $d          const_pointer;
    LIFETIME_PARAM(d)  typedef _CharT& $d                reference;
    LIFETIME_PARAM(d)  typedef const _CharT& $d          const_reference;
    LIFETIME_PARAM(d)  typedef const_pointer $d          const_iterator;
    LIFETIME_PARAM(d)  typedef const_iterator $d         iterator;
    LIFETIME_PARAM(d)  typedef std::reverse_iterator<const_iterator $d>   const_reverse_iterator;

    typedef _Traits                                      traits_type;
    typedef _CharT                                       value_type;
    typedef size_t                                       size_type;
    typedef ptrdiff_t                                    difference_type;
    static _LIBCPP_CONSTEXPR const size_type npos = -1; // size_type(-1);

    basic_string_view();
    basic_string_view(const basic_string_view $s & __s);
    basic_string_view $s & operator=(const basic_string_view $s &);
    basic_string_view(const _CharT* $s __s, size_type __len);
    basic_string_view(const _CharT* $s __s);


    const_iterator $s begin() const;
    const_iterator $s end() const;
    const_pointer $s data() const;


    const_reference $s operator[](size_type __pos) const;
    basic_string_view $s substr(size_type __pos = 0, size_type __n = npos) const;

    void remove_prefix(size_type __n);
    void remove_suffix(size_type __n);

    void swap(basic_string_view $s &__other);

    // copy() and find() don't allow their arguments to escape, therefore their lifetimes
    // are independent of $s.
    // According to lifetime elision rules, they don't need an explicit annotation.
    size_type copy(_CharT* __s, size_type __n, size_type __pos = 0) const;
    size_type find(const _CharT* $t __s, size_type __pos, size_type __n) const;

private:
    const   value_type* $s __data;
    size_type              __size;
};

} // namespace std

Appendix B: Examples of lifetimes inferred by the current experimental implementation

This appendix contains a selection of functions that illustrate the range of C++ language constructs on which our current experimental implementation can automatically infer lifetimes.

The input to the lifetime inference algorithm is the unannotated source code. All lifetime annotations below were automatically inferred from the function implementations.

A simple example to get started

int* $a get_lesser_of(int* $a a, int* $a b) {
  return *a < *b? a : b;
}

Lifetime inference is flow-sensitive

int* $p target(int* $p p, int* a, int* $p b) {
  // Note: `int* a` is not annotated. The lifetime elision rules imply that it has a
  // unique lifetime different from `$p`.
  for (int i = 0; i < *a; i++) {
    p = a;
    p = b;
  }
  return p;
}

Lifetime inference for class template arguments

template <typename A>
struct S { A array; };

void target(S<int* $a *>* s, int* $a p, int* $a q) {
  s->array[0] = p;
  s->array[1] = q;
}

Lifetime inference for variadic class template arguments

template <int idx, typename... Args> struct S {};
template <int idx, typename T, typename... Args>
struct S<idx, T, Args...> {
  T t;
  S<idx+1, Args...> nested;
};

template <typename... Args>
struct tuple: public S<0, Args...> {};

int*$a target(tuple<int*, int* $a>& s) {
  return s.nested.t;
}

Lifetime inference for nested class templates

template <typename T>
struct R {
  R(T t) : t(t) {}
  T t;
};

bool some_condition();

template <typename T>
struct S {
  S(T a, T b) : r(some_condition() ? R(a) : R(b)) {}
  R<T> r;
};

int* $a target(int* $a a, int* $a b) {
  S<int*> s(a, b);
  return s.r.t;
}

// The algorithm infers the following lifetimes for class template instantiations
// (which cannot be annotated directly in the code):
// R<int* $a>::R(int* $a) $b
// S<int* $a>::S(int* $a, int* $a) $b

Appendix C: How lifetime annotations help static analysis better understand the object graph and potential mutations

Lifetime annotations can help static analysis tools in general better understand how a function call may mutate the object graph.

As an example, say we want to implement a static analysis that detects unchecked unwraps of std::optional. Here is an example program:

struct A {
  std::optional<int> opt_int;
};
struct B { … };

void MutateAB(A* a, B* b);
void MutateB(B* b);
void Use(int x);

void Target() {
  A a;
  B b;
  MutateAB(&a, &b);
  if (a.opt_int.has_value()) {
    MutateB(&b);
    Use(*a.opt_int); // Safe?
  }
}

Many programmers will say that accessing the value of the optional in Use(*a.opt_int) is safe because it is protected by the if (... has_value …) check, and the MutateB(&b) call does not change a.

However, MutateAB(&a, &b) could have stored a pointer to a inside b. Subsequently, MutateB(&b) could have cleared a.opt_int, invalidating the if (... has_value…) check.

A sound static analysis must therefore warn that Use(*a.opt_int) is not safe, but many users will flag this warning in their code as a false positive, because in practice modern C++ code rarely has this kind of action-at-a-distance.

Note that even (unsoundly) assuming absence of global variables does not help here, since no global variables are involved. To eliminate this false positive we need to assume that the object graphs reachable from a and b are disjoint. A scalable, local analysis can’t gather enough evidence from the program to make such assumptions on a solid basis.

Lifetime annotations allow the programmer to express the possible mutations to the object graph in a machine-readable way. If B can point to A and MutateAB() sets this pointer, the code can express it with lifetime annotations:

// Indicate that the lifetimes implied by elision rules are indeed correct.
#pragma clang lifetime_elision

struct A {
  std::optional<int> opt_int;
};

struct B [[clang::lifetime_param(a)]] {
  std::vector<A* $a> helpers;
};

// Lifetime annotations express that the object graph behind the pointer `b` may point to `a`:
void MutateAB(A* $a a, B $a * $b b);

// Or, equivalently, using lifetime elision shorthand syntax:
void MutateAB(A* $a a, B $a * b);

void MutateAB(A* a, B* b) {
  b->helpers.push_back(a);
}

Furthermore, the Clang-Tidy check that verifies that the implementation of MutateAB follows its lifetime contract would reject any other lifetime annotations. In other words, lifetime annotations are not just a promise equivalent to comments; they are checked and can be relied upon.

If, conversely, B can’t point to A – the common case that many engineers expect – the original code without explicit annotations already expresses the right semantics:

// Indicate that the lifetimes implied by elision rules are indeed correct.
#pragma clang lifetime_elision

struct A {
  std::optional<int> opt_int;
};

// Absence of lifetime parameters on `B` means that it can't point to other objects
// in the object graph that it does not own.
struct B { … };

// Lifetime annotations express that object graphs behind pointers `a` and `b` are unrelated:
void MutateAB(A* $a a, B* $b b);

// Or, equivalently, using lifetime elision shorthand syntax:
void MutateAB(A* a, B* b);

Appendix D: How lifetime annotations help C++/Rust interoperability

References and pointers in Rust

Rust provides two kinds of indirections, references and pointers, that have different semantics:

References

References are safe. Each reference has a lifetime associated with it. For example, a reference to a 32-bit integer with lifetime ’a is written &’a i32. The lifetimes allow the borrow checker to verify that references are used in a memory-safe way.
References are non-nullable. Nullability can be added explicitly where necessary by using the Option<T> type, for example Option<&'a i32>.
References are ergonomic. Rust’s syntax and libraries are optimized for using references most of the time.
References are idiomatic. Rust programmers prefer to use references in their code as much as possible.

Pointers

Pointers are unsafe. They don’t carry lifetime information. For example, a non-mutable pointer to a 32-bit integer is simply written *const i32. The borrow checker cannot verify that pointers are used in a memory-safe way.
Pointers are nullable. To express a non-null constraint one must add an annotation.
Pointers lead to non-ergonomic code. For example, verbose casts are required to convert between references and pointers. To convert a pointer x to a reference one must write unsafe {&*x}.
Pointers are non-idiomatic. Rust programmers avoid using pointers.

C++/Rust interoperability without lifetime annotations in C++

Let’s say we want to call the following C++ function from Rust:

// C++:
const int& smaller(const int& x, const int& y);

This function signature does not explain the lifetime contract. A tool that generates C++/Rust bindings based on C++ headers (for example, bindgen) has no choice but to declare smaller() using unsafe pointers that don’t have a Rust lifetime:

// Rust bindings (automatically generated):
extern "C" {
  pub fn smaller(x: *const i32, y: *const i32) -> *const i32;
}

Rust code can now call smaller(), but callers must use unsafe pointers:

// Rust caller of C++ `smaller()` function that does not have lifetime annotations:
fn user() {
  let x = 10;
  let y = 5;
  let m = unsafe { smaller(&x, &y) };
  println!("smaller({x}, {y}) is {}", unsafe{*m});
}

C++/Rust interoperability with lifetime annotations in C++

Now let’s annotate smaller() with lifetimes on the C++ side:

// C++:
const int& $a smaller(const int& $a x, const int& $a y);

Equipped with this machine-readable lifetime information, a tool that generates C++/Rust bindings can define a safe Rust wrapper. This wrapper exposes safe Rust references and describes the lifetime contract of smaller() to the Rust borrow checker:

// Rust bindings (automatically generated):
pub fn smaller<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {
  // Glue code to call C++ function through foreign-function interface omitted.
}

Now smaller() can be ergonomically called like any other safe Rust function:

// Rust caller of C++ `smaller()` function that is annotated with lifetimes:
fn user() {
  let x = 10;
  let y = 5;
  let m = smaller(&x, &y);
  println!("smaller({x}, {y}) is {m}");
}

Appendix E: "Contributing Extensions to Clang" Q&A

Here we answer the usual set of questions about contributing extensions to Clang (https://clang.llvm.org/get_involved.html)

Evidence of a significant user community

Large parts of the C++ community are interested in finding memory safety bugs in C++ code. This is evidenced by the popularity of dynamic analysis tools such as AddressSanitizer and UndefinedBehaviorSanitizer, used in combination with manually written tests and fuzzing.
Finding memory safety bugs statically is also very interesting to users, since it allows bugs to be found before the tests are written and run. This interest is evidenced by Clang’s existing efforts in this area: -Wreturn-stack-address, -Wdangling, and -Wdangling-gsl. The latter two warnings are based on a partial implementation of the WG21 proposal P1179. All of these warnings have been on by default for a few years, have received little to no pushback from users, and have proven themselves valuable by finding quite a few bugs (based on our experience running them on our internal codebases).
Interest in source code annotations that help statically finding memory safety bugs is evidenced by P1179 itself, which has been partially implemented in Clang for a few years.
Interoperability between C++ and other languages is desired by some C++ users. For example, https://cxx.rs/ is a relatively popular crate for C++/Rust interop (750K+ downloads on crates.io (https://crates.io/crates/cxx) ), and C++/Swift interop has been worked on for a few years already. However, the fact that pointers and references in C++ APIs have unclear ownership and lifetime semantics presents a huge obstacle to automatic, ergonomic, safe bridging of C++ to other languages. Due to this issue, for example, cxx.rs does not support borrowed data as much as one could desire to bridge many idiomatic C++ APIs to Rust.

A specific need to reside within the Clang tree

Clang-Tidy is one of the industry standard static analysis tools, integrated into many workflows and IDEs (both free and commercial). Having our proposed analysis integrated into Clang-Tidy will allow interested engineers to run it much more easily than with an out-of-tree tool.
The only change to core Clang we are proposing is a general-purpose type annotation attribute that is not specific to the lifetime analysis. The lifetime analysis itself is kept separate in Clang-Tidy.
We believe that the biggest impact from the proposed static analysis could be realized if it was included into the core compiler as a warning. We are not ready to propose this yet because we are still experimenting with the semantics of the annotations and need to collect feedback from early adopters.

Specification

At this point, we are still experimenting with the semantics of the annotations. This document includes a high-level overview. We will be committing more detailed design docs and specifications together with the implementation, but they will be in flux for some time.

Representation within the appropriate governing organization

We believe it is too early to ask this question. In principle, the existence of P1179 shows that WG21 has some interest in this kind of annotations.

A long-term support plan

If the experimentation confirms that this type of annotations and static analysis is useful in practice, maintaining them is very similar to maintaining any other Clang-Tidy check: organizations and individuals that enable it for their codebases will do the maintenance work.

A high-quality implementation with a test suite

We will be contributing a high-quality implementation with extensive tests. This is in our best interest since the burden is on us to show that this style of lifetime annotations is worth the added complexity for engineers reading and writing C++.

jankorous · April 2, 2022, 12:49am

This looks very very cool!

One thing that I’d like to understand better are the limitations. For example it’s explicitly mentioned that iterator invalidation is not covered while an example of annotated string_view is given. I feel that string_view and iterators are in some ways similar.

The below is this example from [RFC] Lifetime annotations for C++ modified to use string and string_view.

Would this use-after-free be detected?

#include <iostream>
#include <string>
#include <string_view>

int main() {
  std::string s = "abcde";
  std::string_view v = s;
  s += 'f';
  std::cout << v << std::endl; // possible use-after-free?
  return 0;
}

alexr · April 2, 2022, 8:27pm

Two meta points:

• I’d prefer that new attributes be invented over using the plain annotate attribute in any features that land in the main line. I’ve used the plain annotate attribute extensively for plugins to generate language-specific bindings similar to libffi and I think it’s best to leave that one complete extensible.

• Is any of this proposed for the actual Clang Static Analyzer, or just clang-tidy? I’m far more interested in extending Clang according to its design principles instead of adding to external tools.

Aiden2207 · April 2, 2022, 9:02pm

Rust dev here, I think this proposal is off to a pretty good start, and will probably have significant safety benefits. However, there are still some things I think need to be addressed at some point.

First off, there is the 'static lifetime. In Rust, the 'static lifetime is a special lifetime that outlives all other lifetimes. Roughly speaking, any 'static object can be assumed to be valid for the entire length of the program. For example, values, references to global variables, and data leaked on the heap can all typically be considered to have a 'static lifetime.

Another thing to consider is what happens when the returned lifetime of a pointer is not bounded to the argument lifetimes. In the most simple case, something like

int* $a leak(){
    //create an int and leak it on the heap
}

In rust, this is handled by forcing the returned lifetime to be either 'static or unbounded. Unbounded lifetimes are chosen by the caller of the function and are equivalent to 'static in terms of inference most of the time.

Another consideration is what to do with function pointers. Getting these right is surprisingly difficult. Naively one might think the following code is valid:

void use_callback( (*callback)(int* $a) ){
    int foo = 42;
    callback(&foo);
}

However, it isn’t- the callback requires a lifetime of at least as long as the call site of use_callback, while the pointer passed is defined in the local scope and will be invalidated upon return. Rust solves this problem with Higher-Ranked Trait Bounds (HRTBs), which confusingly have far more to do with lifetimes than traits. Rust desugars the function pointer fn(&i32) into for<'a> fn(&'a i32) instead of fn(&'a i32). In words, using HRTBs means the function is valid for all lifetimes 'a rather than some specific lifetime 'a. In practice, this means the lifetime of the function pointer and its arguments are dissociated, so

fn use_callback(callback: for<'a> fn(&'a i32)){
    let foo = 42;
    callback(&foo);
}

is valid, but

fn use_callback<'a> (callback: fn(&'a i32)){
    let foo = 42;
    callback(&foo); //error[E0597]: `foo` does not live long enough
}

is not.

Those were just the first few things that popped into my head, there are plenty of other edge cases to consider as well:

I want to use lifetime checks in most of my code, but some of the stuff I’m doing can’t be modeled with lifetimes. How can I opt-out of checking just those parts?*
I casted my class with a lifetime to a base class without one. Now what?**
For my code to be modeled correctly, I need to lengthen a lifetime. How can I do that?***
When working with a template parameter T can I assert that T must have a certain lifetime?****
My object Foo shouldn’t be used after the object Bar is destroyed, but Foo doesn’t have a reference to Bar. Is there a way to add a phantom lifetime to Foo?*****

There are most definitely other small issues to worry about beyond those, but that’s what I can think of at the moment. My biggest concern with all of this is that the lifetime model may struggle to integrate properly with the language, especially since there is no borrow checker, and might not be as useful as it seems because of it.

Anyways, while I don’t work with C++ very often, I think this could be quite useful, and I am interested to see where this goes.

*Rust handles this with a divide between references and safe code; and raw pointers and unsafe code. This will obviously not work for C++.

**Rust has a similar problem with trait objects and handles this by associating a lifetime with trait objects that are 'static by default.

***In Rust, the safest way to do this is mem::transmute, although casting a reference to a pointer and back can also do this.

****This is an important part of rust’s lifetime system because references aren’t allowed to outlive the data they point to.

*****Rust uses PhantomData for this purpose. However, it is mainly used to add lifetime information to a pointer that otherwise wouldn’t have any. Because C++ doesn’t have the same distinction between a pointer and reference that rust does, this would be a much more niche use case, because most of the time just adding a pointer to the parent object would be fine.

tschuett · April 2, 2022, 9:14pm

When you are saying clang-tidy and api notes, I am quite happy. You can experiment upstream without interfering with other people. At the same, I like the idea and it seems useful. They are too many memory bugs in C++. Improving the interface between C++ and Swift/Rust sounds even better.

LifeIsStrange · April 3, 2022, 12:21am

Prior Art:
https://news.ycombinator.com/item?id=22137650

LLVM Prior Art:
https://reviews.llvm.org/D15032
https://reviews.llvm.org/D63954
https://github.com/mgehre/llvm-project/issues/98
I wonder wether this RFC reuse the code from those PRs ? It would be unfortunate to ignore those previous works.
If this RFC is a followup of those previous works, then my bad.
Either way, inspiration, code reuse and ideally interop with the CPP guideline lifetime checker would be a desirable goal.

junon · April 3, 2022, 1:00am

Could this proposal perhaps be used as a catalyst for improving the overall Clang plugin story? One of the major blockers when I endeavored down that path was the sheer complexity of adding new attribute types for plugins to use when analyzing/transforming ASTs - it tied very deeply into tablegen, which means hardcoding new attribute types as opposed to registering or otherwise processing them via plugins.

It also required a lot of playing with Sema, and there was a loose conversation on the mailing lists that didn’t go anywhere but ultimately discussed the right direction for plugin designs to go to support such usecases. It appears there is some overlap here.

If such a thing could be the basis for these additions into mainline Clang at some point (instead of just keeping it in clang-tidy) that would be a massive improvement to the overall ecosystem, in my opinion. It would certainly open up Clang to more interesting usecases and tooling that wasn’t possible, or at least wasn’t as ergonomic, as before.

Very excited to see what comes of this.

duneroadrunner · April 3, 2022, 10:36am

Several weeks ago I made a similar post here, interestingly, also motivated by essentially the same kind of lifetime annotations:

(Btw, my post links to a tool that already implements enforcement of lifetime constraint attributes on function interfaces for those interested.)

The problem is that in any case there’s no way to avoid using macros rather than attributes directly because even if you can get clang to support the attributes, other compilers that don’t will throw an “unrecognized attribute” warning. The compiler vendors seem to consider attributes to be necessarily compiler-specific. It seems we would need the standards body to explicitly mandate a subset of the attribute namespace to be available for third party attributes, for example, as proposed by the paper linked in this reddit post:

https://old.reddit.com/r/cpp/comments/ttw0dl/ruststyle_lifetimes_proposed_in_clang/i35s5g6/

martinboehme · April 4, 2022, 5:31am

(code example omitted)

No. Like the iterator example that this is a modification of, this could be caught by enforcing the “borrowing rule” discussed there, but as the discussion also notes, a lot of existing C++ intentionally violates this rule. We’re definitely interested in exploring how we could catch these kinds of errors too, but that would be future work.

martinboehme · April 4, 2022, 5:38am

What do you mean by “main line” – Clang itself, or everything in the LLVM/Clang repository (e.g. also Clang-Tidy)?

I’m not sure what you mean here – can you elaborate? FWIW, we’re not proposing to limit the generality or extensibility of the annotate attribute in any way. We are proposing to introduce a new general-purpose attribute annotate_type that is analogous to annotate but for use on types. (The linked RFC discusses why we’re proposing a new attribute rather than extending annotate to types.)

Currrently, we’re proposing adding the checks only to Clang-TIdy because our approach is still experimental and doing the work in Clang-Tidy has the least impact on other parts of the codebase. However, if our evaluations of the approach on large real-world codebases show that it works well, we would definitely be interested in integrating the check into Clang Static Analyzer or Clang itself.

martinboehme · April 4, 2022, 7:25am

Thanks for the detailed comments and pointers!

Some of these (e.g. static lifetimes, forcing returning lifetimes to be static or unbounded, HRTBs) are things that we’ve already considered. We haven’t described them in this RFC because a complete spec would take many pages (our internal spec for lifetime annotations currently runs to 25 pages), and we felt it wasn’t productive to go into this level of detail in this high-level RFC. We do however plan to provide more complete design docs as patches with the implementation, and of course we’re happy to answer specific questions here.

We’re planning to include the concept of an “unsafe” lifetime for things that cannot be modeled with lifetimes. Pointers or references with an unsafe lifetime would be analogous to unsafe pointers in Rust.

You mean the derived class has a lifetime parameter, and you’re casting it to a base class that does not?

As nothing in the base class can use the lifetime parameter, you would simply not have access to it. If you want to cast back to the derived class, you would have to do so using an unsafe lifetime_cast operation.

Are there any other issues you’re thinking of that would arise? Do you have an example?

We intend to provide an unsafe lifetime_cast operation that can be used to extend lifetimes or convert unsafe lifetimes to safe ones (when the programmer can prove this is safe). Is this what you meant?

I’m not sure how this would work, as the template argument for T might contain arbitrarily many lifetimes, or none at all. Do you have a motivating example?

You note below that:

We plan to enforce the same constraint, and a similar constraint for template arguments (i.e. an object may not outlive any lifetimes in its template arguments).

As in Rust, you could add a corresponding lifetime parameter to Foo. In Rust, you would use this lifetime parameter merely in a PhantomData field; in C++, you wouldn’t use the lifetime parameter anywhere in the definition of Foo.

In other words, unlike Rust, we would probably allow unused lifetime parameters. If we conclude this is undesirable, we might want to have to a construct that is analogous to Rust’s PhantomData, but we’d have to give this some more thought.

The lack of borrow checking is certainly a limitation (though we also want to explore what can be done in this area at a future point). Anecdotally, since starting this work, we have come across several non-obvious lifetime bugs in our own code that would have been caught by our proposed checks, so we believe they bring significant value. However, this is something that would need to be validated by use of the tools themselves on large code bases.

martinboehme · April 4, 2022, 8:59am

This is “Lifetime safety: preventing common dangling (WG21 proposal P1179)” / -Wdangling-gsl, which we describe in detail and compare with our proposed approach in the section “Comparison with other work in this area”.

This patch was abandoned (I presume in favor of the -Wdangling-gsl check that is part of Clang?).

This is the first of a series of patches that implement -Wdangling-gsl.

As we are planning to implement our inference and verification tooling as Clang-Tidy checks, it would be hard to reuse parts of the -Wdangling-gsl implementation, which is part of Clang itself. However, if our approach proves successful, we would be interested in contributing it to the Clang core, and at that point we should of course make sure that we don’t duplicate any logic that already exists as part of -Wdangling-gsl.

Regarding interop: Our plan is that our tooling should be able to interpret the attributes used by -Wdangling-gsl and internally translate them to corresponding lifetimes (to the extent that this is possible), so that codebases that use the two annotation schemes can be used together.

martinboehme · April 4, 2022, 10:08am

As you’re probably aware, there is already a mechanism for adding “attribute plugins” to Clang:

https://clang.llvm.org/docs/ClangPlugins.html#defining-attributes

However, this only allows you to define declaration attributes, not type attributes, which is presumably what you are after.

A few months ago, I explored the possibility of extending this plugin mechanism to allow the definition of type attributes and submitted the following (ultimately abandoned) patch:

https://reviews.llvm.org/D114235

Unfortunately, the conclusion was that it’s much harder to make type attributes pluggable than declaration attributes. Type attributes interact with the type system, different attributes may want to do this in different ways, and it’s hard to make the logic for this pluggable.

Instead, we have decided to propose a general-purpose type annotation attribute, see this companion RFC. As @duneroadrunner points out above, attributes are typically hidden behind macros anyway for potability reasons, so a general-purpose type annotation would end up looking the same in the source code as a special-purpose attribute.

vvassilev · April 4, 2022, 7:55pm

Hi! Thanks for working on this, it looks very exciting.

There is a tool that creates automatic python/c++ bindings called cppyy. More details on the general use can be found here. In essence, we use clang’s incremental compilation facilities (via cling, and recently clang-repl) to make Python interoperate with C++ on the fly. If we knew more about memory management we would be able to avoid a class of problems where we don’t know who is responsible for object destruction python or C++.

I am happy to elaborate more if you find this useful or worth adding as an interpo use-case.

Aiden2207 · April 4, 2022, 11:01pm

That’s what I figured, I just wanted to make sure you have a plan for all of the weird edge cases that cropped up in rust’s lifetime system.

Yep, that is what I mean. As for issues that arise, I’m thinking use-after-frees:

#include <memory>
class Foo{
    public:
        virtual int make_int() = 0;
};

class LIFETIME_PARAM(a)  Bar: public Foo {
    private:
        int* $a ptr;
    public:
        Bar(int* $a baz):ptr(baz){}
        int make_int(){
            return *ptr;
        }
};
std::unique_ptr<Foo> make_foo(){
    int local = 42; //local variable
    auto bar = Bar(&local);
    auto owned = std::make_unique<Bar>(bar);
    return owned; //uh-oh, returning a reference to a local
}

I’ve just cast away lifetime information and then took advantage of that fact to return a pointer to an object that no longer exists, even though my code assumes that it does. In rust, the equivalent code:

trait Foo {
   fn make_int(&self) -> i32;
}
struct Bar<'a> {
   ptr: &'a i32,
}
impl Foo for Bar<'_> {
   fn make_int(&self) -> i32 {
       *self.ptr
   }
}
fn make_foo() -> Box<dyn Foo> {
   let local = 42;
   Box::new(Bar { ptr: &local }) // error[E0515]: cannot return value referencing local variable `local`
}

fails to compile.

That is what I meant.

While the argument T might have many lifetimes associated with it, there is only one that really matters: the shortest one. You wouldn’t need to individually handle every lifetime involved, just grab the shortest lifetime in the object and call that the lifetime of the object. The biggest reason would be to ensure something has a 'static lifetime. For example, this code, which creates a single object per-type cache

#include <optional>

template <typename T>
std::optional<T> swap_cache(std::optional<T> t){
    static std::optional<T> cache = std::nullopt;
    cache.swap(t);
    return t;
}

is only sound if T satisfies the 'static bound (and if it’s only called from a single-threaded context, but that’s beside the point).

Rust satisfies this constraint by automatically assuming that &T means &'a T where T:'a in most circumstances, and it’s an error if that condition is violated. This means that sometimes you have to bound a generic variable to satisfy that lifetime. The situation that immediately comes to mind is Generic Associated Types (GATs), where you will often have something like

trait Foo{
    type Bar<'a> where Self: 'a;
}

although those are still unstable.

This is perfectly fine if you can sort out the variance appropriately. Rust disallows unused generic parameters because the compiler had to assume the struct was bivariant over those parameters, which was pretty much always the wrong thing.

I suppose, even if it only catches obvious bugs it will probably help avoid more than a few build-run-segfault-curse cycles. I agree though, the only way to truly know how useful it is is to test it on some real code and see what happens.

martinboehme · April 5, 2022, 7:47am

Thanks – these are interesting pointers!

One of the stated goals of our project is to support better interop between C++ and other languages. It would be great if we could add Python to this list. I’m not sure to which lifetimes will be able to help with your specific problem though. It sounds as if you need information about ownership, specifically who is responsible for deleting an object, and lifetimes don’t really help with that. Maybe there are other purposes for which you can use the lifetime information though?

martinboehme · April 5, 2022, 12:00pm

@Aiden2207 First of all, thank you for the many insightful issues you raise!

(quoting only the relevant part of the code)

Aiden2207:

std::unique_ptr<Foo> make_foo(){
    int local = 42; //local variable
    auto bar = Bar(&local);
    auto owned = std::make_unique<Bar>(bar);
    return owned; //uh-oh, returning a reference to a local
}

Thank you. This is an interesting example, and I now understand the issue you are getting at.

Initially, I thought that the verification tool should reject this code. Because an object may not outlive any of its lifetime parameters, we infer on the line auto bar = Bar(&local); that the lifetime parameter of Bar is the local lifetime of local, and I thought this would lead us to conclude that the line return owned returns a reference to a local.

This isn’t true, however. The issue is that converting owned to a unique_ptr<Foo> essentially “erases” the lifetime parameter on Bar. The lifetime of the unique_ptr<Foo> can now be extended at will, beyond that of the lifetime parameter on Bar.

I see two possible solutions to this:

Add some equivalent of Rust’s Box<dyn Foo + 'a> to our C++ annotation scheme – something like std::unique_ptr<Foo + $a> in spirit, though we would need to find something that is actually allowed by C++'s grammar. This would imply that all lifetime parameters of the dynamic type of the pointed-to object outlive $a. It would also imply that the lifetime of the unique_ptr may not be extended beyond $a.
Require all lifetime parameters to be declared on the base class; do not allow lifetime parameters to be added to derived classes.

For the time being, I would favor the second approach because of its simplicity. The main limitation is that it forces Foo to carry a lifetime parameter around even if not all of its subclasses use that lifetime parameter. If it turns out that this is too burdensome, we would need to implement the first approach.

This reflects a general principle that we’ve been trying to follow: Only add those things to the annotation scheme that we’re convinced we’re going to need. It could be argued that everything in Rust that’s related to lifetimes is there because it’s needed, and will therefore also be needed in C++, but we’re not sure this argument holds – Rust and C++ are different languages. We’re also trying hard to minimize complexity to make the annotation scheme as easy to learn as possible. Because of this, we’re being conservative with how many of Rust’s lifetime concepts we import. Real-world evaluations will be the final determination of which features we actually need.

Aiden2207:

While the argument T might have many lifetimes associated with it, there is only one that really matters: the shortest one. You wouldn’t need to individually handle every lifetime involved, just grab the shortest lifetime in the object and call that the lifetime of the object. The biggest reason would be to ensure something has a 'static lifetime. For example, this code, which creates a single object per-type cache
#include <optional>

template <typename T>
std::optional<T> swap_cache(std::optional<T> t){
    static std::optional<T> cache = std::nullopt;
    cache.swap(t);
    return t;
}
is only sound if T satisfies the 'static bound (and if it’s only called from a single-threaded context, but that’s beside the point).

Thanks, I now understand.

This example highlights an interesting difference between Rust and our approach to lifetimes in C++. Because C++ templates are not generics, we perform lifetime inference and checking on each concrete template instantiation. When doing so, all of the lifetimes in the template argument (in this case T) are “visible” and participate in the lifetime inference and checking, allowing us to flag any violations of lifetime correctness.

We have to do this because C++ templates are syntactic; the semantics can vary significantly depending on the template arguments.

Aiden2207:

The situation that immediately comes to mind is Generic Associated Types (GATs), where you will often have something like
trait Foo{
    type Bar<'a> where Self: 'a;
}
although those are still unstable.

I have to admit this is an area of Rust that I’m not very familiar with. My tendency though would be not to add a corresponding construct to our lifetime annotation scheme until we see a clear need for it. (Do you have an example where this kind of thing would be needed in C++?)

For the time being, we’re taking the simple approach of making all lifetime-parameterized classes covariant with respect to their lifetime parameters. In other words, unlike Rust, we’re don’t infer variance from the way those lifetimes are actually used within the class (though we should emit an error if we see that our assumption of variance is in fact wrong). Again, we’re trying to limit complexity for the time being in hopes that we won’t need the additional complexity. If you have some important Rust use cases you can share where something other than covariance is needed, we would be very interested!

ojeda · April 5, 2022, 1:14pm

It is still needed to indicate that the function is safe to call, somehow (e.g. with a [[safe]] attribute in the C++ side). Lifetime annotations might be useful to widen the set of functions that may be annotated as safe, but they should not be automatically marked as such, in order to remain sound from Rust’s point of view.

AaronBallman · April 5, 2022, 6:06pm

Thank you for this RFC, I think it’s very important to try to bring light to the dark corners where bugs lurk, and lifetime issues is absolutely one of the darker corners.

I have questions about about how you intend to experiment with the semantics:

Will the experimentation take place in tree or out of tree? (I understand the plan is to eventually land the annotations in Clang, but this is about the plan leading up to when we have the design finalized.)

If you plan to do the experimentation in tree, how do you expect to protect users against design changes? Especially when we get the semantics design right and we start thinking about how best to surface the feature (new keywords, an attribute specific to the purpose, god forbid: pragmas, etc).

If you plan to do the experimentation out of tree, will there be a feedback loop with the community as you refine the design, or are you planning to wait until the design looks pretty close to final and then start soliciting community feedback?

Aiden2207 · April 6, 2022, 6:26am

Glad I could help.

martinboehme:

This isn’t true, however. The issue is that converting owned to a unique_ptr<Foo> essentially “erases” the lifetime parameter on Bar . The lifetime of the unique_ptr<Foo> can now be extended at will, beyond that of the lifetime parameter on Bar .

I see two possible solutions to this:

Add some equivalent of Rust’s Box<dyn Foo + 'a> to our C++ annotation scheme – something like std::unique_ptr<Foo + $a> in spirit, though we would need to find something that is actually allowed by C++'s grammar. This would imply that all lifetime parameters of the dynamic type of the pointed-to object outlive $a . It would also imply that the lifetime of the unique_ptr may not be extended beyond $a .

Require all lifetime parameters to be declared on the base class; do not allow lifetime parameters to be added to derived classes.

For the time being, I would favor the second approach because of its simplicity. The main limitation is that it forces Foo to carry a lifetime parameter around even if not all of its subclasses use that lifetime parameter. If it turns out that this is too burdensome, we would need to implement the first approach.
For the time being, I would favor the second approach because of its simplicity. The main limitation is that it forces Foo to carry a lifetime parameter around even if not all of its subclasses use that lifetime parameter. If it turns out that this is too burdensome, we would need to implement the first approach.

This reflects a general principle that we’ve been trying to follow: Only add those things to the annotation scheme that we’re convinced we’re going to need. It could be argued that everything in Rust that’s related to lifetimes is there because it’s needed, and will therefore also be needed in C++, but we’re not sure this argument holds – Rust and C++ are different languages. We’re also trying hard to minimize complexity to make the annotation scheme as easy to learn as possible. Because of this, we’re being conservative with how many of Rust’s lifetime concepts we import. Real-world evaluations will be the final determination of which features we actually need.

I think personally the first approach will be the easiest to work with- the second one seems to be rather inflexible, as it seems it forces people writing abstract base classes to plan for other people’s implementations of it. Not having a lifetime and needing one for the derived class is bad, and working with an unnessecary lifetime parameter is definitely awkward.

I think being conservative with how much of Rust’s lifetime system you bring over is a good goal though- a fair bit of what’s there is to deal with issues that C++ simply doesn’t have, on account of the differences between templates and generics, as well as rust’s much stricter rules about what references are permitted to point to and the borrow checker.

Right, templates are checked at instantiation rather than definition, I forgot about that (I am quite a bit more experienced with rust than C++, if you couldn’t already tell).

Sorry, I meant to show an example of where an explicit bound on a type parameter might be useful, but I chose what is quite possibly the worst one. Long story short, GATs are primarily meant to solve a lifetime issue C++ probably doesn’t have to deal with. Come to think of it, because lifetime checking would come after template instantiation, explicit lifetime bounds on the type parameters probably aren’t necessary in C++, but it might be worth integrating such lifetime constraints with concepts.

The Rustnomicon has a section devoted to variance. Long story short, references are covariant over their lifetime, any Object<T> that permits mutation without ownership is invariant over T (that’s things like &mut T and &Cell<T>), function pointers are weird, and anything else is covariant.

Topic		Replies	Views
[RFC] New attribute `annotate_type` (iteration 2) Clang Frontend	9	2424	June 15, 2022
[RFC] Upstreaming Lifetime Function Annotations Clang Frontend	4	116	December 13, 2019
[RFC] Adding lifetime analysis to clang Clang Frontend	25	377	August 23, 2019
Question about lifetime bound annotation Clang Frontend	3	137	May 25, 2021
[RFC] A dataflow analysis framework for Clang AST Clang Frontend	16	1952	November 19, 2021

[RFC] Lifetime annotations for C++

Summary

High-level implementation plan

Implementation status

Rollback plan

Use cases enabled by lifetime annotations

Limitations

Overview of lifetime annotations

Example

Lifetime of this

Lifetimes in template arguments

Lifetime-parameterized types

Lifetime elision

Alternative annotation syntax using only [[clang::annotate]]

Current limitations of proposed lifetime annotations

No subtyping constraints between lifetimes

No equality constraints between lifetime parameters

Cannot define different constraints for function entry and exit

Comparison with other work in this area

[[clang::lifetimebound]]

Lifetime safety: preventing common dangling (WG21 proposal P1179, -Wdangling-gsl)

Implementations

Comparison to Rust-style lifetimes

Examples

Lifetimes and the borrow checker in Rust

Appendix A: std::string_view annotated with lifetimes

Appendix B: Examples of lifetimes inferred by the current experimental implementation

Appendix C: How lifetime annotations help static analysis better understand the object graph and potential mutations

Appendix D: How lifetime annotations help C++/Rust interoperability

References and pointers in Rust

C++/Rust interoperability without lifetime annotations in C++

C++/Rust interoperability with lifetime annotations in C++

Appendix E: "Contributing Extensions to Clang" Q&A

Evidence of a significant user community

A specific need to reside within the Clang tree

Specification

Representation within the appropriate governing organization

A long-term support plan

A high-quality implementation with a test suite

Related Topics

Lifetime of `this`

Alternative annotation syntax using only `[[clang::annotate]]`