[RFC] Introduce [[clang::lifetime_capture_by(X)]]

TL;DR

We propose a new Clang annotation, [[clang::lifetime_capture_by(X)]], to handle cases where a function parameter’s reference is captured by entities like this or other function parameters. This annotation helps diagnose common issues like use-after-free and dangling references that are not detected by [[clang::lifetimebound]].

The latest version of this proposal is being maintained at:
Introduce [[clang::lifetime_capture_by(X)]] - Google Docs.

Background

The [[clang::lifetimebound]] attribute allows developers to indicate that certain function parameters or implicit object parameters (such as this) may have their reference captured by the return value. By specifying such a contract, Clang can detect several use-after-free (UaF) and use-after-return (UaR) errors.

However, [[clang::lifetimebound]] has its limitations. It only accounts for lifetimes tied to return values, leaving a critical gap: it does not cover scenarios where references or pointers are captured by entities like this object, class members, or through propagation in function calls.

Lifetime contracts

Below are few examples of lifetime contracts and violations:
(Today, Clang can diagnose many of them using lifetime annotations and some limited statement-local lifetime analysis.
Violations which cannot be detected using statement-local analysis, needing complex control-flow and dataflow analysis, are explicitly out of scope of this document.)

1. Function’s return value references a parameter

When the return value captures a reference to a parameter, that parameter must outlive the return value. Annotating with [[clang::lifetimebound]] enables Clang to diagnose certain stmt-local cases when the return value outlives the captured argument.

const std::string& getElement(const std::vector<std::string>& v [[clang::lifetimebound]],
                              std::size_t index) {
    return v[index];
}
const std::string& use() {
    const std::string& i1 = getElement({"ab", "cd"}, 0);
    // ^ captures dangling reference to a temporary. Clang detects it. Good.

    std::vector<std::string> v = {"ab", "cd"};
    if (foo()) return getElement(v, 0);
    // ^ returns dangling reference to a stack-variable. Clang detects it. Good.
    
    const std::string& i2 = getElement(v, 0);  // OK.
    return i2;
    // ^ returns reference to a stack-variable. Clang cannot detect this. Bad.
    // Needs dataflow analysis. (Out-of-scope of this proposal).
}

2. Member function’s return value references this object

In this case, the return value of a member function captures a reference to this object. Clang supports diagnosing such cases by annotating the member function with [[clang::lifetimebound]].


struct S {
    int& get() [[clang::lifetimebound]] { return x; };
    int x;
};

int& use() {
    const int& x = S{1}.get(); 
    // ^ captures reference to a temporary. Clang detects it. Good.
    S s{1};
    return s.get();
    // ^ returns reference to a stack variable. Clang detects it. Good.
}

3. this object references a parameter

In this case, a member function captures a reference to an argument in this object. To be valid, the argument should outlive the object.

Clang does not support this today. (Supporting this is the primary purpose of this document.)

struct S {
    void set(const std::string& x) { this->x = x; };
    std::string_view x;
};

std::string create() { return "42"; }

S use() {
    S s;
    s.set(create());
    // ^ 's' captures a reference to a temporary. Clang cannot detect this. BAD.

    std::string local = create();
    s.set(local);  // OK.
    
    return s;
    // ^ returns a reference to 'local'. Clang cannot detect this. Bad.
    // Needs dataflow analysis. (Out-of-scope of this proposal).
}

4. A parameter references another parameter

// 'set' captures a reference to 's'
void addToSet(std::string_view s, std::set<std::string_view>& set) {
    set.insert(s);
}

void use() {
    std::set<std::string_view> set;
    addToSet(create(), set); // Dangling. No support in clang.
}

5. Global entity references another parameter

std::set<std::string_view> set;

// 'set' captures a reference to 's'
void addToSet(std::string_view s) {
    set.insert(s);
}

void use() {
    addToSet(create(), set); // Dangling. No support in clang.
}

Proposal

The primary goal of this proposal is to extend Clang’s capabilities to cover cases like those in examples #3, #4, and #5. These cases involve capturing references to parameters by member functions, other parameters, or global entities, areas that are not currently handled by Clang’s [[clang::lifetimebound]] annotation.
We propose introducing a new Clang annotation, [[clang::lifetime_capture_by(X)]], to formally establish and enforce these lifetime contracts.

Annotation

A function parameter Y can be annotated with [[clang::lifetime_capture_by(X)]] to indicate that a reference to the argument to Y is captured by the entity X. This establishes the contract that Y should outlive X.
Here X can be:

  1. this (for member functions).
  2. Another named parameter of the same function.
  3. Empty / “Unknown”. This would be considered in global scope by the analysis.
// 3. 'this'.
struct S {
    void set(const std::string& x [[clang::lifetime_capture_by(this)]]) { 
        this->x = x;
    };
    std::string_view x;
};

// 4. Another parameter.
void addToSet(std::string_view s [[clang::lifetime_capture_by(set)]],
              std::set<std::string_view>& set) {
    set.insert(s);
}

// 5. Global scope.
std::set<std::string_view> set;
void addToSet(std::string_view s [[clang::lifetime_capture_by()]]) {
   set.insert(s);
}

Annotation in definition vs. declaration

  • If the function definition does not have a parameter annotated then the canonical declaration would be used for reading the annotation for that parameter.
  • If the definition has the annotated parameter, it should be consistent with the declaration and should establish the same lifetime contracts.
  • The capturing parameter name X used in the annotation can differ in definition and declaration if the param X is named differently in the definition and declaration.
void addToSet(std::string_view s [[clang::lifetime_capture_by(set1)]],
              std::set<std::string_view>& set1);
void addToSet(std::string_view s [[clang::lifetime_capture_by(set2)]],
              std::set<std::string_view>& set2) {
    set.insert(s);
}

Support in Clang

When a parameter Y is annotated with [[clang::lifetime_capture_by(X)]], clang would detect instances when the argument to Y does not outlive X.

The implementation for this analysis would reuse the existing clang’s statement-local lifetime analysis. Since the analysis is restricted to a statement, Clang would only detect when a temporary is used as an argument to Y and X lives beyond the function call (thereby capturing a dangling reference to Y in X).

std::string create() { return "42"; }
void addToSet(std::string_view s [[clang::lifetime_capture_by(set)]],
              std::set<std::string_view>& set) {
    set.insert(s);
}
void use() {
    std::set<std::string_view> set;
    addToSet(create(), set);  // Clang would now detect this.
}

Diagnosing wrong usage

Clang would diagnose when the X (in [[clang::lifetime_capture_by(X)]]) does not refer to a semantically valid entity.

  • X is considered valid if it is this or a named function parameter or is empty (unspecified).
  • A parameter X is considered valid if it is not marked as const. This is likely a wrong API definition. X cannot be const if it captures a reference to parameter Y.
  • A parameter X should be a pointer type or a reference type or a type annotated with [[gsl::Pointer]].

Constructors and overlap with [[clang::lifetimebound]]

For C++ constructors, the annotations [[clang::lifetimebound]] and new [[clang::lifetime_capture_by(this)]] (with X=this) would overlap and provide the same analysis and detect the same violations.

For example, below both the annotations establish the contract that a reference to param X is captured by the object s being constructed.

struct S {
    S(const int &x [[clang::lifetimebound]]) : x(x) {}
    S(const int &x [[clang::lifetime_capture_by(this)]]) : x(x) {}
    int &x;
}

void use() {
    S s(1);  // Reference to temporary '1' captured by 's'.
}

Support for Standard Containers

This annotation would prove useful especially for cases involving Container<view types> (like std::vector<std::string_view>).

// Storing a dangling ref in the container.
std::vector<std::string_view> t;
t.push_back(std::string()); 
// ^ 't' captures a dangling reference to a temporary.

We would hardcode such STL containers and their capturing member functions to implicitly annotate their parameters as [[clang::lifetime_capture_by(this)]]. This is how we handle standard GSLOwner types today.

Examples of such candidates would include (where T is a pointer/view type)

  • std::vector<T>::push_back()
  • std::set<T>::insert()

Alternative [[clang::lifetime_capture]]

As discussed earlier, due to limitations of a statement-local analysis, Clang would only be able to detect temporaries being used for params marked with [[clang::lifetime_capture_by(X)]].

A potential alternative is introducing a simpler [[clang::lifetime_capture]] annotation, which would omit specifying the entity X that captures the reference.

Reasons for preferring lifetime_capture_by(X):

  • lifetime_capture_by(X) offers greater clarity by explicitly defining the contract between the parameter and the entity capturing its reference. By specifying X, developers gain a more informative and precise tool for documenting lifetime constraints.
  • X would be used by the analysis to distinguish between the capturing entities. For example, we can differentiate between the argument is captured by this or any other parameter: S{}.captureToGlobal(std::string()) is invalid because reference to temporary is captured to a global scope while S{}.captureToThis(std::string()) is valid because reference to temporary is captured by a temporary this.
  • Furthermore, the [[clang::lifetime_capture_by(X)]] annotation sets a solid foundation for future improvements in lifetime analysis, such as expanding beyond statement-local analysis to detect more intricate use-after-free and dangling reference issues. See the next section for possible future direction.

Why not Rust-like lifetimes ?

Rust-like lifetimes (proposal) are a powerful, general-purpose mechanism for managing memory safety, but there are a few reasons why this proposal does not pursue them.

  • Different approach and purpose: Rust introduces lifetimes as a formal abstraction layer to track reference validity. This abstraction is fundamentally different from the current C++ static analysis in Clang, which is more focused on identifying objects and their pointers and diagnosing local issues. This proposal aims to expand this existing local reasoning by linking more objects and pointers rather than introducing a new system like Rust lifetimes.
  • Incremental improvements and reusing existing analysis: Rust-style lifetimes would require a major rewrite of Clang’s static analysis system. This proposal, however, builds on Clang’s existing infrastructure. It enhances current tools to catch more memory safety issues without a complete overhaul. While Rust lifetimes would address more complex cases, they are beyond the immediate scope. The focus here is on improving the current system, not replacing it.

If and when someone implements Rust-like lifetimes, this work will become obsolete. However, we can catch at least some bugs with this approach until then.

Possible future direction

A promising direction for enhancing Clang’s lifetime analysis involves detecting more general cases where entity Y outlives X, even when Y is not a temporary. Extending Clang’s current single-statement-local analysis would be key to achieving this. Consider the following example:

void use() {
    std::set<std::string_view> set;
    if (foo()) {
        string s = create();
        addToSet(s, set);
        // 's' goes out of scope. 
    }
    // 'set' now has a dangling reference to 's'.
}

void use() {
    std::set<std::string_view> set;
    if (foo()) {
        string s = create();
        addToSet(s, set);
        set.clear();
        // 's' goes out of scope.
    }
    // 'set' is valid here.
}

This would require developing an infrastructure to compare lifetimes of two entities X and Y.

We need to develop an analysis to detect control flows having “X captures a reference to Y” followed by “Y goes out of scope before X” with no “X releases the captured reference to Y” in between.

Reinitialization (like set.clear()) is one of the ways X can release the captured reference to Y. Clang already supports [[clang::reinitializes]] annotation to mark member functions which reinitializes an object’s lifetime.

3 Likes

Thanks for working on this! I absolutely love it.

There are many orthogonal ways to expand on lifetimebound. This proposal addresses one of them, but does not cover others. While I do believe that it is OK to only expand this annotation in one direction at the time, I’d love to see some preliminary work to make sure this design is compatible with other potential ways to increase the expressivity.

Here are some of the things that I’d love these annotations to be able to express in the future:

  1. Make a distinction between unannotated code and code that does not have any lifetime dependence:
// How to annotate this?
string_view foo() { return "static_string"; }
  1. Add a way to annotate non-top level entities:
// How to annotate the lifetime for result.first and result.second?
std::pair<iterator, iterator> algorithm(const Container1& c1, const Container2& c2);
// How to annotate that this returns p.first?
std::string_view first(std::pair<std::string_view, std::string_view> p);
// How to annotate the lifetime contracts of something behind a pointer or a reference?
std::string_view deref(const std::string_view* arg);
  1. More integrated role into the type system. Ability to annotate function pointers and disallow unsafe conversions. Check if overrides of functions are adhering to the proper subtyping rules.

I understand that we do not want to make lifetimebound annotations as expressive as Rust style lifetimes, but I wonder if the sweet spot might be a bit more expressivity than what is proposed here and I want to make sure that this extension does not prevent us from doing further improvements.

If and when someone implements Rust-like lifetimes, this work will become obsolete.

I strongly disagree. I think it is not impossible that for some functions these lifetimebound annotations could be converted to Rust-style annotations. Moreover, Rust-style lifetime analysis is stricter and often requires rewriting/refactoring code. A less restrictive analysis based on a different annotation language might be a good stepping stone for some projects to incrementally migrate towards memory safety. The sweet spot between safety and the ease of prototyping/changing architecture is depending on a lot of factors. This makes me think that these annotations will remain useful even if C++ adopts a Rust-style borrow checker.

The existing analysis does not differentiate between the two. Unannotated code implies that there are no lifetime contracts. I think the same can be said about this code, and I would argue that there is no need to annotate code that does not have lifetime dependence.

These indeed touch upon a more intricate level of lifetime analysis, and I feel it is closer to the Rust-like system of explicitly tracking lifetimes of individual entities within data structures. Supporting this would likely require an additional abstraction of “lifetimes” that goes beyond simple pointer/reference relations expressed by existing and proposed annotations. It would further need deeper integration into the type system.

The current lifetimebound annotation (and its extensions) might not be expressive enough to establish lifetime contracts across function calls (even if they are part of the same statement). It would probably need a different annotation and significant changes to existing analysis.

For example, we could imagine attaching [[clang::lifetimebound]] to the template argument instead of the function parameter.

std::string_view 
first(const std::pair<std::string_view [[clang::lifetimebound]], std::string_view>& p);

We could rely on the annotation being propagated to the corresponding std::pair constructor instantiation (let’s assume this works).

std::pair(const std::string_view& [[clang::lifetimebound]], const std::string_view&)

But this is also not sufficient to catch simple cases like std::string_view s = first({std::string(), local}); because the std::pair is a temporary here, and analysis would consider it fine for a temporary to refer to another temporary. See godbolt.

We can try to (implicitly) annotate the const std::pair& parameter in first function as well. But it would introduce new false positives. See godbolt.

std::string_view 
first(std::pair<std::string_view [[clang::lifetimebound]], std::string_view> p [[clang::lifetimebound]]);
auto sv = first({local, local}); // false positive warning here.

In conclusion, I do not see a great way forward with existing [[clang::lifeitmebound]] for non-top level entities.

This is true but annotations can also serve as documentation and future analyses can benefit from this. E.g., one could create a simple check/policy all string_view returning functions need to be annotated. It can also be useful for interop with Rust, e.g., bridging a function as returning a slice with a static lifetime. That being said, I am not advocating for adding something that we do not have a use case for at the moment, I just want to make sure that the syntax can accommodate this if there is a need in the future.

I think this is true if we want to solve the general problem. But I had something slightly different in mind, something like using access paths:

std::string_view 
first(const std::pair<std::string_view, std::string_view>& p)
  [[clang::lifetimebound(return: p.first)]];

This is a bit more general than the annotations currently proposed, but does not go all the way to Rust-style lifetimes. One of the appeals of supporting this scenario is to support some common idioms in C++ like APIs working with pairs of iterators.

What do you think?

Right. I do not see a need for this at the moment. If needed, we can introduce new annotations for specific purposes.

[[clang::lifetime_static]] string_view foo() { return "static_string"; }

For user education, checks that rely on all returned pointers having some lifetimebound parameter can suggest adding this annotation if the pointer points to static storage.

I think lifetime_capture_by can also be extended in a similar fashion if the need be

void getFirst(
  const pair<string_view, string_view>& p [[clang::lifetime_capture_by(p.first, output)]], 
  string_view& output);

If there is a concern to not introduce a new spelling, I am fine with sticking with lifetimebound. So instead of introducing [[clang::lifetime_capture_by(X)]], we can extend [[clang::lifetimebound]] to [[clang::lifetimebound(X)]] with the proposed semantics.

When and if there is a need, the X here can be extended to express more complex constraints using a new DSL. The ergonomics are certainly debatable here: Function annotation vs parameter decl annotation, (return: p.first) vs (return, p.first). But we do not have to decide this in this proposal. My point would be that this looks extensible enough to express such constraints.

WDYT ?

This proposal makes sense to me, I support it. It is a good extension to improve the expressivity of [[clang::lifetimebound]] and it seems to be flexible enough so we can extend this further in the future.

I have one small comment:

void addToSet(std::string_view s [[clang::lifetime_capture_by(set)]],
              std::set<std::string_view>& set);

It looks like the attribute can reference a parameter that is introduced after the attribute. It looks like this requires delayed parsing. Do you have any concerns about this? I think Clang already supports this and there are other attributes that follow this pattern (-fbounds-safety attributes come to my mind as an example).

I think if there are no objections in a week or so we can consider this RFC accepted.

1 Like

Yeah. I think clang already supports this kind of attribute argument.
I see CallbackAttr which also references params. Attributes also support late parsing. So I feel confident that this should be doable.