[RFC] Intra-procedural Lifetime Analysis in Clang

Utkarsh Saxena @usx95
Dmytro Hrybenko @gribozavr
Yitzhak Mandelbaum @ymand
Jan Voung @jvoung
Kinuko Yasuda @kinu

Summary

Clang’s current lifetime analysis operates locally within a single statement and cannot track object lifetimes across basic blocks or control-flow constructs.

This RFC proposes a new intra-procedural, flow-sensitive lifetime analysis for Clang to detect a broader class of use-after-scope issues, such as use-after-free and use-after-return, particularly those involving stack-allocated variables. The specific details of the underlying dataflow algorithm are omitted here, as this RFC focuses on the goals, user visible changes, and high-level approach, rather than serving as a detailed design document.

At its core, this analysis performs a form of points-to analysis based on OriginSets and Loans. An OriginSet is a symbolic identifier associated with a pointer-like object (pointer, reference, view), representing a set of possible Loans it could hold. A Loan represents an act of borrowing from a specific memory location. The underlying dataflow analysis and lifetime model are inspired by Rust’s Polonius borrow checker, adapted significantly for C++ semantics.

This approach tracks the set of Loans within a pointer’s OriginSet across control flow. The analysis respects existing annotations (such as clang::lifetimebound, gsl::Pointer, gsl::Owner). We would use approximations and gradual typing because C++ functions often lack necessary lifetime annotations (like clang::lifetimebound), or sometimes their lifetime contracts are too complex to be fully expressed using the existing annotation system. Consequently, it assigns an ‘Opaque’ (or Unknown) Loan to an OriginSet when a pointer’s source is unclear, particularly after calls to such functions.

The analysis offers different strictness levels (-Wdangling-safety and -Wdangling-safety-permissive). This configuration allows users to control the sensitivity of the warnings issued, managing the trade-off between finding more potential bugs and reducing false positive reports (as detailed in the Permissive vs. Strict Modes section).

This analysis is intended to eventually supersede Clang’s existing statement-local lifetime checker with strictly more powerful capabilities.

C++ Lifetime Model: An alias-based approach

Inspired by Polonius, this analysis uses a points-to technique based on OriginSets and Loans designed for intuitive understanding. Here’s how it works on a high level:

  • When a reference or pointer is created (a borrow occurs), it generates a Loan. Each Loan represents borrowing from a specific memory location and is identified by where it was created in the code (borrow site) and the path to the borrowed memory (e.g., var, obj.field, arr[0]).
  • For each pointer variable or expression, the analysis tracks its OriginSet. An OriginSet is a symbolic identifier representing the set of possible Loans the pointer could hold at a program point.
  • The analysis determines when the lifetime of the memory associated with each Loan expires (e.g., when a local variable goes out of scope).
  • It flags an error when a pointer is used at a point where its OriginSet could contain a Loan whose lifetime has already ended.

This focus on tracking the possible sources (Loans) contained within a pointer’s OriginSet and checking their validity upon use aims to make warnings easier to understand and debug than more abstract models (e.g., NLL (non-lexical lifetime) in Rust).

The analysis tracks the set of loans associated with each pointer’s OriginSet {…} for ptr through the control flow. Consider these examples.

void simple() {
    std::string_view ptr; // ptr's origin set is {} (empty)
    {
        std::string small = "short lived";
        ptr = small; // Taking a reference to 'small' creates a loan 'L' with path 'small'.
                     // ptr's origin set contains Loan L.
    }  // lifetime of 'small' ends => Loan L expires.
    // ptr's origin set is {<expired L>}
    std::cout << ptr; // UaF: origin set contains expired loan L.
}

Origin sets merge at join points in the CFG.

void branch(bool condition) {
    std::string large = "long lived";
    std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}

    if (condition) {
        std::string small = "short lived";
        ptr = small; // Loan L_small is created. ptr's origin set is {L_small}
    }  // L_small expires
    // Origin sets merge: {L_large, <expired L_small>}
    std::cout << ptr; // UaF: origin set potentially contains expired loan L_small.
}

Reassignments overwrite the origin set.

void reassignments(bool condition) {
    std::string large = "long lived";
    std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}

    if (condition) {
        std::string small = "short lived";
        ptr = small; // Loan L_small is created; ptr's origin set is {L_small}
    }  // L_small expires.
    // Origin sets merge: {L_large, <expired L_small>}

    ptr = large; // New loan L_large2 is created with path 'large' at this borrow site.
                 // Reassignment: ptr's origin is now just {L_large2}.
                 // The potential link to '<expired L_small>' is removed.
    std::cout << ptr; // Ok.
}

Pointer assignment propagates the origins.

void pointer_assignments() {
    std::string_view ptr1; // ptr1's origin is {}
    {
        std::string small = "short lived";
        std::string_view ptr2; // ptr2's origin is {}

        ptr2 = small; // L_small; ptr2's origin set is {L_small}
        ptr1 = ptr2;  // Assignment: ptr2 flows into ptr1.
                      // ptr1's and ptr2's origin is {L_small}
    }
    std::cout << ptr1; // UaF; origin contains expired loan L_small.
}

Output origin covers input origins resulting in a union.
When a function has [[clang::lifetimebound]] parameters, its return value’s Origin is constrained by the Origins of those parameters. For functions like below, this means the return Origin effectively contains the union of Loans from all lifetimebound input Origins.

std::string_view max(std::string_view a [[clang::lifetimebound]],
                     std::string_view b [[clang::lifetimebound]]);

void form_subsets() {
    std::string a = "a";
    std::string b = "b";

    std::string_view ptr1 = a; // Loan La; ptr1's origin is {La}
    std::string_view ptr2 = b; // Loan Lb; ptr2's origin is {Lb}

    std::string_view ptr3 = max(ptr1, ptr2);
                               // ptr1's origin is a subset of ptr3.                                          
                               // ptr2's origin is a subset of ptr3.
                               // => ptr3's origin is {La, Lb}
}

Opportunistic Bug finding

Inner types (Structs, Containers): While the core model focuses on Origins associated with top-level variables and expressions (pointers, references, views), we also aim to provide opportunistic bug finding for common patterns involving pointers within aggregate types (struct members, std::pair) or containers (e.g., std::vector).

This approach relies on heuristics and specific knowledge of common types, similar to the existing statement-local analysis (e.g., container of pointers). It is less general than a system with full support for Rust-like lifetime parameters on type definitions but allows catching important classes of bugs today. As Clang potentially adopts more explicit lifetime annotations for types, the reliance on these special handling would diminish.

struct S {
    std::string_view a; // Member 'a' has Origin Oa
    std::string_view b; // Member 'b' has Origin Ob
};

S return_struct_with_local() {
    std::string local_str = "local";

    S s; // Instance 's' created. Origins Oa={}, Ob={} initialized.

    s.a = global_str; // Oa = {L_global}
    s.b = local_str;  // Ob = {L_local}
    return s; // Returning 's'.
              // L_local expires.
              // Oa contains {L_global} => Ok
              // Ob contains {L_local} => UaR.
}
std::vector<std::string_view> return_vector_with_local() {
    std::vector<std::string_view /*Inner origin Oi*/> v; // Oi = {}

    std::string local = "local";

    // vector::push_back(T) is [[clang::lifetime_capture_by(this)]];
    v.push_back(local);  // Loan L_local to 'local'.
                         // Oi = {L_local}.
    v.push_back(global); // Loan L_global to 'global'.
                         // Oi = {L_local, L_global}.

    return v; // Returning 'v' associated with Oi containing loan L_local.
} // End scope: L_local expires.

Permissive (-Wdangling-safety-permissive) vs. Strict (-Wdangling-safety) Modes

This lifetime analysis reports potential issues under two different warning flags, -Wdangling-safety-permissive (permissive mode) and -Wdangling-safety (strict mode), corresponding to the analysis’s confidence that a true bug exists.

The core analysis tracks the Origin set (representing the set of Loans it might hold). The difference between the permissive and strict modes then lies in their reporting criteria: the permissive mode typically reports only if the pointer must be dangling (a “must-analysis”), whereas the strict mode reports if the pointer may be dangling (a “may-analysis”).

Warning Trigger Conditions:
We issue a warning under the following conditions:

  1. Loan Expiry: A Loan L, representing a borrow, created at point P_borrow, expires at point P_expire (e.g., stack variable goes out of scope).
  2. Potential Dangling Pointer: At P_expire, any pointer Ptr whose Origin O_ptr contains the (now expired) Loan L is considered potentially dangling.
  3. Liveness and Use: A diagnostic is generated only if such a pointer Ptr is potentially used at a later point P_use (meaning Ptr is “live” at P_expire).

Determining the warning group:

  • Permissive: If all loans in Ptr’s origin O_ptr at P_expire refer to expired memory, a -Wdangling-safety-permissive warning is issued. The diagnostic is attached to the borrow location (P_borrow) with a note indicating the problematic use at P_use. This signifies a high-confidence bug according to the analysis. Results in fewer false positives but may miss bugs specific to certain paths.
  • Strict: If the Loan L is expired, but there are other loans in Ptr’s origin O_ptr at P_expire which are still valid, then a -Wdangling-safety warning is issued. The diagnostic is still attached to P_borrow with a note at the use P_use. This mode prioritizes safety. Catches dangerous code patterns where analysis indicates a use-after-free could occur via some path, this may include paths that are dynamically unreachable due to program logic, hence will have more false positives.
std::string global_str = "STATIC";

std::string_view permissive() {
  std::string local = "local";
  view = local;  // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety-permissive]
                 // P_expire: 'local' expires.
  return view;   // P_use: note: returned here.
}

std::string_view strict(bool condition) {
  std::string local = "local";
  std::string_view view = global_str;
  if (condition) {
    view = local;  // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety]
  } 
  // P_expire: 'local' expires after return.
  return view; // P_use: note: returned here.
}

This allows users to choose between broader detection (strict) and higher confidence with less noise (permissive).

Note: The warning group -Wdangling-safety implies and subsumes -Wdangling-safety-permissive.

Gradual typing: Opaque / Unknown Semantics

It’s common to call functions where the lifetime relationship between inputs and outputs isn’t explicitly declared (e.g., missing [[clang::lifetimebound]]) or is too complex to be expressed using existing clang annotations.
When the analysis encounters a pointer or reference initialized from such an “opaque” source:

  • It cannot determine the true Loan(s) that should be associated with the pointer’s Origin, nor their actual lifetime dependencies.
  • It assigns a special “Opaque Loan” to the pointer’s Origin.

This is a conservative approximation to avoid false positives. Since the analysis doesn’t know when the memory backing the opaque pointer actually becomes invalid, it optimistically assumes it remains valid for the entire duration of the current function regardless of the strictness modes mentioned above.

std::string_view opaque_view();

void foo() {
    std::string_view x; // Ox = {}
    x = opaque_view();  // Ox = {Opaque}
}

void store(std::string_view* output);

void foo() {
    std::string_view x; // Ox = {}
    store(&x);          // Ox = {Opaque}
}

Future enhancements

  • Annotation Suggestions: Diagnostics suggesting the addition of [[clang::lifetimebound]] where the analysis detects lifetimes escaping functions, helping users improve function contracts.
  • Annotation Verification: Diagnostics verifying that function implementations adhere to their declared [[clang::lifetimebound]] contracts, catching mismatches between declaration and behavior.
  • Pointer/Iterator Invalidation: Extend the analysis beyond scope-based expiry to detect invalidation caused by operations modifying underlying objects or containers (e.g., std::string reassignment, std::vector::push_back). This could involve modeling limited forms of exclusivity (aliasing XOR mutability) for specific standard library types to catch common bugs like iterator invalidation.

Relation to Rust-like Lifetimes

This analysis draws inspiration from Rust’s Polonius borrow checker. While adapted for C++'s semantics (e.g., handling opaque calls, no enforced exclusivity, configurable strictness), its internal model still uses concepts like Loans and Origins which are analogous to formulation of lifetime in Rust’s Polonius.

This notion of a Origin/Loans serves a role closely related to the lifetime/origin tracking in Polonius, providing a conceptual bridge. This proposal develops the necessary CFG-based dataflow infrastructure that could also support more explicit, Rust-style lifetime systems if they were introduced to Clang.

If Clang evolves to include Rust-like lifetime annotations (e.g., annotating pointers and references with fine-grained lifetimes like T& [[clang::lifetime(a)]]), this analysis framework is positioned to directly consume them. User-provided annotations could then directly inform the calculation of which Loans belong to which Origins. For instance, explicit “outlives” constraints (like 'a: 'b in Rust) would translate directly to subset relations between the corresponding Origins, which this analysis could then enforce. This would replace current approximations (like ‘Opaque Loans’ for unknown function calls or heuristics for unannotated parameters) and significantly increase the precision of the analysis. Furthermore, explicit lifetime parameters on types could reduce the need for special handling of nested pointers/views (e.g., within containers or structs, as discussed previously in "Opportunistic Bug Finding”) by making their lifetime dependencies clear.

However, the analysis described here delivers value independently for today’s C++ and does not rely on the adoption of Rust-style lifetimes.

RFC

Apart from the overall direction of introducing a more powerful, function-local lifetime analysis in Clang, we also seek feedback on the following points:

Warning Flags and Naming

We propose to add this analysis to Clang under two warning flags:

  • -Wdangling-safety to report potentially unsafe patterns at the cost of false positives.
  • -Wdangling-safety-permissive to report only high-confidence warnings suitable for broad adoption with minimal false positives.

Other naming schemes considered:

  • Focusing on “strict”: -Wdangling-cfg (for permissive) and -Wdangling-safety-strict (for strict)
    • The question is, what do we want users to believe our suggested default is. The shorter name without additional qualifiers is more likely to be perceived as the recommended setting. -Wdangling-cfg-permissive sounds like a compatibility mode for legacy codebases, and people should strive to move towards the strict one (-Wdangling-cfg).
  • Using a different base name: e.g., -Wtemporal-safety.
    • We could keep the immediate flags specific to “dangling” issues (like the proposed -Wdangling-safety) and to potentially introduce an umbrella flag like -Wtemporal-safety in the future. This umbrella could then enable a suite of distinct temporal safety warnings, including this one and others (e.g., a separate warning for iterator invalidation).

Experimental prefix:

  • This lifetime analysis will be developed incrementally, and a warning with the experimental prefix -Wexperimental-dangling-safety will be used until it is ready for general use, preventing user disappointment with an incomplete feature.

Default Enablement

  • Permissive mode would be enabled by default once we confirm that the false positive rate is sufficiently low and there is a reasonable compile time impact on typical codebases.
  • Strict mode would be off by default. (Strict warnings could also potentially be surfaced via ClangTidy).

Code structure

  • Plan is to add the analysis under clang/lib/Analysis/ with entry point in clang/lib/Sema/AnalysisBasedWarnings.cpp (like other cfg-based analysis, e.g., thread-safety analysis).
  • We would build the analysis on top of Clang’s CFG.

Performance

Performance impact is a key consideration and will be monitored closely during development. While the underlying dataflow analysis approach is expected to be manageable for typical C++ functions, for particularly complex cases (e.g.), we have the option to cap the analysis after a certain number of iterations per function. To maintain reasonable compile times, this bug-finding tool might miss some issues in extremely complex functions, which is an acceptable compromise.

Appendix: More examples
std::string_view 
Lifetimebound(std::string_view str [[clang::lifetimebound]]);

std::string_view foo(bool cond) {
  std::string local;
  std::string_view view;
  view = Lifetimebound(local);
  return result; // error: returning reference to stack variable 'local'.
}
int* result;
if (std::unique_ptr<int> ptr = create(); ptr != nullptr) {
  result = ptr.get(); // error: 'result' points to 'ptr' which doesn't live long enough.
}
use(result); // note: later used here.

Lifetime_capture_by(X)

struct S {
  void set(std::string_view x [[clang::lifetime_capture_by(this)]]) { view = x; }
  std::string_view view;
}

void foo() {
  S s;
  if (condition) {
    std::string local;
    s.set(local); // error: 's' captures 'local' which doesn't live long enough.
  }
  use(s); // note: later used here.
}

Container of views: Vector

void foo() {
  std::vector<std::string_view> views;
  if (condition) {
    std::string local;
    views.push_back(local); // error: 'views' captures 'local' which doesn't live long enough.
  }
  use(view); // note: later used here.
}

Container of views: Maps

absl::flat_hash_map<std::string_view, int> views;
for (...) {
  std::string local;
  // UaF only if it's inserting the key but we may choose to always error.
  auto& v = views[local]; // error: captures 'local' which doesn't live long enough.
}
use(map_of_views); // note: later used here.

Member Pointers

struct S {
  std::string_view a;
  std::string_view b;
};

void foo() {
  std::string safe;
  S s;
  s.a = safe;
  if (condition) {
    std::string unsafe;
    s.b = unsafe; // error: 's.b' points to 'unsafe' which doesn't live long enough.
  }
  use(s); // note: later used here.
}
// Store a pointer to small scope local object in a member pointer.
class S {
  void foo() {
    std::string local;
    view_ = local; // error: 'view_' points to 'local' which doesn't live long enough.
  }
private: 
  std::string_view view_;
};

Lambdas and callbacks

Async functions accepting callbacks can be annotated with capture_by(this).

thread::Scheduler scheduler;
if (condition) {
    std::string local = "blah";
    scheduler.Add([&]() -> { return use(local); }); // error: 'scheduler' catpures 's' which doesn't live long enough.
}
scheduler.Join(); // note: later used here.

Suggest lifetime annotations

These annotations remain the only way to convey lifetimes across function boundaries and high-quality suggestions have previously helped us uncover several bugs.

std::string_view TrimPrefix(std::string_view in [[clang::lifetimebound]]);
std::string_view TrimSuffix(std::string_view in [[clang::lifetimebound]]);

std::string_view Trim(std::string_view in) { // error: missing lifetimebound on 'in'.
  return TrimPrefix(TrimSuffix(in));
}

Pointer/Iterator invalidation

std::vector<int> v = {1, 2, 3, 4};
auto it = v.find(1);
v.push_back(5); // error: 'it' is not valid anymore.
use(*it); // use-after-free
13 Likes

Adding folks you have been involved previously: @devincoughlin @Xazax-hun

Thanks for the RFC, I am a huge proponent of having such an analysis in Clang and happy to review the patches!

My understanding is that the Rust community has the desire to move towards Polonius at some point as it can accept more correct programs than the current non-lexical lifetimes approach. On the other hand, they had some challenges with performance. That makes me a bit uneasy about whether we can have it on by default. That being said, with the right cut heuristics, it might be fine and if we never try, we never figure it out.

I suspect that this might depend on the performance. If we need to skip the flow-sensitive analysis of some big functions, we could still fall back to the statement local warnings for such code.

In C++, we have many way to express the same memory locations even without aliasing. E.g., obj->field is the same as (*obj).field. I believe the thread safety analysis has some abstractions in place to mitigate this problem. It would be nice to share some code there if it makes sense. @aaronpuchert could you correct me if I’m wrong?

I think there might be some other components that could be reused from the dataflow analysis framework even if we do not want the full SAT solver, memory model and co from there. Specifically, I think the worklist might be standalone enough to be easily reused if we need something similar here.

1 Like

I also wanted to second this concern. Lifetimebound analysis in its current (limited) form is very cheap and can be enabled even for very large codebases without incurring prohibitive compilation overhead. It would be a shame to give this up. @usx95 is very aware of that constraint, though, as we do want to use the new analysis on Google’s internal codebase.

We should definitely try and collect the data on performance before making any conclusions.
It does seem that some form of heuristics is the right way forward, although they may require exposing some flags for certain use-cases, e.g. I expect folks to want a rigorous checker for strict mode even if it comes with performance penalty even if that’s not the right default for other use-cases.

2 Likes

That’s correct, but it’s a bit rudimentary. There is no actual canonicalization, instead unary & and * are simply ignored. In SExprBuilder::translateUnaryOperator (ThreadSafetyCommon.cpp):

  case UO_AddrOf:
    // [handling address-of member function ...]
    // otherwise, & is a no-op
    return translate(UO->getSubExpr(), Ctx);

  // We treat these as no-ops
  case UO_Deref:
    return translate(UO->getSubExpr(), Ctx);

When translating a MemberExpr (or CXXMemberCallExpr), we ignore ME->isArrow() and instead derive whether we have -> or . from the type of ME->getBase(). A better way is probably to canonicalize (i.e. replace -> by (*).).

:heart:. SGTM.

Yeah. I am aware of the performance challenges the Rust community encountered with Polonius and as mentioned in the RFC’s Performance section, achieving acceptable compile-time impact is a firm prerequisite for considering default enablement of even the permissive mode.

However, C++ and Clang have a key advantage here that Polonius doesn’t: we have the liberty to have false-negatives:

  • Capping Iterations: Unlike Rust, which aims for soundness and completeness in rejecting all invalid programs, we can cap the number of dataflow iterations we do before reaching fixed point and stop early.

  • Impact of Capping (False Negatives, Not False Positives): The analysis exhibits a monotonic property: the set of Loans within an OriginSet (On) at some program point after n iterations is always a subset of the set after n+1 iterations (On ⊆ On+1). This means OriginSets only expand or remain stable as iterations proceed.
    Consequently, if we cap iterations, we might fail to propagate some Loans into their respective OriginSets. If an expired Loan is among those missed for a live OriginSet, a use-after-free might go undetected. This trade-off results in potential false negatives (missing some actual bugs) but, importantly, does not introduce false positives (incorrectly flagging valid code). This characteristic seems vital for broad adoption and default enablement.

The new analysis, even when configured to perform zero flow-sensitive iterations, should be able to supersede the current statement-local analysis.

The performance of this ‘zero-iteration’ mode will certainly be a key metric and if it proves efficient, it could serve as the baseline replacing the current analysis. Making the number of iterations configurable, potentially defaulting to a very low number (or even zero for an initial rollout if necessary) could also be an option then.

SG.

I’m really excited about this proposal! I think it could help WebKit a lot.

A few things I mentioned in today’s meet-up:

  • I believe this analysis can also enforce the noescape function argument attribute with relatively little additional effort. The strawman proposal is to model a noescape function argument as having an OriginSet that contains a single Loan that originates at the start of the function and invalidates at every function return point.
  • If we enforce noescape, then we can also turn this intra-procedural analysis into an inter-procedural analysis, still based on local reasoning, again with relatively little additional effort. The strawman proposal is: Any value whose OriginSet includes a non-opaque Loan, if passed as a function argument in a parameter slot that has no declared lifetime label (i.e., the declared function parameter has the Opaque origin), is by definition a Potential Dangling Pointer. (Since the dangling pointer is only potential, it probably only emits a warning in strict mode.) The programmer can cure this warning by marking the parameter noescape, [[clang::lifetimebound]], or [[clang::lifetime_capture_by]]. (Of course, the compiler may still signal a different warning if it finds a contradiction with these attributes.)
2 Likes

Excited to see more lifetime analysis work, thanks for sharing!

Have you considered using the data-flow analysis framework? Curious on why it isn’t not a good fit for this.

Sorry for being late to the party!
This looks exciting and I totally agree with the sentiment that while there might be challenges ahead it shouldn’t block anyone’s attempt at solving them.

Out of sheer (and possibly premature) curiosity - have you given any thoughts to multidimensional pointers?
Things like int ** ptr_to_ptr; or int * array_of_ptrs[42];

Thanks for bringing this up. This was also briefly discussed in one of the PRs .

It doesn’t work right now in the early phases where the analysis assigns one origin per expression/decl.
The plan is to fix this by moving from a single origin per pointer type to a list of origins. The size of this would be governed by the type. For example,

  • int** a would get two origins [O1, O2]
  • struct S { int * a; int **b; } s; would get three origins [[O1], [O2, O3]] (one for int* and two for int**). We need a stronger integration with the Type here to resolve the expression derived from top level s, e.g. that s.a has origin O1 or *s.b has origin O3.

For int * array_of_ptrs[42];, I expect a single origin to work fine though without array sensitivity.

Update: We are forming a breakout group for this effort (Lifetime Safety in Clang) under LLVM Memory Safety WG. Feel free to join. Details are posted here.

1 Like