Utkarsh Saxena @usx95
Dmytro Hrybenko @gribozavr
Yitzhak Mandelbaum @ymand
Jan Voung @jvoung
Kinuko Yasuda @kinu
Summary
Clang’s current lifetime analysis operates locally within a single statement and cannot track object lifetimes across basic blocks or control-flow constructs.
This RFC proposes a new intra-procedural, flow-sensitive lifetime analysis for Clang to detect a broader class of use-after-scope issues, such as use-after-free and use-after-return, particularly those involving stack-allocated variables. The specific details of the underlying dataflow algorithm are omitted here, as this RFC focuses on the goals, user visible changes, and high-level approach, rather than serving as a detailed design document.
At its core, this analysis performs a form of points-to analysis based on OriginSets and Loans. An OriginSet is a symbolic identifier associated with a pointer-like object (pointer, reference, view), representing a set of possible Loans it could hold. A Loan represents an act of borrowing from a specific memory location. The underlying dataflow analysis and lifetime model are inspired by Rust’s Polonius borrow checker, adapted significantly for C++ semantics.
This approach tracks the set of Loans within a pointer’s OriginSet across control flow. The analysis respects existing annotations (such as clang::lifetimebound
, gsl::Pointer
, gsl::Owner
). We would use approximations and gradual typing because C++ functions often lack necessary lifetime annotations (like clang::lifetimebound
), or sometimes their lifetime contracts are too complex to be fully expressed using the existing annotation system. Consequently, it assigns an ‘Opaque’ (or Unknown) Loan to an OriginSet when a pointer’s source is unclear, particularly after calls to such functions.
The analysis offers different strictness levels (-Wdangling-safety
and -Wdangling-safety-permissive
). This configuration allows users to control the sensitivity of the warnings issued, managing the trade-off between finding more potential bugs and reducing false positive reports (as detailed in the Permissive vs. Strict Modes section).
This analysis is intended to eventually supersede Clang’s existing statement-local lifetime checker with strictly more powerful capabilities.
C++ Lifetime Model: An alias-based approach
Inspired by Polonius, this analysis uses a points-to technique based on OriginSets and Loans designed for intuitive understanding. Here’s how it works on a high level:
- When a reference or pointer is created (a borrow occurs), it generates a Loan. Each Loan represents borrowing from a specific memory location and is identified by where it was created in the code (borrow site) and the path to the borrowed memory (e.g.,
var
,obj.field
,arr[0]
). - For each pointer variable or expression, the analysis tracks its OriginSet. An OriginSet is a symbolic identifier representing the set of possible Loans the pointer could hold at a program point.
- The analysis determines when the lifetime of the memory associated with each Loan expires (e.g., when a local variable goes out of scope).
- It flags an error when a pointer is used at a point where its OriginSet could contain a Loan whose lifetime has already ended.
This focus on tracking the possible sources (Loans) contained within a pointer’s OriginSet and checking their validity upon use aims to make warnings easier to understand and debug than more abstract models (e.g., NLL (non-lexical lifetime) in Rust).
The analysis tracks the set of loans associated with each pointer’s OriginSet {…} for ptr through the control flow. Consider these examples.
void simple() {
std::string_view ptr; // ptr's origin set is {} (empty)
{
std::string small = "short lived";
ptr = small; // Taking a reference to 'small' creates a loan 'L' with path 'small'.
// ptr's origin set contains Loan L.
} // lifetime of 'small' ends => Loan L expires.
// ptr's origin set is {<expired L>}
std::cout << ptr; // UaF: origin set contains expired loan L.
}
Origin sets merge at join points in the CFG.
void branch(bool condition) {
std::string large = "long lived";
std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}
if (condition) {
std::string small = "short lived";
ptr = small; // Loan L_small is created. ptr's origin set is {L_small}
} // L_small expires
// Origin sets merge: {L_large, <expired L_small>}
std::cout << ptr; // UaF: origin set potentially contains expired loan L_small.
}
Reassignments overwrite the origin set.
void reassignments(bool condition) {
std::string large = "long lived";
std::string_view ptr = large; // Loan L_large is created; ptr's origin set is {L_large}
if (condition) {
std::string small = "short lived";
ptr = small; // Loan L_small is created; ptr's origin set is {L_small}
} // L_small expires.
// Origin sets merge: {L_large, <expired L_small>}
ptr = large; // New loan L_large2 is created with path 'large' at this borrow site.
// Reassignment: ptr's origin is now just {L_large2}.
// The potential link to '<expired L_small>' is removed.
std::cout << ptr; // Ok.
}
Pointer assignment propagates the origins.
void pointer_assignments() {
std::string_view ptr1; // ptr1's origin is {}
{
std::string small = "short lived";
std::string_view ptr2; // ptr2's origin is {}
ptr2 = small; // L_small; ptr2's origin set is {L_small}
ptr1 = ptr2; // Assignment: ptr2 flows into ptr1.
// ptr1's and ptr2's origin is {L_small}
}
std::cout << ptr1; // UaF; origin contains expired loan L_small.
}
Output origin covers input origins resulting in a union.
When a function has [[clang::lifetimebound]]
parameters, its return value’s Origin is constrained by the Origins of those parameters. For functions like below, this means the return Origin effectively contains the union of Loans from all lifetimebound input Origins.
std::string_view max(std::string_view a [[clang::lifetimebound]],
std::string_view b [[clang::lifetimebound]]);
void form_subsets() {
std::string a = "a";
std::string b = "b";
std::string_view ptr1 = a; // Loan La; ptr1's origin is {La}
std::string_view ptr2 = b; // Loan Lb; ptr2's origin is {Lb}
std::string_view ptr3 = max(ptr1, ptr2);
// ptr1's origin is a subset of ptr3.
// ptr2's origin is a subset of ptr3.
// => ptr3's origin is {La, Lb}
}
Opportunistic Bug finding
Inner types (Structs, Containers): While the core model focuses on Origins associated with top-level variables and expressions (pointers, references, views), we also aim to provide opportunistic bug finding for common patterns involving pointers within aggregate types (struct members, std::pair
) or containers (e.g., std::vector
).
This approach relies on heuristics and specific knowledge of common types, similar to the existing statement-local analysis (e.g., container of pointers). It is less general than a system with full support for Rust-like lifetime parameters on type definitions but allows catching important classes of bugs today. As Clang potentially adopts more explicit lifetime annotations for types, the reliance on these special handling would diminish.
struct S {
std::string_view a; // Member 'a' has Origin Oa
std::string_view b; // Member 'b' has Origin Ob
};
S return_struct_with_local() {
std::string local_str = "local";
S s; // Instance 's' created. Origins Oa={}, Ob={} initialized.
s.a = global_str; // Oa = {L_global}
s.b = local_str; // Ob = {L_local}
return s; // Returning 's'.
// L_local expires.
// Oa contains {L_global} => Ok
// Ob contains {L_local} => UaR.
}
std::vector<std::string_view> return_vector_with_local() {
std::vector<std::string_view /*Inner origin Oi*/> v; // Oi = {}
std::string local = "local";
// vector::push_back(T) is [[clang::lifetime_capture_by(this)]];
v.push_back(local); // Loan L_local to 'local'.
// Oi = {L_local}.
v.push_back(global); // Loan L_global to 'global'.
// Oi = {L_local, L_global}.
return v; // Returning 'v' associated with Oi containing loan L_local.
} // End scope: L_local expires.
Permissive (-Wdangling-safety-permissive) vs. Strict (-Wdangling-safety) Modes
This lifetime analysis reports potential issues under two different warning flags, -Wdangling-safety-permissive (permissive mode) and -Wdangling-safety (strict mode), corresponding to the analysis’s confidence that a true bug exists.
The core analysis tracks the Origin set (representing the set of Loans it might hold). The difference between the permissive and strict modes then lies in their reporting criteria: the permissive mode typically reports only if the pointer must be dangling (a “must-analysis”), whereas the strict mode reports if the pointer may be dangling (a “may-analysis”).
Warning Trigger Conditions:
We issue a warning under the following conditions:
- Loan Expiry: A Loan
L
, representing a borrow, created at pointP_borrow
, expires at pointP_expire
(e.g., stack variable goes out of scope). - Potential Dangling Pointer: At
P_expire
, any pointerPtr
whose OriginO_ptr
contains the (now expired) LoanL
is considered potentially dangling. - Liveness and Use: A diagnostic is generated only if such a pointer
Ptr
is potentially used at a later pointP_use
(meaningPtr
is “live” atP_expire
).
Determining the warning group:
- Permissive: If all loans in Ptr’s origin
O_ptr
atP_expire
refer to expired memory, a-Wdangling-safety-permissive
warning is issued. The diagnostic is attached to the borrow location (P_borrow
) with a note indicating the problematic use atP_use
. This signifies a high-confidence bug according to the analysis. Results in fewer false positives but may miss bugs specific to certain paths. - Strict: If the Loan
L
is expired, but there are other loans inPtr
’s originO_ptr
atP_expire
which are still valid, then a-Wdangling-safety
warning is issued. The diagnostic is still attached toP_borrow
with a note at the useP_use
. This mode prioritizes safety. Catches dangerous code patterns where analysis indicates a use-after-free could occur via some path, this may include paths that are dynamically unreachable due to program logic, hence will have more false positives.
std::string global_str = "STATIC";
std::string_view permissive() {
std::string local = "local";
view = local; // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety-permissive]
// P_expire: 'local' expires.
return view; // P_use: note: returned here.
}
std::string_view strict(bool condition) {
std::string local = "local";
std::string_view view = global_str;
if (condition) {
view = local; // P_borrow: error: 'local' doesn't live long enough [-Wdangling-safety]
}
// P_expire: 'local' expires after return.
return view; // P_use: note: returned here.
}
This allows users to choose between broader detection (strict) and higher confidence with less noise (permissive).
Note: The warning group -Wdangling-safety
implies and subsumes -Wdangling-safety-permissive
.
Gradual typing: Opaque / Unknown Semantics
It’s common to call functions where the lifetime relationship between inputs and outputs isn’t explicitly declared (e.g., missing [[clang::lifetimebound]]
) or is too complex to be expressed using existing clang annotations.
When the analysis encounters a pointer or reference initialized from such an “opaque” source:
- It cannot determine the true Loan(s) that should be associated with the pointer’s Origin, nor their actual lifetime dependencies.
- It assigns a special “Opaque Loan” to the pointer’s Origin.
This is a conservative approximation to avoid false positives. Since the analysis doesn’t know when the memory backing the opaque pointer actually becomes invalid, it optimistically assumes it remains valid for the entire duration of the current function regardless of the strictness modes mentioned above.
std::string_view opaque_view();
void foo() {
std::string_view x; // Ox = {}
x = opaque_view(); // Ox = {Opaque}
}
void store(std::string_view* output);
void foo() {
std::string_view x; // Ox = {}
store(&x); // Ox = {Opaque}
}
Future enhancements
- Annotation Suggestions: Diagnostics suggesting the addition of
[[clang::lifetimebound]]
where the analysis detects lifetimes escaping functions, helping users improve function contracts. - Annotation Verification: Diagnostics verifying that function implementations adhere to their declared
[[clang::lifetimebound]]
contracts, catching mismatches between declaration and behavior. - Pointer/Iterator Invalidation: Extend the analysis beyond scope-based expiry to detect invalidation caused by operations modifying underlying objects or containers (e.g.,
std::string
reassignment,std::vector::push_back
). This could involve modeling limited forms of exclusivity (aliasing XOR mutability) for specific standard library types to catch common bugs like iterator invalidation.
Relation to Rust-like Lifetimes
This analysis draws inspiration from Rust’s Polonius borrow checker. While adapted for C++'s semantics (e.g., handling opaque calls, no enforced exclusivity, configurable strictness), its internal model still uses concepts like Loans and Origins which are analogous to formulation of lifetime in Rust’s Polonius.
This notion of a Origin/Loans serves a role closely related to the lifetime/origin tracking in Polonius, providing a conceptual bridge. This proposal develops the necessary CFG-based dataflow infrastructure that could also support more explicit, Rust-style lifetime systems if they were introduced to Clang.
If Clang evolves to include Rust-like lifetime annotations (e.g., annotating pointers and references with fine-grained lifetimes like T& [[clang::lifetime(a)]]
), this analysis framework is positioned to directly consume them. User-provided annotations could then directly inform the calculation of which Loans belong to which Origins. For instance, explicit “outlives” constraints (like 'a: 'b in Rust) would translate directly to subset relations between the corresponding Origins, which this analysis could then enforce. This would replace current approximations (like ‘Opaque Loans’ for unknown function calls or heuristics for unannotated parameters) and significantly increase the precision of the analysis. Furthermore, explicit lifetime parameters on types could reduce the need for special handling of nested pointers/views (e.g., within containers or structs, as discussed previously in "Opportunistic Bug Finding”) by making their lifetime dependencies clear.
However, the analysis described here delivers value independently for today’s C++ and does not rely on the adoption of Rust-style lifetimes.
RFC
Apart from the overall direction of introducing a more powerful, function-local lifetime analysis in Clang, we also seek feedback on the following points:
Warning Flags and Naming
We propose to add this analysis to Clang under two warning flags:
- -Wdangling-safety to report potentially unsafe patterns at the cost of false positives.
- -Wdangling-safety-permissive to report only high-confidence warnings suitable for broad adoption with minimal false positives.
Other naming schemes considered:
- Focusing on “strict”: -Wdangling-cfg (for permissive) and -Wdangling-safety-strict (for strict)
- The question is, what do we want users to believe our suggested default is. The shorter name without additional qualifiers is more likely to be perceived as the recommended setting.
-Wdangling-cfg-permissive
sounds like a compatibility mode for legacy codebases, and people should strive to move towards the strict one (-Wdangling-cfg
).
- The question is, what do we want users to believe our suggested default is. The shorter name without additional qualifiers is more likely to be perceived as the recommended setting.
- Using a different base name: e.g., -Wtemporal-safety.
- We could keep the immediate flags specific to “dangling” issues (like the proposed
-Wdangling-safety
) and to potentially introduce an umbrella flag like -Wtemporal-safety in the future. This umbrella could then enable a suite of distinct temporal safety warnings, including this one and others (e.g., a separate warning for iterator invalidation).
- We could keep the immediate flags specific to “dangling” issues (like the proposed
Experimental prefix:
- This lifetime analysis will be developed incrementally, and a warning with the experimental prefix -Wexperimental-dangling-safety will be used until it is ready for general use, preventing user disappointment with an incomplete feature.
Default Enablement
- Permissive mode would be enabled by default once we confirm that the false positive rate is sufficiently low and there is a reasonable compile time impact on typical codebases.
- Strict mode would be off by default. (Strict warnings could also potentially be surfaced via ClangTidy).
Code structure
- Plan is to add the analysis under clang/lib/Analysis/ with entry point in clang/lib/Sema/AnalysisBasedWarnings.cpp (like other cfg-based analysis, e.g., thread-safety analysis).
- We would build the analysis on top of Clang’s CFG.
Performance
Performance impact is a key consideration and will be monitored closely during development. While the underlying dataflow analysis approach is expected to be manageable for typical C++ functions, for particularly complex cases (e.g.), we have the option to cap the analysis after a certain number of iterations per function. To maintain reasonable compile times, this bug-finding tool might miss some issues in extremely complex functions, which is an acceptable compromise.
Appendix: More examples
std::string_view
Lifetimebound(std::string_view str [[clang::lifetimebound]]);
std::string_view foo(bool cond) {
std::string local;
std::string_view view;
view = Lifetimebound(local);
return result; // error: returning reference to stack variable 'local'.
}
int* result;
if (std::unique_ptr<int> ptr = create(); ptr != nullptr) {
result = ptr.get(); // error: 'result' points to 'ptr' which doesn't live long enough.
}
use(result); // note: later used here.
Lifetime_capture_by(X)
struct S {
void set(std::string_view x [[clang::lifetime_capture_by(this)]]) { view = x; }
std::string_view view;
}
void foo() {
S s;
if (condition) {
std::string local;
s.set(local); // error: 's' captures 'local' which doesn't live long enough.
}
use(s); // note: later used here.
}
Container of views: Vector
void foo() {
std::vector<std::string_view> views;
if (condition) {
std::string local;
views.push_back(local); // error: 'views' captures 'local' which doesn't live long enough.
}
use(view); // note: later used here.
}
Container of views: Maps
absl::flat_hash_map<std::string_view, int> views;
for (...) {
std::string local;
// UaF only if it's inserting the key but we may choose to always error.
auto& v = views[local]; // error: captures 'local' which doesn't live long enough.
}
use(map_of_views); // note: later used here.
Member Pointers
struct S {
std::string_view a;
std::string_view b;
};
void foo() {
std::string safe;
S s;
s.a = safe;
if (condition) {
std::string unsafe;
s.b = unsafe; // error: 's.b' points to 'unsafe' which doesn't live long enough.
}
use(s); // note: later used here.
}
// Store a pointer to small scope local object in a member pointer.
class S {
void foo() {
std::string local;
view_ = local; // error: 'view_' points to 'local' which doesn't live long enough.
}
private:
std::string_view view_;
};
Lambdas and callbacks
Async functions accepting callbacks can be annotated with capture_by(this)
.
thread::Scheduler scheduler;
if (condition) {
std::string local = "blah";
scheduler.Add([&]() -> { return use(local); }); // error: 'scheduler' catpures 's' which doesn't live long enough.
}
scheduler.Join(); // note: later used here.
Suggest lifetime annotations
These annotations remain the only way to convey lifetimes across function boundaries and high-quality suggestions have previously helped us uncover several bugs.
std::string_view TrimPrefix(std::string_view in [[clang::lifetimebound]]);
std::string_view TrimSuffix(std::string_view in [[clang::lifetimebound]]);
std::string_view Trim(std::string_view in) { // error: missing lifetimebound on 'in'.
return TrimPrefix(TrimSuffix(in));
}
Pointer/Iterator invalidation
std::vector<int> v = {1, 2, 3, 4};
auto it = v.find(1);
v.push_back(5); // error: 'it' is not valid anymore.
use(*it); // use-after-free