[RFC] Lifetime annotations for C++

Excellent, I was hoping that was the case.

Yup! Changing the diagnostic behavior isn’t an issue (unless you’re reversing the diagnostic logic entirely or something super invasive); changing the syntax so user code no longer compiles even when using the macros would be more of an issue. It sounds like we’re on the same page with our goals.

Thanks!

I had one above. The RFC claims safe wrappers can be automatically generated thanks to the annotations, as an example of the better interoperability with Rust. However, without extra information, this is not possible.

In fact, this issue appears even for functions that do not involve lifetimes. For instance, consider this C++ function signature:

void f();

It is not possible to know that the function can be used to introduce undefined behavior or not. Even having access to f’s implementation may not be enough to determine whether it is safe or not (sometimes it is enough, but not in general).

This is not to say lifetime annotations cannot be useful for better interoperability, but they cannot be used to automatically determine the safety of a function.

Thanks – you raise a valid point: The lifetime annotations by themselves do not guarantee the safety of the C++ function being called.

As discussed in this section, the lifetime checker can catch some, but not all temporal memory safety errors, and it doesn’t help at all with spatial memory safety. So while this proposal improves memory safety in C++ and can help with Rust/C++ interop, it is not a complete solution.

Thanks for the reply!

Yes, that section looks fine, but to be clear, I am referring to Appendix D:

This is important because one of the main value propositions of Rust is that safe code is guaranteed to be memory-safe, which this would undermine.

For instance, the example given in the RFC is 100% safe Rust, yet could have UB because C++ could introduce it:

Thanks – that’s a good point. The safety of the code being called needs to be verified independently of what we do for the lifetimes on the interface.

First a pragmatic question: What do we call this proposal/endeavor/analyzer? “Lifetime annotations for C++”? Is that too presumptive at this stage?

Also, in the “Alternative annotation syntax using only [[clang::annotate]]” section, for function annotation you propose:

[[clang::annotate("function_lifetimes”, “a, a -> a")]]
const std::string& smaller(const std::string& s1, const std::string& s2);

But [[clang::annotate]] can be appended (or prepended) to the parameter declaration directly. So instead you could do something like:

[[clang::annotate("function_lifetimes”, “-> a")]]
const std::string& smaller(const std::string& s1 [[clang::annotate("lifetime”, “a")]], const std::string& s2 [[clang::annotate("lifetime”, “a")]]);

While more verbose, it specifies the parameter lifetime label at the location of the parameter itself, like your [[clang::annotate_type]] proposal (and Rust) does. Is there a reason not prefer (or at least additionally support) this syntax?

Btw, are you expecting to implement the “Alternative annotation syntax using only [[clang::annotate]]”, or just the “non-alternative” [[clang::annotate_type]] syntax?

And just to clarify, the example from the “Lifetimes in template arguments” section of the proposal:

int* $a get_first(const std::vector<int* $a>& $b v) {
  return v.at(0);
}

But if we consider the case where a type alias is used, would the “non-template” annotation syntax still be available?:

typedef std::vector<int*> my_vector_t;
int* $a get_first(const my_vector_t& $b $a v) {
  return v.at(0);
}

It seems to me it would need to be. Template arguments might also not be explicitly present in cases where the type is expressed as auto or decltype(...) or a deduced template parameter or whatever, right?

We use “lifetime annotations” in the title because we feel it describes the proposal well: It uses annotations, and those annotations contain lifetimes. It’s not a borrow checker, as we make clear in the RFC, and unlike Rust it does not protect against all types of memory bugs.

Thanks, that’s a good idea. It makes it easier to see which parameters the lifetimes refer to. I think we should use this syntax if we can’t use [[clang::annotate_type]] (but see below).

We do expect to use the [[clang::annotate_type]] syntax. The proposal to add the attribute to Clang has not met with any objections, and the implementation is currently in code review.

Type aliases for a type containing pointers or references will need to define a lifetime parameter, similar to classes that contain pointers or references. Your example would look like this:

LIFETIME_PARAM(p) typedef std::vector<int* $p> my_vector_t;
int* $a get_first(const my_vector_t $a & $b v) {
  return v.at(0);
}

Correct. In these cases, we will perform lifetime inference for any pointers or references contained in those types.

Sorry I wasn’t clear. I wasn’t questioning the appropriateness of the title of the rfc, I was questioning the adoption of the title as an moniker for the project in general. In particular I’m looking for a term that distinguishes this work from P1179/-Wlifetime/-Wdangling-gsl. If we adopt the term “Lifetime annotations for C++” for this work, would we be implying that none of the annotations used in P1179 qualify as “lifetime annotations for C++”? Or would we be implying that those annotations are not (or no longer) relevant enough cause any ambiguity? And would we be implying that this design is so satisfactory that we are confident that no future annotation syntax candidate will emerge to compete for the title of “Lifetime annotations for C++”? While I wouldn’t argue against this last implication, arguably the -Wlifetime project made a similar presumption by staking that compiler flag.

In the early days, well before the P1179 paper, for lack of a better term, that work was sometimes referred to as “Herb’s lifetime thing”. Personally, I can live with “@martinboehme’s lifetime thing”, but I just thought I’d ask if there was a preferred alternative term. :slight_smile:

Ok, so basically emulating Rust’s support for lifetime annotations in type aliases. That makes sense.

Hmm, I wonder if just “Rust-style lifetime annotations for C++” will end up being the de facto descriptive term for this project.

Oh I see. So the inferencing will be done by a separate tool that outputs modified source files? Or is it just an internal step in the static analysis? I suppose either would work.

In the case of the former, can I request (the option of) some sort of annotation to indicate places where the inference fails (as opposed to just determining that annotations aren’t needed)? I’m thinking this could allow for further processing (possibly including the addition of run-time checks or safety mechanisms).

Btw, is there some sort of public channel to keep up with project, or is it too early for that?

Ah, thanks for the clarification – now I understand what you were asking.

Agree that we need a term that’s more specific than “lifetime annotations for C++”.

I think that’s not a bad candidate. I don’t have anything better to offer at the moment, but maybe we’ll find a more succinct term over time.

It depends – both can happen, depending on the circumstances.

For many parts of the code, inferencing happens in a separate step, and the tool that does the inferencing outputs the inferred lifetimes as annotations.

But in some cases, inferencing has to be done “one the fly” as one step of a larger static analysis because there is nowhere that the inferred lifetimes can be written to. Typically, this happens with templates. Consider this simple example:

template <class T>
T id(T t) {
  return t;
}

int* foo(int* p) {
  return id(p);
}

The lifetimes we infer for foo() are, pretty obviously, int* $a foo(int* $a p);, and we can write those into the source code as annotations.

As part of inferring the lifetimes for foo(), we also need to infer lifetimes for id<int*>(). We can write those as int* $a id<int*>(int* $a);, but there’s no place in the source code to actually spell those annotations out because id<int*> is only instantiated implicitly (and we don’t want to add explicit instantiations all over the place). So we have to infer the lifetimes for id<int*>() whenever we see a call to it. Fortunately, because of the way templates work, we know we’ll be able to see the definition of the template, so we’ll be able to do inference, but it does mean we will redo this inference in every translation unit that uses id<int*>(), similarly to how we have to compile it in every translation unit.

This is a good point, and something we haven’t covered in the RFC. When inference fails, we are planning to add a special “unsafe” lifetime to the signature of the function; this would be akin to unsafe pointers in Rust and would signal that the function does not make any lifetime guarantees to the caller. The user would be able to change those annotations manually if they can guarantee that the function does actually have a lifetime contract that can be expressed with the annotations.

Not yet – this forum post is all there is for the time being. I assume we will need a different communication medium at some point though.

A good place to put this code might be in clang/lib/Analysis, then have it called from clang::sema::AnalysisBasedWarnings::IssueWarnings in clang/lib/Sema/AnalysisBasedWarnings.cpp. We already have the CFG available there, and the existing warnings there work in a similar way.

Maybe I’m misunderstanding this. To me it seems like you’re putting the attribute on the reference type, but isn’t the lifetime a property of the pointee, i.e. shouldn’t it be const std::string $a &?

Sorry, only saw your response today.

Because our approach is still experimental, we’re proposing to add the checks only to Clang-Tidy for the time being, not Clang itself. See #10 for more discussion of this.

Yes, the lifetime in question is the lifetime of the pointee, but the property that we’re annotating is still a property of the reference / pointer, namely that it references a pointee of a given lifetime. To put it another way: It only makes sense to put the annotation on references or pointers (or types with user-declared lifetime parameters) because they can refer to different pointees, with potentially different lifetimes. An annotation int $a i would not make any sense because the lifetime of i is known anyway from its static scope.

Saw the discussion, but I’m not sure what being experimental has to do with this. I’m not aware that we’re using clang-tidy for experimentation, I’ve always seen it as a linter-like tool to diagnose (and fix) non-idiomatic code. It’s going to be in the same git repository anyway, and regular users should generally not notice anything if the new analysis is hidden behind a flag.

Sure, but isn’t it a type attribute instead of a declaration attribute? If you dereference a pointer of type int* $a, the attribute will disappear (as it sits on the pointer type) and you’ll get an int lvalue without lifetime annotation. But you might still want to know the lifetime because that lvalue could be passed by reference to another function, right? If it was a property of the pointee, dereferencing would yield int $a as type.

So my confusion here is that these are (as I understand it) type attributes, but treated like declaration attributes. It’s not just declarations that have a type, expressions do so as well, and a non-reference type lvalue does not have to have automatic storage duration.

There are also pointers to pointers, and their pointees might have different lifetimes. Seems a bit contrived I guess, but I think it shows that as a type attribute this belongs to the pointee.

If you think of this as a natural way to prevent annotations on local variables or non-pointers, I don’t think that’s a good motivation. Declarations with automatic storage duration might simply not allow an outer-level lifetime annotation, because we can derive it. (That is, you could diagnose that as ignored, we do something similar in -Wthread-safety-attributes.)

I think that’s where I don’t follow. We also write const int*, even though one might say it’s a property of the pointer that it references an immutable pointee. If we parse “it references a pointee of a given lifetime”, we get a verb “references” with an object “a pointee” and a prepositional phrase “of a given lifetime” attached to the “pointee”. That’s pretty much what I think the attributes should also be doing, i.e. instead of

AttributedType 'int * $a' sugar
`-PointerType 'int *'
  `-BuiltinType 'int'

we should have

PointerType 'int $a *'
`-AttributedType 'int $a' sugar
  `-BuiltinType 'int'

Note also that top-level const qualifiers on parameters and return types are sometimes ignored. (To be fair, that’s a minor source of confusion, but doesn’t warrant putting const anywhere else in my view.)

A good example for pointer attributes are nullability attributes. These are really properties of the pointer and have nothing to do with the pointee.

It was my impression that Clang has more stringent requirements for false positives and performance than Clang-Tidy. This is why we’re proposing making this a Clang-Tidy check initially, and potentially migrating it to Clang later on. I’d appreciate input from others on this, including @AaronBallman who was involved in previous discussion around this.

Yes, the idea of a Clang-Tidy check being “experimental” is new, and there’s quite a bit of discussion around this up-thread (see this comment and the replies to it). One of the things that emerged from that discussion was the idea of introducing a new experimental category of Clang-Tidy checks.

This is a question of how a static analysis would do lifetime checking. To do that, you need to do much more than propagate lifetime annotations through expressions. Consider for example that an expression might refer to objects of different lifetimes depending on control flow. Our prototype tooling does a flow-sensitive pointer analysis and uses the result to infer lifetimes; lifetime verification would use the same underlying machinery.

Agreed. Our pointer analysis associates a points-to set with every glvalue (as well as every prvalue of pointer type).

In our proposed annotation, a double pointer with different lifetimes at different levels would be annotated as, for example, int * $a * $b p. The interpretation is that p can point at objects of lifetime $b, and *p can point at objects of lifetime $a.

I’m not sure const is a good analogy for lifetimes. Unlike a lifetime annotation, const makes sense to apply to a non-pointer variable, e.g. const int i = 5;, and it changes what you can do with that variable (you can no longer assign to it). In contrast, it doesn’t make sense to annotate int i = 5; with a lifetime – we already know the lifetime of i from its scope.

I’ll need to elaborate a bit to show why it’s desirable to put lifetimes on pointers, not pointees.

Consideration 1: Reference-like types

As discussed in the RFC, some types are “reference-like” in the sense that they refer to data whose lifetime is independent of their own lifetime, for example string_view. To express this, we give the type a lifetime parameter:

class LIFETIME_PARAM(s) string_view {
...
};

The lifetime parameter s refers to the lifetime of the string data referenced by the string_view.

When lifetime-parameterized types are used elsewhere in the code, they are annotated with a lifetime in the same way that pointers and references are, for example:

string_view $a id(string_view $a s) { return s; }

This (simplistic) example expresses that the string_view returned by id references data with the same lifetime as the parameter s.

As discussed, string_view is a “reference-like” type, and we’re putting the lifetime annotation on this reference-like type. The option of putting it on the pointee type doesn’t even exist in this case because the pointee type isn’t visible anywhere in the code.

If we’re putting lifetimes on “reference-like” types, then for consistency, it makes sense to also put lifetimes on references and pointers (and not their pointee types).

Consideration 2: Type aliases

It is common to define type aliases for pointer types. For example, std::string::iterator is often defined to be a char *.

Because std::string::iterator is a pointer, we want to be able to annotate it with a lifetime. For example (again, contrived for simplicity):

std::string::iterator $a get_next(std::string::iterator $a iter) {
  return iter+1;
}

Again, note that we’re putting the lifetime annotation on the pointer type – we have to because the pointee type is not named anywhere in the code.

If we replace the type alias with its underlying type, it makes a lot of sense then for consistency to continue putting the lifetime annotation on the pointer type:

char* $a get_next(char* $a iter) {
  return iter+1;
}

Consideration 3: Consistency with Rust

(I’ve left this for last because it’s arguably the weakest argument here; we should primarily be doing what’s right within the context of C++, not simply following another language’s design choices “just because”.)

The proposed lifetime annotation scheme is inspired by Rust lifetimes, and Rust associates lifetimes with the reference, not the pointee. For users who are familiar with Rust, it would be confusing to use a C++ lifetime annotation scheme that is similar to Rust’s in many ways but diverges in this fundamental aspect.

1 Like

This is correct. The usual approach in Clang (outside of Analysis) is that we only want on-by-default diagnostics because we have evidence that new off-by-default diagnostics aren’t enabled often enough to warrant their inclusion unless they’re for pedantic diagnostics (those do get enabled reasonably often). So the bar for including a diagnostic in Clang is pretty high (near-zero false positives, doesn’t degrade compile time, easy mitigations to silence false positives, that sort of thing).

Then there are the analysis-based warnings (like thread safety and fallthrough). Those are opt-in because they generally are more expensive to check (may impact compile times) and may have slightly higher false positive rates which may be a bit harder to silence.

Warnings that are likely to be chatty or are part of a coding style guide, etc tend to go into clang-tidy (or sometimes the static analyzer).

Given the experimental nature of the proposed check, I think clang-tidy or the static analyzer are the appropriate place for doing that experimentation. If we find that practice validates the check and there’s a sufficiently low false positive rate for it, we can explore moving it into the compiler at that point.

1 Like

Just an FYI, PLDI 2022 had a paper about how lifetime annotations can be used to aid static analysis. It reinforces some of the ideas that were already discussed in this proposal.

Paper: [2111.13662] Modular Information Flow through Ownership
Talk: Modular Information Flow Through Ownership - YouTube

2 Likes

FYI, it looks like RFC: Improving Diagnostics with Template Specialization Resugaring could be helpful here, as we could resugar lifetime annotations through access to template specializations.

But I leave questions of applicability to you as I haven’t had time to study this proposal yet :slight_smile:

1 Like

Also, it seems that some of the pre-requisite work for resugaring could be helpful:

Last year we merged ⚙ D110216 [clang] retain type sugar in auto / template argument deduction which would allow the lifetime annotation to survive type deduction. We also implemented with that the ability for auto to retain type sugar, so that a variable or function return deduced from a lifetime annotated expression could benefit from this.

There are some other MRs still unmerged, D111283, D130308 and D111509, that could allow merging two types with different lifetime annotations into one result.

For example, you could teach D130308 how to merge lifetime attributes so that the following example would be useful:

auto *foo(bool flag, $a int *p1, $b int *p2) {
  return flag ? p1 : p2;
}

And the return type of foo would deduce to some merged lifetime of $a and $b. That works for multiple return statements as well. Or deducing an std::initializer_list from multiple elements with different lifetimes.