[RFC] Adding lifetime analysis to clang

Thank you for summarizing the discussion and raising all these points.

Based on your response I propose the following plan moving forward:

  • Start submitting patches regarding the non-controversial parts of the analysis, including:
  • Adding the annotations to Clang

  • Generalize existing statement-local warnings

  • Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?

  • Adding some extra statement-local warnings

  • Adding the flow sensitive analysis

Can we add you as a reviewer to those patches?

While we add the non-controversial part we can continue to discuss the rest on the mailing list and act according to the consensus we have later on.

Fortunately, not having the inference upstream will not stop us from leveraging the rest of the work.

To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives. One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.

In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

What do you think?

Cheers,

Gabor

Thank you for summarizing the discussion and raising all these points.
Based on your response I propose the following plan moving forward:

* Start submitting patches regarding the non-controversial parts of the analysis, including:
  - Adding the annotations to Clang
  - Generalize existing statement-local warnings
    - Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?
  - Adding some extra statement-local warnings
  - Adding the flow sensitive analysis

Can we add you as a reviewer to those patches?

Happy to help :slight_smile:

To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives.

I would say my concerns are around the understandability of the
system, more so than the false negatives of inference.

One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.
In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

That'd be helpful, however, relying purely on such analysis is not
much better than inference. Users should annotate types that conform
to type categories. We should also figure out a way to learn if it is
common for users to have types that are in spirit conforming to type
categories, but don't conform syntactically (e.g., have named methods
instead of overloaded operators). If those are important, we would
need to build corresponding annotations.

Dmitri

Dmitri Gribenko <gribozavr@gmail.com> ezt írta (időpont: 2019. ápr. 15., H 16:02):

Thank you for summarizing the discussion and raising all these points.
Based on your response I propose the following plan moving forward:

  • Start submitting patches regarding the non-controversial parts of the analysis, including:
  • Adding the annotations to Clang
  • Generalize existing statement-local warnings
  • Technical question here: should we expect the STL vendors to annotate the types, should we hard-code the annotations into the compiler, or should we actually implement type inference but restrict it to STL types?
  • Adding some extra statement-local warnings
  • Adding the flow sensitive analysis

Can we add you as a reviewer to those patches?

Happy to help :slight_smile:

To start the discussions regarding the type category inference, as far as I understand you are more worried about the false negatives that can give a bad impression about the analysis rather than the false positives.

I would say my concerns are around the understandability of the
system, more so than the false negatives of inference.

One idea we had is to have a refactoring that will automatically annotate the user defined types based on the inference we have. This could give the users a way to understand the root causes of such false negatives.
In case the community will not want automatic type category inference in the compiler we are still likely to implement it as a Tidy check with a refactoring to make it easier for users to adopt this analysis.

That’d be helpful, however, relying purely on such analysis is not
much better than inference. Users should annotate types that conform
to type categories. We should also figure out a way to learn if it is
common for users to have types that are in spirit conforming to type
categories, but don’t conform syntactically (e.g., have named methods
instead of overloaded operators). If those are important, we would
need to build corresponding annotations.

Great! So let’s move on with upstreaming the non-controversial parts and return to discussing the inference problem once we have more experience from users and other data.

Thanks,
Gábor

Hi!

I wanted to give you a small update on the process of upstreaming these changes. The type category annotations along with most of the statement local warnings are already upstreamed. Clang will not guess the type categories automatically, types need to be annotated explicitly. For some STL types, Clang will automatically append the annotations even if they are not present in the source code. This way the new warnings will trigger even if the STL implementation does not have the annotations in place yet.

Furthermore, it looks like these warnings really do catch bugs! :slight_smile: See the two examples we found in the LLVM repository by annotationg llvm::StringRef [1]. We also found bugs in other popular open source projects most of which was fixed very soon after reporting them. We did not see any false positives when running current Clang top of tree on other projects, but as these warnings are relatively new it is not impossible to have some. The Chromium and LLVM builds are clean though, so there should not be any obvious problem. If you notice any spurious warnings please let us know!

We plan to share a more detailed evaluation/report after the CppCon talk on the subject [2].

Cheers,
Gabor

[1]: https://reviews.llvm.org/D66443 , https://reviews.llvm.org/D66442 , https://reviews.llvm.org/D66440
[2]: https://sched.co/Sfrc

Thanks for the update!

I looked at this a bit now that it exists, and I have a question about the Pointer/Owner attributes: It looks like they go on a class to mark the class as an owner or a pointer – but the examples I’ve seen so far are all for classes that have a single member variable.

If I have a class with multiple member variables, some of them weak, others strong, can I use these attributes? Say I have something like

class Foo {
std::unique_ptr b;
C* c; // Owned by b, or something else entirely
D* d; // Also weak
E* e; // Happens to be owned by Foo, but not yet a unique_ptr
};

Since unique_ptr is Owner, b should be fine, but Foo is a Pointer to c and d and an owner of E. Can I express this with the current attributes? Or is this intentionally out of scope?

Thanks,
Nico

Hi Nico!

Currently, the statement-local warnings have a very limited scope. If you create a Pointer from an Owner using a conversion operator or a constructor it will assume that the Pointer points ‘into’ the owner. To make these warnings more useful we hardcoded the semantics of some methods in the STL. So in case you have another type Bar that can be created from Foo and Bar will point to something owned by Foo it make a lot of sense to annotate both classes. Otherwise, it has relatively little use right now.

Improving the situation, however, is not out of scope. We plan to introduce function annotations as well. With those function annotations you can annotate which methods will return a pointer to an owned entity. This way having multiple pointees will not be a problem.

One thing, however, that is out of the scope, is more complex ownership models. For example, when a function can return both owned and non-owned pointers depending on their arguments. We do not scan to support that scenario, and simply not annotating such functions should avoid the false positives.

So to summarize, currently the most value of these warnings are coming from using STL types and user defined conversions. But we do plan to extend these to be able to catch additional problems but that will require the users to annotate functions/methods. We also plan to suggest/infer those annotations automatically, but his feature will certainly be off by default and might be provided by a separate tool, like a clang-tidy check (or just a separate warning flag).

Cheers,
Gabor