[RFC] abseil-unchecked-statusor-use check

CC @ymand @jvoung

Abstract

absl::StatusOr<T> is a widely-used type that optionally contains a value (of type T). If it does not contain a value, it contains an absl::Status. It is similar to C++23 std::expected, but with a hardcoded unexpected type of absl::Status and a different API. It is also (to a lesser degree) similar to std::optional in that it optionally contains a value.

It is invalid to deference an absl::StatusOr that does not contain a value. The value of an absl::StatusOr<T> object called sor can be accessed by

  • *sor or sor.value() to get a T&
  • sor->foo() to call T::foo.

There are other APIs related to absl::StatusOr, e.g. CHECK_OK to crash the program if there is no value. In this RFC, we propose a check for code that accesses absl::StatusOr values without ensuring they contain a value. The check also aims to support related APIs like CHECK_OK.

Example code that will be flagged by this check

bool foo(absl::StatusOr<int> bar) {
  return *bar < 5; // NOT OK
}

Example code that will not be flagged this check:

bool foo(absl::StatusOr<int> bar) {
  if (!bar.ok()) return false;
  return *bar < 5; // OK
}

Proposal

Use the dataflow framework to add a new ClangTidy check, abseil-unchecked-statusor-use, that detects expressions in which the value is accessed, but where we cannot prove that the access is safe from the control flow. We include calls to absl::StatusOr::value which are defined to throw an exception, because we empirically have found that it is used interchangeably with the deference operators (which lead to undefined behavior if called when no value is present), not because the developer intended to crash the program.

We (Google) have deployed a downstream check to various projects, with good success. I would like to upstream this check for external users of Abseil to also benefit from it. It is similar to the existing bugprone-unchecked-optional-access check, and shares some of the code (mostly around return value caching). The API surface of absl::StatusOr is sufficiently different to prevent a common parametrized check.

By using the dataflow framework, we are able to reason about control flow

bool safe = false;
if (sor.ok() && SomeOtherCondition()) {
  safe = true;
}
// ... more code...
if (safe) {
  use(*sor); // OK
}

The check is strictly intraprocedural. This is both preferable from an API design perspective, and also makes implementation of the check much easier. For interprocedural reasoning, the type system should be used to communicate whether an absl::StatusOr contains a value or not. If a absl::StatusOr<T> always contains a value, a T& should be used instead.

That being said, the check models the return value of zero-argument const methods (as a proxy for getter methods) as being stable if no non-const method is called on the same object between them. This is to support common patterns like

class Foo {
public:
  absl::StatusOr<int> get_bar() const;
  void clear_bar();
};

Foo foo = get_foo();
if (foo.get_bar().ok())
  return *foo.get_bar(); // OK
if (foo.get_bar().ok()) {
  foo.clear_bar();
  return *foo.get_bar(); // NOT OK
}

While nothing requires this assumption to be true, we have found this a good tradeoff to make the check easier to deploy. We have not encountered code that violates this assumption.

Limitations

There are two major limitations in the current check, which I expect to be present in the initial upstreaming. The same limitations exist in the bugprone-unchecked-optional-access check, which is already part of clang-tidy. These limitations have not been a significant hurdle to our adoption of this check.

We do not model accesses to containers, so we incorrectly flag the following code

if (sors[0].ok()) {
  use(*sors[0]); // false positive
}

We do not accurately model captured variables, so we incorrectly flag the following code:

if (sor.ok()) {
  [&sor]() {
    use(*sor); // false positive
  }
}

The container code could be rewritten as follows, which also has the advantage of getting rid of the repeated container access, which is harder to optimize

if (auto& sor = sors[0]; sor.ok()) {
  use(*sor); // OK
}

The lambda capture code as follows, which makes the types inside of the lambda clearer

if (sor.ok()) {
  [&s = *sor]() {
    use(s); // OK
  }
}

Future Work

Follow up work from this would include

  • Improve modelling of containers
  • Improve modelling of lambdas
  • Implement a similar check for std::expected
2 Likes

Thank you for your proposal!
I think it’s always good to have new checks, especially that are already battle-tested.

Speaking on clang-tidy part, I don’t think that anything is holding us back, your PR is welcome.

There was a similar issue lately about implementing such check: [clang-tidy] `bugprone-unchecked-optional-access`: support detection of unsafe `std::expected` access · Issue #135045 · llvm/llvm-project · GitHub. It could be used to track progress of implementation std::expected check

I actually have some PoC implementation of std::expected analysis, but didn’t have time to finish it and create a PR.

@fmayer I will be interested to add std::expected support on top of your PR since you have already battle-tested implementation! If it’s okay to extend your current dataflow model.

But if you see the support of std::expected analysis to be more flexible as a separate model, then I will prioritize finishing my PR soon.

Let me know if you’re happy to collaborate on it!

I hope to send out the change in the next few weeks. I am not sure how much of it is applicable forstd::expected, because a lot of it is dealing with some API specifics of absl::StatusOr. I feel like it is best for me to add you to the change and then you can see if it makes sense to base your work on it.

Works for me, thank you!

I started sending out the initial PRs: