RFC Detecting uses of dangling references

We would like to detect dangling references.
Let’s look at three examples, one demonstraing dangling references and one for having lifetime extensions, and one last one about dangling references to primitive types. Example

#include <string>

void clang_analyzer_printState();

class Message {
public:
  explicit Message(const char *cstr) : text{cstr} {}
  const std::string &get() const {
    return text;
  }
private:
  std::string text;
};

auto dangling_ref() {
  auto const &x = Message{"world"}.get();
  // clang_analyzer_printState();
  size_t len = x.size();
  return len;
}

auto lifetime_extended() {
  auto const &y = Message{"world"};
  // clang_analyzer_printState();
  size_t len = y.get().size();
  return len;
}

template<class T> constexpr const T& min(const T& a, const T& b) {
    return (b < a) ? b : a;
}

int saturate_percentage(int x) {
    const int &val = min(x, 100); // Formed a reference to the constant '100' temporary if 'x' > 100.
    if (x <= 100) return 0; // Pretend we only care about the temporary case.
    // clang_analyzer_printState();
    return val;
}

In the first two cases, the Message object will be a CXXTempObjectRegion. In the last case, the temporary ‘int’ is also represented by CXXTempObjectRegion.
This is really good, matches with what one would expect.
However, we think that if we had a different region for the lifetime-extended temporaries, we could for example kill all CXXTempObjectRegions from the store by the end of the current statement.

This would enable us to detect dangling references without much additional complexity.
We might need to refine bug-report construction to account for this kind of Undefined value and adjust notes.

We haven’t though this completely through, but so far it looks promising.
If this discussion doesn’t uncover major obstacles or problems with this outlined approach, we will try to implement it.

Let me know what you think.
@Xazax-hun @NoQ @DonatNagyE

1 Like

Overall, I am a fan of this idea, I’d love to see the static analyzer detecting more instances of dangling, dangling due to temporaries is prevalent. I have some questions about the details.

What do you mean by “killing” the regions? Do you mean some additional state to mark regions that should not be used? Or do you mean actually purging them from the state? How would you change the pointers/references pointing to a region like that? Or would you generate a sink?

Do you envision detecting a warning when a dangling reference/pointer is created, or only when dereferenced?

Removing the bindings, aka. purging the temp regions, but those only.

I think we dont need to do anything. Loading from a pointer refering to the missing temp object should result in Undefined. And the use of such value would trigger a sink generation by the core checkers. And this is where a specific core checker or reporter might come into play finetuning the report message and notes.

So, yea, I would permit the construction of dangling references or pointers to such dead temp objects but not their uses.

I have a couple thoughts:

(1) In your examples the problem of lifetimes is purely syntactic. Say, you can subscribe to check::PreStmt<DeclStmt>() for variables x and y and at this point you immediately see from the AST whether the initializer is properly lifetime-extended. Combine that with path-senstive information about the value of the initializer (it’s pointer into a temporary that’s been allocated by a sub-expression of the initializer - CXXTempObjectRegion::getExpr() tells you that) and you solve the problem.

(2) In the general case, it becomes essential to capture the moment of time when references become invalid. My first thought was "Why don’t you simply subscribe to check::PostCall() of destructor" and then I noticed that technically references aren’t automatically dangling after destruction:

Lifetime - cppreference.com
If a new object is created at the address that was occupied by another object, then all pointers, references, and the name of the original object will automatically refer to the new object and, once the lifetime of the new object begins, can be used to manipulate the new object, but only if the original object is transparently replaceable by the new object.

So it sounds like we could use some sort of “check::Deallocation()” callback. It’d also be a more principled way of dealing with your examples.

(3) Scanning the Store to erase the dangling references seems counter-productive. Even if the object is deallocated, the loc::MemRegionVal inside those references continues to correctly represent the numeric value of the pointer, which the program can technically continue to rely upon.

Just like the existing StackAddrEscape checker doesn’t erase anything from the Store. It simply scans it to find things that shouldn’t be there, and warn about them.

Erasing things from the Store would have a nice side effect of being able to reuse check::DeadSymbols() as a vegan substitute for check::Deallocation() (given that deallocated regions will also immediately become dead), but I don’t see why should we conflate these two; I think it’s probably good to have the checker express its intent more precisely, because it’s possible that we’re missing some other nuance and the two aren’t actually related.

I’ll come back with more convincing examples where one cannot easily model lifeextended temporaries to underpin the outlined memor region approach that we proposed.
Expect it by the end of this week.

Hi,

In general detection of the dangling references to temporaries cannot be handled in a purely syntactic manner, as illustration please consider the following example:

template<typename T>
T const& select(bool cond, T const& t, T const& u) { return cond ? t : u; }

int const& le = Composite{}.x;
auto&& ref = select(cond, le, 10);

For the above ref is conditionally dangling, depending on the value of cond.
Such examples, for us are a major motivation to solve the problem by inspecting the regions and providing a mechanism to mark ones that become dangling.

The [analyzer] Differentiate lifetime extended temporaries patch is first step in such direction, and I would like to refer to it for additional motivation.

I think there was a small misunderstanding. @NoQ suggested that it is a purely syntactic problem to determine whether a value is lifetime extended, but he still suggested to rely on path sensitive information to figure out what the actual value is.

Thanks for the patch, I will take a look!