Invalidated Iterator Project

Hi,

I'm new to the Clang dev list and will be working on adding static analysis checking for C++. I've been doing a lot of reading of the existing code and am ready to start coding. It would be very useful to get some feedback and suggestions on implementation.

My first checker will be designed to detect bad iterator usage. In particular, it will be looking for uninitialized iterators, invalidated iterators and invalid iterator operations. The basic algorithm is:

1) Locate all STL container instance declarations. This is needed because we need to associate each iterator with a particular container instance. STL containers have well defined operations that invalidate bound iterators.
2) Locate all iterator declarations.
3) Locate all iterator definitions (assignments) and bind to the instance used to initialize.
4) Do a modified reaching definition analysis on the iterators where certain operations on an instance such as insert, clear, reserve, etc. can invalidate the iterator. Use the binding of the instance to the iterator to invalidate the iterator.
5) Flag with warnings uses of iterators that have been invalidated.
6) Flag with warnings binary operations on iterators bound to different instances.

Please feel free to offer any suggestions or comments. Thanks.

  - jim

What are the advantages over adding this code into a debugging version of the standard library, such as the ones contained in both VC++ and g++?

Chris

The problem is that some of the operations can be successful 99 times out of a hundred. The debugging version may never see the 1 time out of a hundred. For example, when you insert a value in a vector, if there is enough space reserved, the iterator will still be valid. Only if the memory is re-allocated, will the iterator be invalidated. So, while a debugging version might never experience the problem, the final executable might under the right conditions. One of the points of static analysis is to discover potential problems that may only occur under the precise conditions by analyzing all realizable paths and detecting possible problems.

  - jim

They are complementary approaches. Static analysis can potentially find the issues early, or find cases that are hard to trigger in practice. Runtime checking will find issues that static analysis doesn't find because of analysis imprecision, but will only find the issues when the code gets triggered (possibly in some corner case).

I realized I never commented on these points. Comments inline.

1) Locate all STL container instance declarations. This is needed
because we need to associate each iterator with a particular
container instance. STL containers have well defined operations that
invalidate bound iterators.

One thing to keep in mind is that STL implementations may vary in how they implement features such as std::string. In some STL implementations a container might be a typedef, while in others it might be a direct declared class. In my conversations with engineers working on commercial C++ static analysis tools, I vaguely recall this being an issue. I don't have any specific advice; just something to look out for.

2) Locate all iterator declarations.

I don't think this needs to be done up front. I think it can be done lazily when an iterator is used.

3) Locate all iterator definitions (assignments) and bind to the
instance used to initialize.

I think (2) can be lazily done as part of (3), and would be far more efficient.

4) Do a modified reaching definition analysis on the iterators where
certain operations on an instance such as insert, clear, reserve,
etc. can invalidate the iterator. Use the binding of the instance to
the iterator to invalidate the iterator.

This can probably be done as a typestate analysis, where a typestate is associated with the iterator. Examples would be "VALID" and "INVALID".

5) Flag with warnings uses of iterators that have been invalidated.

If a typestate analysis was in place, this could easily be done for monitoring accesses to the iterators and seeing if the typestate is INVALID.

6) Flag with warnings binary operations on iterators bound to
different instances.

I advice doing this after (5) is in place. This is a little more tricky, as this isn't just about different instances, but also that the iterators were created at a consistent point for the same container. e.g.:

iterator E = container.end();
...
/// modify the container
...
iterator I = container.begin();
...
if (I == E) // error

This example, which is probably really rare in practice (although it could conceptually happen, especially if 'begin' is something like 'find'), is still an error even though both iterators access the same container.