[Static Analyzer] temp_object chaos?

I’m analyzing the following source line:

auto it = v.begin(); // v is an std::vector

As far as I’m able to reconstruct what happens using the checker callbacks, it goes somehow like this:

  • checkPostCall on v.begin(): the call has been interpreted, the result is a lazyCompoundVal with a temp_object within it

  • checkPostStmt on v.begin(): ProgramState::getSVal() yields the same result as above (the same lazyCompoundVal with the same temp_object within it)

  • bind: some lazyCompoundVal with a temp_object within it to loc ‘&it’ – but this lazyCompoundVal and temp_object are different than above!!!

  • checkPostStmt on v.begin() again: ProgramState::getSVal() yields &it, getting the SVal inside the region gives a lazyCompoundVal with ‘it’ in it

My problem is that I’m unable to implement a checker that would need to track iterator values, as temp_objects just seem to pop and disappear without leaving a trace and without having any followable connection between them.

Is this working as intended, and if so, how should I approach this? I recall there being an open project for better modelling of C++ temporary objects, is that whose effect I’m seeing here?

Thanks!

I’m analyzing the following source line:

auto it = v.begin(); // v is an std::vector

As far as I’m able to reconstruct what happens using the checker callbacks, it goes somehow like this:

  • checkPostCall on v.begin(): the call has been interpreted, the result is a lazyCompoundVal with a temp_object within it

  • checkPostStmt on v.begin(): ProgramState::getSVal() yields the same result as above (the same lazyCompoundVal with the same temp_object within it)

  • bind: some lazyCompoundVal with a temp_object within it to loc ‘&it’ – but this lazyCompoundVal and temp_object are different than above!!!

  • checkPostStmt on v.begin() again: ProgramState::getSVal() yields &it, getting the SVal inside the region gives a lazyCompoundVal with ‘it’ in it

The last line is probably wrong… You have “checkPostStmt on v.begin()” twice. Also, can you figure out which AST node is responsible for the bind?

Basically, to understand what is going on better, it would be valuable to match the AST of the statements to the callbacks (you can get the AST with clang -cc1 -ast-dump):

-DeclStmt 0x1023e7f00 <line:6:3, col:22> -VarDecl 0x1023cd780 <col:3, col:21> it ‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’
-CXXConstructExpr 0x1023e7ec8 <col:13, col:21> 'class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator<int> > >':'class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator<int> > >' 'void (const class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator<int> > > &) throw()' elidable -MaterializeTemporaryExpr 0x1023e7da8 <col:13, col:21> ‘const class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’ lvalue
-ImplicitCastExpr 0x1023e7d90 <col:13, col:21> 'const class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator<int> > >' <NoOp> -CXXMemberCallExpr 0x1023cd8a0 <col:13, col:21> ‘iterator’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’
-MemberExpr 0x1023cd870 <col:13, col:15> '<bound member function type>' .begin 0x1023c66d0 -DeclRefExpr 0x1023cd7d8 col:13 ‘std::vector’:‘class std::vector<int, class std::allocator >’ lvalue Var 0x1023bebc0 ‘v’ ‘std::vector’:‘class std::vector<int, class std::allocator >’

We’ve looked at implementing these checks before and it is not a simple problem. The issue here is that the iterators are value objects, represented by LazyCompoundVals and SVals, and we do not have any way of persisting/tracking values right now.

Let’s start with mapping the results of callbacks to the AST nodes to see if the modeling that we have now makes sense.

Thanks,
Anna.

Hi Anna,

The last line is probably wrong… You have “checkPostStmt on v.begin()” twice.

I get it every time I re-run the checker, so I’m positive this is what happens. It did look weird for me too, that’s why I mentioned it. Note though that I’m using Clang 3.3, so it might be something that’s been fixed since then.

This is what I get for the bind:

Bind: val lazyCompoundVal{0x4cca6b0,temp_object{iterator,0x4bd1df0}} => loc &it

  • in stmt: v.begin()

  • stmt ast: CXXConstructExpr 0x4bd5d18 ‘std::vector::iterator’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’ ‘void (class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > > &&) noexcept’ elidable

`-MaterializeTemporaryExpr 0x4bd5bb8 ‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’ xvalue

`-CXXMemberCallExpr 0x4bd1df0 ‘iterator’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’

`-MemberExpr 0x4bd1dc0 ‘’ .begin 0x4bc7600

`-DeclRefExpr 0x4bd1d28 ‘std::vector’:‘class std::vector<int, class std::allocator >’ lvalue Var 0x4b96ee0 ‘v’ ‘std::vector’:‘class std::vector<int, class std::allocator >’

The issue here is that the iterators are value objects, represented by LazyCompoundVals and SVals, and we do not have any way of persisting/tracking values right now.

The only case I’m concerning myself with right now is when iterators are saved into local variables, like so:

std::vector v = { 1, 2, 3 };

auto it = v.begin();

v.push_back(10);

it++; // possibly invalid!

The variable ‘it’ will be represented by a memory region, and that it what I’m tracking, instead of the actual iterator value. I realize that this is limiting, but my understanding of the Clang Static Analyzer is rather limited and this is the best I could come up with. :slight_smile:

Basically the only problem seems to be that I don’t get consistent values representing the iterator: the result SVal from checkPostCall is different than the value that is actually bound to the variable. I guess I can try to work-around by implementing a kind of a state machine, and instead of starting to track iterator values in checkPostCall, I just record that a given Expr yields an iterator value, and look for that Expr in checkBind, but that sounds hackish and I don’t see any reason why what I’m trying now shouldn’t work.

Thanks!

Hi Anna,

The last line is probably wrong… You have “checkPostStmt on v.begin()” twice.

This is strange. Would be good to figure out what’s going on.

I get it every time I re-run the checker, so I’m positive this is what happens. It did look weird for me too, that’s why I mentioned it. Note though that I’m using Clang 3.3, so it might be something that’s been fixed since then.

This is what I get for the bind:

Bind: val lazyCompoundVal{0x4cca6b0,temp_object{iterator,0x4bd1df0}} => loc &it

  • in stmt: v.begin()

  • stmt ast: CXXConstructExpr 0x4bd5d18 ‘std::vector::iterator’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’ ‘void (class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > > &&) noexcept’ elidable

`-MaterializeTemporaryExpr 0x4bd5bb8 ‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’ xvalue

`-CXXMemberCallExpr 0x4bd1df0 ‘iterator’:‘class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class std::allocator > >’

`-MemberExpr 0x4bd1dc0 ‘’ .begin 0x4bc7600

`-DeclRefExpr 0x4bd1d28 ‘std::vector’:‘class std::vector<int, class std::allocator >’ lvalue Var 0x4b96ee0 ‘v’ ‘std::vector’:‘class std::vector<int, class std::allocator >’

We expect the MaterializeTemporaryExpr to create a new region - it is “materializing” a temporary value into memory by creating a region to host it.

The issue here is that the iterators are value objects, represented by LazyCompoundVals and SVals, and we do not have any way of persisting/tracking values right now.

The only case I’m concerning myself with right now is when iterators are saved into local variables, like so:

std::vector v = { 1, 2, 3 };

auto it = v.begin();

v.push_back(10);

it++; // possibly invalid!

The variable ‘it’ will be represented by a memory region, and that it what I’m tracking, instead of the actual iterator value. I realize that this is limiting, but my understanding of the Clang Static Analyzer is rather limited and this is the best I could come up with. :slight_smile:

Again, none of the checkers are trying to track value objects. They track regions (pointers). This checker is different.

There are two approaches one could take:

  1. Work on extending the analyzer core infrastructure to track value objects. We currently do not have time to work on this, but we could schedule an internal meeting to figure out a design for this if you are interested in pursuing this direction.

  2. Some hacky solution based on pattern matching the expression in the checker. For example, you’ll need to identify the region created by v.begin() and deal with the fact that it is a temporary by pattern matching on the MaterializeTemporaryExpr. As with any pattern matching solutions, this will be brittle and might not work for all code. Something like this could work (I did not think it through very well):

  • You register a callback to intercept PostCall on v.begin(). Then check if the parent expr is a MaterializeTemporaryExpr. If yes, you store (callExpr, region) pair into a map. (You probably have another map that maps this region to vector state.)
  • You register another callback for MaterializeTemporaryExpr. Here, you check if the child expression is one of the callExprs you have in the map, and if yes, you remove the pair from the map and associate the new region with the vector info.
    The trickiest part here is ensuring that you clear out the state after you don’t use the info anymore. For example, if you store info for every v.begin regardless weather it is wrapped in MaterializeTemporaryExpr or not, you might never clean out that info from the map.

Jordan, I think this is similar to what we’ve discussed, but I might be missing/added some details…
Anna.

Hi Anna,

Work on extending the analyzer core infrastructure to track value objects. We currently do not have time to work on this, but we could schedule an internal meeting to figure out a design for this if you are interested in pursuing this direction.

How difficult a job would this be? I’m absolutely open to this idea, but it’s doubtful that my boss would authorize this (we’re low on manpower), and I simply don’t have the energy to work on this outside my working hours (I’m also an undergraduate university student with a full timetable).

Some hacky solution based on pattern matching the expression in the checker. For example, you’ll need to identify the region created by v.begin() and deal with the fact that it is a temporary by pattern matching on the MaterializeTemporaryExpr. As with any pattern matching solutions, this will be brittle and might not work for all code. Something like this could work (I did not think it through very well):

This is similar to what I was considering attempting, but it just doesn’t feel robust enough - i.e. there probably would be many corner-cases when this simply would not work - to be worth the effort.

Thanks!

Hi Anna,

Work on extending the analyzer core infrastructure to track value objects. We currently do not have time to work on this, but we could schedule an internal meeting to figure out a design for this if you are interested in pursuing this direction.

How difficult a job would this be?

We don’t really know, but it is unlikely to be very easy.