On heap variables and Clang SA

Hello,

Clang analyzer's checker dev page [1] talks of Clang SA being able to
track state of symbolic expressions and symbolic memory regions. Do
these concepts map to stack variables/expressions and heap
variables/expressions respectively?

For a construed example shown below, SA did not warn about comparison
against garbage value on Line 16. I concluded that, it's because Clang
SA doesn't reason about the program heap but wanted to make sure I'm not
looking at it superficially or drawing the wrong conclusion.

1. class bar {
2. public:
3. bar() {}
4. int m_x;
5. };
6.
7. class foo {
8. public:
9. foo() { ptrToBarObj = new bar(); }
10. void method();
11. int m_y;
12. bar *ptrToBarObj;
13. };
14.
15. void foo::method() {
16. if((ptrToBarObj->m_x > 0))
17. m_y = 0;
18. }

[1]: http://clang-analyzer.llvm.org/checker_dev_manual.html

Regards,
Bhargava

Hello,

Clang analyzer’s checker dev page [1] talks of Clang SA being able to
track state of symbolic expressions and symbolic memory regions. Do
these concepts map to stack variables/expressions and heap
variables/expressions respectively?

AFAIU symbolic regions are regions that the analyzer considers to be constant and thus representable purely by the expression that generates them, while memory regions basically modeling memory - both stack and heap.

For a construed example shown below, SA did not warn about comparison
against garbage value on Line 16. I concluded that, it’s because Clang
SA doesn’t reason about the program heap but wanted to make sure I’m not
looking at it superficially or drawing the wrong conclusion.

  1. class bar {
  2. public:
  3. bar() {}
  4. int m_x;
  5. };
  6. class foo {
  7. public:
  8. foo() { ptrToBarObj = new bar(); }
  9. void method();
  10. int m_y;
  11. bar *ptrToBarObj;
  12. };
  13. void foo::method() {
  14. if((ptrToBarObj->m_x > 0))
  15. m_y = 0;
  16. }

How can you prove a comparison against garbage value from that code? Seems like somebody can set m_x to anything between the constructor and the call to method.
If you want to catch this, you’ll at least need:
void f() {
foo f;
f.method();
}
… and then the SA needs to “inline” both the call to the constructor and the method call to see the problem.

Hi,

How can you prove a comparison against garbage value from that code?
Seems like somebody can set m_x to anything between the constructor and
the call to method.
If you want to catch this, you'll at least need:
void f() {
  foo f;
  f.method();
}

Apologies for having left out the crucial function that instantiates a
foo object. Agree that this is the missing piece.

... and then the SA needs to "inline" both the call to the constructor
and the method call to see the problem.

My understanding is that, during symbolic execution, Clang SA ``visits"
function calls in the procedure under analysis. So, in the function void
f() above, Clang SA would metaphorically step into foo's constructor and
subsequently method() and prove garbage value in two steps i.e.,

Step 1. Call to f.method() from void f()
Step 2. Garbage value comparison in method()

Is inlining how Clang SA really does this? Afaik, Clang SA visits the
call graph for a translation unit in topological order. In the example,
this means, when void f() is being analyzed, both ctor declaration and
method declarations would be visited, no?

Regards,
Bhargava

Hi,

How can you prove a comparison against garbage value from that code?
Seems like somebody can set m_x to anything between the constructor and
the call to method.
If you want to catch this, you’ll at least need:
void f() {
foo f;
f.method();
}

Apologies for having left out the crucial function that instantiates a
foo object. Agree that this is the missing piece.

… and then the SA needs to “inline” both the call to the constructor
and the method call to see the problem.

My understanding is that, during symbolic execution, Clang SA ``visits"
function calls in the procedure under analysis. So, in the function void
f() above, Clang SA would metaphorically step into foo’s constructor and
subsequently method() and prove garbage value in two steps i.e.,

Yes, that’s what the SA calls “inlining”. I agree that it’s confusing :slight_smile:

Step 1. Call to f.method() from void f()
Step 2. Garbage value comparison in method()

Is inlining how Clang SA really does this? Afaik, Clang SA visits the
call graph for a translation unit in topological order. In the example,
this means, when void f() is being analyzed, both ctor declaration and
method declarations would be visited, no?

Well, it depends. Whether the SA drills into a function depends on many things.

On a tangent, does Google use Clang SA on large codebases esp. Chromium
that has a massive C++ LoC count? If not, what are the top reasons for
not doing so? Lack of C++ support seems to be the Google position on
this [1] but am wondering if that is the only reason.

[1]: chromium - An open-source project to help move the web forward. - Monorail

Regards,
Bhargava

On a tangent, does Google use Clang SA on large codebases esp. Chromium
that has a massive C++ LoC count? If not, what are the top reasons for
not doing so? Lack of C++ support seems to be the Google position on
this [1] but am wondering if that is the only reason.

Yes, this is the only reason; we’d love to be able to use it. I already shot considerable time and effort into trying to get this good enough, but I think I’d need to spend another couple of weeks, which I currently just don’t have.

+jordan who has more context, too (Bhargava has expressed interest in helping to fix the remaining problems)

(cc’ delesley mainly for FYI, as he was curious about some of the limitations of the C++ cfg)

The state of the temporary destructor problem is the following (Jordan will correct me if I’m wrong):
Currently, the SA doesn’t cannot inline (which in SA terms means “symbolically execute”) constructors and destructors of temporary values of class type due to a couple of problems.

  1. The CFG doesn’t handle lifetime-extended temporaries correctly.
    I had a patch out that addressed this (http://reviews.llvm.org/D4706); look at the changes to CFG.cpp for what the CFG changes necessary here. Basically, the core of the problem is that lifetime extended temporaries have a rather complex encoding in clang’s AST, and one has to actually run over the full statement to see what comes out as liftime extended; to construct the CFG, the static analyzer runs over the AST multiple times (iirc):
    a) it figures out the scopes needed (for this it needs to figure out which variables are lifetime extended)
    b) it runs again to fill in the scopes
    c) during b) it visits parts again to figure out the temporary destructors that need to run
    The fact that it visits multiple times makes the code contain a bit of duplication (see the patch) that I wanted to remove, but never got around to. I think the patch was pretty close to being able to be submitted, and I don’t fully remember what was missing (iirc some tests were breaking and I didn’t know how to deal with it).
    I think getting that patch submitted would be a great way to get started on this :slight_smile:

  2. The static analyzer has an incorrect model for the representation of temporaries; it encodes them as immutable symbolic values, which they are not (they have memory regions that the constructor / destructor or other methods on the temporary need to work on). This is a rather intricate change. During the last LLVM dev meeting, Jordan had hacked together a proof of concept for some of this, but so far I haven’t gotten back around to pull out all those patches, understand them, and clean them up (patch from Jordan from back then attached).

  3. An idea for an intermediate step towards (1) was to only inline temporary constructors/destructors if we know they are not bound to parameters (which is where the problem with the memory region hits). Patch http://reviews.llvm.org/D4740 was an attempt to go that way (this is similar as the patch in (1)).

Let me know if you have more questions :slight_smile:

Cheers,
/Manuel

params-by-value.patch (13.2 KB)