Memory regions as conservative alias analysis

Hi,

I'm evaluating the possibility of using the memory regions computed by
the static analyzer as a *conservative* alias analysis (i.e.: I want
to be able to guarantee that two given pointers are never going to
alias, in any path). I'm aware that the analyzer is mainly oriented
towards bug finding, and that false negatives are acceptable in that
particular application. On the contrary, I think that a conservative
approach is necessary in many cases for the application I'm thinking
of (coding rule validation). If the analysis is precise enough, I
think it can be also useful in this other way. I'd like to know if you
think this is not feasible for some reason.

In order to define my checks, I need to know what guarantees the
analyzer can give me, e.g:
- If two pointers point to regions contained in different memory
spaces, they are not aliased. (Memory spaces are always disjoint,
right?)
- If two pointers point to different stack frames, they are not aliased.
- etc.
Are those guesses correct? What other certainties can I have?

I've been trying to write some toy checkers, looking at the examples
that come with clang, but the number of concepts/classes involved is
quite intimidating. I'd really appreciate some document explaining the
general design of the analyzer and the different pieces that can be
replaced (StoreManager, ConstraintManager, etc.). I've already read
the paper "A Memory Model for Static Analysis of C Programs", by Xu,
Kremenek, and Zhang, and
http://clang.llvm.org/docs/AnalyzerRegions.html, but I find difficult
to jump from those documents to the API documentation.

As a side note: I wrote some example checkers to determine the memory
space of some variables, only to receive an "<Unknown Region>" for
almost every region in my code. The cause was not that the analyzer
was incapable of assigning meaningful memory spaces, but that most of
'MemSpaceRegion' subclasses don't re-implement the 'dumpToSream'
method. It'd be a trivial patch to implement those methods but that
would greatly improve debugging (I can contribute the patch, if you
want). On the other hand, I never receive regions in 'HeapSpaceRegion'
for neither 'new' nor 'malloc'. Is the implementation for that
missing?

Hi Guillem,

Comments inline!

Hi,

I'm evaluating the possibility of using the memory regions computed by
the static analyzer as a *conservative* alias analysis (i.e.: I want
to be able to guarantee that two given pointers are never going to
alias, in any path). I'm aware that the analyzer is mainly oriented
towards bug finding, and that false negatives are acceptable in that
particular application. On the contrary, I think that a conservative
approach is necessary in many cases for the application I'm thinking
of (coding rule validation). If the analysis is precise enough, I
think it can be also useful in this other way. I'd like to know if you
think this is not feasible for some reason.

In order to define my checks, I need to know what guarantees the
analyzer can give me, e.g:
- If two pointers point to regions contained in different memory
spaces, they are not aliased. (Memory spaces are always disjoint,
right?)
- If two pointers point to different stack frames, they are not aliased.
- etc.
Are those guesses correct? What other certainties can I have?

The second assumption is correct. The first one is not.

If the memory space is "UnknownSpaceRegion", it means that we do not know the actual memory space. For example, suppose you had code that looked like:

  void *p;
  void *q = foo(p);

If we do not know anything about the implementation of 'foo', then we return a memory region wrapping a conjured symbol. That memory region has an unknown memory space. The name pretty much implies the ramifications. It is entirely possible that 'p' and 'q' are aliases after these two lines of code execute.

I've been trying to write some toy checkers, looking at the examples
that come with clang, but the number of concepts/classes involved is
quite intimidating. I'd really appreciate some document explaining the
general design of the analyzer and the different pieces that can be
replaced (StoreManager, ConstraintManager, etc.). I've already read
the paper "A Memory Model for Static Analysis of C Programs", by Xu,
Kremenek, and Zhang, and
http://clang.llvm.org/docs/AnalyzerRegions.html, but I find difficult
to jump from those documents to the API documentation.

The details are unfortunately complex, and real documentation needs to be written. I've posted details numerous times in the past to this list, but I don't have good references off hand. The basic concepts are:

(1) Symbols, managed by SymbolManger. The represent how we name symbolic things, be it symbolic values, symbolic chunks of memory, etc.
(2) Memory regions, which represent a lazy abstraction of memory. Symbolic regions are used to represent abstract chunks of memory.
(3) Bindings of values to regions, which are represented by the symbolic store and are managed by StoreManager. It's basically up to the StoreManager to decide how aliasing relationships come into play when modeling loads/stores from/to symbolic memory.
(4) Constraints on symbols, managed by Constraint Manager. Currently this is limited to constraints between symbols and integer constants, but this will eventually also track symbol to symbol relationships as well.

As a side note: I wrote some example checkers to determine the memory
space of some variables, only to receive an "<Unknown Region>" for
almost every region in my code. The cause was not that the analyzer
was incapable of assigning meaningful memory spaces, but that most of
'MemSpaceRegion' subclasses don't re-implement the 'dumpToSream'
method. It'd be a trivial patch to implement those methods but that
would greatly improve debugging (I can contribute the patch, if you
want).

That would be great!

On the other hand, I never receive regions in 'HeapSpaceRegion'
for neither 'new' nor 'malloc'. Is the implementation for that
missing?

HeapSpaceRegion was added during prototyping, and was never used. We should probably remove it, and replace it with something more expressive, such as a symbolic memory space. That way we can distinguish between different parts of the heap, e.g., memory returned by distinct allocators. That may help with providing stronger aliasing guarantees.

Hi Ted, thank you for your reply.

[...]

In order to define my checks, I need to know what guarantees the
analyzer can give me, e.g:
- If two pointers point to regions contained in different memory
spaces, they are not aliased. (Memory spaces are always disjoint,
right?)
- If two pointers point to different stack frames, they are not aliased.
- etc.
Are those guesses correct? What other certainties can I have?

The second assumption is correct. The first one is not.

If the memory space is "UnknownSpaceRegion", it means that we do not know the actual memory space. For example, suppose you had code that looked like:

void *p;
void *q = foo(p);

If we do not know anything about the implementation of 'foo', then we return a memory region wrapping a conjured symbol. That memory region has an unknown memory space. The name pretty much implies the ramifications. It is entirely possible that 'p' and 'q' are aliases after these two lines of code execute.

Ok, to be more precise, if both regions are in a different memory
space, and these memory spaces are not Unknown, then can I be sure
that the regions do not overlap?

[...]

As a side note: I wrote some example checkers to determine the memory
space of some variables, only to receive an "<Unknown Region>" for
almost every region in my code. The cause was not that the analyzer
was incapable of assigning meaningful memory spaces, but that most of
'MemSpaceRegion' subclasses don't re-implement the 'dumpToSream'
method. It'd be a trivial patch to implement those methods but that
would greatly improve debugging (I can contribute the patch, if you
want).

That would be great!

I need to update my repo to generate the patch. I'll send it in a few days.

On the other hand, I never receive regions in 'HeapSpaceRegion'
for neither 'new' nor 'malloc'. Is the implementation for that
missing?

HeapSpaceRegion was added during prototyping, and was never used. We should probably remove it, and replace it with something more expressive, such as a symbolic memory space. That way we can distinguish between different parts of the heap, e.g., memory returned by distinct allocators. That may help with providing stronger aliasing guarantees.

That would be very helpful!

Regards,