GSoC 2012: Static Function Blacklisting

Hello,

For this Google Summer of Code I would like to propose adding a static
analysis tool to clang to verify that some set of functions is never
called from some entry point.
This static analysis would be capable of using some function
annotations to determine which functions can and cannot be called to
satisfy this requirement.

Example:

void func(void) __attribute__((does_not_call_foo))
{
    foo();//analyzer emits an error here
}

OR

void bar(void)
{
    foo();
}

void func(void) __attribute__((does_not_call_foo))
{
    bar();//analyzer emits an error within here
}

Why would this be useful?
1) Security - it can be used to audit what functions can possibly be
called from some entry point (ie verifying that no network functions
are accessed)
2) Rentrant Code Checks - it could verify that no global state is
accessed (from functions (or objects with a alteration to the
proposal))
3) Real-time Safety - it can check what system/library calls are made
to verify that latency goals can be meet.

Real-time safety is the primary motivation for me, after seeing some
of the issues that exist within the linux audio community.
The primary issue is that it is very easy to mistakenly create
non-real-time safe code and it is possible for this mistake to go
unnoticed for a large period of time.
Some simple mistakes that can be found are calls to locking mutexes,
calls to memory allocation, IO, and other blocking functions.
With static analysis many of these issues can be found and taken care of.

This analysis would likely need to be supplemented with some
blacklist/whitelist for functions not defined in the analyzed source
ie libraries.

Can the annotations/attributes in clang be extended in this manner
(ignoring example syntax)?
Can attributes be assigned to functions without altering the source
(the library case)?
Does this sound like a good/feasable summer of code project?

General comments are welcome.

--Mark McCurry

Hi Mark,

The general idea is something that's definitely useful. I wrote a hacky ad-hoc tool with libclang to perform this kind of check for the xlocale code in FreeBSD and ensure that none of the reentrant functions was touching any global state or calling the non-reentrant versions and it would be great to see it done properly and generalised.

A few thoughts about the design though:

I am not convinced by the annotations. A does-not-call-X attribute doesn't seem to be sufficiently expressive for this kind of test. You'd need to whitelist dozens of functions at each call. I would rather something that allowed you to tag functions with attributes like 'reentrant', or 'realtime' (the annotation attribute could be used here) and then another like __attribute__((may_only_call(reentrant))) to be applied to a function, that would only allow it to call functions whose declarations are marked as reentrant.

Currently, the analyser does not perform any checking between compilation units. That means that with your example, if bar() and func() were in different compilation units then the analyser would be quiet. By only calling annotated functions the compilation unit containing func() would warn you if bar() did not have the required attribute or (if it did) the compilation unit containing bar() would warn you if foo() did not have the required attribute.

You might want to take a look at the thread safety annotations (holds lock and friends) for some inspiration.

David

Hi Mark,

We happen to write a very similar tool that we use in Autodesk to help us enforce certain coding standard on a large code base. For example, we have functions that annotated as reentrant which has two constraints: it should not access any named declared static data, and it should only call functions that also annotated as reentrant. The tool is basically a recursive AST visitor that checks the AST properties (call graph, variable declaration and access pattern), and we don't do any analysis at the moment but still quite useful to find some bugs.

Some comments on your questions

Can the annotations/attributes in clang be extended in this manner
(ignoring example syntax)?

You can add any attributes you like but to get them up stream would be hard as adding attributes is adding language extension and Clang definitely doesn't want to include arbitrary domain specific attributes. If you take a look at the attribute definition file, the attributes there are either from standard, or Clang built in, or already available in GCC (like thread safe annotations). One idea of extending attribute system is to take a look at the Clang "annotate" attribute which you can embedded arbitrary literals so technically you can get your own annotation DSL inside the "annotate" attribute.

Can attributes be assigned to functions without altering the source
(the library case)?

It is possible and what we did in our tool is to redeclare the library / 3rd party functions with annotated attributes, and in Clang Sema, merge the attributes into function types.

Does this sound like a good/feasable summer of code project?

It should be very useful as there are many low-hanging fruits out there that could be reached by this. A more ambitious goal would be to extend Clang by providing a framework to encode the attributes and the "rules" (e.g. a white/black call list of a function) so clang users could add their own domain specific checks without reinvent the wheel again.

Cheers
Michael

I agree with David that a blacklist approach is doomed to failure because of the opacity, in general, of function declarations. A whitelist… would just be a maintenance nightmare.

On the other hand, for attributes, there is the possibility of specifying that attributes should appear on the first declaration of a function, and that subsequent declarations (and definitions) cannot add to the list (they may specify less).

You could perhaps put in place a “property” system: attribute((property(“reentrant”, “realtime”)), which though an extension to Clang is perhaps sufficiently generic to warrant consideration (you would have to check with Douglas Gregor and Ted Kremenek at least). And specify that for this attribute, a subsequent declaration/definition cannot gain a property (it might be looser).

Then, since it would be recorded in the function type, you could statically check that it works.

One thing to consider, however, is the difficulty to do so for template functions. This is, in essence, similar to noexcept, and the Standard specifies a form noexcept(boolean-expression) to allow expressing conditions on template functions/methods. This makes things a bit harder, no doubt.

Perhaps that templated code should be kept out of the loop in your project, or at least, dependant annotations.

– Matthieu

The general idea is something that's definitely useful.
I wrote a hacky ad-hoc tool with libclang to perform this kind of check for the xlocale code in FreeBSD and ensure that none of the reentrant functions was touching any global state or calling the non-reentrant versions and it would be great to see it done properly and generalised.

Good to see that the general would be useful.

A few thoughts about the design though:

I am not convinced by the annotations.
A does-not-call-X attribute doesn't seem to be sufficiently expressive for this kind of test.

I am not by any means really attached to the example syntax.

You'd need to whitelist dozens of functions at each call.
I would rather something that allowed you to tag functions with attributes like 'reentrant', or 'realtime' (the annotation attribute could be used here) and then another like __attribute__((may_only_call(reentrant))) to be applied to a function, that would only allow it to call functions whose declarations are marked as reentrant.

I personally think Matthieu has the most interesting syntax for
marking functions themselves ATM, but the may_only_call seems to be
approximately the same in terms of semantics.
On the whitelisting issue, it should be possible for clang to perform
some form of taging non-annotated functions for the purpose of the
static analysis.

Currently, the analyser does not perform any checking between compilation units.
That means that with your example, if bar() and func() were in different compilation units then the analyser would be quiet.
By only calling annotated functions the compilation unit containing func() would warn you if bar() did not have the required attribute or (if it did) the compilation unit containing bar() would warn you if foo() did not have the required attribute.

I think it should be possible to work around this, though I am not
very familiar with clangs internals.
Would it not be possible to:
- check bar() when working on its compilation unit and automatically
annotate it (internally)
- allow func() to check this internal annotation to find the violation

You might want to take a look at the thread safety annotations (holds lock and friends) for some inspiration.

I will try to dig into the source and take a look at its development state soon.

--Mark

Can the annotations/attributes in clang be extended in this manner
(ignoring example syntax)?

You can add any attributes you like but to get them up stream would be hard as adding attributes is adding language extension and Clang definitely doesn't want to include arbitrary domain specific attributes. If you take a look at the attribute definition file, the attributes there are either from standard, or Clang built in, or already available in GCC (like thread safe annotations).
One idea of extending attribute system is to take a look at the Clang "annotate" attribute which you can embedded arbitrary literals so technically you can get your own annotation DSL inside the "annotate" attribute.

As of so far my searching has only resulted in finding the standard
set of __attribute__() declarations defined in clang.
ie
http://clang-analyzer.llvm.org/annotations.html
http://clang.llvm.org/docs/LanguageExtensions.html

Are these the annotations you are referring to or have I overlooked something?
ie is there an __attribute__(annotate()) ?

Can attributes be assigned to functions without altering the source
(the library case)?

It is possible and what we did in our tool is to redeclare the library / 3rd party functions with annotated attributes, and in Clang Sema, merge the attributes into function types.

Does this sound like a good/feasable summer of code project?

It should be very useful as there are many low-hanging fruits out there that could be reached by this.
A more ambitious goal would be to extend Clang by providing a framework to encode the attributes and the "rules" (e.g. a white/black call list of a function) so clang users could add their own domain specific checks without reinvent the wheel again.

I am not too sure how ambitious to be with the scope of this project,
but I certainly do intend to make it an easily extended system, such
that it will apply to a variety of use cases.

--Mark

I agree with David that a blacklist approach is doomed to failure because of
the opacity, in general, of function declarations. A whitelist... would just
be a maintenance nightmare.

I personally think that (for externally referenced libraries)
whitelisting/blacklisting various functions could be a viable option.
Outside of maintaining some list, custom headers would need which
IMHO would be more of a maintenance issue.
Attributes would be a good choice for in project code.
I may be misinterpreting what scope you are talking about for the
whitelist/blacklists though.

On the other hand, for attributes, there is the possibility of specifying
that attributes should appear on the first declaration of a function, and
that subsequent declarations (and definitions) cannot *add* to the list
(they may specify less).

The one issue with this would be external code (ie libraries), which
would either need an external method of declaring the properties or
the ability of adding one.
In what cases were you thinking specifying less properties would have
utility to developers? (briefness/backwards compatibility)

You could perhaps put in place a "property" system:
__attribute__((__property__("reentrant", "realtime")), which though an
extension to Clang is perhaps sufficiently generic to warrant consideration
(you would have to check with Douglas Gregor and Ted Kremenek at least). And
specify that for this attribute, a subsequent declaration/definition cannot
gain a property (it might be looser).

This is quite close to what I have intended, but I feel like it may
require some over-annotation (if all realtime functions must manually
get the assigned "realtime" property (syntax is great)).
I am fairly sure that it would be much less verbose if there was an
ability for the analyser to assume that a function has some property
by default until proved otherwise.

eg: if foo() has the realtime property and it calls bar(), then bar()
is assumed to be realtime until a contradiction is reached (ie bar()
calls a non-realtime function).

This could lead to the unmaintainable list of whitelists and
blacklists, but I would prefer to not force developers to annotate
their entire codebase for the static analysis to be effective.

One thing to consider, however, is the difficulty to do so for template
functions. This is, in essence, similar to `noexcept`, and the Standard
specifies a form `noexcept(boolean-expression)` to allow expressing
conditions on template functions/methods. This makes things a bit harder, no
doubt.

Perhaps that templated code should be kept out of the loop in your project,
or at least, dependant annotations.

Templates are certainly a nasty case and while it would be nice to
account for that case, it is reasonable to let them fall into an
undefined case.

--Mark

As was mentioned before, it's important to decide how you will deal with multiple translation units.

For example, you have: foo() that calls bar() that calls m(). Each is defined in its own translation unit. m() calls a non-reentrant function and only foo is annotated as "may_only_call(reentrant)".
The options here are:
- The user should annotate all of the functions as "may_only_call(reentrant)". Bad for the user.
- The analyzes is done in llvm, which has an ability to link in code from different translation units. That's not what the current static analyzer is doing, so would this be a separate tool?
- Come up with a framework of storing the intermediate results between the analyzes of different translation units. Most likely, this is the direction the current clang static analyzer will take at some point; however, this is a challenging problem.
- Only perform the check for the callees that are either explicitly annotated or have definitions is in the current translation unit. Unless you want to be ambitious, this is probably your best option.

Also, I think both "may_only_call" and "does_not_call" annotations would be useful. One can imagine the second being less maintenance for security checking (as in "does not perform IO/accesses network").

Cheers,
Anna.

This seems analogous to the throw(...) exception specification in c++ that causes lots of headaches, precisely because the complete closure of who calls who and who can throw what isn't known across translation units, much less static or dynamic libraries. gcc and Visual Studio (and probably clang) treat the meaning of the decoration differently, leading to runtime behavior differences which has caused a number of bugs in code ported between the compilers.

Some commentary here: http://cboard.cprogramming.com/cplusplus-programming/95555-why-does-cplusplus-need-throw.html

Computing the closure of who can call what in order to decide if the blacklist/whitelist is ok is pretty much the same problem, I think. Even restricting this to just clang as the initial implementing platform, this quickly grows into needing to persist state across invocations of the compiler so that said closure can be generated. That information won't be available until the entire project is linked together; do you abort the link then, when it is discovered that foo calls bar calls m which is illegal according to foo's annotations? That might be an hour later in a build cycle, or days/months later when someone links in a new static library with the implementation of m().

Tricky!
Schwieb

I do want to avoid manual annotations and I would hope to have some
basic support for determining if code in another translation unit is
"safe".
For this to be done, I should only need the AST from multiple
translation units, which I had thought would not be too complex within
clang's static analysis.
Would this be simplified greatly by making this analysis a separate tool?

If not, I think the logical path is to work within each translation
unit to check for safety as that process should be extendable once
clang's static analysis has incorporated support for shifting
information between translation units.
Is any work currently being done in cross translation unit analysis or
it is just sitting on a todo list?

This analysis would definitely be more useful to users if it could do
those checks, but it should still be of use without them.
As per me working on some of the cross translation code, I might be
able to get enough working for a proof of concept, but I have a poor
gauge of how much time it would consume.

--Mark

This, by itself, would make a good GSoC project. A persistence framework for adding annotations to functions, so that the analyser could run as in two passes, one collecting metadata and another applying it would not be a massively complex project by itself, but would be hugely useful for a whole range of analyses.

David

Indeed, this would be a first step enabling many inspections. One could imagine a simple database (sqlite ?) to record all this information, and perhaps a simple python script on the side to query it once the analysis is complete.

Regarding the issue of tagging functions with properties, I agree that each property should have a default state. Whether this default is chosen to be the safest or the most convenient option is of course debatable. I’d like to personally err on the side of safety.

Regarding the issue of external libraries, once you have the concept of default values, then all you need is a set of “configuration files” (ideally one per library to ease maintenance) that declare the non-default properties of the functions that are used. The tool can them “assume” the default properties for all functions, and overwrite the defaults with the properties passed via those configuration files. An optimization could consist in either lazy loading the files (only loading up to the necessary functions) or having an initialization phase consisting in condensing the information in a custom file / DB ala pre-compiled header (which can then be lazily deserialized in the former case).

One good thing about those configuration files, is that they can actually be checked against the implementation. For example, if one file declares “myproject::bar(int, int)” as being “reentrant” and we encounter a declaration/definition of “myproject::bar(int, int)” that declares it is not reentrant, then it is an error.

And this where they can really shine, since you can actually create them on the fly (with a specific option). For example, if “myproject::bar(int, int)” is (unfortunately) not annotated, then you can write a an “assumption file” (same format than a configuration file), recording the assumption you had to make about it. Then, this assumption file should get pulled in the set of “configuration files” for the other TUs so that the assumptions can be validated. It can also be completed when analyzing them, of course.

The command line I have in mind would look like:

./analyzer-tool [OPTIONS] --add-annotation-file=one.annot --add-annotation-file=two.annot --merged-annotation-file=merged.annot --assumption-file=assumption.annot src/*.cpp

(with bonus points for recursivity in --add-annotation-file)

Note: in the absence of an assumption file, refuse non-annotated functions (unless the default values fit the use cases, of course).

Note: it means that the first file to be analyzed could declare a non-annotated functions later assumed to have default properties. Therefore, the analysis is incomplete if the assumption file was changed during the session. This can be solved either by completing the “annotation files” so that the “assumption file” is not needed any longer (or treating it as a configuration file…) or simply by running the analysis twice.

Thus the configuration files allow both a form of persistence across TU and a way to annotate 3rd party headers. A persistence framework would be great, of course, but this simple hack should get you on track should you prefer to continue in this avenue.

– Matthieu.

There is an attribute called "annotate" that enable you to embed string literals like:

__attribute__((annotate("read_only"))) struct foo;

You can find it along with all other attributes in llvm\tools\clang\include\clang\Basic\Attr.td

However, there is bug saying this attribute only work for variables but not functions and it is not fixed yet
http://llvm.org/bugs/show_bug.cgi?id=2490 so you may end up of adding some "real" attributes, which is documented as http://clang.llvm.org/docs/InternalsManual.html#AddingAttributes

Michael

There is a better way.

"Decorating" the code to carry meta-information complicates the
compiler and the code unnecessarily. Information that is not input to
the compiler (or direct commentary) is better held elsewhere. Otherwise
the code itself is obscured.

I know, Doxygen and the rest have long held the opposite, and the
argument has merit in the absence of better tools. That argues for
better tools, precisely what Clang aims to facilitate. (Imagine, for
instance, a "compiler" that could match code with its documentation.)

Like documentation -- because, really, they *are* documentation --
assertions about what the code does or may do are fundamentally
unbounded. The assertions we might make about re-entrancy (just to
pick one example) certainly need more than one [key]word. It is not
reduceable to a binary condition like "const". Is it re-entrant,
thread-safe, signal-safe? More: Does it rely on or modify global
state? Does it have side effects or I/O? Other linkage dependencies?
Has it been reviewed and approved for questionable but legal syntax,
such as rendundant parenthesis? Who made these assertions, and when?
Who approved them? Why?

(Please don't claim any of these are the responsibility of the
version-control system. A VCS does not parse. It cannot track
assertions at that level of granularity.)

This "assertion meta-language" is neither C++ nor specific to C++.
It would be much better to address it as such, and not attempt to
shoehorn it into the compiler at the expense of the code's most
perishable and evanescent quality: clarity.

--jkl

Looks like the support for function annotations has been added to clang. See cfe/trunk/test/CodeGen/annotations-loc.c in commit:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20110905/046156.html

This annotation mechanism is very flexible (the user can specify the annotation names) and is probably a good match for the project. 

Anna.

I do want to avoid manual annotations and I would hope to have some
basic support for determining if code in another translation unit is
"safe".
For this to be done, I should only need the AST from multiple
translation units, which I had thought would not be too complex within
clang's static analysis.
Would this be simplified greatly by making this analysis a separate tool?

You don't necessarily need the full AST, just having a call graph with information about the functions attributes is probably enough. After you have it, you just propagate the attributes essentially solving a data flow problem.

Here is a rough algorithm for solving this, taking the "reentrant" annotation as an example.
1) Build a call graph (clang has rudimentary call graph support already).
2) For each node(representing a function) internally mark it with "reentrant", "non reentrant", "don't know".
3) Iterate through the nodes in the graph and propagate the annotations:
       If at least one of of the callees is "non-reentrant", the caller becomes "non-reentrant".
       If all of the callees are "reentrant", the caller becomes "reentrant".
       Validate: If a function becomes "non-reentrant" and it has "reentrant" user annotation, raise a warning.
4) Repeat Step #4 until no change. (To optimize performance, you'd iterate in topological order starting from the callees.)

This algorithm assumes that if we don't know about a function, it is doing the right thing. This approach is the easiest for users to adapt. In addition, it allows you to deal with imprecision of the analyses, like functions defined in other translation units, dynamically linked libraries, system functions the analyzer does not reason about, calls via function pointers (though, these could be dropped at call graph construction). You can always strengthen your analyzes afterwards to make it more precise.

You could use the same algorithm even when handling multiple translation units. However, you'd need to build a global call graph by linking the information from different translation units. I am not sure how easy it is to resolve possible ambiguities here.. In the worst case, you can always assume the function is marked as "don't know".

I'd start by implementing the analyzes for a single TU. Then, come up with a serialization scheme for the generated data. Last, we can either feed the serialized output back into the tool or have another tool/script to do the processing of the global graph.

I think it is very important to embed the knowledge about existing standard/system functions into the analyzes, so that someone could use it off-the-shelf.

If not, I think the logical path is to work within each translation
unit to check for safety as that process should be extendable once
clang's static analysis has incorporated support for shifting
information between translation units.
Is any work currently being done in cross translation unit analysis or
it is just sitting on a todo list?

It's on a todo list. The general analyzer support is more involved than what you'd need for this project and I am not sure when we'll have it.
Adding the support required for your project would be a first step in the right direction here and might be useful for other analyses.

> - Come up with a framework of storing the intermediate results between the analyzes of different translation units.
> Most likely, this is the direction the current clang static analyzer will take at some point; however, this is a challenging problem.
This, by itself, would make a good GSoC project. A persistence framework for adding annotations to functions, so that the analyser could run as in two passes, one collecting metadata and another applying it would not be a massively complex project by itself, but would be hugely useful for a whole range of analyses.

After letting the comments digest for a while, I am fairly sure that
my original proposals scope should be expanded some.
It sounds like it would be feasible to take some steps in the
direction of what David mentioned, though I am not exactly sure which
ones.
I am very hesitant to generalize persistence past function (and static
data?) annotation, as I do want to have a coherent result at the end
of GSoC.

After reading through some portions of the documentation on
/clang/lib/StaticAnalyzer, I am not too sure where a persistence
framework would fit relative to the clang code base.

From what I understand persistence work would need to be at least

within /clang/lib, but I am not entirely sure on that based upon my
current knowledge of how clang is structured.
Is it safe to assume that the callgraph/annotation analysis should
only be interfaced to clang through the libclang interface?

One could imagine a simple database (sqlite ?) to record all this information, and perhaps a simple python script on the side to query it once the analysis is complete.

Is there some standard way for recording this information or is sqlite
just what comes to mind?

As for the rest of the semantics proposed by Matthieu, I find them to
be excellent specs to work by.

There is an attribute called "annotate" that enable you to embed string literals like:

Well that eliminates the need for another attribute, assuming that
hijacking this does not create any issues.
Thanks Michael.

You don't necessarily need the full AST, just having a call graph...

I had assumed that the call graph information was not readily accessible.

(clang has rudimentary call graph support already)

Could you point me to where that support may be?
After reading through some of the internals, I started to find
warnings about the "gore of the internal analysis engine".

Here is a rough algorithm for solving this, taking the "reentrant" annotation as an example.
1) Build a call graph (clang has rudimentary call graph support already).
2) For each node(representing a function) internally mark it with "reentrant", "non reentrant", "don't know".
3) Iterate through the nodes in the graph and propagate the annotations:
      If at least one of of the callees is "non-reentrant", the caller becomes "non-reentrant".
     If all of the callees are "reentrant", the caller becomes "reentrant".
      Validate: If a function becomes "non-reentrant" and it has "reentrant" user annotation, raise a warning.
4) Repeat Step #4 until no change. (To optimize performance, you'd iterate in topological order starting from the callees.)

That is fairly close to what I have intended to do, but seeing it
formalized so concisely makes me think that the scope of the project
should be expanded some.

So with all of that said, is it reasonable to extend this project into
some restricted two pass persistence framework for clang's static
analysis that could have the previously described property checking as
the first use of the new functionality?
Hopefully this work can be built upon in future clang development.

--Mark

  • Come up with a framework of storing the intermediate results between the analyzes of different translation units.

Most likely, this is the direction the current clang static analyzer will take at some point; however, this is a challenging problem.

This, by itself, would make a good GSoC project. A persistence framework for adding annotations to functions, so that the analyser could run as in two passes, one collecting metadata and another applying it would not be a massively complex project by itself, but would be hugely useful for a whole range of analyses.

After letting the comments digest for a while, I am fairly sure that
my original proposals scope should be expanded some.
It sounds like it would be feasible to take some steps in the
direction of what David mentioned, though I am not exactly sure which
ones.
I am very hesitant to generalize persistence past function (and static
data?) annotation, as I do want to have a coherent result at the end
of GSoC.

After reading through some portions of the documentation on
/clang/lib/StaticAnalyzer, I am not too sure where a persistence
framework would fit relative to the clang code base.
From what I understand persistence work would need to be at least
within /clang/lib, but I am not entirely sure on that based upon my
current knowledge of how clang is structured.
Is it safe to assume that the callgraph/annotation analysis should
only be interfaced to clang through the libclang interface?

For now, you could restrict it to the analyzer; it can be moved out if there are more users.

One could imagine a simple database (sqlite ?) to record all this information, and perhaps a simple python script on the side to query it once the analysis is complete.

Is there some standard way for recording this information or is sqlite
just what comes to mind?

I think a database might be an overkill in this case (we probably don’t want to introduce the dependency on sqlite). You can just write out a formatted file.

There are several places where clang is doing serialization already. You might check if one of those mechanisms is something that suits your purposes. For example, libclang has the ability to create a .pch file - a serialized AST, and thus, contains subroutines to serialize a DenseMap, which might be enough. Clang and the analyzer serialize diagnostic information into different formats.

As for the rest of the semantics proposed by Matthieu, I find them to
be excellent specs to work by.

There is an attribute called “annotate” that enable you to embed string literals like:

Well that eliminates the need for another attribute, assuming that
hijacking this does not create any issues.
Thanks Michael.

You don’t necessarily need the full AST, just having a call graph…

I had assumed that the call graph information was not readily accessible.

(clang has rudimentary call graph support already)

Could you point me to where that support may be?
After reading through some of the internals, I started to find
warnings about the “gore of the internal analysis engine”.

See http://clang.llvm.org/doxygen/CallGraph_8h_source.html

Here is a rough algorithm for solving this, taking the “reentrant” annotation as an example.

  1. Build a call graph (clang has rudimentary call graph support already).
  1. For each node(representing a function) internally mark it with “reentrant”, “non reentrant”, “don’t know”.
  1. Iterate through the nodes in the graph and propagate the annotations:

If at least one of of the callees is “non-reentrant”, the caller becomes “non-reentrant”.

If all of the callees are “reentrant”, the caller becomes “reentrant”.

Validate: If a function becomes “non-reentrant” and it has “reentrant” user annotation, raise a warning.

  1. Repeat Step #4 until no change. (To optimize performance, you’d iterate in topological order starting from the callees.)

That is fairly close to what I have intended to do, but seeing it
formalized so concisely makes me think that the scope of the project
should be expanded some.

So with all of that said, is it reasonable to extend this project into
some restricted two pass persistence framework for clang’s static
analysis that could have the previously described property checking as
the first use of the new functionality?

I think so.

The proposal is now on the GSoC page:
http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/fundamental/1

Thanks for all the suggestions, pointers, and all around help.
Hopefully I will be interacting with this list some more this summer.

--Mark