Preprocessor conditional macro's hack

Hi all,

I've been messing around with the LLVM/Clang internals, for fun and giggles.
However I quickly ran into problems with the preprocessor. In my opinion the
worst thing about C/C++ is the fact that the preprocessor is completely
decoupled from the rest of the language making it really hard to do proper code
analysis. If you also want to build a code analyzer you also want to properly
track conditional macro's and their dependencies. What I would like to do is
create a dependency graph for the preprocessor macro's in a single .c file.

Example:

#ifdef A
  #ifdef B
  #endif
#endif
#ifdef C
  #ifndef B
   #define D
  #endif
#endif

This should yield, with dependencies between parentheses: A(), B(C), C(), D(!B)

Using Clang's preprocessor it's impossible to go beyond the first level of
a conditional macro if it is undefined. The preprocessor will simply eat
everything within A and thus never deduce that B is conditional on A. I've
also looked at gcc (cpp -E -dU - < hello.c | egrep "(#undef|#define)")
but that also only tracks first level macro's.

I'm was thinking about a solution where I simply get the first level macro's
and then create all permutations of their conditions and rerun the preprocessor
to find additional levels of used conditional macro's. Rinse and repeat. I
think that should work relatively well although it might blow up quite quickly
with code bases which are really large. I haven't even thought about
conditional macro's yet. This all could probably be made to work with an
unpatched Clang instance.

For my preferred solution I was thinking about is to create a patch for the
Clang Preprocessor and turn all the relevant functions into virtual ones so I
can override the Preprocessor. Either that or I create a patch which gives me
the option to record all this information when doing the Preprocessing. I'm not
looking directly to get this into Clang, if at all, since I very well
understand it's quite a limited use-case, but if someone could maybe offer some
guidance here on how to approach this (or share some pitfalls I need to watch
out for) it would be greatly appreciated.

cheers,
Vincent

If you haven't already, take a look at clang's PPCallbacks class
<http://clang.llvm.org/doxygen/classclang_1_1PPCallbacks.html&gt;\.

You could maybe try the following:
At every #if* (PPCallbacks::If, PPCallbacks::Ifdef,
PPCallbacks::Ifndef), you basically "checkpoint" the Preprocessor.
Then, if a corresponding PPCallbacks::SourceRangeSkipped occurs, you
can arrange to go back and lex it as though it were there

Also, this could get really hairy. The dependency graph of macros that
you are thinking of is not necessarily going to be acyclic; it could
be quite bad (think of recursive includes; also check out Boost
Preprocessor for more nastiness
<http://www.boost.org/doc/libs/1_50_0/libs/preprocessor/doc/index.html&gt;\).
However, for "most code", I'm guessing that the graph will probably
not be too bad, but I might be wrong.

and then *create all permutations of their conditions* and rerun the preprocessor

(emphasis mine). All permutations is n!. If you can't come up with
some really clever way to prune this, it won't be tractable.

--Sean Silva

If you haven’t already, take a look at clang’s PPCallbacks class
<http://clang.llvm.org/doxygen/classclang_1_1PPCallbacks.html>.

You could maybe try the following:
At every #if* (PPCallbacks::If, PPCallbacks::Ifdef,
PPCallbacks::Ifndef), you basically “checkpoint” the Preprocessor.
Then, if a corresponding PPCallbacks::SourceRangeSkipped occurs, you
can arrange to go back and lex it as though it were there

Also, this could get really hairy. The dependency graph of macros that
you are thinking of is not necessarily going to be acyclic; it could
be quite bad (think of recursive includes; also check out Boost
Preprocessor for more nastiness
<http://www.boost.org/doc/libs/1_50_0/libs/preprocessor/doc/index.html>).
However, for “most code”, I’m guessing that the graph will probably
not be too bad, but I might be wrong.

and then create all permutations of their conditions and rerun the preprocessor
(emphasis mine). All permutations is n!. If you can’t come up with
some really clever way to prune this, it won’t be tractable.

I don’t understand how permutations play into this:
For each #ifdef that triggers a decision you can either go into that branch or not. If every decision is based on a single define, it seems like this is an exponential problem (2 states for each define).

It gets much worse if ifdefs start to have data dependencies like:

#if a < b and a * 42 < b * 5

Now the problem is not only the number of combinations, but also solving for the right values to use to trigger each switch …

Cheers,
/Manuel

Good point. 2^n seems accurate. Unfortunately, that's still
intractable without heavy pruning.

Also, even though preprocessor is not Turing-complete (maybe it is
with recursive includes?), it is still possible to create
non-terminating computations.

--Sean Silva