DataFlowAnalysis in Clang

Dear all,
I am currently evaluating tools to implement some high level-code-transformations directly in the source code. Clang seems to be a very nice platform for doing that, the only thing which is lacking is Data Dependency Analysis. I really need to have stuff like DefUse chains... and so on! As I cannot see anything like that in the Analysis module I wanted to start to implement them from scratch by myself.

Before start this adventure I want to be sure such structures aren't already implemented in clang. I guess that this kind of analysis is not inline with the philosophy of the clang project, I saw that most of the transformations (loop transformations/constants propagation) are meant to be applied at LLVM bytecode level. However for the transformation I am interested in I really need to work at source code level; lot of information is lost when it's converted (like openmp pragmas... and so on) and that's not good for us! :slight_smile: If you have any kind of pointer about the implementation of dataflow analysis in clang... you are really welcome!

many thanks for the awesome work you are doing with clang, Simone

Clang contains both flow-sensitive and path-sensitive dataflow analysis engines. They can be found in include/clang/Analysis. The flow-sensitive engine is used by LiveVariables.cpp and UninitializedValues.cpp respectively.

Both dataflow analysis engines use source-level CFGs that model both the explicit and implicit control-flow of C.

The flow-sensitive engine associates dataflow values with "block-level expressions" (which are expressions that appear as a "top-level" expression in the basic block of the source-level CFG). Using -cfg-dump and -cfg-view on a source file will give you a better idea of what these concepts mean. The dataflow solver uses a merge operate to merge values at confluence points and uses a worklist algorithm to compute dataflow values. The flow-sensitive solver is functional; it definitely works and has been tested, but it has been highly optimized for performance (there simply hasn't been a need yet).

The path-sensitive engine is used by the static analyzer, is fairly complicated. It uses the notion of transfer functions to update dataflow state, but the program analysis task is modeled as a graph reachability problem in the state space graph of a program rather than associating dataflow values with sections of the CFG.