Live interval analysis on LLVM IR (not on Machine instructions)

Hello.
     Could you please tell me how can I implement best a live interval analysis on LLVM IR (not on Machine instructions, which is already available in http://llvm.org/docs/doxygen/html/LiveIntervalAnalysis_8cpp_source.html)?
     I need to analyze the standard LLVM IR (list of Instruction *) and decide for each SSA variable what is it's live(ness) interval. My problem is that I don't even know of an existing general Dataflow analysis algorithm implementation for LLVM IR.

   Thank you,
     Alex

You can use:

Hello.
     Thank you very much for the research paper. I will try to make use of the algorithms
it presents.

     I just want to add that I found a 3rd party project doing dataflow analysis for LLVM
IR at https://github.com/rohitjha/cse231-proj2. As written at
http://cseweb.ucsd.edu/~r1jha/#five , the project's description is:
     "Dataflow Analysis Framework for LLVM
  This is an extensible dataflow analysis framework for analyzing and optimizing LLVM IR
code. The framework runs forward optimistic iterative dataflow analyses such as Constant
Propagation, Available Expressions, Range Analysis (variable and array) and
Intra-procedural Pointer Analysis. Using these, checking for array access bounds,
optimization such as Constant Folding, Branch Folding and Common Subexpression Elimination
are easily implemented."
     I've started using it - need to adapt it a bit to work with LLVM 3.9. The project seems promising.

    I'm a bit surprised LLVM does not contain a DFA module for LLVM IR. I noticed that
lib/Analysis/AliasAnalysisEvaluator.cpp and ConstantFolding.cpp seem to implement DFA, but
it does not say it; also, lib/Transforms/Scalar/EarlyCSE.cpp performs CSE but without DFA it seems; as already mentioned, there is also lib/CodeGen/LiveIntervalAnalysis.cpp but this pass is performed on Machine instructions.

   Best regards,
     Alex

Dear Alex,

By "data flow analysis," do you mean classic iterative data-flow analysis (Kam-Ullman style), or do you mean data-flow analysis in general?

LLVM does implement SSA-based data-flow analysis algorithms for its optimizations. They are typically not iterative because a) the SSA-based form doesn't require iteration and b) the optimizations only operate on SSA-based values. Constant propagation on SSA form, for example, only needs two iterations. It's one of the advantages that SSA form provides.

It would make sense to use classic iterative data-flow analysis algorithms on values not stored in SSA form (e.g., data stored in the heap). To date, I don't think there's been much demand for such analyses from the in-tree optimizations, but that could change in the future.

Finally, based on the URL you provided, it looks like the project below is a course project. I'd be careful of using course project code; some LLVM-based courses will have students implement the classic iterative data-flow analyses on LLVM IR as an instructional exercise even though a "real-world" implementation would use a more efficient SSA-based algorithm.

Regards,

John Criswell