Constructing use-def chains

Hi,

I would like to construct def-use/ use-def chains for variables and array references from the source code using Clang. I skimmed through the code of LiveVariables.cpp and supporting classes such as AnalysisData. I also looked into llvm code to see how it is done. I am looking for recommendations and have some questions. Please correct me if I am wrong.

I think in the LLVM constructs the def-use chains in the liveness analysis (LiveValues)? Correct?
In the LiveVariables.cpp this is not done. But it is possible to do it . Correct?

Any recommendations or ideas are appreciated.

Thanks,
Moataz

Hi Moataz,

The current live variables analysis implementation (in Clang, not LLVM) has not been optimized for anything but simplicity. There are probably areas where it can be significantly improved both in terms of functionality and performance.

All the the analyses in libAnalysis have been driven by immediate needs. So far there hasn't been a need for use-def chains in the static analyzer, so they have never been implemented.

If you are interested, I certainly encourage you to implement def-use chains yourself. Depending on what you find easiest, I wouldn't necessarily touch the current LiveVariables.cpp and instead start with a new analysis. It's up to you.

Cheers,
Ted

Hi Ted and Moataz,
Actually I have already implemented Def-Use chains in clang.

I didn't announce it in the mailing list because I thought DefUse chain analysis was not in the philosophy of Clang as all the analysis and optimizations are built in LLVM (from SSA code).
Anyway if the community is interested I can make the implementation available to you, my only concern is about the coding style. I saw you frequently use data containers (map, vector)
from the LLVM framework while I preferred to stick with the STL (because I am sure they are not buggy) :slight_smile:
My current implementations is based on the Kennedy's book algorithm (which is recursive). The analysis has been tested against several codes... I am sure it's still not perfect but it's working in most of the cases.

regards, Simone Pellegrini

Ted Kremenek wrote:

Hi Simone,

We'd be delighted if you wanted to make your DefUse implementation available for others to use. I should mention that for the code to be committed back to the Clang tree, it would still have to undergo code review like any other patch. If you don't want to go through this process, simply making the code available would be great!

To comment on your point about STL and LLVM data structures, we tend to use LLVM data structures (such as DenseMap) instead of those in the STL because of their vastly improved performance characteristics. Most of the LLVM data structures have direct analogues to STL data structures (e.g., DenseMap and HashMap), so swapping one for the other is usually trivial. We're not opposed to using STL data structures when they are clearly the best solution.

Regards,
Ted

Hi again,
Yes sure, I can make the code available. BTW I am also interested in this code review process, how it works?
Are there some coding rules to follow? Where can I find it?

Regards, Simone

Hi again,
Yes sure, I can make the code available. BTW I am also interested in this
code review process, how it works?

Sure, this is available online.

http://llvm.org/docs/DeveloperPolicy.html

Are there some coding rules to follow? Where can I find it?

There are.

http://llvm.org/docs/CodingStandards.html

Hope that helps,
Gordon

Thanks Ted and Simone for your replies I didn't have the chance to see your emails earlier.

Sure, your implementation will be useful

Best regards,
Moataz

Hello again,
I am trying to solve some issues with the copyright of the code. Is it possible to commit the code in the main CLANG branch and state (somewhere) the name of the university where the code has been developed?

regards, Simone

Ted Kremenek wrote:

We usually do this in the CREDITS.TXT file.

-Chris

Hello again,
I guess I solved the issue with my university, everything is fine if our contributions will be listed in the CREDITS.txt file.

So, once I modified the code to be compliant with the LLVM coding rules, where or to who should I send the code for the reviewing process?

cheers, Simone

Chris Lattner wrote:

Simone Pellegrini wrote:

Hello again,
I guess I solved the issue with my university, everything is fine if our contributions will be listed in the CREDITS.txt file.
  
Excellent.

So, once I modified the code to be compliant with the LLVM coding rules, where or to who should I send the code for the reviewing process?
  
To this list, preferably.

John.

We actually prefer patches to be sent to cfe-commits, but cfe-dev is fine too:

   http://clang.llvm.org/hacking.html#patches

Hi again,
Here it is my current implementation of DefUse chain analysis. I adjust the code to be compliant with some of the clang rules (I had some problem in switching from std::map to llvm::DenseMap and I will try to better investigate this issue in the future).

So as this is not a patch, but a complete new feature I don't know where it should be placed in the source tree (clang/Analysis seems fine), so I will let u decide. The analysis is composed by 2 files DefUse.h and DefUse.cpp which represent respectively the interface and the implementation of the DefUse analysis. The usage of DefUse is very similar to CFG. You build the all DefUse chains for a specific piece of code (usually a function body) and then u start querying the object.

#include "DefUse.h"
using namespace defuse;

clang::FunctionDecl* fd = ...
DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx);

now if we have a reference to a variable inside the function body (usually a DeclRefExpr) we can query the DefUse object to know if the VarRef is a "definition" or an "use":

DeclRefExpr* DR = ...
DU->isDef(DR);
DU->isUse(DR);

Now if our variable is a "definition" for example we can iterate among all the uses referred to the definition in the following way:
for(DefUse::uses_iterator I = DU->uses_begin(DR), E = DU->uses_end(); I != E, I++){
    DeclRefExpr* U = *I;
    ...
}

If the variable is a use, we can iterate among all the definitions:
for(DefUse::defs_iterator I = DU->defs_begin(DR), E = DU->defs_end(); I != E, I++){
   ...
}
here the problem is that a definition can be either a DeclRefExpr in situations like:
-> a = s; // this is a definition for var. a

or a VarDecl in situations like:
-> int a; // this is a definition for variable a

For these reasons I have introduced a wrapper class called DefUseNode (the name can be changed) that can contains VarDecls or DeclRefExprs.

And that's the basic usage of DefUse.

A more advanced use is also possible, I don't want to go into the details of the algorithm used to compute the defuse chains, but you can imagine it divided into 2 steps. In the first steps all the variable references are collected and marked as definitions or uses depending on the semantic of the instruction and then, in the second step, this informations are elaborated in order to build the graph.
We can discuss about it but It comes out that marking the use of a variable can depends from the context, and the semantics can be different. For example if we are calling a function and the variable is passed as a reference or pointer we have to assume that this is a definition for the variable, but if we know the semantics of the function and we are sure the variable is not modified in the function body we can "risk" a little and mark the variable as an use. For this reason different semantics can be defined by the user in order to cover specific cases.

This is possible by extending the DefUseBasicHelper. The basic helper already implements the default semantics (a very conservative one) but it exposes some virtual methods that can be extended by the user who wants to implement a different behavior. The helper should be passed to the BuildDefUseChain method in this way:

DefUseCustomHelper helper;
DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx, &helper);
...

At last, I added a class to test the algorithm DefUseChainTest. I produces a lot of output (badly formatted) but you can easily see how ( and if :slight_smile: ) the whole analysis works just by:
defuse::DefUseChainTest Consumer(llvm::outs());
...
ParseAST(PP, Consumer, *ContextOwner, false);

The test class is also meant to be an example of how to use the DefUse.

N.B.: As I already stated in my previous emails it is important (when and if the patch will be committed in the CLANG source tree) the institution where I am working, the "University of Innsbruck", and if possible my name (Simone Pellegrini) appear in the CREDITS.TXT file as contributors for this type of analysis.

have fun with DefUses, cheers Simone

P.S.: Unfortunately next week I will be on a business trip so not reachable. I will reply and address your comments/feedbacks as soon as I am back in the office.

Ted Kremenek wrote:

DefUse.h (7.71 KB)

DefUse.cpp (31.1 KB)

Hi again,
Here it is my current implementation of DefUse chain analysis. I adjust the code to be compliant with some of the clang rules (I had some problem in switching from std::map to llvm::DenseMap and I will try to better investigate this issue in the future).

So as this is not a patch, but a complete new feature I don't know where it should be placed in the source tree (clang/Analysis seems fine), so I will let u decide. The analysis is composed by 2 files DefUse.h and DefUse.cpp which represent respectively the interface and the implementation of the DefUse analysis. The usage of DefUse is very similar to CFG. You build the all DefUse chains for a specific piece of code (usually a function body) and then u start querying the object.

#include "DefUse.h"
using namespace defuse;

clang::FunctionDecl* fd = ...
DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx);

now if we have a reference to a variable inside the function body (usually a DeclRefExpr) we can query the DefUse object to know if the VarRef is a "definition" or an "use":

DeclRefExpr* DR = ...
DU->isDef(DR);
DU->isUse(DR);

Now if our variable is a "definition" for example we can iterate among all the uses referred to the definition in the following way:
for(DefUse::uses_iterator I = DU->uses_begin(DR), E = DU->uses_end(); I != E, I++){
  DeclRefExpr* U = *I;
  ...
}

If the variable is a use, we can iterate among all the definitions:
for(DefUse::defs_iterator I = DU->defs_begin(DR), E = DU->defs_end(); I != E, I++){
...
}
here the problem is that a definition can be either a DeclRefExpr in situations like:
-> a = s; // this is a definition for var. a

or a VarDecl in situations like:
-> int a; // this is a definition for variable a

For these reasons I have introduced a wrapper class called DefUseNode (the name can be changed) that can contains VarDecls or DeclRefExprs.

And that's the basic usage of DefUse.

A more advanced use is also possible, I don't want to go into the details of the algorithm used to compute the defuse chains, but you can imagine it divided into 2 steps. In the first steps all the variable references are collected and marked as definitions or uses depending on the semantic of the instruction and then, in the second step, this informations are elaborated in order to build the graph.
We can discuss about it but It comes out that marking the use of a variable can depends from the context, and the semantics can be different. For example if we are calling a function and the variable is passed as a reference or pointer we have to assume that this is a definition for the variable, but if we know the semantics of the function and we are sure the variable is not modified in the function body we can "risk" a little and mark the variable as an use. For this reason different semantics can be defined by the user in order to cover specific cases.

This is possible by extending the DefUseBasicHelper. The basic helper already implements the default semantics (a very conservative one) but it exposes some virtual methods that can be extended by the user who wants to implement a different behavior. The helper should be passed to the BuildDefUseChain method in this way:

DefUseCustomHelper helper;
DefUse* DU = DefUse::BuildDefUseChains(fd->getBody(), astCtx, &helper);
...

At last, I added a class to test the algorithm DefUseChainTest. I produces a lot of output (badly formatted) but you can easily see how ( and if :slight_smile: ) the whole analysis works just by:
defuse::DefUseChainTest Consumer(llvm::outs());
...
ParseAST(PP, Consumer, *ContextOwner, false);

The test class is also meant to be an example of how to use the DefUse.

N.B.: As I already stated in my previous emails it is important (when and if the patch will be committed in the CLANG source tree) the institution where I am working, the "University of Innsbruck", and if possible my name (Simone Pellegrini) appear in the CREDITS.TXT file as contributors for this type of analysis.

have fun with DefUses, cheers Simone

P.S.: Unfortunately next week I will be on a business trip so not reachable. I will reply and address your comments/feedbacks as soon as I am back in the office.

Hi Simone,

Sorry for the delay in replying! The overall interface for this class looks pretty good. I have a bunch of comments with regards to the implementation and the coding style that we should iterate on before committing back to the Clang tree. My comments are inline with the code below. Feel free to reply whenever you have time.

As for where this code should live, I suggest:

include/clang/Analysis/Analyses/DefUse.h
lib/Analysis/DefUse.cpp

My comments reflect a first pass over this patch. There is a lot going on here, so I have focused only on a subset of the details.

//===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the declaration of DefUse analysis headers
//
//===----------------------------------------------------------------------===//

#ifndef DEF_USE_HPP_
#define DEF_USE_HPP_

Please use "LLVM_CLANG_DEFUSE_H" to avoid any conflict with other defuse implementations.

#include "llvm/ADT/DenseMap.h"

#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ParentMap.h"
#include "clang/AST/StmtVisitor.h"

Looks fine.

#include <exception>
#include <stdexcept>

We don't use C++ exceptions at all in LLVM, largely for performance reasons.

namespace clang {

// forward definitions
class CFG;
class CFGBlock;

namespace defuse {

//===------------------------- Exceptions -------------------------===//

struct NotDefExcpetion {
   Stmt const* stmt;
   NotDefExcpetion(Stmt const* stmt_): stmt(stmt_){}
};
struct NotUseExcpetion {
   DeclRefExpr const* expr;
   NotUseExcpetion(DeclRefExpr const* expr_): expr(expr_){}
};

These will need to be replaced with an alternate mechanism.

//===------------------------- DefUseNode -------------------------===//
// Keeps the information for a particular 'definition' or 'use' of a variable
// The structure is needed because a defuse chain can contains variable declarations
// as well as variable reference. As a DeclStmt can contains several declarations
// to address a particular variable we need to store its VarDecl or DeclRefExp
struct DefUseNode{
  enum NodeKind{ VarDecl, VarRef };
  enum UseKind{ Use, Def, UseDef };

  DefUseNode(clang::VarDecl const* decl): var_decl(decl), kind(VarDecl), usage(Def){}
  DefUseNode(DeclRefExpr const* ref, UseKind u = Use): var_ref(ref), kind(VarRef), usage(u){}

  clang::VarDecl const* getDecl() const;

  NodeKind const& getKind()const { return kind; }
  UseKind const& getUse() const { return usage; }

  clang::VarDecl const* getVarDecl() const {
    assert(kind==VarDecl); return var_decl;
  }
  DeclRefExpr const* getVarRef() const {
    assert(kind==VarRef); return var_ref;
  }

  bool operator==(DefUseNode const& n) const;

private:
  // a def-use node can be either a VarDecl or a DeclRefExpr
  union{
    clang::VarDecl const* var_decl;
    DeclRefExpr const* var_ref;
  };
  NodeKind kind;
  UseKind usage;
};

Since this 'struct' has private data, please make it a class instead:

http://llvm.org/docs/CodingStandards.html#ci_class_struct

//===------------------------- Typedefs -------------------------===//
class DefUseBlock;
class VarDeclMap;
typedef std::vector<DefUseBlock> DefUseData;
typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
typedef std::vector<DefUseNode> DefUseVect;
typedef std::vector<DefUseNode const*> VarRefsVect;

class DefUseBasicHelper;

//===------------------------- DefUse -------------------------===//

class DefUse{
  ASTContext const& ctx;
  DefUseData const* analysis_data;
  VarDeclMap const* decl_map;
  VarRefBlockMap const* block_map;
  unsigned const num_cfg_blocks;

  DefUse(ASTContext const& ctx_, DefUseData const* analysis_data_, VarDeclMap const* decl_map_,
          VarRefBlockMap const* block_map_, unsigned num_blocks):
    ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_), block_map(block_map_),
    num_cfg_blocks(num_blocks){ }

  bool isDef(DefUseNode const& n) const;

  struct iterator_impl{

To make MSVC happy, this should be a 'class'.

    DefUse const* du;
    DefUseNode const* node;

    struct iter{
      int block_id;
      DefUseVect::const_iterator block_it;
      iter(int blk_id): block_id(blk_id){}
    };

    iterator_impl(): du(NULL){ }
    iterator_impl(DefUse const* du_): du(du_){ }
    iterator_impl& operator++() { return inc(false); }
    iterator_impl& operator++(int) { return inc(false); }
    virtual iterator_impl& inc(bool) = 0;
  };
public:
  class uses_iterator: public std::iterator<std::input_iterator_tag, DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
                        public iterator_impl{
    iter iter_ptr;
    bool inDefBlock;

    uses_iterator(): iterator_impl(), iter_ptr(-1), inDefBlock(true){}
    uses_iterator(DefUse const* du, DefUseNode const& n);
    uses_iterator& inc(bool first);
    friend class DefUse;
  public:
    virtual bool operator!=(uses_iterator const& iter);

Does this need to be virtual?

    virtual DeclRefExpr const* operator*();

Does this need to be virtual?

  };

  class defs_iterator: public std::iterator<std::input_iterator_tag, DefUseNode, std::ptrdiff_t, DefUseNode const*>,
                        public iterator_impl{
    struct iter_: public iter{
      VarRefsVect::const_iterator reaches_it;
      iter_(int blk_id): iter(blk_id){}
    } iter_ptr;
    bool blockDef;

    defs_iterator(): iterator_impl(), iter_ptr(-1), blockDef(false) {}
    defs_iterator(DefUse const* du, DeclRefExpr const& n);
    defs_iterator& inc(bool first);
    friend class DefUse;
  public:
    bool operator!=(defs_iterator const& iter);
    DefUseNode const* operator*();
  };

  // USES //
  defs_iterator defs_begin(DeclRefExpr const* var) const;
  defs_iterator defs_end() const;

Please add doxygen comments for these methods.

  // DEFS //
  uses_iterator uses_begin(DeclRefExpr const* var) const;
  uses_iterator uses_begin(VarDecl const* var) const;
  uses_iterator uses_end() const;

Please add doxygen comments for these methods.

  bool isUse(DeclRefExpr const* var) const;
  bool isDef(DeclRefExpr const* var) const { return isDef(DefUseNode(var, DefUseNode::Def)); }
  bool isDef(VarDecl const* var) const { return isDef(DefUseNode(var)); }

  ~DefUse();

  static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx, DefUseBasicHelper* helper = 0,
                    CFG* cfg = 0, ParentMap* pm = 0, bool verbose = false,
                    llvm::raw_ostream& out = llvm::outs());
};

I think having a "defuse" namespace is fine (for all the special typedefs, etc.), but it should only contain the implementation details for DefUse. The DefUse class should be in the clang namespace.

//===------------------------- DefUseBasicHelper -------------------------===//

It seems like DefUseBasicHelper allows a bunch of hooks for subclasses via the 'HandleXXX' methods, but this level of hooks is not going to fair well from a performance standpoint. I believe that one base class can implement most of the HandleXXX logic without making those methods virtual (and thus dispense with the HandleXXX methods entirely and just have VisitXXX methods). I think most subclasses will need very few hooks, if any at all. I think it's worth identifying the one or two methods that need to be virtual and then leave the rest of them as non-virtual methods.

Also, if this truly is the base class, how about calling it 'DefUseHelper'? That seems more succinct and clear from an interface perspective.

class DefUseBasicHelper: public StmtVisitor<DefUseBasicHelper>{
  class DefUseBasicHelperImpl;
  DefUseBasicHelperImpl* pimpl;

  void InitializeValues(DefUseData* data, VarRefBlockMap* bm, VarDeclMap* dm);

  friend class DefUse;
public:
  DefUseNode::UseKind current_use;
  DefUseBasicHelper();

  virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to call the super class implementation of the method
  void HandleDeclStmt(DeclStmt *DS);

  virtual void HandleBinaryOperator(BinaryOperator* B);
  virtual void HandleConditionalOperator(ConditionalOperator* C);
  virtual void HandleCallExpr(CallExpr* C);
  virtual void HandleUnaryOperator(UnaryOperator* U);
  virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
  // ...

Is there a need for the '// ..' line?

  void VisitDeclRefExpr(DeclRefExpr *DR){ return HandleDeclRefExpr(DR); }
  void VisitDeclStmt(DeclStmt *DS){ return HandleDeclStmt(DS); }
  void VisitBinaryOperator(BinaryOperator* B){ return HandleBinaryOperator(B); }
  void VisitConditionalOperator(ConditionalOperator* C){ return HandleConditionalOperator(C); }
  void VisitCallExpr(CallExpr* C){ return HandleCallExpr(C); }
  void VisitUnaryOperator(UnaryOperator* U){ return HandleUnaryOperator(U); }
  void VisitArraySubscriptExpr(ArraySubscriptExpr* AS){ return HandleArraySubscriptExpr(AS); }
  // ...

Is there a need for the '// ..' line?

  void VisitCFGBlock(clang::CFGBlock const& blk, int root);

The 'root' argument is not necessary and should be removed (see comment below).

  void VisitStmt(Stmt* S);

  virtual ~DefUseBasicHelper();
};

//===------------------------- DefUseChainTest -------------------------===//

class DefUseChainTest: public StmtVisitor<DefUseChainTest>, public ASTConsumer {
  llvm::raw_ostream& out;
  ASTContext* ctx;
  DefUse const* du;

public:
  DefUseChainTest(llvm::raw_ostream& out_): out(out_), ctx(NULL), du(NULL) { }

  void Initialize(ASTContext& context) { ctx = &context; }
  void HandleTopLevelDecl(DeclGroupRef declGroupRef);
  // void HandleTranslationUnit(clang::ASTContext& context);

  void VisitDeclRefExpr(DeclRefExpr *DR);
  void VisitDeclStmt(DeclStmt *DS);
  void VisitStmt(Stmt* S);
};

void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx, llvm::raw_ostream& out);
void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx, llvm::raw_ostream& out);
void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext& ctx, llvm::raw_ostream& out);

} // end namespace defuse

The DefUseChainTest can be made entirely private in DefUse.cpp, and have a static function create it (and return an ASTConsumer). There is no need to have it in the public API interface.

} // end namespace clang

#endif /* DEF_USE_HPP_ */

Some compilers complain about the extra tokens after #endif. Please remove this comment.

//===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the implementation of DefUse analysis
//
//===----------------------------------------------------------------------===//

#define __STDC_LIMIT_MACROS
#define __STDC_CONSTANT_MACROS

What are these two defines used for?

#include "DefUse.h"

#include "clang/Basic/SourceManager.h"
#include "clang/Analysis/CFG.h"

#include <llvm/Support/raw_ostream.h>

#include <set>
#include <map>

Eventually the code should probably use DenseSet and DenseMap for performance reasons, as std::map and std::set are very slow and make heavy use of malloc(). I understand that a significant amount of code may need to be tweaked, but the performance gain can easily be 2-3x.

using namespace clang;
using namespace defuse;

// ASSERT MACRO
#define ASSERT(COND, MSG) do{ \
  std::string str; \
  llvm::raw_string_ostream ss(str); \
  ss << MSG; \
  assert((COND) && ss.str().c_str()); }while(false)

I don't think this will do what you intend. If I did:

ASSERT(false, 10)

I'd see:

   "false && ss.str().c_str()"

as the failed assertion condition in the terminal output.

One option to get better error reporting is to use the facilities in 'clang/Basic/PrettyStackTrace.h', which are used in various places in Clang. These can be used to print the location within the analyzed source that the assertion fails. In general we don't like specialized error reporting machinery for just one file (as in the case of 'ASSERT'), but like adding to the general error reporting machinery that all of LLVM/Clang can use.

I also don't think this ASSERTION macro adds much value and causes some of the client code below to be structured in some slightly suboptimal ways.

// utility functions
unsigned Line(SourceLocation const& l, clang::SourceManager const& sm) {
  PresumedLoc pl = sm.getPresumedLoc(l);
  return pl.getLine();
}

unsigned Column(SourceLocation const& l, SourceManager const& sm) {
  PresumedLoc pl = sm.getPresumedLoc(l);
  return pl.getColumn();
}

Instead of adding these functions, we can add 'getPresumedLoc' to FullSourceLoc and then use the 'getColumn()' and 'getLine()' methods from that class.

std::string PrintClangStmt(Stmt const* stmt, ASTContext const& ctx){
  std::string str;
  llvm::raw_string_ostream pp(str);
  stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
  return pp.str();
}

I don't think this function is needed. If you need it, please declare it static to keep it private to this file.

//===------------------------- DefUseNode -------------------------===//
inline bool defuse::DefUseNode::operator==(DefUseNode const& n) const{
  if(n.usage != usage || n.kind != kind)
    return false;

Please add a space between 'if' and the '('. This is the coding style used throughout LLVM.

Also, please use 2 space indentation like the rest of LLVM and Clang.

  if(kind == VarDecl)
    return var_decl == n.var_decl;
  else
    return var_ref == n.var_ref;
  return false;
}

The 'return false' is dead code. Please remove it.

This can be more succinctly written as:

   return kind == VarDecl ? var_decl == n.var_decl : var_Ref == n.var_ref;

Alternatively:

   if (kind == VarDecl)
     return var_decl == n.var_decl;

   return var_ref == n.var_ref;

inline clang::VarDecl const* DefUseNode::getDecl() const{
  if(kind == VarRef)
    return cast<clang::VarDecl>(var_ref->getDecl());
  return cast<clang::VarDecl>(var_decl);
}

Same as the previous comment.

   if (kind == VarRef)
     return cast<VarDecl>(var_ref->getDecl());

   return cast<VarDecl>(var_decl);

or:

   return kind == VarRef ? cast<VarDecl>(var_ref->getDecl()) : cast<VarDecl>(var_decl);

typedef std::set<DefUseNode const*> VarRefsSet;
typedef std::set<VarDecl const*> VarDeclsSet;
typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

Please use llvm::DenseSet and llvm::DenseMap. You'll need to use pointers to DefUseVects instead (and then destroy them), but the speed improvement will be significant.

//===------------------------- DefUseBlock -------------------------===//

struct defuse::DefUseBlock: public DefUseChain{

This should be 'class' instead of 'struct'. You declared it earlier as a 'class'. While gcc will eat this, MSVC will not. Also, since this isn't POD, it should be a class anyway.

  VarDeclsSet uses;
  VarRefsVect defsout;
  VarDeclsSet killed;
  VarRefsVect reaches;

  class not_killed_set_iterator: public std::iterator<std::input_iterator_tag, DefUseNode, std::ptrdiff_t, DefUseNode const*>{
    DefUseBlock& block;
    VarRefsVect::const_iterator ptr_it;

    not_killed_set_iterator& inc(bool first){
      if(ptr_it != block.reaches.end() && !first)
        ptr_it++;

Please use '++ptr_it' instead of 'ptr_it++'. While it makes no difference in this case, general speaking it performs better and there is no reason to make any assumptions about the underlying data structure and how the iterators are implemented.

      while(ptr_it != block.reaches.end() &&
          (block.killed.find((*ptr_it)->getDecl()) != block.killed.end() &&
              std::find(block.defsout.begin(), block.defsout.end(), *ptr_it) == block.defsout.end()))
        ptr_it++;
      return *this;
    }
  public:
    not_killed_set_iterator(DefUseBlock& block_, VarRefsVect::const_iterator ptr_it_): block(block_), ptr_it(ptr_it_){ inc(true); }
    not_killed_set_iterator& operator++() { return inc(false); }
    not_killed_set_iterator& operator++(int) { return inc(false); }

Do we even need an operator++(int)? By disallowing it people won't accidentally use it. This also isn't the standard semantics of this operator, so by implementing it this way clients will silently be doing the wrong thing instead of getting a compilation error.

    bool operator!=(not_killed_set_iterator const& iter) { return !((&iter.block) == &(block) && iter.ptr_it == ptr_it); }

This method can (and should) be declared const.

    const DefUseNode* const& operator*() const { return *ptr_it; }
  };

  not_killed_set_iterator begin_not_killed_set(){ return DefUseBlock::not_killed_set_iterator(*this, reaches.begin()); }
  not_killed_set_iterator end_not_killed_set(){ return DefUseBlock::not_killed_set_iterator(*this, reaches.end()); }

Please add a space between () and {

};

//===------------------------- VarDeclMap -------------------------===//

struct defuse::VarDeclMap: public StmtVisitor<VarDeclMap>, public llvm::DenseMap<VarDecl const*, DeclStmt const*> {

LLVM has an 80 column coding convention:

http://llvm.org/docs/CodingStandards.html#scf_codewidth

Please wrap lines that exceed this length.

  VarDeclMap(Stmt const* body){ VisitStmt(const_cast<Stmt*>(body)); }

  void VisitDeclStmt(DeclStmt *DS){
    for(DeclStmt::decl_iterator it = DS->decl_begin(); it != DS->decl_end(); it++){
      if((*it)->getKind() != Decl::Var)
        continue;

Use dyn_cast instead of getKind.

      insert(std::make_pair(cast<VarDecl>(*it), DS));
    }
  }
  void VisitStmt(Stmt* S){
    for(Stmt::child_iterator I = S->child_begin(), E = S->child_end(); I != E; ++I)
      if(*I) Visit(*I);
  }
};

Please make this is a class, as the earlier declaration was also a class.

Please also add a space between 'for' and '('. This is the LLVM coding convention.

//===------------------------- DefUseBasicHelper -------------------------===//

struct defuse::DefUseBasicHelper::DefUseBasicHelperImpl{
  DefUseData* data;
  VarRefBlockMap* bm;
  VarDeclMap* dm;
  unsigned blockID;

  DefUseBasicHelperImpl(): data(NULL), bm(NULL), dm(NULL){}
};
defuse::DefUseBasicHelper::DefUseBasicHelper(): pimpl(new DefUseBasicHelperImpl), current_use(DefUseNode::Use){}
defuse::DefUseBasicHelper::~DefUseBasicHelper(){ delete pimpl; }

void defuse::DefUseBasicHelper::InitializeValues(DefUseData* data, VarRefBlockMap* bm, VarDeclMap* dm){
  pimpl->data = data; pimpl->bm = bm; pimpl->dm = dm;
}

void defuse::DefUseBasicHelper::HandleDeclRefExpr(DeclRefExpr *DR){
  VarRefBlockMap& bm = *pimpl->bm;
  unsigned blockID = pimpl->blockID;
  DefUseBlock& block_data = (*pimpl->data)[blockID];
  if(DR->getDecl()->getKind() == Decl::Var || DR->getDecl()->getKind() == Decl::ParmVar){

The condition is overly verbose. We also discourage the use of 'getKind' unless you are using it in a switch statement.

Suggestion:

   Decl *D = DR->getDecl();
   if (isa<VarDecl>(D)) {
    ...
   }

isa<> uses the value returned from VarDecl::classof() to determine if 'D' is a VarDecl.

    VarDecl* VD = cast<VarDecl>(DR->getDecl());

Actually, instead of using isa, here is a better style:

if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl()) {
   ...
}

The dyn_cast combines 'isa' with 'cast', and returns a NULL pointer if isa would return false. This is the encouraged style.

    // map this variable to the current block
    bm[DR] = blockID;
    if(block_data.find(VD) == block_data.end()){
      block_data.insert(std::make_pair(VD, DefUseVect()));
      block_data.uses.insert(VD); // this variable is used in the current block and it has
                      // no prior definition within the block

If you end up using a DenseMap, this code can be made faster by avoiding the double lookup (the first for block_data.find() and the second for the insertion):

    DefUseBlock *&B = block_data[VD];
    if (!B) {
     B = new DefUseBlock();
     B->users.insert(VD);
   }

Using a DenseMap (which involves changing the data value to a pointer), the first line involves a quick hashtable lookup and creates a NULL pointer entry if the key could not be found. The code that follows conjures up the new DefUseBlock without having to do a second search of the map.

    }
    if(current_use == DefUseNode::UseDef){ // in case of compound operators
      block_data[VD].push_back(DefUseNode(DR, DefUseNode::Use));
      block_data[VD].push_back(DefUseNode(DR, DefUseNode::Def));

These lines cause two identical (and redundant) lookups to block_data. Consider:

   DefUseBlock &B = block_data[VD];
   B.push_back(...);

    }else
      block_data[VD].push_back(DefUseNode(DR, current_use));
  }
}

Please add spaces between 'if' and '(' and 'else' and '{'.

void defuse::DefUseBasicHelper::HandleDeclStmt(DeclStmt *DS){
  for(DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end(); I != E; I++){
    if((*I)->getKind() != Decl::Var)
      continue;
    VarRefBlockMap& bm = *pimpl->bm;
    VarDeclMap& dm = *pimpl->dm;
    unsigned blockID = pimpl->blockID;
    DefUseBlock& block_data = (*pimpl->data)[blockID];

    VarDecl* VD = cast<VarDecl>(*I);
    if(block_data.find(VD) != block_data.end())
      ASSERT(true, "Variable '" << VD->getNameAsString() << "' redefined in the same block");

Why not do:

    assert(block_data.find(VD) != block_data.end() && "Variable redefined in the same block");

It is far more succinct, doesn't add more code in a Release build (as ASSERT does), and if we really care what VD is we can run this code in the debugger (which we would likely need to do anyway). It also reduces the number of lookups in the map in a Release build from two to one.

void defuse::DefUseBasicHelper::HandleBinaryOperator(BinaryOperator* B){
  DefUseNode::UseKind backup = current_use; // backup the usage
  if(B->isAssignmentOp()){
    current_use = DefUseNode::Use;
    Visit(B->getRHS());
    current_use = DefUseNode::Def;
    if(B->isCompoundAssignmentOp())
      current_use = DefUseNode::UseDef;
    Visit(B->getLHS());
  }else{
    Visit(B->getLHS());
    current_use = DefUseNode::Use;
    Visit(B->getRHS());
  }
  current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseBasicHelper::HandleConditionalOperator(ConditionalOperator* C){ }

void defuse::DefUseBasicHelper::HandleCallExpr(CallExpr* C) {
  DefUseNode::UseKind backup = current_use; // backup the usage
  for(CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end(); I != E; ++I){
    if((*I)->getType()->isPointerType() || (*I)->getType()->isReferenceType())
      current_use = DefUseNode::Def;
    Visit(*I);
    current_use = backup;
  }
}

void defuse::DefUseBasicHelper::HandleUnaryOperator(UnaryOperator* U){
  DefUseNode::UseKind backup = current_use; // backup the usage
  switch(U->getOpcode()){
  case UnaryOperator::PostInc:
  case UnaryOperator::PostDec:
  case UnaryOperator::PreInc:
  case UnaryOperator::PreDec:
    current_use = DefUseNode::UseDef;
    break;
  case UnaryOperator::Plus:
  case UnaryOperator::Minus:
  case UnaryOperator::Not:
  case UnaryOperator::LNot:
    current_use = DefUseNode::Use;
    break;
  case UnaryOperator::AddrOf:
  case UnaryOperator::Deref:
    // use the current_use
    break;
  default:
    // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode()) << " not supported in def-use analysis");
    break;
  }
  Visit(U->getSubExpr());
  current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseBasicHelper::HandleArraySubscriptExpr(ArraySubscriptExpr* AS){
  DefUseNode::UseKind backup = current_use; // backup the usage
  Visit(AS->getBase());
  current_use = DefUseNode::Use;
  Visit(AS->getIdx());
  current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseBasicHelper::VisitCFGBlock(CFGBlock const& blk, int root){
  pimpl->blockID = blk.getBlockID();
  for(CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I != E; I++)
    Visit(*I);

  unsigned blockID = pimpl->blockID;
  DefUseBlock& block_data = (*pimpl->data)[blockID];

  // Post processing
  // build up the uses(b), defsout(b) and killed(b)

  // for each variable
  for(DefUseBlock::iterator I = block_data.begin(), E = block_data.end(); I != E; ++I){
      // in case of references to ParmVar what we do is insert the variable definition in the first block of the CFG
      DefUseBlock& root_data = (*pimpl->data)[root];
       if(I->first->getKind() == Decl::ParmVar &&
               std::find(root_data[I->first].begin(), root_data[I->first].end(), DefUseNode(I->first)) == root_data[I->first].end()){
           root_data[I->first].push_back(DefUseNode(I->first));
           root_data.defsout.push_back(&root_data[I->first].back());
       }
    // if this is a local variable, should't be in the defsout,killed sets
    // defsout is calculated by iterating backwards to find the last use of the variable
    for(DefUseVect::const_reverse_iterator UI = I->second.rbegin(), UE = I->second.rend(); UI != UE; UI++)
      if(UI->getUse() == DefUseNode::Def && (block_data.defsout.empty() >> block_data.defsout.back()->getDecl() != I->first)){
        block_data.defsout.push_back(&(*UI));
        block_data.killed.insert(UI->getDecl());
        break;
      }
  }
}

void defuse::DefUseBasicHelper::VisitStmt(Stmt* S){
  for(Stmt::child_iterator I = S->child_begin(), E = S->child_end(); I != E; ++I)
    if(*I) Visit(*I);
}

bool ContainsStmt(Stmt const* block, Stmt const* stmt){
  if(block == stmt)
    return true;
  for(Stmt::const_child_iterator I = block->child_begin(), E = block->child_end(); I != E; ++I)
    if(*I){
      if(ContainsStmt(*I, stmt)) return true;
    }

How about:

   if (*I && ContainsStmt(*I, stmt))
     return true;

  return false;
}

Stmt const* GetEnclosingBlock(ParentMap const& pm, Stmt const* stmt){
  Stmt const* E = stmt;
  while(E){
    Stmt const* cond = NULL;
    switch(E->getStmtClass()){
    case Stmt::IfStmtClass:
      cond = cast<IfStmt>(E)->getCond();
      break;
    case Stmt::WhileStmtClass:
      cond = cast<WhileStmt>(E)->getCond();
      break;
    case Stmt::DoStmtClass:
      cond = cast<DoStmt>(E)->getCond();
      break;
    case Stmt::ForStmtClass:
      cond = cast<ForStmt>(E)->getCond();
      break;
    case Stmt::SwitchStmtClass:
      cond = cast<SwitchStmt>(E)->getCond();
      break;
    case Stmt::CompoundStmtClass:
      return E;
    default:
      break;
    }
    if(cond && !ContainsStmt(cond, stmt))
      return E;
    E = pm.getParent(E);
  }
  return NULL;
}

void RemoveLocalDefs(CFG const& cfg, ParentMap const& pm, VarDeclMap& decl_vars, DefUseData& data){
  for(CFG::const_reverse_iterator it = cfg.rbegin(), cend = cfg.rend(); it != cend; it++) {
    // every time we find a node with more than 1 predecessor it means
    // we are at a junction point and the local variable defined in the previuos
    // blocks must be removed from the defsout set
    if(it->pred_size() <= 1)
      continue;
    // for each predecessor
    for(CFGBlock::const_pred_iterator pit = it->pred_begin(), end = it->pred_end(); pit != end && (*pit)->size(); pit++){
      DefUseBlock& block_data = data[(*pit)->getBlockID()];
      for(VarRefsVect::iterator dit = block_data.defsout.begin(); dit != block_data.defsout.end(); dit++){
        // we have to be sure the element we are using to see if the Decl is in the same block of block statements
        // is also present in the AST, so if the first statement of the CFG block is a VarDecl, we use the VarDeclMap
        // to retrieve the corresponding AST node
        Stmt const* testStmt = (*pit)->front();
        if(testStmt->getStmtClass() == Stmt::DeclStmtClass)
          testStmt = decl_vars[cast<VarDecl>(cast<DeclStmt>(testStmt)->getSingleDecl())];

Use 'dyn_cast' or 'isa' instead of 'getStmtClass()'

        // if the variable is local to one of the prec blocks and we are at a junction point
        // it means the variable must be removed from the defsout set
        if(GetEnclosingBlock(pm, decl_vars[(*dit)->getDecl()]) == GetEnclosingBlock(pm, testStmt)){
          // it can be the case the following block is in the same scope of the previous block (for example if stmt without else)
          Stmt const* testStmt = it->front();
          if(testStmt->getStmtClass() == Stmt::DeclStmtClass)
            testStmt = decl_vars[cast<VarDecl>(cast<DeclStmt>(testStmt)->getSingleDecl())];

Use 'dyn_cast' or 'isa' instead of 'getStmtClass()'

          if(GetEnclosingBlock(pm, decl_vars[(*dit)->getDecl()]) != GetEnclosingBlock(pm, testStmt)){
            block_data.defsout.erase(dit--);
          }
        }
      }
    }
  }
}

// implements reaching definitions according to Kennedy's book iterative algorithm
void ComputeReaches(CFG const& cfg, DefUseData& data){
  bool changed = true;
  while(changed){
    changed = false;
    VarRefsSet newreaches;
    for(CFG::const_reverse_iterator it = cfg.rbegin(); it != cfg.rend(); it++) {
      DefUseBlock& BlockData = data[it->getBlockID()];
      newreaches = VarRefsSet();
      for(CFGBlock::const_pred_iterator pit = it->pred_begin(), end = it->pred_end(); pit != end; pit++){
        unsigned p = (*pit)->getBlockID();
        // s(p) = reaches(p) - killed(p)
        VarRefsSet s, temp_set, temp_set_1;
        copy(data[p].begin_not_killed_set(), data[p].end_not_killed_set(), inserter(s, s.begin()));
        // tmp_set(p) = defsout(p) U s(p)
        set_union(s.begin(), s.end(), data[p].defsout.begin(), data[p].defsout.end(), inserter(temp_set, temp_set.begin()));
        // tmp_set_1 = newreaches U tmp_set(p)
        set_union(newreaches.begin(), newreaches.end(), temp_set.begin(), temp_set.end(), inserter(temp_set_1, temp_set_1.begin()));
        newreaches.swap(temp_set_1);
      }
      if(BlockData.reaches.size() != newreaches.size()){
        BlockData.reaches = VarRefsVect(newreaches.begin(), newreaches.end());
        changed = true;
      }
    }
  }
}

//===------------------------- Printing utilities -------------------------===//

struct printer: public std::iterator<std::input_iterator_tag, printer, std::ptrdiff_t, printer>{
  printer(llvm::raw_ostream& x, ASTContext const& ctx_, const char* s = "") : o(x), ctx(ctx_), delim(s) { }
  template<typename T>
  printer& operator=(const T& x);
  printer& operator*() { return *this; }
  printer& operator++() { return *this; }
  printer& operator++(int) { return *this; }

  mutable llvm::raw_ostream& o;
  ASTContext const& ctx;
  const char* delim;
};

template<>
printer& printer::operator=<VarDecl const*>(const VarDecl* const& x){
  o << x->getNameAsString() << delim;
  return *this;
}

template<>
printer& printer::operator=<DefUseNode const*>(const DefUseNode* const& x){
  SourceLocation loc;
  if(x->getKind() == DefUseNode::VarRef)
    loc = x->getVarRef()->getLocStart();
  else
    loc = x->getVarDecl()->getTypeSpecStartLoc();
  o << x->getDecl()->getNameAsString() << " " << "(" << Line(loc, ctx.getSourceManager()) << ","
       << Column(loc, ctx.getSourceManager()) << " "
       << ((x->getUse() == DefUseNode::Use)?"USE":"DEF") << ")" << delim;
  return *this;
}

template<>
printer& printer::operator=<DefUseNode>(DefUseNode const& x){ return this->operator=(&x); }

void printDefUseData(llvm::raw_ostream& out, ASTContext const& ctx, DefUseData const& data){
  out << "--------------- PRINTING ANALYSIS DATA ----------------------\n";
  unsigned blockID = data.size() - 1;
  for(DefUseData::const_reverse_iterator it = data.rbegin(); it != data.rend(); it++, blockID--){
    out << "* BlockID '" << blockID << "' uses vars:\n";
    for(DefUseBlock::const_iterator pit = it->begin(); pit != it->end(); pit++){
      out << "'" << pit->first->getNameAsString() << "':\n\t";
      std::copy( pit->second.begin(), pit->second.end(), printer( out, ctx, ", " ) );
      out << "\n";
    }
  }

  blockID = data.size() - 1;
  for(DefUseData::const_reverse_iterator it = data.rbegin(); it != data.rend(); it++, blockID--) {
    DefUseBlock const& BlockData = *it;
    out << "---------------- Block: " << blockID << " ----------------\n";
    out << "# uses(" << blockID << "): {";
    copy( BlockData.uses.begin(), BlockData.uses.end(), printer( out, ctx, ", " ) );
    out << "}\n";
    out << "# defsout(" << blockID << "): {";
    copy( BlockData.defsout.begin(), BlockData.defsout.end(), printer( out, ctx, ", " ) );
    out << "}\n";
    out << "# killed(" << blockID << "): {";
    copy( BlockData.killed.begin(), BlockData.killed.end(), printer( out, ctx, ", " ) );
    out << "}\n";
    out << "# reaches(" << blockID << "): {";
    copy( BlockData.reaches.begin(), BlockData.reaches.end(), printer( out, ctx, ", " ) );
    out << "}\n";
  }
}

//===------------------------- DefUse -------------------------===//

DefUse::~DefUse(){ delete analysis_data; delete decl_map; delete block_map; }

bool isDef(ASTContext const& ctx, DefUseNode const& n, int* blockID, DefUseVect::const_iterator* iter,
    DefUseData const* analysis_data, VarDeclMap const* decl_map, VarRefBlockMap const* block_map, Stmt const** stmt = NULL){
  // retrieve the blockID of the CFG where the statement N (which can be wither a VarDecl or a DeclRefExpr) is placed
  Stmt const* _stmt_;
  if(n.getKind() == DefUseNode::VarDecl){
    VarDecl const* vd = n.getVarDecl();
    VarDeclMap::const_iterator it = decl_map->find(vd);
    // we use the decl_map to translate the VarDecl to the corresponding DeclStmt
    // this is useful when DeclStmts from the CFG (which usually are different from the original ones) are used
    ASSERT(it != decl_map->end(), "Variable declaration for '" << vd->getNameAsString() << "' not mapped to any statement in the syntax tree");
    _stmt_ = it->second;
  }else
    _stmt_ = n.getVarRef();
  assert(_stmt_ != NULL);
  VarRefBlockMap::const_iterator it = block_map->find(_stmt_);
  ASSERT(it != block_map->end(), "Statement '" << PrintClangStmt(_stmt_, ctx) << "' not mapped to any block of the CFG");

  // We have to find the definition inside the block
  DefUseBlock::const_iterator data_it = (*analysis_data)[it->second].find(n.getDecl());
  // if we don't find the variable among the variable used or defined in the block it means something went wrong
  ASSERT(data_it != (*analysis_data)[it->second].end(), "Block ID: " << it->second << " doesn't contains usage or definitions of variable " << n.getDecl()->getNameAsString());

  // we find a Variable definition, now we have to iterate through the DefUseNodes in order to find the
  // the node corresponding to the n variable passed by the user
  DefUseVect::const_iterator _iter_ = std::find(data_it->second.begin(), data_it->second.end(), n);
  if(blockID) *blockID = it->second;
  if(iter) *iter = _iter_;
  if(stmt) *stmt = _stmt_;
  return _iter_ != data_it->second.end();
}

bool isUse(ASTContext const& ctx, clang::DeclRefExpr const* var, int* blockID, DefUseVect::const_iterator* iter,
    DefUseData const* analysis_data, VarRefBlockMap const* block_map){
  Stmt const* stmt = var;
  VarRefBlockMap::const_iterator it = block_map->find(stmt);
  ASSERT(it != block_map->end(), "Statement " << PrintClangStmt(stmt, ctx) << " not mapped to any block of the CFG!");

  DefUseBlock::const_iterator data_it = (*analysis_data)[it->second].find(cast<VarDecl>(var->getDecl()));
  ASSERT(data_it != (*analysis_data)[it->second].end(),
    "Block ID: " << it->second << " doesn't contains usage or definitions of variable " << var->getDecl()->getNameAsString());

  DefUseVect::const_iterator _iter_ = std::find(data_it->second.begin(), data_it->second.end(), DefUseNode(var, DefUseNode::Use));
  if(blockID) *blockID = it->second;
  if(iter) *iter = _iter_;
  return _iter_ != data_it->second.end();
}

//===------------------------- uses_iterator -------------------------===//

DefUse::uses_iterator::uses_iterator(DefUse const* du, DefUseNode const& n):
  DefUse::iterator_impl(du), iter_ptr(-1), inDefBlock(true)
{
  Stmt const* stmt = NULL;
  if(!::isDef(du->ctx, n, &iter_ptr.block_id, &iter_ptr.block_it, du->analysis_data, du->decl_map, du->block_map, &stmt))
    throw NotDefExcpetion(stmt); // there is no definition for the DeclRefExpr passed by the user
                    // this means he trying to list the uses of a declrefexpr which is a use!
                    // an excpetion is thrown.
  node = &(*(iter_ptr.block_it));
  inc(true);
}

bool DefUse::uses_iterator::operator!=(uses_iterator const& iter){
  return !(iter_ptr.block_id == iter.iter_ptr.block_id && iter.iter_ptr.block_it == iter.iter_ptr.block_it);
}

DeclRefExpr const* DefUse::uses_iterator::operator*() {
  assert(iter_ptr.block_it->getKind() == DefUseNode::VarRef);
  return iter_ptr.block_it->getVarRef();
}

DefUse::uses_iterator& DefUse::uses_iterator::inc(bool first){
  if(iter_ptr.block_id == -1) return *this;
  DefUseBlock const& m = (*du->analysis_data)[iter_ptr.block_id];
  DefUseBlock::const_iterator it = m.find(node->getDecl());
  assert(it != m.end());
  if(iter_ptr.block_it != it->second.end())
    iter_ptr.block_it++;

  if(iter_ptr.block_it != it->second.end() && iter_ptr.block_it->getDecl() == node->getDecl()){
    if(iter_ptr.block_it->getUse() == DefUseNode::Use)
      return *this; // this is a use of the variable
    else if(inDefBlock){
      // found a new definition, stop iterating
      iter_ptr.block_it = DefUseVect::const_iterator();
      iter_ptr.block_id = -1;
      return *this;
    }
  }
  // if variable in defsout we have to check successor blocks
  if(inDefBlock && std::find(m.defsout.begin(), m.defsout.end(), node) == m.defsout.end()){
    iter_ptr.block_it = DefUseVect::const_iterator();
    iter_ptr.block_id = -1; // it means there are no other uses of this variable
    return *this;
  }
  // We have to change block
  // the following code has the precondition the blocks are numbered in a decrescent order
  // we lock into the block_id and see if the decl is in the reaches() set or not
  int new_block = iter_ptr.block_id;
  while(--new_block >= 0){
    DefUseBlock const& m = (*du->analysis_data)[new_block];
    if(std::find(m.reaches.begin(), m.reaches.end(), node) != m.reaches.end()){
      inDefBlock = false;
      DefUseBlock::const_iterator data_it = (*du->analysis_data)[new_block].find(node->getDecl());
      if(data_it != (*du->analysis_data)[new_block].end())
        if(data_it->second.begin()->getUse() == DefUseNode::Use){
          iter_ptr.block_id = new_block;
          iter_ptr.block_it = data_it->second.begin();
          return *this;
        }
    }
  }
  iter_ptr.block_it = DefUseVect::const_iterator();
  iter_ptr.block_id = -1;
  return *this;
}

//===------------------------- defs_iterator -------------------------===//

DefUse::defs_iterator::defs_iterator(DefUse const* du, DeclRefExpr const& n):
  iterator_impl(du), iter_ptr(-1), blockDef(false)
{
  if(!::isUse(du->ctx, &n, &iter_ptr.block_id, &iter_ptr.block_it, du->analysis_data, du->block_map))
    throw NotUseExcpetion(&n); // the DeclRefExpr passed by the user is never used in this block but only defined,
                   // the user is trying to iterate through the definitions of a definition... which doesn't make sense
                   // an exception must be thrown
  iter_ptr.reaches_it = (*du->analysis_data)[iter_ptr.block_id].reaches.begin();
  node = &(*(iter_ptr.block_it));
  inc(true);
}
bool DefUse::defs_iterator::operator!=(defs_iterator const& iter){
  return !(iter_ptr.block_id == iter.iter_ptr.block_id &&
      iter.iter_ptr.block_it == iter.iter_ptr.block_it &&
      iter.iter_ptr.reaches_it == iter_ptr.reaches_it);
}

DefUseNode const* DefUse::defs_iterator::operator*() {
  DefUseBlock::const_iterator it = (*du->analysis_data)[iter_ptr.block_id].find(node->getDecl());
  assert(it != (*du->analysis_data)[iter_ptr.block_id].end());
  if(blockDef)
    return &(*iter_ptr.block_it);
  return *iter_ptr.reaches_it;
}

DefUse::defs_iterator::defs_iterator& DefUse::defs_iterator::inc(bool first){
  if(iter_ptr.block_id == -1) return *this;
  // look inside reaches vector
  DefUseBlock::const_iterator it = (*du->analysis_data)[iter_ptr.block_id].find(node->getDecl());
  assert(it != (*du->analysis_data)[iter_ptr.block_id].end());
  if(first){
    // we have to find a def in the block, if exist
    for(; iter_ptr.block_it >= it->second.begin(); iter_ptr.block_it--)
      if(iter_ptr.block_it->getUse() == DefUseNode::Def){
        blockDef = true;
        break;
      }
    // the def is in the block, so it means we don't have to look any further
    // next time the iterator is incremented we return an empty one
    if(blockDef)
      return *this;

  }else if(blockDef){
    iter_ptr.block_it = DefUseVect::const_iterator();
    iter_ptr.reaches_it = VarRefsVect::const_iterator();
    iter_ptr.block_id = -1;
    return *this;
  }
  // we didn't find a def in the block, so we have to look at the reaches set
  VarRefsVect const& reaches_v = (*du->analysis_data)[iter_ptr.block_id].reaches;
  if(!first && iter_ptr.reaches_it != reaches_v.end())
    iter_ptr.reaches_it++;
  for(; iter_ptr.reaches_it != reaches_v.end(); iter_ptr.reaches_it++)
    if((*iter_ptr.reaches_it)->getDecl() == node->getDecl())
      return *this;

  iter_ptr.block_it = DefUseVect::const_iterator();
  iter_ptr.reaches_it = VarRefsVect::const_iterator();
  iter_ptr.block_id = -1;
  return *this;
}

DefUse::uses_iterator defuse::DefUse::uses_begin(clang::DeclRefExpr const* var) const{
  return DefUse::uses_iterator(this, DefUseNode(var, DefUseNode::Def));
}
DefUse::uses_iterator defuse::DefUse::uses_begin(clang::VarDecl const* var) const{
  return DefUse::uses_iterator(this, DefUseNode(var));
}
DefUse::uses_iterator defuse::DefUse::uses_end() const{ return DefUse::uses_iterator(); }

DefUse::defs_iterator defuse::DefUse::defs_begin(clang::DeclRefExpr const* var) const{
  return DefUse::defs_iterator(this, *var);
}
DefUse::defs_iterator defuse::DefUse::defs_end() const{ return DefUse::defs_iterator(); }

bool defuse::DefUse::isUse(clang::DeclRefExpr const* var) const { return ::isUse(ctx, var, NULL, NULL, analysis_data, block_map); }

bool defuse::DefUse::isDef(DefUseNode const& n) const { return ::isDef(ctx, n, NULL, NULL, analysis_data, decl_map, block_map); }

//===------------------------- BuildDefUseChains -------------------------===//

DefUse* defuse::DefUse::BuildDefUseChains(Stmt* body, ASTContext *ctx, DefUseBasicHelper* helper,
       CFG* cfg, ParentMap* pm, bool verbose, llvm::raw_ostream& out)
{
  bool cleanCFG = false, cleanParentMap = false, cleanHelper = false;
  if(!cfg){
    cfg = CFG::buildCFG(body, ctx);
    cleanCFG = true;
  }
  if(!pm){
    pm = new ParentMap(body);
    cleanParentMap = true;
  }
  DefUseData* data = new DefUseData(cfg->getNumBlockIDs());

  VarDeclMap* dm = new VarDeclMap(body); // map VarDecl to AST nodes - this is needed as when the CFG is created by CLANG
                       // DeclStmt with several declarations are rewritten into separate DeclStmts

  VarRefBlockMap* bm = new VarRefBlockMap; // map each VarRef to the CFG block where it has been used

  if(verbose)
    cfg->dump(ctx->getLangOptions());

  if(!helper){
    helper = new DefUseBasicHelper;
    cleanHelper = true;
  }
  helper->InitializeValues(data, bm, dm);

  for(CFG::const_iterator it = cfg->begin(); it != cfg->end(); it++)
    helper->VisitCFGBlock(*it, cfg->getNumBlockIDs()-1);

You should not rely on the root being number 'getNumBlockIDs() - 1'. This actually is not an invariant you should depend on. If you want to identify the entry block, use 'CFG::getEntry()'.

  // remove from defsout local variables
  RemoveLocalDefs(*cfg, *pm, *dm, *data);
  // Calculates the reaches set for each block of the CFG
  ComputeReaches(*cfg, *data);

  if(verbose)
    printDefUseData(out, *ctx, *data);

  unsigned block_id = cfg->getNumBlockIDs();
  if(cfg && cleanCFG)
    delete cfg;
  if(pm && cleanParentMap)
    delete pm;
  if(helper && cleanHelper)
    delete helper;
  return new DefUse(*ctx, data, dm, bm, block_id);
}

Instead of BuildDefUseChains building its own CFG and ParentMap, have these be passed in from the client. This allows reuse of existing data that has likely been constructed for other uses. You can also make an alternate version of BuildDefUseChains that is passed an AnalysisManager and uses those to create the CFGs and ParentMap.

Also, consider using llvm::OwningPtr to auto-reclaim 'helper' and friends. For example:

  llvm::OwningPtr<DefUseBasicHelper> helperReclaim;
  if (!helper) {
    helper = new DefUseBasicHelper();
    helperReclaim.reset(helper);
  }

You can then forget about the later cleanup because it will automatically happen when the function returns. This idiom is really powerful when you have multiple return sites, as you can easily forget to do the proper cleanup. It also cleans up the code, as all the cleanup is just boilerplate that is easily handled by the OwningPtr.

//===------------------------- DefUseChainTest -------------------------===//

void DefUseChainTest::HandleTopLevelDecl(DeclGroupRef declRef) {
  for(DeclGroupRef::iterator it = declRef.begin(); it < declRef.end(); it++){
    if((*it)->getKind() != Decl::Function)
      continue;

Use 'dyn_cast' or 'isa' instead of 'getKind()'

    // Top-level Function declaration
    FunctionDecl* func_decl = cast<FunctionDecl> (*it);
    Stmt* func_body = func_decl->getBodyIfAvailable();
    if(!func_body)
      continue;

    out << "--- Building Def-Use Chains for function '" << func_decl->getNameAsString() << ":" << func_decl->getNumParams() << "' ---\n";
    du = DefUse::BuildDefUseChains(func_body, ctx, 0, 0, 0, true);
    out << "------ Testing Def-Use iterators ------\n";
    VisitStmt(func_body);
    out << "**************************************************\n";
    delete du;
  }
}

void defuse::PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx, llvm::raw_ostream& out){
  if(DR->getDecl()->getKind() != Decl::Var && DR->getDecl()->getKind() != Decl::ParmVar)
    return;

Use 'dyn_cast' or 'isa' instead of 'getKind()'

  VarDecl const* VD = cast<VarDecl>(DR->getDecl());
  SourceLocation loc = DR->getLocStart();
  out << "Variable ref '" << VD->getNameAsCString() << "' in LOC: "
        << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ") -> defs_iterator:\n";
  try{
    for(DefUse::defs_iterator it = DU->defs_begin(DR); it != DU->defs_end(); it++){
      SourceLocation loc;
      if((*it)->getKind() == DefUseNode::VarRef)
        loc = (*it)->getVarRef()->getLocStart();
      else
        loc = (*it)->getVarDecl()->getTypeSpecStartLoc();
      out << "\t* Found DEF in loc: " << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ")\n";
    }
  }catch(NotUseExcpetion e){
    SourceLocation loc = e.expr->getLocStart();
    out << "ERROR: Variable " << PrintClangStmt(e.expr, ctx) << " in location : "
        << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ") is not a use for variable: " << VD->getNameAsCString() << "\n";

Again, LLVM doesn't use C++ exceptions. You'll need to come up with an alternate mechanism.

  }
}
void PrintVarUses(DefUse const* DU, DefUseNode const& n, ASTContext& ctx, llvm::raw_ostream& out){
  VarDecl const* VD = n.getDecl();
  SourceLocation loc = VD ->getTypeSpecStartLoc();
  if(n.getKind() == DefUseNode::VarRef)
    loc = n.getVarRef()->getLocStart();

  out << "Variable ref '" << VD->getNameAsCString() << "' in LOC: "
      << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ") -> uses_iterator:\n";
  try{
    DefUse::uses_iterator it = DU->uses_end();
    if(n.getKind() == DefUseNode::VarRef)
      it = DU->uses_begin(n.getVarRef());
    else
      it = DU->uses_begin(VD);
    for(; it != DU->uses_end(); it++){
      SourceLocation loc = (*it)->getLocStart();
      out << "\t* Found USE in loc: " << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ")\n";
    }
  }catch(NotDefExcpetion e){
    SourceLocation loc = e.stmt->getLocStart();
    out << "ERROR: Variable " << PrintClangStmt(e.stmt, ctx) << " in location : "
        << "(" << Line(loc, ctx.getSourceManager()) << "," << Column(loc, ctx.getSourceManager()) << ") is not a definition for variable: " << VD->getNameAsCString() << "\n";
  }

Again, LLVM doesn't use C++ exceptions. You'll need to come up with an alternate mechanism.

}

inline void defuse::PrintVarUses(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx, llvm::raw_ostream& out){ ::PrintVarUses(DU, DefUseNode(DR), ctx, out); }
inline void defuse::PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext& ctx, llvm::raw_ostream& out){ ::PrintVarUses(DU, DefUseNode(VD), ctx, out); }

void defuse::DefUseChainTest::VisitDeclRefExpr(DeclRefExpr *DR){
  if(DR->getDecl()->getKind() == Decl::Var || DR->getDecl()->getKind() == Decl::ParmVar){

Use 'dyn_cast' or 'isa' instead of 'getKind()'

    PrintVarUses(du, DR, *ctx, out);
    PrintVarDefs(du, DR, *ctx, out);
  }
}

void defuse::DefUseChainTest::VisitDeclStmt(DeclStmt *DS){
  for(DeclStmt::decl_iterator it = DS->decl_begin(); it != DS->decl_end(); it++){
    if((*it)->getKind() != Decl::Var)
      continue;

    VarDecl* VD = cast<VarDecl>(*it);
    PrintVarUses(du, VD, *ctx, out);

    if(VD->getInit())
      Visit(VD->getInit());

How about:

   if (Stmt *S = VD->getInit())
     Visit(S);

Hi Ted,
Thanks for going through all that messy code, I read the code reviews, I definitely agree with them, I will fix it before the end of the week and I will send you a better version hopefully next Monday, so we can go through another iteration.
Sorry for the 80 columns formatting and white spaces issues, I didn't have time to fix it last week and I wanted to send you this code before the end of the week... so I try to get lucky :slight_smile:

I have a couple of questions: Should I provide patch files? or we can continue with sending C++ files at this stage? Should I also provide some test cases?
cheers, Simone

Ted Kremenek wrote:

Hi Simone,

I'm fine with either a patch or files (as long as they are brand new files). In either case, can you please modify Thunderbird to not inline your attachments? Instructions are here:

   http://llvm.org/docs/DeveloperPolicy.html#patches

Test cases would be great. Thanks!

Cheers,
Ted

Hello, I am back :),
so I fix most of the comments, I still need to address a couple of issues, i.e. use of llvm::DenseMap in place of std::map and virtual methods in DefUseHelper.

About the usage of llvm::DenseMap, the problem is that every time I change from std:::map to llvm::DenseMap I get several errors (compile time), which is weird as the 2 classes expose the same interface, anyway I need some time to understand which is the source of problems.

About the virtual methods in DefUseHelper, you are right! :slight_smile: most of the time the user will use the default one and I see the performance problem with the virtual calls. Actually, I have introduced this virtual dispatch because in the project where I am using this implementation of DefUse, I need to implement specific semantics of MPI calls. So mostly I am overloading the semantics of method calls (CallExpr). By default when a variable is passed to a function as a pointer type (or reference), during the DefUse analysis we have to state that this is a definition for the variable. Anyway, if we know the semantics of the specific method, and we know that it will not change the variable (for example the buffer in MPI_Send) we can state that the method call will a use for the variable.

Now I don't know if this mechanism is useful for others... and if it should be here in the first place, but it makes sense for me :slight_smile: Anyway, I guess I can avoid the virtual calls by using templates. I will be working on it the next days.

For the rest I properly formatted the code, for the test cases I have to learn how to write test cases for LLVM :slight_smile:

Btw, I would like to thanks you for your time and the useful hints.

cheers, Simone

Ted Kremenek wrote:

DefUse.h (7.21 KB)

DefUse.cpp (31.9 KB)

Hello, I am back :),
so I fix most of the comments, I still need to address a couple of issues, i.e. use of llvm::DenseMap in place of std::map and virtual methods in DefUseHelper.

Hi Simone,

Sorry for reviewing this second set of changes so late. Comments inline.

About the usage of llvm::DenseMap, the problem is that every time I change from std:::map to llvm::DenseMap I get several errors (compile time), which is weird as the 2 classes expose the same interface, anyway I need some time to understand which is the source of problems.

One thing to understand is that the two data structures do not export the same interface.

std::map allocates memory (using malloc) for each key/value pair you store in the map. This allocation stays around around as long as you keep the key in the map. For example, suppose you had:

    std::map<void*,int> M;
    int &x = M[p];

The reference 'x' is valid for the entire time that the key 'p' is in the map. If you add or remove other keys, the reference stays valid because the key/value pair is actually an allocated blob of memory they you are allowed to reference.

DenseMap doesn't have this property. It assumes that the values you store in the DenseMap are lightweight, e.g. pointers to other data, and thus allocates memory in chunks that are used for buckets for the hash table. When an insertion into the DenseMap occurs, the hash table may resize, causing the values in the old buckets to get copied to new buckets, and the old buckets deallocated. This means that any reference to the old data, as in the example above, is invalid.

Where this is going to bite you is here:

   typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

If you replace this with:

   typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;

then fundamentally things are going to break. This because the DefUseVect values, which is a typedef for std::vector<DefUseNode>, will get copied and deallocated when the table resizes. All references to the DefUseVect, or the objects they contain, will incorrect at this point.

The solution is to use DefUseVect* instead of DefUseVect as the value for DefUseChain.

About the virtual methods in DefUseHelper, you are right! :slight_smile: most of the time the user will use the default one and I see the performance problem with the virtual calls. Actually, I have introduced this virtual dispatch because in the project where I am using this implementation of DefUse, I need to implement specific semantics of MPI calls. So mostly I am overloading the semantics of method calls (CallExpr). By default when a variable is passed to a function as a pointer type (or reference), during the DefUse analysis we have to state that this is a definition for the variable. Anyway, if we know the semantics of the specific method, and we know that it will not change the variable (for example the buffer in MPI_Send) we can state that the method call will a use for the variable.

Now I don't know if this mechanism is useful for others... and if it should be here in the first place, but it makes sense for me :slight_smile: Anyway, I guess I can avoid the virtual calls by using templates. I will be working on it the next days.

I think the approach should be to focus on a clean API. Using templates should be avoided unless there is a huge performance need; the cost of virtual functions is low if the amount of work done in the virtual function call outweighs the dynamic dispatch.

For the rest I properly formatted the code, for the test cases I have to learn how to write test cases for LLVM :slight_smile:

Take a look at the Clang tests in 'test/Analysis' and friends. Basically we'll need to wire up a driver option to clang-cc, and you'll want to run clang-cc on some code that outputs some diagnostics relevant to def-use analysis.

More specific comments inline.

//===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the declaration of DefUse analysis headers
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_DEFUSE_H
#define LLVM_CLANG_DEFUSE_H

#include "llvm/ADT/DenseMap.h"

#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ParentMap.h"
#include "clang/AST/StmtVisitor.h"

namespace clang {

// forward definitions
class CFG;
class CFGBlock;

class DefUse;

namespace defuse {
//===------------------------- DefUseNode -------------------------===//
// Keeps the information for a particular 'definition' or 'use' of a variable
// The structure is needed because a defuse chain can contains variable declarations
// as well as variable reference. As a DeclStmt can contains several declarations
// to address a particular variable we need to store its VarDecl or DeclRefExp
class DefUseNode {
public:
enum NodeKind {
   VarDecl, VarRef
};
enum UseKind {
   Use, Def, UseDef
};

DefUseNode(clang::VarDecl const* decl) :
   var_decl(decl), kind(VarDecl), usage(Def) {
}
DefUseNode(DeclRefExpr const* ref, UseKind u = Use) :
   var_ref(ref), kind(VarRef), usage(u) {
}

clang::VarDecl const* getDecl() const;

NodeKind const& getKind() const {
   return kind;
}
UseKind const& getUse() const {
   return usage;
}

clang::VarDecl const* getVarDecl() const {
   assert(kind==VarDecl);
   return var_decl;
}
DeclRefExpr const* getVarRef() const {
   assert(kind==VarRef);
   return var_ref;
}

bool operator==(DefUseNode const& n) const;

private:
// a def-use node can be either a VarDecl or a DeclRefExpr
union {
   clang::VarDecl const* var_decl;
   DeclRefExpr const* var_ref;
};
NodeKind kind;
UseKind usage;

Please consider using the PointerUnion class for this. It will reduce the code you are using, and will allow you to merge the 'kind' field into the pointer as well with no extra work on your part.

Should 'usage' be const?

Also, most of LLVM does not follow the naming convention of lowercase variable names with underscores. To be consistent with the rest of the codebase, please take a look at the coding conventions used in a core header file, e.g. 'include/clang/AST/Expr.h', which shows that variable names are typically short, and use capitalization instead of underscores. This is really a nitty point, but consistency through the code base makes it more readable.

};

//===------------------------- Typedefs -------------------------===//
class DefUseBlock;
class VarDeclMap;
typedef std::vector<DefUseBlock> DefUseData;
typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
typedef std::vector<DefUseNode> DefUseVect;
typedef std::vector<DefUseNode const*> VarRefsVect;

Something looks potentially wrong here. DefUseVect is a vector of DefUseNode objects, not DefUseNode*. If the vector ever needs to be resized, then the DefUseNode objects will be obliterated, and any references to them (which may be in VarRefVect) will suddenly become invalid. DefUseNode is a lightweight object that you can certainly keep in a vector, but if you need these objects to have stable pointer addresses you will need to manually allocate them and store DefUseNode* instead DefUseVect instead. If you are concerned with 'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to quickly allocate objects.

class DefUseHelper;

//===------------------------- DefUseHelper -------------------------===//

class DefUseHelper : public StmtVisitor<DefUseHelper> {
class DefUseHelperImpl;
DefUseHelperImpl* pimpl;

void InitializeValues(DefUseData* data, VarRefBlockMap* bm, VarDeclMap* dm);

friend class clang::DefUse;
public:
DefUseNode::UseKind current_use;
DefUseHelper();

virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to call the
                             // super class implementation of the method
void HandleDeclStmt(DeclStmt *DS);

virtual void HandleBinaryOperator(BinaryOperator* B);
virtual void HandleConditionalOperator(ConditionalOperator* C);
virtual void HandleCallExpr(CallExpr* C);
virtual void HandleUnaryOperator(UnaryOperator* U);
virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);

void VisitDeclRefExpr(DeclRefExpr *DR) {
   return HandleDeclRefExpr(DR);
}
void VisitDeclStmt(DeclStmt *DS) {
   return HandleDeclStmt(DS);
}
void VisitBinaryOperator(BinaryOperator* B) {
   return HandleBinaryOperator(B);
}
void VisitConditionalOperator(ConditionalOperator* C) {
   return HandleConditionalOperator(C);
}
void VisitCallExpr(CallExpr* C) {
   return HandleCallExpr(C);
}
void VisitUnaryOperator(UnaryOperator* U) {
   return HandleUnaryOperator(U);
}
void VisitArraySubscriptExpr(ArraySubscriptExpr* AS) {
   return HandleArraySubscriptExpr(AS);
}

void VisitCFGBlock(clang::CFGBlock const& blk, CFGBlock const& entry);
void VisitStmt(Stmt* S);

virtual ~DefUseHelper();
};

void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx,
   llvm::raw_ostream& out);
void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR, ASTContext& ctx,
   llvm::raw_ostream& out);
void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext& ctx,
   llvm::raw_ostream& out);

ASTConsumer* CreateDefUseTestConsumer(llvm::raw_ostream& out);

I'm still trying to understand the implementation in its entirety, but it seems to me that you should try and decouple the DefUse implementation from the observer of its values. Here it seems like they are mixed together, which forces you to use subclassing of DefUseHelper to extend it's functionality. This isn't ideal, as it conflates the DefUse algorithm with what desires to consume that information. This muddles the separation of concerns between these two concepts, and it constrains future revisions of the DefUse implementation to stay within this visitor design. The consumer of the DefUse information may wish to have a visitor design, but I'm not certain if you want to expose a visitor pattern in the DefUse implementation itself.

As an alternative, consider the "observer" interface provided by Clang's LiveVariables analysis ('include/clang/Analysis/Analyses/LiveVariables.h'). Here the LiveVariables implementation is completely opaque, but allows clients to provide an "Observer" object so they can query liveness information at specific subexpressions. The dead stores checker uses this interface to find dead stores, and it can do so without any knowledge of how the LiveVariables analysis calculates its liveness information. It makes the API much simpler, and disentangles the implementation of LiveVariables from its clients.

More specifically, the HandleXXX methods look like they are part of the "observer" interface, while the VisitXXX methods look like they are part of the DefUse implementation. Your use of the pimpl idiom implies you want to keep some of this implementation hidden. Why not just separate the implementation of the DefUse algorithm completely from the HandleXXX methods and keep those in the DefUseHelper. The DefUse algorithm can then check to see if a DefUseHelper object is provided, and if so, call the appropriate HandleXXX methods.

} // end namespace defuse

//===------------------------- DefUse -------------------------===//

class DefUse {
ASTContext const& ctx;
defuse::DefUseData const* analysis_data;
defuse::VarDeclMap const* decl_map;
defuse::VarRefBlockMap const* block_map;
unsigned const num_cfg_blocks;

Again, just my comment on the naming convention for variables.

DefUse(ASTContext const& ctx_,
     defuse::DefUseData const* analysis_data_,
     defuse::VarDeclMap const* decl_map_,
     defuse::VarRefBlockMap const* block_map_,
     unsigned num_blocks) :

Please do not use trailing underscores in variable names.

   ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),
   block_map(block_map_), num_cfg_blocks(num_blocks) {
}

bool isDef(defuse::DefUseNode const& n) const;

class iterator_impl {
public:
   DefUse const* du;
   defuse::DefUseNode const* node;

   struct iter {
     int block_id;
     defuse::DefUseVect::const_iterator block_it;
     iter(int blk_id) :
       block_id(blk_id) {
     }
   };
   iterator_impl() : du(NULL) { }
   iterator_impl(DefUse const* du_) : du(du_) { }
   iterator_impl& operator++() {
     return inc(false);
   }
   virtual iterator_impl& inc(bool) = 0;
};

Do we need a virtual iterator? Does iterator_impl need to be subclassed? Not only is this a potential performance problem, but it could be the source of subtle bugs if you assigned an object of a subclass to that of a parent class. It seems to me that defs_iterator and uses_iterator can be merged (substantially reducing the code), and simply add a flag to the iterator indicating whether it is for uses or defs. If you want strong typing, you certainly can have subclassing of the iterators, but I think the entire implementation can get sucked into a common class. Then the "inc" just checks the flag and skips of uses when you are iterating over defs and vis versa. You then remove the vtable pointer, simplify the inheritance tree, and the implementation becomes a lot more readable. There is also more chances for inlining by the compiler (i.e., if the definition of 'inc' is provided in the class declaration).

public:
class uses_iterator : public std::iterator<std::input_iterator_tag,
   DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
   public iterator_impl {
   iter iter_ptr;
   bool inDefBlock;

   uses_iterator() :
     iterator_impl(), iter_ptr(-1), inDefBlock(true) { }
   uses_iterator(DefUse const* du, defuse::DefUseNode const& n);
   uses_iterator& inc(bool first);
   friend class DefUse;
public:
   bool operator!=(uses_iterator const& iter);
   DeclRefExpr const* operator*();
};

class defs_iterator : public std::iterator<std::input_iterator_tag,
defuse::DefUseNode, std::ptrdiff_t, defuse::DefUseNode const*>,
public iterator_impl {
   struct iter_ : public iter {
     defuse::VarRefsVect::const_iterator reaches_it;
     iter_(int blk_id) :
       iter(blk_id) { }
   } iter_ptr;
   bool blockDef;

   defs_iterator() :
     iterator_impl(), iter_ptr(-1), blockDef(false) { }
   defs_iterator(DefUse const* du, DeclRefExpr const& n);
   defs_iterator& inc(bool first);
   friend class DefUse;
public:
   bool operator!=(defs_iterator const& iter);
   defuse::DefUseNode const* operator*();
};

// USES //
defs_iterator defs_begin(DeclRefExpr const* var) const;
defs_iterator defs_end() const;

// DEFS //
uses_iterator uses_begin(DeclRefExpr const* var) const;
uses_iterator uses_begin(VarDecl const* var) const;
uses_iterator uses_end() const;

bool isUse(DeclRefExpr const* var) const;
bool isDef(DeclRefExpr const* var) const {
   return isDef(defuse::DefUseNode(var, defuse::DefUseNode::Def));
}
bool isDef(VarDecl const* var) const {
   return isDef(defuse::DefUseNode(var));
}

~DefUse();

static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,
     CFG* cfg, ParentMap* pm, defuse::DefUseHelper* helper = 0,
     bool verbose = false, llvm::raw_ostream& out = llvm::outs());
};

} // end namespace clang

#endif
//===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===---------------------------------------------------------------------===//
//
// This file contains the implementation of DefUse analysis
//
//===---------------------------------------------------------------------===//

#define __STDC_LIMIT_MACROS
#define __STDC_CONSTANT_MACROS

#include "DefUse.h"

#include "clang/Basic/SourceManager.h"
#include "clang/Analysis/CFG.h"
#include <llvm/Support/raw_ostream.h>

// todo: Use DenseMap and llvm sets
#include <set>
#include <map>

using namespace clang;
using namespace defuse;

// utility functions
static unsigned Line(SourceLocation const& l, clang::SourceManager const& sm) {
PresumedLoc pl = sm.getPresumedLoc(l);
return pl.getLine();
}

static unsigned Column(SourceLocation const& l, SourceManager const& sm) {
PresumedLoc pl = sm.getPresumedLoc(l);
return pl.getColumn();
}

static std::string PrintClangStmt(Stmt const* stmt, ASTContext const& ctx) {
std::string str;
llvm::raw_string_ostream pp(str);
stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
return pp.str();
}

//===------------------------- DefUseNode -------------------------===//
inline bool defuse::DefUseNode::operator==(DefUseNode const& n) const {
if (n.usage != usage || n.kind != kind) return false;
return kind == VarDecl ? var_decl == n.var_decl : var_ref == n.var_ref;
}

inline clang::VarDecl const* DefUseNode::getDecl() const {
return kind == VarRef ? dyn_cast<clang::VarDecl> (var_ref->getDecl()) :
   dyn_cast<clang::VarDecl> (var_decl);
}

typedef std::set<DefUseNode const*> VarRefsSet;
typedef std::set<VarDecl const*> VarDeclsSet;
typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

//===------------------------- DefUseBlock -------------------------===//

class defuse::DefUseBlock : public DefUseChain {

Please use encapsulation instead of subclassing. In this case you are subclassing std::map, and that is something there rarely is ever a good reason to do. Encapsulation ensures isolation between the two classes, and forces you to think in terms of interfaces instead of making assumptions about the implementation of the underlying data structure. Almost all of LLVM is structured this way, and it makes code far more resilient to implementation changes in the employed data structures.

Instead, either have DefUseChain contain an std::map (and then define the proper interfaces in DefUseChain) or have DefUseBlock contain a DefUseChain, not subclass it. The former forces you to define only the interfaces you need for DefUseChain, instead of exposing the entirety of std::map (which isn't the interface you're intending to vend, even if it is just to your client code within the DefUse implementation).

public:
VarDeclsSet uses;
VarRefsVect defsout;
VarDeclsSet killed;
VarRefsVect reaches;

class not_killed_set_iterator : public std::iterator<std::input_iterator_tag,
     DefUseNode, std::ptrdiff_t, DefUseNode const*> {
   DefUseBlock& block;
   VarRefsVect::const_iterator ptr_it;

   not_killed_set_iterator& inc(bool first) {
     if (ptr_it != block.reaches.end() && !first)
       ++ptr_it;
     while (ptr_it != block.reaches.end() &&
         (block.killed.find((*ptr_it)->getDecl()) != block.killed.end() &&
         std::find(block.defsout.begin(), block.defsout.end(), *ptr_it)
         == block.defsout.end()))
       ++ptr_it;
     return *this;
   }
public:
   not_killed_set_iterator(DefUseBlock& block_,
       VarRefsVect::const_iterator ptr_it_) :
     block(block_), ptr_it(ptr_it_) { inc(true); }
   not_killed_set_iterator& operator++() { return inc(false); }
   bool operator!=(not_killed_set_iterator const& iter) const {
     return !((&iter.block) == &(block) && iter.ptr_it == ptr_it);
   }
   const DefUseNode* const & operator*() const { return *ptr_it; }
};

not_killed_set_iterator begin_not_killed_set() {
   return DefUseBlock::not_killed_set_iterator(*this, reaches.begin());
}
not_killed_set_iterator end_not_killed_set() {
   return DefUseBlock::not_killed_set_iterator(*this, reaches.end());
}
};

//===------------------------- VarDeclMap -------------------------===//

class defuse::VarDeclMap : public StmtVisitor<VarDeclMap> ,
   public llvm::DenseMap<VarDecl const*, DeclStmt const*> {
public:
VarDeclMap(Stmt const* body) { VisitStmt(const_cast<Stmt*> (body)); }

void VisitDeclStmt(DeclStmt *DS) {
   for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end();
       I != E; ++I)
   {
     if (VarDecl *VD = dyn_cast<VarDecl>(*I))
       insert(std::make_pair(VD, DS));
   }
}
void VisitStmt(Stmt* S) {
   for (Stmt::child_iterator I = S->child_begin(), E = S->child_end();
       I != E; ++I)
     if (*I) Visit(*I);
}
};

//===------------------------- DefUseHelper -------------------------===//
struct defuse::DefUseHelper::DefUseHelperImpl {
DefUseData* data;
VarRefBlockMap* bm;
VarDeclMap* dm;
unsigned blockID;

DefUseHelperImpl() :
   data(NULL), bm(NULL), dm(NULL) { }
};

Per my comments above, it seems to me this pimpl logic can be sucked into a common class with the main logic for the DefUse algorithm, and then make that whole thing private to DefUse.cpp. That will be a lot simpler, and should migrate many of the implementation details from the header file to the .cpp file. Then clients just pass the DefUseHelper object (which has the HandleXXX methods) to BuildDefUseChains(), and the implementation class stays completely hidden (thus separating the DefUse algorithm from the client interface for querying that information).

defuse::DefUseHelper::DefUseHelper() :
pimpl(new DefUseHelperImpl), current_use(DefUseNode::Use) { }

defuse::DefUseHelper::~DefUseHelper() { delete pimpl; }

void defuse::DefUseHelper::InitializeValues(DefUseData* data,
   VarRefBlockMap* bm, VarDeclMap* dm) {
pimpl->data = data;
pimpl->bm = bm;
pimpl->dm = dm;
}

void defuse::DefUseHelper::HandleDeclRefExpr(DeclRefExpr *DR) {
VarRefBlockMap& bm = *pimpl->bm;
unsigned blockID = pimpl->blockID;
DefUseBlock& block_data = (*pimpl->data)[blockID];
if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl())) {
   // map this variable to the current block
   bm[DR] = blockID;
   if (block_data.find(VD) == block_data.end()) {
     block_data.insert(std::make_pair(VD, DefUseVect()));
     block_data.uses.insert(VD); // this variable is used in the current
     // block and it has no prior definition within the block
   }
   if (current_use == DefUseNode::UseDef) { // in case of compound operators
     DefUseVect &B = block_data[VD];
     B.push_back(DefUseNode(DR, DefUseNode::Use));
     B.push_back(DefUseNode(DR, DefUseNode::Def));
   } else block_data[VD].push_back(DefUseNode(DR, current_use));
}
}

void defuse::DefUseHelper::HandleDeclStmt(DeclStmt *DS) {
for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end();
     I != E; I++)
{
   if (VarDecl* VD = dyn_cast<VarDecl> (*I)) {
     VarRefBlockMap& bm = *pimpl->bm;
     VarDeclMap& dm = *pimpl->dm;
     unsigned blockID = pimpl->blockID;
     DefUseBlock& block_data = (*pimpl->data)[blockID];

     assert((block_data.find(VD) == block_data.end()) &&
         "Variable redefined in the same block");
     block_data[VD].push_back(DefUseNode(VD));
     bm[dm[VD]] = blockID;
   }
}
}

void defuse::DefUseHelper::HandleBinaryOperator(BinaryOperator* B) {
DefUseNode::UseKind backup = current_use; // backup the usage
if (B->isAssignmentOp()) {
   current_use = DefUseNode::Use;
   Visit(B->getRHS());
   current_use = DefUseNode::Def;
   if (B->isCompoundAssignmentOp())
     current_use = DefUseNode::UseDef;
   Visit(B->getLHS());
} else {
   Visit(B->getLHS());
   current_use = DefUseNode::Use;
   Visit(B->getRHS());
}
current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseHelper::HandleConditionalOperator(ConditionalOperator* C) {}

void defuse::DefUseHelper::HandleCallExpr(CallExpr* C) {
DefUseNode::UseKind backup = current_use; // backup the usage
for (CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end();
     I != E; ++I)
{
   if ((*I)->getType()->isPointerType() || (*I)->getType()->isReferenceType())
     current_use = DefUseNode::Def;
   Visit(*I);
   current_use = backup;
}
}

void defuse::DefUseHelper::HandleUnaryOperator(UnaryOperator* U) {
DefUseNode::UseKind backup = current_use; // backup the usage
switch (U->getOpcode()) {
case UnaryOperator::PostInc:
case UnaryOperator::PostDec:
case UnaryOperator::PreInc:
case UnaryOperator::PreDec:
   current_use = DefUseNode::UseDef;
   break;
case UnaryOperator::Plus:
case UnaryOperator::Minus:
case UnaryOperator::Not:
case UnaryOperator::LNot:
   current_use = DefUseNode::Use;
   break;
case UnaryOperator::AddrOf:
case UnaryOperator::Deref:
   // use the current_use
   break;
default:
   // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode()) <<
   // " not supported in def-use analysis");
   break;
}
Visit(U->getSubExpr());
current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseHelper::HandleArraySubscriptExpr(ArraySubscriptExpr* AS) {
DefUseNode::UseKind backup = current_use; // backup the usage
Visit(AS->getBase());
current_use = DefUseNode::Use;
Visit(AS->getIdx());
current_use = backup; // write back the usage to the current usage
}

void defuse::DefUseHelper::VisitCFGBlock(CFGBlock const& blk,
   CFGBlock const& entry) {
pimpl->blockID = blk.getBlockID();
for (CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I != E; ++I)
   Visit(*I);

unsigned blockID = pimpl->blockID;
DefUseBlock& block_data = (*pimpl->data)[blockID];

// Post processing
// build up the uses(b), defsout(b) and killed(b)

// for each variable
for (DefUseBlock::iterator I = block_data.begin(), E = block_data.end(); I
     != E; ++I) {
   // in case of references to ParmVar what we do is insert the variable
   // definition in the first block of the CFG
   DefUseBlock& root_data = (*pimpl->data)[entry.getBlockID()];
   if (isa<ParmVarDecl>(I->first) &&
       std::find(root_data[I->first].begin(), root_data[I->first].end(),
       DefUseNode(I->first)) == root_data[I->first].end())
   {
     root_data[I->first].push_back(DefUseNode(I->first));
     root_data.defsout.push_back(&root_data[I->first].back());
   }
   // if this is a local variable, should't be in the defsout,killed sets
   // defsout is calculated by iterating backwards to find the last use of
   // the variable
   for (DefUseVect::const_reverse_iterator UI = I->second.rbegin(), UE =
       I->second.rend(); UI != UE; ++UI)
     if (UI->getUse() == DefUseNode::Def && (block_data.defsout.empty()
         >> block_data.defsout.back()->getDecl() != I->first)) {
       block_data.defsout.push_back(&(*UI));
       block_data.killed.insert(UI->getDecl());
       break;
     }
}
}

Are these handle methods just part of the main DefUse algorithm? If so, consider sucking them into the Visit methods (thus removing the virtual dispatch), and have them call the HandleXXX methods in a separate DefUseHelper object if one was provided. That way the algorithm stays separate from those who want to observe its values.

I know I keep saying this point, but a critical part of making this clean is making it clear where the def-use implementation is and where we would want to insert hooks to query information (including auditing the algorithm). Making those interfaces clear is key. Ideally, the header file should define just the functions we need to run the algorithm and query the def-use information, nothing more. I fully admit that I just might be confused with the details of the implementation.

There is a bunch of other code that I'm afraid I haven't waded through yet, as there looks like there are some really meaty pieces there. I've noticed some uses of std::set and performing set union's, etc. Note that there may be a really high cost to using such methods, as there will be implicit allocations from malloc, etc. There may possibly be alternative data structures within LLVM that you can use for your purposes that perform far better, but I haven't looked at your implementation close enough to make more specific comments yet. I've given you enough comments, however, in this email to await any feedback you want to provide on the next iteration.

Again, this is really great you're interested in contributing this back, and I look forward to hearing your reply.

Best,
Ted

Ted Kremenek wrote:

Hi Simone,

Sorry for reviewing this second set of changes so late. Comments inline.

Hello,
as you see I am also late :), sorry but currently I am working on other aspects of program analysis and I have very little time to work in DefUse.

Anyway I am doing the changes you proposed:

Where this is going to bite you is here:

  typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

If you replace this with:

  typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;

then fundamentally things are going to break. This because the DefUseVect values, which is a typedef for std::vector<DefUseNode>, will get copied and deallocated when the table resizes. All references to the DefUseVect, or the objects they contain, will incorrect at this point.

The solution is to use DefUseVect* instead of DefUseVect as the value for DefUseChain.

I managed to get rid of the std::map in my implementation, I defined DefUseChain as:

typedef llvm::DenseMap<VarDecl const*, DefUseVect*> DefUseChain;

the only problem is that now in the destructor for the DefUse class I have to manually free the memory allocated in the heap, and that's ugly :slight_smile:

I am looking forward to do something like: typedef llvm::DenseMap<VarDecl const*, llvm::OwningPtr<DefUseVect> > DefUseChain;
but OwiningPrt has no copy constructor and I need to do some major changes in the code.

Is there in LLVM something like the boost::shared_ptr?

I think the approach should be to focus on a clean API. Using templates should be avoided unless there is a huge performance need; the cost of virtual functions is low if the amount of work done in the virtual function call outweighs the dynamic dispatch.

I still have to look into this issue!

For the rest I properly formatted the code, for the test cases I have to learn how to write test cases for LLVM :slight_smile:

Take a look at the Clang tests in 'test/Analysis' and friends. Basically we'll need to wire up a driver option to clang-cc, and you'll want to run clang-cc on some code that outputs some diagnostics relevant to def-use analysis.

I will

More specific comments inline.

//===------------------------- Typedefs -------------------------===//
class DefUseBlock;
class VarDeclMap;
typedef std::vector<DefUseBlock> DefUseData;
typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
typedef std::vector<DefUseNode> DefUseVect;
typedef std::vector<DefUseNode const*> VarRefsVect;

Something looks potentially wrong here. DefUseVect is a vector of DefUseNode objects, not DefUseNode*. If the vector ever needs to be resized, then the DefUseNode objects will be obliterated, and any references to them (which may be in VarRefVect) will suddenly become invalid. DefUseNode is a lightweight object that you can certainly keep in a vector, but if you need these objects to have stable pointer addresses you will need to manually allocate them and store DefUseNode* instead DefUseVect instead. If you are concerned with 'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to quickly allocate objects.

You are completely write, that's a potential huge problem.

I get rid of the DefUseVect so not I keep only vector to pointers. This also allow me to do some simplifications inside the defs_iterator and uses_iterator
of course there is always the issue that I need to clean the memory when the DefUse object is deleted.

Again, this is really great you're interested in contributing this back, and I look forward to hearing your reply.

Best,
Ted

I will continue this work this week and probably at the beginning of the next week I will send you the code for another iteration.

There are several things I want to improve simplifying the overall implementation.

Thanks again for you style lessons, and see you in few days!
bye

Dear Ted,
first of all I apologize for the long time, but I was a little bit
overloaded with work lately so I had no time to fix the style issues in
the DefUse implementation.

I did several improvements following your suggestions. I changed the
interface a little and mainly the implementation. Here it is a list of
changes I did:

* DefUseNode now is implemented using the PointerUnion class (I also
changed the name from DefUseNode to Node as it is already in the du
namespace and it was a little bit overwhelming).

* I have separated the DefUseHelper from the visitor semantics and moved
some of the definition from the the header to the cpp file.

* I have eliminated all the possible problem due to resizing of vectors
and I finally use llvm::DenseMap instead of std::map.

* I also changed most of the variable names to adhere the LLVM style

* I use the AnalysisContext object instead of asking the user for a CFG
and a ParentMap

* plus several small fixes and style improvement

I think there are still improvements to do, the code is way away from
being perfect. I promise I will provide some test cases in short time.

Anyway, thanks for helping me in improving my crap code :stuck_out_tongue: more comments
follows:

> Hello, I am back :),
> so I fix most of the comments, I still need to address a couple of
> issues, i.e. use of llvm::DenseMap in place of std::map and virtual
> methods in DefUseHelper.

Hi Simone,

Sorry for reviewing this second set of changes so late. Comments
inline.

> About the usage of llvm::DenseMap, the problem is that every time I
> change from std:::map to llvm::DenseMap I get several errors
> (compile time), which is weird as the 2 classes expose the same
> interface, anyway I need some time to understand which is the source
> of problems.

One thing to understand is that the two data structures do not export
the same interface.

std::map allocates memory (using malloc) for each key/value pair you
store in the map. This allocation stays around around as long as you
keep the key in the map. For example, suppose you had:

    std::map<void*,int> M;
    int &x = M[p];

The reference 'x' is valid for the entire time that the key 'p' is in
the map. If you add or remove other keys, the reference stays valid
because the key/value pair is actually an allocated blob of memory
they you are allowed to reference.

DenseMap doesn't have this property. It assumes that the values you
store in the DenseMap are lightweight, e.g. pointers to other data,
and thus allocates memory in chunks that are used for buckets for the
hash table. When an insertion into the DenseMap occurs, the hash
table may resize, causing the values in the old buckets to get copied
to new buckets, and the old buckets deallocated. This means that any
reference to the old data, as in the example above, is invalid.

Where this is going to bite you is here:

   typedef std::map<VarDecl const*, DefUseVect> DefUseChain;

If you replace this with:

   typedef llvm::DenseMap<VarDecl const*, DefUseVect> DefUseChain;

then fundamentally things are going to break. This because the
DefUseVect values, which is a typedef for std::vector<DefUseNode>,
will get copied and deallocated when the table resizes. All
references to the DefUseVect, or the objects they contain, will
incorrect at this point.

The solution is to use DefUseVect* instead of DefUseVect as the value
for DefUseChain.

Done.

>
> About the virtual methods in DefUseHelper, you are right! :slight_smile: most of
> the time the user will use the default one and I see the performance
> problem with the virtual calls. Actually, I have introduced this
> virtual dispatch because in the project where I am using this
> implementation of DefUse, I need to implement specific semantics of
> MPI calls. So mostly I am overloading the semantics of method calls
> (CallExpr). By default when a variable is passed to a function as a
> pointer type (or reference), during the DefUse analysis we have to
> state that this is a definition for the variable. Anyway, if we know
> the semantics of the specific method, and we know that it will not
> change the variable (for example the buffer in MPI_Send) we can
> state that the method call will a use for the variable.
>
> Now I don't know if this mechanism is useful for others... and if it
> should be here in the first place, but it makes sense for me :slight_smile:
> Anyway, I guess I can avoid the virtual calls by using templates. I
> will be working on it the next days.

I think the approach should be to focus on a clean API. Using
templates should be avoided unless there is a huge performance need;
the cost of virtual functions is low if the amount of work done in the
virtual function call outweighs the dynamic dispatch.

no templates... got it!

>
> For the rest I properly formatted the code, for the test cases I
> have to learn how to write test cases for LLVM :slight_smile:

Take a look at the Clang tests in 'test/Analysis' and friends.
Basically we'll need to wire up a driver option to clang-cc, and
you'll want to run clang-cc on some code that outputs some diagnostics
relevant to def-use analysis.

to be done. sorry

More specific comments inline.

> //===-- clang/Analysis/DefUse.h - DefUse analysis -------*- C++ -*-
> ===//
> //
> // The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open
> Source
> // License. See LICENSE.TXT for details.
> //
> //
> =
> =
> =
> ----------------------------------------------------------------------=
> ==//
> //
> // This file contains the declaration of DefUse analysis headers
> //
> //
> =
> =
> =
> ----------------------------------------------------------------------=
> ==//
>
> #ifndef LLVM_CLANG_DEFUSE_H
> #define LLVM_CLANG_DEFUSE_H
>
> #include "llvm/ADT/DenseMap.h"
>
> #include "clang/AST/ASTConsumer.h"
> #include "clang/AST/ParentMap.h"
> #include "clang/AST/StmtVisitor.h"
>
> namespace clang {
>
> // forward definitions
> class CFG;
> class CFGBlock;
>
> class DefUse;
>
> namespace defuse {
> //===------------------------- DefUseNode -------------------------
> ===//
> // Keeps the information for a particular 'definition' or 'use' of a
> variable
> // The structure is needed because a defuse chain can contains
> variable declarations
> // as well as variable reference. As a DeclStmt can contains several
> declarations
> // to address a particular variable we need to store its VarDecl or
> DeclRefExp
> class DefUseNode {
> public:
> enum NodeKind {
> VarDecl, VarRef
> };
> enum UseKind {
> Use, Def, UseDef
> };
>
> DefUseNode(clang::VarDecl const* decl) :
> var_decl(decl), kind(VarDecl), usage(Def) {
> }
> DefUseNode(DeclRefExpr const* ref, UseKind u = Use) :
> var_ref(ref), kind(VarRef), usage(u) {
> }
>
> clang::VarDecl const* getDecl() const;
>
> NodeKind const& getKind() const {
> return kind;
> }
> UseKind const& getUse() const {
> return usage;
> }
>
> clang::VarDecl const* getVarDecl() const {
> assert(kind==VarDecl);
> return var_decl;
> }
> DeclRefExpr const* getVarRef() const {
> assert(kind==VarRef);
> return var_ref;
> }
>
> bool operator==(DefUseNode const& n) const;
>
> private:
> // a def-use node can be either a VarDecl or a DeclRefExpr
> union {
> clang::VarDecl const* var_decl;
> DeclRefExpr const* var_ref;
> };
> NodeKind kind;
> UseKind usage;

Please consider using the PointerUnion class for this. It will reduce
the code you are using, and will allow you to merge the 'kind' field
into the pointer as well with no extra work on your part.

Done.

Should 'usage' be const?

yes, you are right!

Also, most of LLVM does not follow the naming convention of lowercase
variable names with underscores. To be consistent with the rest of
the codebase, please take a look at the coding conventions used in a
core header file, e.g. 'include/clang/AST/Expr.h', which shows that
variable names are typically short, and use capitalization instead of
underscores. This is really a nitty point, but consistency through
the code base makes it more readable.

> };
>
> //===------------------------- Typedefs -------------------------===//
> class DefUseBlock;
> class VarDeclMap;
> typedef std::vector<DefUseBlock> DefUseData;
> typedef llvm::DenseMap<Stmt const*, unsigned> VarRefBlockMap;
> typedef std::vector<DefUseNode> DefUseVect;
> typedef std::vector<DefUseNode const*> VarRefsVect;

Something looks potentially wrong here. DefUseVect is a vector of
DefUseNode objects, not DefUseNode*. If the vector ever needs to be
resized, then the DefUseNode objects will be obliterated, and any
references to them (which may be in VarRefVect) will suddenly become
invalid. DefUseNode is a lightweight object that you can certainly
keep in a vector, but if you need these objects to have stable pointer
addresses you will need to manually allocate them and store
DefUseNode* instead DefUseVect instead. If you are concerned with
'malloc()' being too slow, you can use the BumpPtrAllocator in llvm to
quickly allocate objects.

yes, right! Fixed it.

> class DefUseHelper;
>
> //===------------------------- DefUseHelper -------------------------
> ===//
>
> class DefUseHelper : public StmtVisitor<DefUseHelper> {
> class DefUseHelperImpl;
> DefUseHelperImpl* pimpl;
>
> void InitializeValues(DefUseData* data, VarRefBlockMap* bm,
> VarDeclMap* dm);
>
> friend class clang::DefUse;
> public:
> DefUseNode::UseKind current_use;
> DefUseHelper();
>
> virtual void HandleDeclRefExpr(DeclRefExpr *DR); // remember to
> call the
> // super class implementation of the
> method
> void HandleDeclStmt(DeclStmt *DS);
>
> virtual void HandleBinaryOperator(BinaryOperator* B);
> virtual void HandleConditionalOperator(ConditionalOperator* C);
> virtual void HandleCallExpr(CallExpr* C);
> virtual void HandleUnaryOperator(UnaryOperator* U);
> virtual void HandleArraySubscriptExpr(ArraySubscriptExpr* AS);
>
> void VisitDeclRefExpr(DeclRefExpr *DR) {
> return HandleDeclRefExpr(DR);
> }
> void VisitDeclStmt(DeclStmt *DS) {
> return HandleDeclStmt(DS);
> }
> void VisitBinaryOperator(BinaryOperator* B) {
> return HandleBinaryOperator(B);
> }
> void VisitConditionalOperator(ConditionalOperator* C) {
> return HandleConditionalOperator(C);
> }
> void VisitCallExpr(CallExpr* C) {
> return HandleCallExpr(C);
> }
> void VisitUnaryOperator(UnaryOperator* U) {
> return HandleUnaryOperator(U);
> }
> void VisitArraySubscriptExpr(ArraySubscriptExpr* AS) {
> return HandleArraySubscriptExpr(AS);
> }
>
> void VisitCFGBlock(clang::CFGBlock const& blk, CFGBlock const&
> entry);
> void VisitStmt(Stmt* S);
>
> virtual ~DefUseHelper();
> };
>
> void PrintVarDefs(DefUse const* DU, DeclRefExpr const* DR,
> ASTContext& ctx,
> llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, DeclRefExpr const* DR,
> ASTContext& ctx,
> llvm::raw_ostream& out);
> void PrintVarUses(DefUse const* DU, VarDecl const* VD, ASTContext&
> ctx,
> llvm::raw_ostream& out);
>
> ASTConsumer* CreateDefUseTestConsumer(llvm::raw_ostream& out);
>

I'm still trying to understand the implementation in its entirety, but
it seems to me that you should try and decouple the DefUse
implementation from the observer of its values. Here it seems like
they are mixed together, which forces you to use subclassing of
DefUseHelper to extend it's functionality. This isn't ideal, as it
conflates the DefUse algorithm with what desires to consume that
information. This muddles the separation of concerns between these
two concepts, and it constrains future revisions of the DefUse
implementation to stay within this visitor design. The consumer of
the DefUse information may wish to have a visitor design, but I'm not
certain if you want to expose a visitor pattern in the DefUse
implementation itself.

The helper is not used as observer, it's more something needed to build
the defuse chains. When the analysis data is collected I need to know if
a DeclVarRef is a definition or an use. And this is DefUse::Helper
business. Most of the time the user will not need to redefine the
behaviour, it could happen and I want to give the opportunity for that.

Anyway now I separated the Visitor from the HandleXXX method. It's a
little bit cleaner but the concept is still the same.

As an alternative, consider the "observer" interface provided by
Clang's LiveVariables analysis ('include/clang/Analysis/Analyses/
LiveVariables.h'). Here the LiveVariables implementation is
completely opaque, but allows clients to provide an "Observer" object
so they can query liveness information at specific subexpressions.
The dead stores checker uses this interface to find dead stores, and
it can do so without any knowledge of how the LiveVariables analysis
calculates its liveness information. It makes the API much simpler,
and disentangles the implementation of LiveVariables from its clients.

More specifically, the HandleXXX methods look like they are part of
the "observer" interface, while the VisitXXX methods look like they
are part of the DefUse implementation. Your use of the pimpl idiom
implies you want to keep some of this implementation hidden. Why not
just separate the implementation of the DefUse algorithm completely
from the HandleXXX methods and keep those in the DefUseHelper. The
DefUse algorithm can then check to see if a DefUseHelper object is
provided, and if so, call the appropriate HandleXXX methods.

> } // end namespace defuse
>
> //===------------------------- DefUse -------------------------===//
>
> class DefUse {
> ASTContext const& ctx;
> defuse::DefUseData const* analysis_data;
> defuse::VarDeclMap const* decl_map;
> defuse::VarRefBlockMap const* block_map;
> unsigned const num_cfg_blocks;

Again, just my comment on the naming convention for variables.

>
> DefUse(ASTContext const& ctx_,
> defuse::DefUseData const* analysis_data_,
> defuse::VarDeclMap const* decl_map_,
> defuse::VarRefBlockMap const* block_map_,
> unsigned num_blocks) :

Please do not use trailing underscores in variable names.

> ctx(ctx_), analysis_data(analysis_data_), decl_map(decl_map_),
> block_map(block_map_), num_cfg_blocks(num_blocks) {
> }
>
> bool isDef(defuse::DefUseNode const& n) const;
>
> class iterator_impl {
> public:
> DefUse const* du;
> defuse::DefUseNode const* node;
>
> struct iter {
> int block_id;
> defuse::DefUseVect::const_iterator block_it;
> iter(int blk_id) :
> block_id(blk_id) {
> }
> };
> iterator_impl() : du(NULL) { }
> iterator_impl(DefUse const* du_) : du(du_) { }
> iterator_impl& operator++() {
> return inc(false);
> }
> virtual iterator_impl& inc(bool) = 0;
> };

Do we need a virtual iterator? Does iterator_impl need to be
subclassed? Not only is this a potential performance problem, but it
could be the source of subtle bugs if you assigned an object of a
subclass to that of a parent class. It seems to me that defs_iterator
and uses_iterator can be merged (substantially reducing the code), and
simply add a flag to the iterator indicating whether it is for uses or
defs. If you want strong typing, you certainly can have subclassing
of the iterators, but I think the entire implementation can get sucked
into a common class. Then the "inc" just checks the flag and skips of
uses when you are iterating over defs and vis versa. You then remove
the vtable pointer, simplify the inheritance tree, and the
implementation becomes a lot more readable. There is also more
chances for inlining by the compiler (i.e., if the definition of 'inc'
is provided in the class declaration).

I also fixed this. The problem here is that the 2 iterators are a little
bit different. I cannot put everything together. For example the
uses_iterator iterates through DeclRefExpr* while the defs_iterator
return a pointer to a Node* (old DefUseNode) because an use can be both
a decl or a var ref.

> public:
> class uses_iterator : public std::iterator<std::input_iterator_tag,
> DeclRefExpr, std::ptrdiff_t, DeclRefExpr const*>,
> public iterator_impl {
> iter iter_ptr;
> bool inDefBlock;
>
> uses_iterator() :
> iterator_impl(), iter_ptr(-1), inDefBlock(true) { }
> uses_iterator(DefUse const* du, defuse::DefUseNode const& n);
> uses_iterator& inc(bool first);
> friend class DefUse;
> public:
> bool operator!=(uses_iterator const& iter);
> DeclRefExpr const* operator*();
> };
>
> class defs_iterator : public std::iterator<std::input_iterator_tag,
> defuse::DefUseNode, std::ptrdiff_t, defuse::DefUseNode const*>,
> public iterator_impl {
> struct iter_ : public iter {
> defuse::VarRefsVect::const_iterator reaches_it;
> iter_(int blk_id) :
> iter(blk_id) { }
> } iter_ptr;
> bool blockDef;
>
> defs_iterator() :
> iterator_impl(), iter_ptr(-1), blockDef(false) { }
> defs_iterator(DefUse const* du, DeclRefExpr const& n);
> defs_iterator& inc(bool first);
> friend class DefUse;
> public:
> bool operator!=(defs_iterator const& iter);
> defuse::DefUseNode const* operator*();
> };
>
> // USES //
> defs_iterator defs_begin(DeclRefExpr const* var) const;
> defs_iterator defs_end() const;
>
> // DEFS //
> uses_iterator uses_begin(DeclRefExpr const* var) const;
> uses_iterator uses_begin(VarDecl const* var) const;
> uses_iterator uses_end() const;
>
> bool isUse(DeclRefExpr const* var) const;
> bool isDef(DeclRefExpr const* var) const {
> return isDef(defuse::DefUseNode(var, defuse::DefUseNode::Def));
> }
> bool isDef(VarDecl const* var) const {
> return isDef(defuse::DefUseNode(var));
> }
>
> ~DefUse();
>
> static DefUse* BuildDefUseChains(Stmt* body, ASTContext *ctx,
> CFG* cfg, ParentMap* pm, defuse::DefUseHelper* helper = 0,
> bool verbose = false, llvm::raw_ostream& out = llvm::outs());
> };
>
> } // end namespace clang
>
> #endif
> //===-- clang/Analysis/DefUse.cpp - DefUse analysis -------*- C++ -*-
> ===//
> //
> // The LLVM Compiler Infrastructure
> //
> // This file is distributed under the University of Illinois Open
> Source
> // License. See LICENSE.TXT for details.
> //
> //
> =
> =
> =
> ---------------------------------------------------------------------
> ===//
> //
> // This file contains the implementation of DefUse analysis
> //
> //
> =
> =
> =
> ---------------------------------------------------------------------
> ===//
>
> #define __STDC_LIMIT_MACROS
> #define __STDC_CONSTANT_MACROS
>
> #include "DefUse.h"
>
> #include "clang/Basic/SourceManager.h"
> #include "clang/Analysis/CFG.h"
> #include <llvm/Support/raw_ostream.h>
>
> // todo: Use DenseMap and llvm sets
> #include <set>
> #include <map>
>
> using namespace clang;
> using namespace defuse;
>
> // utility functions
> static unsigned Line(SourceLocation const& l, clang::SourceManager
> const& sm) {
> PresumedLoc pl = sm.getPresumedLoc(l);
> return pl.getLine();
> }
>
> static unsigned Column(SourceLocation const& l, SourceManager const&
> sm) {
> PresumedLoc pl = sm.getPresumedLoc(l);
> return pl.getColumn();
> }
>
> static std::string PrintClangStmt(Stmt const* stmt, ASTContext
> const& ctx) {
> std::string str;
> llvm::raw_string_ostream pp(str);
> stmt->printPretty(pp, 0, PrintingPolicy(ctx.getLangOptions()), 0);
> return pp.str();
> }
>
> //===------------------------- DefUseNode -------------------------
> ===//
> inline bool defuse::DefUseNode::operator==(DefUseNode const& n)
> const {
> if (n.usage != usage || n.kind != kind) return false;
> return kind == VarDecl ? var_decl == n.var_decl : var_ref ==
> n.var_ref;
> }
>
> inline clang::VarDecl const* DefUseNode::getDecl() const {
> return kind == VarRef ? dyn_cast<clang::VarDecl> (var_ref->getDecl
> ()) :
> dyn_cast<clang::VarDecl> (var_decl);
> }
>
> typedef std::set<DefUseNode const*> VarRefsSet;
> typedef std::set<VarDecl const*> VarDeclsSet;
> typedef std::map<VarDecl const*, DefUseVect> DefUseChain;
>
> //===------------------------- DefUseBlock -------------------------
> ===//
>
> class defuse::DefUseBlock : public DefUseChain {

Please use encapsulation instead of subclassing. In this case you are
subclassing std::map, and that is something there rarely is ever a
good reason to do. Encapsulation ensures isolation between the two
classes, and forces you to think in terms of interfaces instead of
making assumptions about the implementation of the underlying data
structure. Almost all of LLVM is structured this way, and it makes
code far more resilient to implementation changes in the employed data
structures.

Instead, either have DefUseChain contain an std::map (and then define
the proper interfaces in DefUseChain) or have DefUseBlock contain a
DefUseChain, not subclass it. The former forces you to define only
the interfaces you need for DefUseChain, instead of exposing the
entirety of std::map (which isn't the interface you're intending to
vend, even if it is just to your client code within the DefUse
implementation).

I have to think about it... the main observation is that this stuff will
be only used in this translation unit.

> public:
> VarDeclsSet uses;
> VarRefsVect defsout;
> VarDeclsSet killed;
> VarRefsVect reaches;
>
> class not_killed_set_iterator : public
> std::iterator<std::input_iterator_tag,
> DefUseNode, std::ptrdiff_t, DefUseNode const*> {
> DefUseBlock& block;
> VarRefsVect::const_iterator ptr_it;
>
> not_killed_set_iterator& inc(bool first) {
> if (ptr_it != block.reaches.end() && !first)
> ++ptr_it;
> while (ptr_it != block.reaches.end() &&
> (block.killed.find((*ptr_it)->getDecl()) != block.killed.end
> () &&
> std::find(block.defsout.begin(), block.defsout.end(),
> *ptr_it)
> == block.defsout.end()))
> ++ptr_it;
> return *this;
> }
> public:
> not_killed_set_iterator(DefUseBlock& block_,
> VarRefsVect::const_iterator ptr_it_) :
> block(block_), ptr_it(ptr_it_) { inc(true); }
> not_killed_set_iterator& operator++() { return inc(false); }
> bool operator!=(not_killed_set_iterator const& iter) const {
> return !((&iter.block) == &(block) && iter.ptr_it == ptr_it);
> }
> const DefUseNode* const & operator*() const { return *ptr_it; }
> };
>
> not_killed_set_iterator begin_not_killed_set() {
> return DefUseBlock::not_killed_set_iterator(*this, reaches.begin
> ());
> }
> not_killed_set_iterator end_not_killed_set() {
> return DefUseBlock::not_killed_set_iterator(*this, reaches.end());
> }
> };
>
> //===------------------------- VarDeclMap -------------------------
> ===//
>
> class defuse::VarDeclMap : public StmtVisitor<VarDeclMap> ,
> public llvm::DenseMap<VarDecl const*, DeclStmt const*> {
> public:
> VarDeclMap(Stmt const* body) { VisitStmt(const_cast<Stmt*> (body)); }
>
> void VisitDeclStmt(DeclStmt *DS) {
> for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS-
> >decl_end();
> I != E; ++I)
> {
> if (VarDecl *VD = dyn_cast<VarDecl>(*I))
> insert(std::make_pair(VD, DS));
> }
> }
> void VisitStmt(Stmt* S) {
> for (Stmt::child_iterator I = S->child_begin(), E = S->child_end();
> I != E; ++I)
> if (*I) Visit(*I);
> }
> };
>
> //===------------------------- DefUseHelper -------------------------
> ===//
> struct defuse::DefUseHelper::DefUseHelperImpl {
> DefUseData* data;
> VarRefBlockMap* bm;
> VarDeclMap* dm;
> unsigned blockID;
>
> DefUseHelperImpl() :
> data(NULL), bm(NULL), dm(NULL) { }
> };

Per my comments above, it seems to me this pimpl logic can be sucked
into a common class with the main logic for the DefUse algorithm, and
then make that whole thing private to DefUse.cpp. That will be a lot
simpler, and should migrate many of the implementation details from
the header file to the .cpp file. Then clients just pass the
DefUseHelper object (which has the HandleXXX methods) to
BuildDefUseChains(), and the implementation class stays completely
hidden (thus separating the DefUse algorithm from the client interface
for querying that information).

I tried to deal with this.

>
> defuse::DefUseHelper::DefUseHelper() :
> pimpl(new DefUseHelperImpl), current_use(DefUseNode::Use) { }
>
> defuse::DefUseHelper::~DefUseHelper() { delete pimpl; }
>
> void defuse::DefUseHelper::InitializeValues(DefUseData* data,
> VarRefBlockMap* bm, VarDeclMap* dm) {
> pimpl->data = data;
> pimpl->bm = bm;
> pimpl->dm = dm;
> }
>
> void defuse::DefUseHelper::HandleDeclRefExpr(DeclRefExpr *DR) {
> VarRefBlockMap& bm = *pimpl->bm;
> unsigned blockID = pimpl->blockID;
> DefUseBlock& block_data = (*pimpl->data)[blockID];
> if (VarDecl *VD = dyn_cast<VarDecl>(DR->getDecl())) {
> // map this variable to the current block
> bm[DR] = blockID;
> if (block_data.find(VD) == block_data.end()) {
> block_data.insert(std::make_pair(VD, DefUseVect()));
> block_data.uses.insert(VD); // this variable is used in the
> current
> // block and it has no prior definition within the block
> }
> if (current_use == DefUseNode::UseDef) { // in case of compound
> operators
> DefUseVect &B = block_data[VD];
> B.push_back(DefUseNode(DR, DefUseNode::Use));
> B.push_back(DefUseNode(DR, DefUseNode::Def));
> } else block_data[VD].push_back(DefUseNode(DR, current_use));
> }
> }
>
> void defuse::DefUseHelper::HandleDeclStmt(DeclStmt *DS) {
> for (DeclStmt::decl_iterator I = DS->decl_begin(), E = DS->decl_end
> ();
> I != E; I++)
> {
> if (VarDecl* VD = dyn_cast<VarDecl> (*I)) {
> VarRefBlockMap& bm = *pimpl->bm;
> VarDeclMap& dm = *pimpl->dm;
> unsigned blockID = pimpl->blockID;
> DefUseBlock& block_data = (*pimpl->data)[blockID];
>
> assert((block_data.find(VD) == block_data.end()) &&
> "Variable redefined in the same block");
> block_data[VD].push_back(DefUseNode(VD));
> bm[dm[VD]] = blockID;
> }
> }
> }
>
> void defuse::DefUseHelper::HandleBinaryOperator(BinaryOperator* B) {
> DefUseNode::UseKind backup = current_use; // backup the usage
> if (B->isAssignmentOp()) {
> current_use = DefUseNode::Use;
> Visit(B->getRHS());
> current_use = DefUseNode::Def;
> if (B->isCompoundAssignmentOp())
> current_use = DefUseNode::UseDef;
> Visit(B->getLHS());
> } else {
> Visit(B->getLHS());
> current_use = DefUseNode::Use;
> Visit(B->getRHS());
> }
> current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::HandleConditionalOperator
> (ConditionalOperator* C) {}
>
> void defuse::DefUseHelper::HandleCallExpr(CallExpr* C) {
> DefUseNode::UseKind backup = current_use; // backup the usage
> for (CallExpr::arg_iterator I = C->arg_begin(), E = C->arg_end();
> I != E; ++I)
> {
> if ((*I)->getType()->isPointerType() || (*I)->getType()-
> >isReferenceType())
> current_use = DefUseNode::Def;
> Visit(*I);
> current_use = backup;
> }
> }
>
> void defuse::DefUseHelper::HandleUnaryOperator(UnaryOperator* U) {
> DefUseNode::UseKind backup = current_use; // backup the usage
> switch (U->getOpcode()) {
> case UnaryOperator::PostInc:
> case UnaryOperator::PostDec:
> case UnaryOperator::PreInc:
> case UnaryOperator::PreDec:
> current_use = DefUseNode::UseDef;
> break;
> case UnaryOperator::Plus:
> case UnaryOperator::Minus:
> case UnaryOperator::Not:
> case UnaryOperator::LNot:
> current_use = DefUseNode::Use;
> break;
> case UnaryOperator::AddrOf:
> case UnaryOperator::Deref:
> // use the current_use
> break;
> default:
> // DEBUG("Operator " << UnaryOperator::getOpcodeStr(U->getOpcode
> ()) <<
> // " not supported in def-use analysis");
> break;
> }
> Visit(U->getSubExpr());
> current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::HandleArraySubscriptExpr
> (ArraySubscriptExpr* AS) {
> DefUseNode::UseKind backup = current_use; // backup the usage
> Visit(AS->getBase());
> current_use = DefUseNode::Use;
> Visit(AS->getIdx());
> current_use = backup; // write back the usage to the current usage
> }
>
> void defuse::DefUseHelper::VisitCFGBlock(CFGBlock const& blk,
> CFGBlock const& entry) {
> pimpl->blockID = blk.getBlockID();
> for (CFGBlock::const_iterator I = blk.begin(), E = blk.end(); I !=
> E; ++I)
> Visit(*I);
>
> unsigned blockID = pimpl->blockID;
> DefUseBlock& block_data = (*pimpl->data)[blockID];
>
> // Post processing
> // build up the uses(b), defsout(b) and killed(b)
>
> // for each variable
> for (DefUseBlock::iterator I = block_data.begin(), E =
> block_data.end(); I
> != E; ++I) {
> // in case of references to ParmVar what we do is insert the
> variable
> // definition in the first block of the CFG
> DefUseBlock& root_data = (*pimpl->data)[entry.getBlockID()];
> if (isa<ParmVarDecl>(I->first) &&
> std::find(root_data[I->first].begin(), root_data[I->first].end
> (),
> DefUseNode(I->first)) == root_data[I->first].end())
> {
> root_data[I->first].push_back(DefUseNode(I->first));
> root_data.defsout.push_back(&root_data[I->first].back());
> }
> // if this is a local variable, should't be in the defsout,killed
> sets
> // defsout is calculated by iterating backwards to find the last
> use of
> // the variable
> for (DefUseVect::const_reverse_iterator UI = I->second.rbegin(),
> UE =
> I->second.rend(); UI != UE; ++UI)
> if (UI->getUse() == DefUseNode::Def && (block_data.defsout.empty
> ()
> >> block_data.defsout.back()->getDecl() != I->first)) {
> block_data.defsout.push_back(&(*UI));
> block_data.killed.insert(UI->getDecl());
> break;
> }
> }
> }

Are these handle methods just part of the main DefUse algorithm? If
so, consider sucking them into the Visit methods (thus removing the
virtual dispatch), and have them call the HandleXXX methods in a
separate DefUseHelper object if one was provided. That way the
algorithm stays separate from those who want to observe its values.

there is no observing, the handleXXX are used to build DefUse.

I know I keep saying this point, but a critical part of making this
clean is making it clear where the def-use implementation is and where
we would want to insert hooks to query information (including auditing
the algorithm). Making those interfaces clear is key. Ideally, the
header file should define just the functions we need to run the
algorithm and query the def-use information, nothing more. I fully
admit that I just might be confused with the details of the
implementation.

There is a bunch of other code that I'm afraid I haven't waded through
yet, as there looks like there are some really meaty pieces there.
I've noticed some uses of std::set and performing set union's, etc.
Note that there may be a really high cost to using such methods, as
there will be implicit allocations from malloc, etc. There may
possibly be alternative data structures within LLVM that you can use
for your purposes that perform far better, but I haven't looked at
your implementation close enough to make more specific comments yet.
I've given you enough comments, however, in this email to await any
feedback you want to provide on the next iteration.

Again, this is really great you're interested in contributing this
back, and I look forward to hearing your reply.

thanks to you Ted!

Best,
Ted

cheers, Simone

DefUse.cpp (34.2 KB)

DefUse.h (8.15 KB)