df_ext_iterator in LiveIntervalAnalysis

I would like to make a suggestion. In the LiveIntervalAnalysis class, instead of numbering the instructions in the order in which basic blocks are stored in the machine function, use the df_ext_iterator. It will order the instruction according to the dominance tree (or it seems to be doing so). There are many advantages in doing this. One of them is that, once you traverse the dominance tree to find the live intervals, they will be contiguous, you don't have to handle holes. Could someone tell me if there is any problem in doing this? In my version of LLVM, I just had to make two changes in LiveIntervalAnalysis:

//===---------------------------------------
First: when numbering the instructions:
// <-- new code
//===---------------------------------------
   unsigned miIndex = 0;
   MachineBasicBlock *entry = mf_->begin();
   std::set<MachineBasicBlock*> visited;
   for (df_ext_iterator<MachineBasicBlock*> dfi = df_ext_begin(entry, visited),
                       endi = df_ext_end(entry, visited); dfi != endi; ++dfi) {
       MachineBasicBlock *mbb = *dfi;
       for (MachineBasicBlock::iterator mi = mbb->begin(), miEnd = mbb->end();
                                                           mi != miEnd; ++mi) {
         bool inserted = mi2iMap_.insert(std::make_pair(mi, miIndex)).second;
         assert(inserted && "multiple MachineInstr -> index mappings");
         i2miMap_.push_back(mi);
         miIndex += InstrSlots::NUM;
       }
   }
   this->maxInstrIndex_ = miIndex;
//===---------------------------------------
// --> old code
//===---------------------------------------
   unsigned miIndex = 0;
   for (MachineFunction::iterator mbb = mf_->begin(), mbbEnd = mf_->end();
        mbb != mbbEnd; ++mbb) {
     for (MachineBasicBlock::iterator mi = mbb->begin(), miEnd = mbb->end();
          mi != miEnd; ++mi) {
       bool inserted = mi2iMap_.insert(std::make_pair(mi, miIndex)).second;
       assert(inserted && "multiple MachineInstr -> index mappings");
       i2miMap_.push_back(mi);
       miIndex += InstrSlots::NUM;
     }
   }
   this->maxInstrIndex_ = miIndex;

//===---------------------------------------
Second: when computing the intervals:
// <-- new code
//===---------------------------------------
   MachineBasicBlock *entry = mf_->begin();
   std::set<MachineBasicBlock*> visited;
   for (df_ext_iterator<MachineBasicBlock*> dfi = df_ext_begin(entry, visited),
                          end = df_ext_end(entry, visited); dfi != end; ++dfi) {
       MachineBasicBlock *mbb = *dfi;
//===---------------------------------------
// --> old code
//===---------------------------------------
// for (MachineFunction::iterator I = mf_->begin(), E = mf_->end();
// I != E; ++I) {
// MachineBasicBlock* mbb = I;

Fernando

I would like to make a suggestion. In the LiveIntervalAnalysis class,
instead of numbering the instructions in the order in which basic blocks
are stored in the machine function, use the df_ext_iterator. It will order
the instruction according to the dominance tree (or it seems to be doing
so). There are many advantages in doing this. One of them is that, once
you traverse the dominance tree to find the live intervals, they will be
contiguous, you don't have to handle holes. Could someone tell me if there
is any problem in doing this? In my version of LLVM, I just
had to make two changes in LiveIntervalAnalysis:

That should work fine. Please do it, then do some compile-time/code-quality comparisons to see if it's better :slight_smile:

good numbers + code = patch committed to cvs :slight_smile:

Thanks, nice idea btw!

-Chris

//===---------------------------------------
First: when numbering the instructions:
// <-- new code
//===---------------------------------------
  unsigned miIndex = 0;
  MachineBasicBlock *entry = mf_->begin();
  std::set<MachineBasicBlock*> visited;
  for (df_ext_iterator<MachineBasicBlock*> dfi = df_ext_begin(entry,
visited),
                      endi = df_ext_end(entry, visited); dfi != endi;
++dfi) {
      MachineBasicBlock *mbb = *dfi;
      for (MachineBasicBlock::iterator mi = mbb->begin(), miEnd =
mbb->end();
                                                          mi != miEnd;
++mi) {
        bool inserted = mi2iMap_.insert(std::make_pair(mi,
miIndex)).second;
        assert(inserted && "multiple MachineInstr -> index mappings");
        i2miMap_.push_back(mi);
        miIndex += InstrSlots::NUM;
      }
  }
  this->maxInstrIndex_ = miIndex;
//===---------------------------------------
// --> old code
//===---------------------------------------
  unsigned miIndex = 0;
  for (MachineFunction::iterator mbb = mf_->begin(), mbbEnd = mf_->end();
       mbb != mbbEnd; ++mbb) {
    for (MachineBasicBlock::iterator mi = mbb->begin(), miEnd =
mbb->end();
         mi != miEnd; ++mi) {
      bool inserted = mi2iMap_.insert(std::make_pair(mi, miIndex)).second;
      assert(inserted && "multiple MachineInstr -> index mappings");
      i2miMap_.push_back(mi);
      miIndex += InstrSlots::NUM;
    }
  }
  this->maxInstrIndex_ = miIndex;

//===---------------------------------------
Second: when computing the intervals:
// <-- new code
//===---------------------------------------
  MachineBasicBlock *entry = mf_->begin();
  std::set<MachineBasicBlock*> visited;
  for (df_ext_iterator<MachineBasicBlock*> dfi = df_ext_begin(entry,
visited),
                         end = df_ext_end(entry, visited); dfi != end;
++dfi) {
      MachineBasicBlock *mbb = *dfi;
//===---------------------------------------
// --> old code
//===---------------------------------------
// for (MachineFunction::iterator I = mf_->begin(), E = mf_->end();
// I != E; ++I) {
// MachineBasicBlock* mbb = I;

Fernando
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

Hi,

Just my two cents:

If I recall correctly, in some papers on the linear scan register allocation people described that they tried different orderings for instruction numbering, e.g. including DFS or based on the loop nesting levels, etc. There was no clear winner though.

But let's see the numbers anyway. May be it really brings some improvements.

-Roman

Chris Lattner wrote:

Actually, changing the ordering, with the current implementation of linear scan that LLVM uses, will not bring improvements to the code. The register allocator must be aware that the intervals are been visited in the order of dominance between basic blocks. This allows to simplify the implementation a bit, because you don't have to handle holes in the live ranges: the new ordering has the property that, once a program point is visited (i.e the intervals starting at that program poin), all the program points that dominate it have already been visited.

Fernando

Actually, changing the ordering, with the current implementation of linear
scan that LLVM uses, will not bring improvements to the code. The register
allocator must be aware that the intervals are been visited in the order
of dominance between basic blocks. This allows to simplify the
implementation a bit, because you don't have to handle holes in the live
ranges: the new ordering has the property that, once a program point is
visited (i.e the intervals starting at that program poin), all the program
points that dominate it have already been visited.

You can still have holes due to phi elimination, copy coallescing, live ranges for physregs, etc.

-Chris

Just my two cents:

If I recall correctly, in some papers on the linear scan register
allocation people described that they tried different orderings for
instruction numbering, e.g. including DFS or based on the loop nesting
levels, etc. There was no clear winner though.

But let's see the numbers anyway. May be it really brings some improvements.

-Roman

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

That is very true. Holes appear mostly due to phi-elimination. That is why I said you would have to change the implementation a bit. To avoid these holes, one must assign registers before SSA-elimination. In general this tends to reduce spill, but increases the number of copies in the final code.

Nice idea. Please also try using SmallPtrSet (with a sufficiently large size) instead of std::set for traversal after everything is working. Using std::set can really hurt compile time in case of large basic block numbers.

Is there a way to dynamically adjust "SmallSize" based on number of basic blocks in the function?

Evan

Nice idea. Please also try using SmallPtrSet (with a sufficiently
large size) instead of std::set for traversal after everything is
working. Using std::set can really hurt compile time in case of large
basic block numbers.

Is there a way to dynamically adjust "SmallSize" based on number of
basic blocks in the function?

Nope. The small size for SmallSet really should be "small" anyway (say < 32). For sets below that threshold, queries do a linear scan of all the elements to determine if something is inside the set.

-Chris