An unexpected behavior in RegionInfo's block_iterator

Hi Fellows,

I notice an unexpected behavior in RegionInfo’s block_iterator. Consider the following situation: a user creates her own region containing just a single basic block TheBB, then tries to have the block_iterator to print a DFS traversal of the region. The expected behavior should be that only the single basic block TheBB will be printed, but the real behavior is that block_iterator prints out all the basic blocks of the CFG starting from TheBB to the end.

It looks like the issue originates from setting the end iterator to (BasicBlock *) 0. I understand that the “region” detected by RegionInfo should never contain a single basic block or even a sequence of basic blocks. So maybe the above degenerated case is considered “will never happen”?

… …

BasicBlock *BB = Func.getARandomBasicBlock;

Region *R = new Region(BB, BB, RI, DT);
for (Region::block_iterator i = R->block_begin(),

e = R->block_end(); i != e; ++i) {
errs() << (*i)->getName() << “\n”;
}

… …

Best Regards,
Paul

Hi Fellows,

    I notice an unexpected behavior in RegionInfo's block_iterator. Consider
the following situation: a user creates her own region containing just a
single basic block TheBB, then tries to have the block_iterator to print a
DFS traversal of the region. The expected behavior should be that only the
single basic block TheBB will be printed, but the real behavior is that
block_iterator prints out all the basic blocks of the CFG starting from
TheBB to the end.

     It looks like the issue originates from setting the end iterator to
(BasicBlock *) 0. I understand that the "region" detected by RegionInfo
should never contain a single basic block or even a sequence of basic
blocks. So maybe the above degenerated case is considered "will never
happen"?

     ... ...

     BasicBlock *BB = Func.getARandomBasicBlock;

     Region *R = new Region(BB, BB, RI, DT);
     for (Region::block_iterator i = R->block_begin(),
            e = R->block_end(); i != e; ++i) {
          errs() << (*i)->getName() << "\n";
       }

Hi Paul,

the exit basic block is the block after the region.

  BB0 < Entering Block
   >
   v
  BB1 < Entry and Exiting block
   >
   v
  BB2 < Exit block

So your code should probably be:

new Region(BB, *pred_begin(BB), RI, DT)

Let me know if it works for you. In case it does, we could add an assert to the constructor to catch these cases.

Cheers,
Tobias

Hi Tobias,

Thanks so much for the quick response. Your approach fixes the issue.
On a bigger context, would it make more sense to make the region exit part of the region? For example, a while loop gets lowered down to LLVM IR contains while.cond, while.body and while.end. If one tries to use RegionInfo as a substitute for structural analysis, she might think while.end should belong to the region/structure … is it necessary to exclude the exit from being part of the region or both ways are equally correct in terms of uniquely representing regions?

Best,
Paul

Hi Tobias,

      Thanks so much for the quick response. Your approach fixes the issue.
      On a bigger context, would it make more sense to make the region exit
part of the region? For example, a while loop gets lowered down to LLVM IR
contains while.cond, while.body and while.end. If one tries to use
RegionInfo as a substitute for structural analysis, she might think
while.end should belong to the region/structure ... is it necessary to
exclude the exit from being part of the region or both ways are equally
correct in terms of uniquely representing regions?

The region provides both the exiting blocks as well as the exit block. So you should have all you need, no?

An example, where it is beneficial to use the block after the region as to define a region is the following:

     IF-1
    / \
   > IF-2
   > / \
   > > >
    \ | /
     \ | /
      \|/
      END

Here, we would like to model two regions IF-1 => END and IF-2 => END. The first region contains IF-1 and IF-2, the second one just IF-2. Obviously, neither of these regions has a single exit edge, but for both it is possible to create a single exit edge by introducing two
new basic blocks on the outgoing edges of each region.

Cheers,
Tobias

Hi Tobias,

      Thanks so much for the quick response. Your approach fixes the
issue.
      On a bigger context, would it make more sense to make the region
exit
part of the region? For example, a while loop gets lowered down to LLVM IR
contains while.cond, while.body and while.end. If one tries to use
RegionInfo as a substitute for structural analysis, she might think
while.end should belong to the region/structure ... is it necessary to
exclude the exit from being part of the region or both ways are equally
correct in terms of uniquely representing regions?

The region provides both the exiting blocks as well as the exit block. So
you should have all you need, no?

Yes, the region interface suffices all my needs. It just does not contain
an "iterator" that can iterate all the basic blocks
in the region plus the exit at once. What I usually do, under this kind of
situation, is to define a std::vector of basicblocks,
push all the basic blocks in the region using block_iterator in a for loop.
And then, at the end, push_back(region->getExit()).

An example, where it is beneficial to use the block after the region as to
define a region is the following:

    IF-1
   / \
  > IF-2
  > / \
  > > >
   \ | /
    \ | /
     \|/
     END

Here, we would like to model two regions IF-1 => END and IF-2 => END. The
first region contains IF-1 and IF-2, the second one just IF-2. Obviously,
neither of these regions has a single exit edge, but for both it is
possible to create a single exit edge by introducing two
new basic blocks on the outgoing edges of each region.

Cheers,
Tobias

Got it! Thanks for the example.

Best,
Paul

Why would you need such an iterator? The exit is not part of the region, so I am slightly surprised you expect it to be enumerated.

Tobias

Had regioninfo been designed so that the region exit to be integral part of
a region, then there would be an iterator that enumerates up until exit ...
is what I was trying to say. You're totally right, under the current
design, there wouldn't be no need for such a iterator.