The implementation algorithm behind LLVM's RegionInfo class

Hi fellows,

I am writing to ask what is the algorithm implemented in LLVM’s RegionInfo.h/cpp. In the header file “RegionInfo.h”, it says “Calculates a program structure tree built out of single entry single exit regions (defined in a 1994 paper called “The Program Structure Tree”). … … The algorithm to calculate these data structures however is COMPLETELY DIFFERENT, as it takes advantage of existing information already available … …”. Does anyone know any papers talking about the exact implementation? Also, how is the RegionInfo class related to a general interval analysis? Thanks a lot.

Best Regards,
Paul

Hi Paul,

I should probably reply. I implemented this algorithm four years ago.

Yes, this algorithm is on purpose different. I implemented it after investigating previous approaches. There are two main reasons for the difference:

1) The Program Structure Tree does not access so-called refined regions

Refined regions are program regions that do not have a single entry and a single exit edge, but can be transformed into such a region very easily. My experience was, that most programs have a large number of refined regions which we want to detect.

2) Use of the dominator tree

When I first came to the mailing list and asked for advice people (including Chris) strongly suggested to rely on existing analysis as much as possible. The RegionInfo pass does this by using the DominatorTree as well as a couple of other analysis. My measurements
at that time have shown that building the RegionTree itself is in average twice as fast as building the dominator tree. The other necessary analysis have taken time comparable to a dominator tree construction. The use of DominanceFrontier in the RegionInfo tree has
been a point criticized because this analysis is normally not used in LLVM (and needs to be built just for RegionInfo). Also the dominance frontier is in exceptional cases expensive to construct. On the other side, the analysis has shown useful not only for Polly, but a couple of other projects (R600 backend?, some OpenCL vectorizers?). Still, if we find a way to remove the reliance on the DomFrontier while still being able to detect refined regions, this would be very valuable.

I never bothered to publish the Region Info paper, but I have an unfinished draft that I am happy to send you. Also, if the documentation
in the source code is too sparse, please let me know and I add the missing information.

Out of interest, why are you interested in the RegionInfo implementation?

Cheers,
Tobias

Hi Tobias,

Thanks a lot for the detailed reply. I am working on several new optimizations for OpenCL kernels for a DSP architecture. The project itself has an NDA associated with it, so I cannot go into more details, but the source will be open to public after completion. One of the first steps is to serialize the work-items in a work-group (e.g., insert nested loops around REGIONs in between barriers). At this point, I thought I should implement a full-featured structural analysis, but some one in my team suggested to first explore what was already in the LLVM. That’s what lead me to the RegionInfo class. If it’s not much trouble, could you send the draft to my forum email? Also, are you aware of any plan on having the interval/structural analysis supported in LLVM. Thanks again.

P.S. One thing I need to find out from reading the implementation is why a sequence of basic blocks do not get to form a region as a whole.

Best Regards,
Paul

Hi Tobias,

          Thanks a lot for the detailed reply. I am working on several new
optimizations for OpenCL kernels for a DSP architecture. The project itself
has an NDA associated with it, so I cannot go into more details, but the
source will be open to public after completion. One of the first steps is
to serialize the work-items in a work-group (e.g., insert nested loops
around REGIONs in between barriers). At this point, I thought I should
implement a full-featured structural analysis, but some one in my team
suggested to first explore what was already in the LLVM. That's what lead
me to the RegionInfo class. If it's not much trouble, could you send the
draft to my forum email? Also, are you aware of any plan on having the
interval/structural analysis supported in LLVM. Thanks again.

Maybe this is interesting to you:

lib/Transforms/Scalar/StructurizeCFG.cpp

P.S. One thing I need to find out from reading the implementation is why a
sequence of basic blocks do not get to form a region as a whole.

A sequence of basic blocks can form a region. The reason the RegionInfo does not form such reason is that there is no canonical way to form such regions. Hence, RegionInfo is only forming minimal regions, which can then be easily joined to larger reasons by connecting sequences of them (e.g. using the expandRegion()).

Btw, depending how complex your transformations will be, you might also be interested in polly.llvm.org or other automatic transformation in the area of automatic GPU/accelerator code generation. Those won't be finished solutions, but the frameworks/algorithms that exist there may allow to develop optimizations for your hardware a lot faster.

Tobi

Hi Tobi,

I have one additional question about the RegionInfo::isRegion function. In the second case (i.e. Entry dominates Exit), why is checking the following two conditions are equivalent to checking it’s a refined region:

For any BB in DF(exit), 1) BB should be in DF(entry)
2) BB reachable only from entry through a path passing exit.

I.e., why is checking these two conditions is equivalent to ensure single-entry, single-exit in the sense of Refined Region.

Best Regards,
Paul

I tried to formalize this in section 3.1 of the paper I sent you (but
did not look into this since almost four years), (no bug reports yet,
despite regular use at least in Polly).

Here the old text:

If Entry dominates Exit, than Exit is not element of
the dominance frontier of Entry. Every BB ∈ DF ( Exit ) will be element
of the domi- nance frontier of Entry. So there are two conditions to
check. First only basic blocks that are part of the dominance frontier
of Exit are allowed to be element of the dominance frontier of Entry.
And furthermore it has to be shown that these basic blocks can only be
reached from Entry through a path passing Exit. To show this the
derivedDomFrontier() function is used. derivedDomFrontier ( BB, Entry,
Exit ) checks if there exists a path from Entry to BB that does not pass
Exit. This is done by checkng for every predecessor of BB, that if it
is dominated by Entry it is also dominated by Exit. It has still to be
shown that there are no edges entering the region. As all basic blocks
are dominated by Entry the only case where edges enter the region is if
Exit is dominated by Entry and has back edges pointing into the region.
These back edges will point to basic blocks dominated by Entry but not
by Exit. So the dominance frontier of Exit is not allowed to contain any
basic blocks that are dominated by Entry.

The code in lib/Analysis/RegionInfo.cpp matches this description almost
one-to-one with the only change that derivedDomFrontier was renamed to
isCommonDomFrontier().

Did you read this text? Does it make sense to you? It is not a nicely
written mathematical proof but may give an intuition.

Out of interest, why are you interested in the formalization of the
RegionInfo pass? Do you happen to have a test case where it is
incorrect?

Thanks,
Tobias

Thanks very much for the quick response. I have read the text many times, but it was not very clear to me why checking the two conditions involving dominance frontiers is equivalent to proving the pair {entry, exit} defines a refined region. I was asking for an mathematical proof really. It sounds to me like there should be a theorem or two in the graph theory endorsing it. Or do you mean, the implementation is solely based on empirical observation instead of theory. The reason I am interested in the formalization is that I find the current RegionInfo implementation very helpful in defining regions in between barriers in the OpenCL implementation on CPU. I haven’t found a test case that breaks it. I got the right intuition, just could not figure out a way to formally prove it, neither did I find one in your regioninfo draft.

Thanks,
Paul

The code was written as a student project during my masters and I never got around to write a paper or a formal mathematical proof. I believe the description should give an intuition for the prove, but I am currently a little too loaded to do it myself. Obviously, I would be interested in any kind of further formalisation. Is there a specific part of the description that is unclear to you and that would allow you to possibly develop a proof?

Also, please keep me updated with your OpenCL barrier work. This sounds very interesting.

Cheers,
Tobias

The draft is pretty clearly written (with some typos that correct themselves).

I was just thinking if the implementation is only based on empirical studies of example CFGs, then under what condition it fails. My OpenCL barrier work right now is mainly using DT and PDT trees (instead of dominance frontier). But the algorithm looks similar to what was implemented in regioninfo’s findRegionsWithEntry function. I am trying to understand the connection between the two a bit better so that I can maximize the re-use of what is already existing in the LLVM. I will keep you updated with the progress. Thanks.

Best,
Paul

I was looking at this recently since I want to create a MachineRegionPass, which doesn’t currently exist. Can the existing RegionInfo be refactored to support MachineBasicBlocks? I went to try doing it it, but then the DominanceFrontier it depends on has a warning that it’s deprecated. Should I be looking somewhere else or do I need to create something new (which I really don’t want to do)?

-Matt

Hi Matt,

to give my point on this.

The use of DominanceFrontier has previously caused concerns raising the following two issues: 1) Worst case N^2 complexity of the DominanceFrontier computation 2) Not available by default, which means using just the dominance information is preferable as it is most of the time already available and costs O(0). The last is an argument against the use of any analysis pass, if dominance info is sufficient.

I personally have not seen any compile-time issues due to 1) in the last four years of using it in Polly, but this does obviously not change the theoretical complexity. It would be great to remove the need for the dominance frontier in the RegionInfo pass, but at the time of writing it I did not see a way to do so, without reducing its precision (not handling so-called refined regions any more).

Assuming the concept of control flow regions is what you want/need there
seem to be the following steps you could take forward:

1) I think porting the interface/analysis to MachineBasicBlocks might
    makes sense in general to evaluate/experiment it in your work and
    also to gather data on its speed and possible compile-time overhead.
    Also, I would run experiments to verify how much benefit the use
    of refined regions gives you by running your pass once only on
    simple regions and once on simple and refined regions

2) If simple regions have shown sufficient for you, you might check if
    a dominance-tree-only check is possible that is sufficient
    to find simple regions.

3) Obviously, work on the algorithmic side is always possible and in
    the best case this should not affect the interface.

Cheers,
Tobias