Refactoring the dominators patch for Clang

Hi, Ted

The preliminary refactoring of the dominators patch for clang based on the more efficient LLVM core implementation is done. Attached is the patch. I am not very satisfied with this version because it relies a ugly hack to deal with the subtle differences between LLVM Function and Clang CFG. Since this version requires some modifications to include/llvm/Analysis/Dominators.h, so there is also a patch for llvm.

While I believe there should be a cleaner way to do this, I do not know how to achieve that. Please let me know your comments. I shall continue to improve until it become satisfactory.
Regards.

dominators-clang.patch (15.5 KB)

dominators-llvm.patch (1.24 KB)

Hi Guoping,

It seems that the only reason why you need to copy the recalculate function is that the names for the entry block in the clang CFG and LLVM function are different. Is that correct? (The bool parameter does not seem to be used..)

A simple solution to that would be to make sure that we have the same name in both. I suggest using getEntryBlock().

On llvm's side, it would involve changing the recalculate function to use getEntryBlock instead of front(). Looks like they are the same thing.

On Clang's side, we could just rename CFG::getEntry() -> CFG::getEntryBlock().

Thanks!
Anna.

Both of these seem reasonable to me.

Hi, Anna

First, thank your for your time for commenting this code!
There is one more subtle difference. This is the major one that ended up the code this way. It is the difference between CFGBlock::iterator and Function::iterator, as shown in the code excerpt below.

Function::iterator. Each iterator I is used directly as BasicBlock * here.
// Initialize the roots list
for (typename FT::iterator I = F.begin(), E = F.end(); I != E; ++I) {
if (std::distance(GraphTraits<FT*>::child_begin(I),
GraphTraits<FT*>::child_end(I)) == 0)
addRoot(I);

CFG::iterator. Each iterator has the CFGBlock ** type. In this case we need a pointer indirection here.

// Initialize the roots list
for (typename FT::iterator I = F.begin(), E = F.end(); I != E; ++I) {
if (std::distance(GraphTraits<FT*>::child_begin(I),
GraphTraits<FT
>::child_end(*I)) == 0)
addRoot(*I);

One approach is to change CFG or Function implementation to make them have the same interface. But I am afraid this may cause two many changes to the code base. I would love to hear your comments. Is there a decent solution that does not require too many changes to the code?

Hi Guoping,

There are two possible solutions here.

First solution is to rewrite the recompute() routine to use the following GraphTraits instead of passing in the Function/clang::CFG object:
// From include/llvm/Support/CFG.h
template <> struct GraphTraits<Function*> : public GraphTraits<BasicBlock*>
GraphTraits already has most of the APIs used by the dominator computation. However, it does not have the size() method, which is used in the Calculate() function called from recalculate(). That would mean that we would possibly need to extend the traits… Then, we could change the clang::CFG GraphTraits to have the nodes_iterator type consistent with that of Function’s GraphTraits. This change is much more intrusive on LLVM side.

Another solution is to change the iterator type in clang::CFG to match the Function iterator.

As a side note, it’s a bit strange that the nodes_iterator types are inconsistent between clang::CFG and Function GraphTraits. They are defined as iterators of clang::CFG and Function respectively. Everything still works because the GraphWriter, which relies on the traits, has an overloaded writeNode() function which allows it to work with NodeType and NodeType*. But either solution would fix the inconsistency.

Ted, do you have any preferences?

Cheers,
Anna.

Hi, Anna, Ted

Here is the approach I finally take, which I believe shall be least intrusive to both clang and LLVM:
The EntryBlock issue is easy to resolve, as you pointed out, I just changed the Clang’s counterpart method into this name. Up to now, three classes involve dominators: the Function class, the Clang::CFG class, and the MachineFunction class. The MachineFunction class also does not have the getEntryBlock() method. It’s trivial to just add this method into the class.

To deal with the iterator issue, either changing Clang or LLVM to use the other’s iterator is too intrusive. The approach I am using is simply an interface method. On the Clang side, add this method to the CFG class:
CFGBlock * getBlock(iterator I) { return *I; }
While on the LLVM side, adding this method to both Function and MachineFunction classes:
BasicBlock * getBlock(iterator I) { return I; }
This fixes the issue.

Attached is the final patch to both LLVM and Clang. Hopefully this could conclude the addition of the dominator tree support for Clang. Please let me know your comments.

Thanks!

dominators-clang.patch (21.2 KB)

dominators-llvm.patch (2.97 KB)

Hi Guoping,

Even though adding another method (getBlock()) to resolve the inconsistency is the least intrusive solution, it’s not the cleanest one in my opinion. Ex: There is no interface that defines the API a graph class should implement to support the dominator computation. It’s not a problem with your extension, but with the current dominator’s implementation. The cleanest approach would be to change the dominators to only rely on GraphTraits. Though that would involve more work and I am not sure if you have the time to do it.

Cheers,
Anna.

Hi, Anna

I may not fully understand your point. Here are my thoughts:
First, In think changing the CFG::iterator to be in line with Function::iterator is too intrusive to clang, and not very necessary. I actually tried to do this. It involved so many changes that I finally gave up.

Second, on your comment on the approach I took. Even the current implementation of the recalculate function does not solely depend on GraphTraits. It also depends on Function::getEntryBlock or Function::front(), Function.size(), etc. Why do you think a clean implementation should only depend on GraphTraits? In addition, I actually have such a concern: is it reasonable enough to fill in GraphTraits more stuff only because dominators implementation needs this?

I believe it should be possible to change the recalculate function to only depend on GraphTraits. But I am not sure if this will be viewed as too intrusive to LLVM. In thinking of these I finally made that compromise.

Regarding the time, I am willing to do further refactoring until this code will finally become good enough, although I can only work on weekend on this. I just want to understand clearly the reasonable motivation behind each refactoring step.

Hi, Anna

I may not fully understand your point. Here are my thoughts:
First, In think changing the CFG::iterator to be in line with Function::iterator is too intrusive to clang, and not very necessary. I actually tried to do this. It involved so many changes that I finally gave up.

Just to be sure we are on the same page, I was suggesting to change the type, not the internal implementation of the CFG. Ex: you could create an iterator adapter. In any case, there will be less dependencies if we go with the GraphTraits solution.

Second, on your comment on the approach I took. Even the current implementation of the recalculate function does not solely depend on GraphTraits. It also depends on Function::getEntryBlock or Function::front(), Function.size(), etc. Why do you think a clean implementation should only depend on GraphTraits?

That is the general idea behind GraphTraits. Each class which represents a graph implements the traits interface. The graph algorithms (dominators is one of them) rely only on that interface. I am not sure why the current implementation of dominators relies directly on the Function class methods instead of using the traits. I suspect it was just the matter of someone not having the time to implement that…

In addition, I actually have such a concern: is it reasonable enough to fill in GraphTraits more stuff only because dominators implementation needs this?

We should ask on the LLVM developers mailing list. getEntryBlock() is already in the GraphTraits, so we would only need to add size() (or something similar to that, which would satisfy the dominators implementation).

I believe it should be possible to change the recalculate function to only depend on GraphTraits. But I am not sure if this will be viewed as too intrusive to LLVM. In thinking of these I finally made that compromise.

This would require a non-trivial patch to LLVM. I’d suggest running the idea by the llvm-dev list/sending the LLVM patch there for review.

Regarding the time, I am willing to do further refactoring until this code will finally become good enough, although I can only work on weekend on this. I just want to understand clearly the reasonable motivation behind each refactoring step.

Great! Thanks for all the work you are doing.

Anna.

I’m just catching up on this thread, but I agree 100% with Anna the we should have Dominators rely more on traits rather than specific methods in the graph object is the best way to do go. Essentially there is an abstraction violation here that doesn’t make Dominators generic enough. What if we added a third kind of graph that we wanted to run Dominators on? Would it need to have the same exact methods as well? Having Dominators depend fully on trait classes gives users all the flexibility they need to adopt that algorithm to any graph data structure.

While well intentioned, I’m not a fan of this approach. The problem is really that the dominators implementation is not generic enough. Further, should any graph structure we choose to run dominators on be required to provide a “getBlock()” method? What if we want to run dominators on a graph where there is no such notion of a “block”?

Dominators is a generic graph algorithm. Compilers are just one application of the dominator algorithm, but not the only one. Embedding the notion of basic blocks into the dominators algorithm violates the generality of the original algorithm itself.

Looking back at this comment, I think the Dominators algorithm should not rely on the existence of a ‘getEntryBlock()’ and really just being using GraphTraits that can get the entrance of the graph. We shouldn’t have to rename anything in the CFG just to get the dominators analysis to work. If we are missing a concept, that is one thing, but renaming something means that the dominators analysis is relying on particular methods being present. That just feels really wrong to me. As I mentioned in another email I just sent, there is also no reason for the dominators analysis to have any notion of “basic blocks” either since it is a generic graph algorithm.

Ted, this comment is from before Guoping pointed out that this function name is not the only inconsistency between Function and CFG (at that point it seemed fine to just make a simple change to get the patch through). With the new approach, where we change the dominators to use GraphTraits, it will handle this case as well. There is getEntryBlock() method in the GraphTraits.

Indeed. I figured this would be handled by using GraphTraits for everything, but I wanted to amend my previous comment for posterity. :slight_smile:

Hi, Ted, Anna

Thanks. Now I think I understand the design rationale behind GraphTraits. I will do another refactoring later this week.

Hi, Anna

I am doing the refactoring only using GraphTraits. Here is the problem: We still need to deal with the difference between the iterator implementation of CFG and Function. The fundamental reason for this difference is that CFG used BumpVector, while the Function class used iplist. There are two possible solutions to this problem:

  1. As you pointed out, let’s change the CFG design to use iplist instead of BumpVector. But for me, this may seem unnecessarily intrusive to Clang.
  2. In the GraphTraits class, let’s provide a convert routine to do the iterator to Block conversion. For example, in GraphTraits<CFG*>, let’s add this:
    static NodeType *I2B(node_iterator I) {
    return *I;
    }
    In this way, we can still conceal all implementation details in GraphTraits. But this may make the GraphTraits a little messy. Please let me know your comments. Which approach should be better?

Hi Guoping,

I propose to use a 3d solution.

What we need is to change the clang::CFG GraphTraits to have the nodes_iterator type consistent with that of Function’s GraphTraits. As I pointed out in one of the previous emails, they are currently not consistent even in terms of NodeType (which is a type internal to traits) - one iterates over NodeType and another over NodeType*. This can be implemented by adding a a custom iterator for CFG which returns CFGBlock* (instead of CFGBlock**). Then, you define nodes_iterator as your new iterator.

Cheers,
Anna.

Anna,

Sounds like a good idea to provide a custom iterator for CFG. But how to achieve that?
CFG iterator is defined as BumpVector<CFGBlock*>::iterator, do you mean makes a custom iterator as BumpVector::iterator? It’s easy to implement begin()/end(), but how to implement ++ operator? All CFGBlocks are stored as CFGBlock* in BumpVector, do you mean also maintaining another vector of CFGBlocks, ex: BumpVector? That will also be very messy.

I was thinking of something similar to PathDiagnostic.h:523 (http://clang.llvm.org/doxygen/PathDiagnostic_8h_source.html)

Anna.

Hi, Anna, Ted

Thanks for your link. Attached is another refactoring, using only GraphTraits to implement all this functionality. Please let me know your comments.

dominators-clang.patch (18 KB)

dominators-llvm.patch (3.5 KB)