passmanager, significant rework idea...

The patch below basically hammers out some ideas as to where I'd like
to take the passmanager in LLVM. I've tried thinking things through,
but I'm still a n00b, so some criticism would be more than welcome. =)

Starting from line 191 down. If you're wondering why I created a
patch, well that's because I found thinking in passmanagert.h the most
productive.

passmanager_idea.diff (4.36 KB)

Just a quick little tid bit, that's rather important. The
PassCollection class should inherit from PassUnit. THis allows one to
embedd PassCollections within one another to get the appropriate
runtime hierarchy.

Interesting approach. :slight_smile:

For those who aren't aware, Saem has been actively refactoring the pass manager to make it easier to understand and improve.

Comments below, with ***'s before the notes:

Index: lib/VMCore/PassManagerT.h

Interesting approach. :slight_smile:

Thanks.

Comments below, with ***'s before the notes:
+class LoopPass : public Pass {}; // Temporary.

*** I wouldn't worry about loop passes yet.

Sure.

+class PassUnit {
+ Pass *pass;
+
+ enum Traversal {
+ LINEAR, // Standard top down traversal.
+ CALLGRAPHSCC // Bottom Up Traversal
+ };
+
+ Traversal traversal;

*** 'Traversals' as you have them here don't really make sense. The
notion of whether to run a Function pass in module order or in bottom-up-
on-the-callgraph order is not a property of any pass. Further, this idea
is specific to function passes, so it shouldn't be captured in a place
generic to all passes. I'd much rather have this implemented as a new
kind of "batcher" pass manager:

You're the expert, so I gather you're right. I was thinking that a
generic traversal facility would be nice, since basicblocks might get
something in the future or what have you. But that can be fixed by
putting this in the FUnitType and having that handle it behind the
scenes. There is something about batchers that doesn't quite sit
right with me, I think it stems from the fact that you can only batch
one type. It doesn't seem like a huge problem, but it either paints
you into a corner or forces you to riddle the classes with extra
methods to handle items that don't neatly fit into the hierarchy. eg.
CGSCC, Immutable, loop and reverse dependencies (module -> function).

In the current pass manager, when you add a sequence of function passes to
a "Module PassManager", a batcher is created. This batcher happens to
traverse the module from beginning to end, running each function pass on
the functions in this order. However, there is no specific reason to do
this, it could run them from end to beginning, random order, or even
callgraphscc order.

Yup.

To handle this notion, I'd suggest two things: 1) having a batcher for
CallGraphSCCPass'es and 2) having a new batcher class, some sort of
"callgraphsccbatcher".

So callgraphsccpasses would be (for example) inline and mem2reg and
then those would be tossed into the batcher.

1) is important, because right now CallGraphSCCPass's are really just
    ModulePass'es to the pass manager. If we have two CallGraphSCCPass'es
    (e.g. -prune-eh and -inline) being run in sequence, added to a "Module
    PassManager", we currently run the first one on each function
    bottom-up, then run the second on each function bottom up.

    Instead of this, it would be better to have a CallGraphSCCPassBatcher
    thing, that is added to the ModulePass. Given this, for each function,
    bottom up, we can pipeline between then two passes. This gives us the
    nice ABABABAB ordering instead of AAAABBBB ordering which is nice for
    cache behavior of the compiler.

Yup, interleaving would be far more efficient.

2) Once 1) is implemented, if a Module PassManager currently has a
    "CallGraphSCCPassBatcher" active, it makes sense to use a new batcher
    for the function passes. Since we don't need to run them in any
    specific order, we might as well run the function passes in the order
    that the callgraphsccpasses are being run in. If there is no
    CallGraphSCCPassBatcher active, the passmanager would check to see if
    there is a FunctionPassBatcher active, and if not it would create one.

As stated in the inliner improvements notes on your site, there seems
to be a significant effect based on the order of application and
traversal method used, wouldn't this hold true for other passes? I
could be wrong, but this seems weird to me. This is all speaking in
general terms.

Finally, note that all of this behavior is specific to "Module
PassManagers", so ideally none of the logic would be used/touched by the
FunctionPassManager stuff etc. Given this, it shouldn't be in something
generic like the base PassUnit class, which is why I'm talking about it
here. :slight_smile:

Makes sense.

+ std::vector<Pass*> RequiredPasses;

Why have the list of required passes here? You can trivially get this
from P->getAnalysisUsage(), likewise with the pass name.

My idea was to add dependencies as the passes were being organised,
such as a functionpass depending upon callgraphscc as part of the
traversal bit. Now that I think about it, the traversal variable is
all that's needed. So chalk it upto brainfart!

+public:
+ PassUnit(Traversal traversal = LINEAR, Pass *Pass) :
+ traversal(traversal),
+
+ virtual const char *getPMName() const =0;
+
+ virtual const char *getPassName() const =0;
+
+ virtual bool runPass(PassClass *P, UnitType *M) =0;
+};
+
+class BBPassUnit : public PassUnit {
+ BasicBlockPass *BBPass;
+
+public:
+ BBPassUnit(Traversal traversal = LINEAR, BasicBlockPass *Pass) :
+ PassUnit::traversal(traversal),
+ PassUnit::Pass(static_cast<Pass*>(Pass))
+ BBPassUnit::BBPass(Pass) {}
+};
+
+class LPassUnit : public PassUnit {
+ LoopPass *LPass;
+
+public:
+ LPassUnit(Traversal traversal = LINEAR, LoopPass *Pass) :
+ PassUnit::traversal(traversal),
+ PassUnit::Pass(static_cast<Pass*>(Pass))
+ LPassUnit::LPass(Pass) {}
+};
+
+class FPassUnit : public PassUnit {
+ FunctionPass *FPass;
+
+public:
+ FPassUnit(Traversal traversal = LINEAR, FunctionPass *Pass) :
+ PassUnit::traversal(traversal),
+ PassUnit::Pass(static_cast<Pass*>(Pass))
+ FPassUnit::FPass(Pass) {}
+};
+
+// For CallGraphSCC passes, really they're a FunctionPass for instance told
to
+// change traversal methods, this adds to their requiredPasses CallGraphSCC
+// and when told to run, simply checks the traversal and uses CGSCC
bottom-up.
+
+class MPassUnit : public PassUnit {
+ ModulePass *MPass;
+
+public:
+ MPassUnit(Traversal traversal = LINEAR, ModulePass *Pass) :
+ PassUnit::traversal(traversal),
+ PassUnit::Pass(static_cast<Pass*>(Pass))
+ MPassUnit::MPass(Pass) {}
+};

*** I don't think I really understand what the idea is behind the PassUnit
instances here. It appears to capture the same information that passes
already capture themselves. Can you explain a bit more?

The idea is to provide a wrapper to hide any pass specific logic
that's required for pass management. They don't really capture a lot
of information right now but that's because they're more geared
towards being method heavy.

+// PassCollection and its implmentations will actually posses a significant
+// amount of the logic in terms of handling passes. The passmanager will
in
+// fact be a simple interface and entry point.

"Just a quick little tid bit, that's rather important. The
PassCollection class should inherit from PassUnit. THis allows one to
embedd PassCollections within one another to get the appropriate
runtime hierarchy."

+class PassCollection {
+public:
+ enum Ordering {
+ ORDERED,
+ UNORDERED
+ };
+
+private:
+ Ordering order;
+
+public:
+ PassCollection(Ordering order) : order(order) {}
+ virtual void add(PassUnit pass) = 0;
+ virtual bool remove(PassUnit pass) = 0; // false if pass is not present.
+ virtual void runPasses() = 0;
+
+ // TODO: Figure out a reasonable data access strategy, might just be
+ // iterators and potentionally returning vectors. The actual storage
+ // is thankfully flexible, so for instance in the case of LoopPasses, we
+ // should be able to use a PriorityQueue and not have that explode in our
+ // faces.
+};

To me, the most logical class hierarchy looks like this:

   Pass
     ModulePass
     CallGraphSCCPass
     FunctionPass
     BasicBlockPass
     ...

All of the high-level behaviors of passes are captured by which Pass class
they inherit from. The "Pass" class itself has basic pass support and
interfaces, e.g. getting the pass name, the getAnalysisUsage method, the
getAnalysis<> method, etc. Pass does *not* contain any run* methods, only
the derived classes (like ModulePass) do. User passes extend things like
FunctionPass as they currently do. Expanding this hierarchy, I'd propose
to have these built-in (but internal/private to VMCore):

   Pass
     ModulePass
       CallGraphSCCPassBatcher
       FunctionPassBatcher
     CallGraphSCCPass
       FunctionPassBatcherForCGSCCPass
     FunctionPass
       BasicBlockPassBatcher
     ModulePassBatcher
     ...

The batcher classes I wrote about above. Basically they are the adapters
that allow "small" passes to be embedded in "big" pass managers. For
example, CallGraphSCCPassBatcher would allow CallGraphSCCPass's to be
added to a ModulePassManagerT. As such, they have to conform to the
interface that the outer PassManagerT object expects. ModulePassBatcher
is a funny one: it contains ModulePasses, and exposes a single 'run'
interface.

So, batchers are pseudo passes, basically, if a batcher inherits from
x, then it runs at the same strata as the parent, thus the
inheritance. Mind you, that's basically the current state of affairs.

Finally, the passmanagers themselves are implemented with:

   ModulePassManager
   FunctionPassManager

There is no inheritance here, and these classes are implemented using the
pImpl idiom as they currently are. However, instead of using
PassManagerT<foo> to implement them, they are actually wrappers around
ModulePassBatcher and FunctionPassBatcher, respectively. Both of these
'batcher' passes provide a simple interface to be used by the *PassManager
classes.

I would think having a single PassManager which has an overloaded set
of methods to handle the various passes that could be added to it
might be better.

The only question left (to me at least :slight_smile: ), is where to factor out the
commonality between the Batcher classes. To me, the best solution appears
to be to have a completely independent class "PassBatcher" which the
*PassBatcher classes (which derive from Pass) contain an instance of to do
the heavy lifting.

Does any of this make sense?

I think the key points here is that we group passes, order those
groups of passes and then finally run those ordered groups of passes.
The batcher hierarchy you presented makes sense. I'm simply trying to
shake the gut feeling that I have, relating to how extensible this
will be if a new passtype is desired, for instance.

You're the expert, so I gather you're right. I was thinking that a
generic traversal facility would be nice, since basicblocks might get
something in the future or what have you. But that can be fixed by

I wouldn't worry too much about BasicBlockPass's. In practice, they operate at too fine of a granularity to be worth pipelining. In practice, I wouldn't mind very much if we just got rid of passmanager awareness of BasicBlockPasses, and just made them normal FunctionPasses that have some traversal logic in the BasicBlockPass class to simplify the implementations.

putting this in the FUnitType and having that handle it behind the
scenes. There is something about batchers that doesn't quite sit
right with me, I think it stems from the fact that you can only batch
one type.

Ok.

It doesn't seem like a huge problem, but it either paints
you into a corner or forces you to riddle the classes with extra
methods to handle items that don't neatly fit into the hierarchy. eg.
CGSCC, Immutable, loop and reverse dependencies (module -> function).

I don't see how this is related. Given the organization I emailed about yesterday, consider something like the ModulePassBatcher. Its implementation is actually *simplified* by the knowledge that all of the things that it "contains" are ModulePasses that just have a single run method.

Consider again, CallGraphSCCPassBatcher. It is simplified by knowing that all things it contains conform to the CallGraphSCCPass interface. Due to (careful forethought leading to) the fact that all FunctionPasses are defined to be CallGraphSCCPasses, we can have a simple adapter class (with the hypothetical name in the previous email of "FunctionPassBatcherForCGSCCPass") that serves this purpose and keeps everything simple.

The hierarchy and the interfaces have nothing to do with the pass dependency relationships, they relate to the structure of the pass managers and the order the passes get run. The dependencies can *change* the structure, but the structure is not directly based on the dependencies.

To handle this notion, I'd suggest two things: 1) having a batcher for
CallGraphSCCPass'es and 2) having a new batcher class, some sort of
"callgraphsccbatcher".

So callgraphsccpasses would be (for example) inline and mem2reg and
then those would be tossed into the batcher.

s/mem2reg/pruneeh. mem2reg is a FunctionPass.

1) is important, because right now CallGraphSCCPass's are really just
    ModulePass'es to the pass manager. If we have two CallGraphSCCPass'es
    (e.g. -prune-eh and -inline) being run in sequence, added to a "Module
    PassManager", we currently run the first one on each function
    bottom-up, then run the second on each function bottom up.

    Instead of this, it would be better to have a CallGraphSCCPassBatcher
    thing, that is added to the ModulePass. Given this, for each function,
    bottom up, we can pipeline between then two passes. This gives us the
    nice ABABABAB ordering instead of AAAABBBB ordering which is nice for
    cache behavior of the compiler.

Yup, interleaving would be far more efficient.

And intrinsically more powerful/effective (solving some phase ordering issues), which is the basic idea behind the inliner improvement notes.

2) Once 1) is implemented, if a Module PassManager currently has a
    "CallGraphSCCPassBatcher" active, it makes sense to use a new batcher
    for the function passes. Since we don't need to run them in any
    specific order, we might as well run the function passes in the order
    that the callgraphsccpasses are being run in. If there is no
    CallGraphSCCPassBatcher active, the passmanager would check to see if
    there is a FunctionPassBatcher active, and if not it would create one.

As stated in the inliner improvements notes on your site, there seems
to be a significant effect based on the order of application and
traversal method used, wouldn't this hold true for other passes? I
could be wrong, but this seems weird to me. This is all speaking in
general terms.

I don't understand your point here.

+ std::vector<Pass*> RequiredPasses;

Why have the list of required passes here? You can trivially get this
from P->getAnalysisUsage(), likewise with the pass name.

My idea was to add dependencies as the passes were being organised,
such as a functionpass depending upon callgraphscc as part of the
traversal bit. Now that I think about it, the traversal variable is
all that's needed. So chalk it upto brainfart!

No problem.

*** I don't think I really understand what the idea is behind the PassUnit
instances here. It appears to capture the same information that passes
already capture themselves. Can you explain a bit more?

The idea is to provide a wrapper to hide any pass specific logic
that's required for pass management. They don't really capture a lot
of information right now but that's because they're more geared
towards being method heavy.

That's fine, but it appears that they don't capture anything that cannot be extracted directly from the pass already. Why not just use the pass objects?

The batcher classes I wrote about above. Basically they are the adapters
that allow "small" passes to be embedded in "big" pass managers. For
example, CallGraphSCCPassBatcher would allow CallGraphSCCPass's to be
...

So, batchers are pseudo passes, basically, if a batcher inherits from
x, then it runs at the same strata as the parent, thus the
inheritance.

Yes, the batchers actually *are* passes, they just happen to be private implementation details of the pass manager.

Mind you, that's basically the current state of affairs.

True, and the concept works well. What doesn't work well is the nasty implementation using layers of templates and other goop :frowning:

Finally, the passmanagers themselves are implemented with:

   ModulePassManager
   FunctionPassManager

There is no inheritance here, and these classes are implemented using the
pImpl idiom as they currently are. However, instead of using
PassManagerT<foo> to implement them, they are actually wrappers around
ModulePassBatcher and FunctionPassBatcher, respectively. Both of these
'batcher' passes provide a simple interface to be used by the *PassManager
classes.

I would think having a single PassManager which has an overloaded set
of methods to handle the various passes that could be added to it
might be better.

I disagree for the high level "PassManager" implementation. PassManager and FunctionPassManager are used very differently. The first one want a whole module, the second wants a function at a time. Aside from the "addPass" method, the method for using them (run*) has a completely different interface. I don't see why they should be the same class.

The *implementation* OTOH should be shared, as I sketched out before.

The only question left (to me at least :slight_smile: ), is where to factor out the
commonality between the Batcher classes. To me, the best solution appears
to be to have a completely independent class "PassBatcher" which the
*PassBatcher classes (which derive from Pass) contain an instance of to do
the heavy lifting.

Does any of this make sense?

I think the key points here is that we group passes, order those
groups of passes and then finally run those ordered groups of passes.
The batcher hierarchy you presented makes sense. I'm simply trying to
shake the gut feeling that I have, relating to how extensible this
will be if a new passtype is desired, for instance.

Ok.

-Chris