RFC: CfgTraits/CfgInterface design discussion

Hi all,

I'm moving a discussion from https://reviews.llvm.org/D83088 here,
where it's arguably more appropriate.

For context, I proposed in early July a mechanism that allows writing
program analyses that are generic over the type of IR using dynamic
dispatch (virtual methods). The first mail of the llvm-dev thread is
here: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143039.html

The change was more than three months in review, including some very
long stretches of radio silence. The change was accepted on Friday
and, after waiting a few more days, I pushed it yesterday. Now some
concerns are being raised on the review.

I don't necessarily want to discuss the dysfunctions of our review
process here. I also don't want to steamroll those concerns because I
don't want us to follow a misguided notion of "progress at all cost",
but I do think progress is important.

So I propose that we discuss perceived shortcomings of the design
here. If folks here can provide a sense of changes to the design
they'd actually be happy with, then I'm happy to spin up the required
patches to improve the design.[0] With that in mind, let me "import"
the last comment on the Phabricator review:

> > Ah, I see, the "append" functions are accessors, of a sort. Returning a container might be more clear than using an out parameter - alternatively, a functor parameter (ala std::for_each) that is called for each element, that can then be used to populate an existing container if desired, or to do immediate processing without the need for an intermediate container.
>
> The code is trying to strike a balance here in terms of performance. Since dynamic polymorphism is used, a functor-based traversal can't be inlined and so the number of indirect function calls increases quite a bit. There are a number of use cases where you really do want to just append successors or predecessors to a vectors, like during a graph traversal. An example graph traversal is here: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329

One way to simplify the dynamic polymorphism overhead of iteration would be to invert/limit the API - such as having a "node.forEachEdge([](const Edge& E) { ... });" or the like.

I assume you're not talking about runtime overhead here? If we're
talking about that, I'd expect that variant to often be slower,
although I obviously don't have numbers.

If it's the code style you're concerned about, yeah, I'm happy to
prepare a patch that extends the interface with forEachSuccessor /
forEachPredecessor methods like `void forEachSuccessor(CfgBlockRef,
std::function<void (CfgBlockRef)>)`.

> [discussion on printBlockName and similar methods]
I'm generally worried about the genericity of these abstractions - whether or not a generic abstraction over printing is required/pulling its weight. Are there common abstractions over printing you have in mind using this abstraction?

It's mostly a matter of debuggability of the algorithms that are being
written. Pragmatism is the key.

Example: The current in-tree DivergenceAnalysis leaves debug messages like:

  LLVM_DEBUG(dbgs() << "propBranchDiv " << Term.getParent()->getName() << "\n");

In the new CfgTraits-based divergence analysis I'm working on, a
similar point in the algorithm has:

  LLVM_DEBUG(dbgs() << "DIVERGENT TERMINATOR: "
                    << printer().printableBlockName(divergentBlock) << '\n');

At this point, divergentBlock is a CfgBlockRef, which is simply an
opaque wrapper around a type-erased pointer. So the only other thing
we could do here is print out the pointer value, which makes for an
awful debugging experience.

I'll point out again that existing template-based generic algorithms
have leaky interfaces: some parts of the dominator tree machinery call
the node type's `printAsOperand`.

> [broader discussion]
I don't mean to treat CfgGraph to be thought of like the template parameter to, say, std::vector - I meant thinking of CfgGraph as something like std::vector itself. Rather than using traits to access containers in the C++ standard library, the general concept of a container is used to abstract over a list, a vector, etc.

eg, if you want to print the elements of any C++ container, the code looks like:

template<typename Container>
void print(const Container &C, std::ostream &out) {
out << '{';
bool first = true;
for (const auto &E : C) {
   if (!first)
     out << ", ";
   first = false;
   out << E;
}
out << '}';
}

Which, yes, is much more legible than what one could imagine a GraphTraits-esque API over
containers might be:

template<typename Container, typename Traits = ContainerTraits<Container>>
void print(const Container &C, std::ostream &out) {
  out << '{';
bool first = true;
for (const auto &E : Traits::children(C)) {
   if (!first)
     out << ", ";
   first = false;
   out << Traits::element(E);
}
out << '}';
}

Or something like that - and the features you'd gain from that would be the ability to sort of "decorate" your container without having to create an actual container decorator - instead implementing a custom trait type that, say, iterates container elements in reverse. But generally a thin decorator using the first non-traits API would be nicer (eg: llvm::reverse(container) which gives you a container decorator that reverses order).

I agree that the first form is nicer. I'd raise two points.

First, the first form works nicely if the concept of container is
sufficiently well understood and all relevant containers really are
sufficiently similar. Neither is the case here. We're trying to cover
different pre-existing IRs, and even though all of them are SSA IRs,
the differences are substantial. Currently, we only really care about
LLVM IR and MachineIR in SSA form; their main differences is that the
concept of "value" is a C++ object in LLVM IR, while in MIR-SSA it's a
plain unsigned int (now wrapped in a Register). MLIR would add yet
another way of thinking about values. The fact that we have to work
with those pre-existing IRs means we have to compromise.

Second, none of this applies to dynamic polymorphism, which is really
the main goal here at least for me.

Not sure where that leaves us. We need something like CfgTraits to
cover the substantial differences between the IRs, but perhaps we can
work over time to reduce those differences? Over time, instead of
relying on CfgTraits for _everything_ in CfgInterfaceImpl, we could
instead use unified functionalities directly?

Some of that could probably already be done today. For example, we
could probably just use llvm::successors directly in
CfgInterfaceImpl::appendSuccessors (and your proposed
forEachSuccessor) instead of going via CfgTraits.

If you had a runtime polymorphic API over containers in C++, then it might look something
like this:

void print(const ContainerInterface& C, std::ostream& out) {
out << '{';
bool first = true;
C.for_each([&](const auto &E) {
   if (!first)
     out << ", ";
   first = false;
   E.print(out);
});
out << '}';
}

Yes, though it wouldn't quite work like that with the IRs we have in
LLVM. Most of them go to great length to avoid embedding virtual
methods in the IR classes themselves. That's why you'd more likely see
something like the following:

void print(const CfgInterface& iface, CfgBlockRef block, raw_ostream& out) {
  out << "Successors of " << iface.printableName(block) << ':';
  iface.forEachSuccessor(block, [&](CfgBlockRef succ) {
    if (first)
      out << ' ';
    else
      out << ", ";
    first = false;
    iface.printName(succ, out);
  });
  out << '\n';
}

The CfgBlockRef / CfgValueRef / etc. appear in the design to avoid
having an abstract base class for the different IRs, and also to cover
the difference between LLVM IR `Value *` and MIR-SSA `Register`.

I'd really like to see examples like this ^ using the different abstractions under consideration here (classic GraphTraits, CfgTraits dynamic and static typed, perhaps what a static API would look like if it wasn't trying to be dynamic API compatible).

I don't really care that much about what the static side of things
looks like. CfgTraits is the way it is on purpose because I felt it
would be useful to have a homogenous view of what generic aspects of
IR will ultimately be exposed by the CfgInterface. In the absence of
compiler-enforced concepts, that feels helpful to me. If you put
yourself in the shoes of an author of an algorithm that is supposed to
be generic over IRs, you'll realize that it's great to have some easy
way of discovering which things really _are_ generic over IRs. The
comparison with containers is slightly lacking because the potential
surface area is so much larger. But perhaps we can iterate on the
design and find a better way.

This is kind of a restatement of what I wrote above: If we can get
general agreement in the community that there is a desire that the
different IRs in LLVM follow common concepts "directly", then we can
iterate towards that over time. I'm personally convinced that unifying
the various IRs is a worthy goal (the introduction of MLIR really was
an unfortunate step backwards as far as that is concerned), it just
feels like it'd be a thankless project because it would go through
_many_ reviews that feel like the one that triggered this thread. Even
without a dysfunctional review process it'd be a long refactoring, and
some sort of trait class is unavoidable at least for now. At a
minimum, something is needed to abstract over the large differences
between SSA value representations.

Cheers,
Nicolai

[0] What I'm _not_ going to do is write patches based on vague guesses
of what other people might want. I don't want even more changes
hanging in Phabricator limbo for months. Be explicit about what it is
that you want, and we can make progress.

Hey Nicolai,

Hi all,

I’m moving a discussion from https://reviews.llvm.org/D83088 here,
where it’s arguably more appropriate.

For context, I proposed in early July a mechanism that allows writing
program analyses that are generic over the type of IR using dynamic
dispatch (virtual methods). The first mail of the llvm-dev thread is
here: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143039.html

The change was more than three months in review, including some very
long stretches of radio silence. The change was accepted on Friday
and, after waiting a few more days, I pushed it yesterday. Now some
concerns are being raised on the review.

I don’t necessarily want to discuss the dysfunctions of our review
process here. I also don’t want to steamroll those concerns because I
don’t want us to follow a misguided notion of “progress at all cost”,
but I do think progress is important.

Sure enough - and apologies for the delays.

So I propose that we discuss perceived shortcomings of the design
here. If folks here can provide a sense of changes to the design
they’d actually be happy with, then I’m happy to spin up the required
patches to improve the design.

I’m still fairly concerned this isn’t a matter of fixing it in-tree & not especially happy with the inversion that’s occurred here (now it’s “this goes ahead unless I can justify why it shouldn’t”, whereas previously it was “this doesn’t go ahead unless you can justify why it should” - that’s a pretty significant change) which is why I’d like to revert the patch first, and discuss second.

This is generally the bar for LLVM reviews - if there are outstanding concerns, patches don’t move forward. This doesn’t mean arbitrary hold-up, and there’s admittedly some problems with how to resolve irreconcilable differences, usually by escalating to other, more senior community members to make a determination. Ultimately, given the fairly large surface area/intended use of this abstraction, I think it might rise to the level of using the proposed “contentious decision” process discussed here: https://github.com/llvm/llvm-www/blob/master/proposals/LP0001-LLVMDecisionMaking.md

Other folks weighing in would be great & certainly there’s a line at which my opinion isn’t the gatekeeper here if other folks are in favor.

[0] With that in mind, let me “import”
the last comment on the Phabricator review:

Ah, I see, the “append” functions are accessors, of a sort. Returning a container might be more clear than using an out parameter - alternatively, a functor parameter (ala std::for_each) that is called for each element, that can then be used to populate an existing container if desired, or to do immediate processing without the need for an intermediate container.

The code is trying to strike a balance here in terms of performance. Since dynamic polymorphism is used, a functor-based traversal can’t be inlined and so the number of indirect function calls increases quite a bit. There are a number of use cases where you really do want to just append successors or predecessors to a vectors, like during a graph traversal. An example graph traversal is here: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329

One way to simplify the dynamic polymorphism overhead of iteration would be to invert/limit the API - such as having a “node.forEachEdge([](const Edge& E) { … });” or the like.

I assume you’re not talking about runtime overhead here? If we’re
talking about that, I’d expect that variant to often be slower,
although I obviously don’t have numbers.

I think I was referring to runtime overhead - you mentioned the increase in indirect calls. A callback-like API can reduce the number of such calls. (rather than one virtual call per increment, inequality test, and dereference - there would be only one polmorphic call per iteration)

If it’s the code style you’re concerned about, yeah, I’m happy to
prepare a patch that extends the interface with forEachSuccessor /
forEachPredecessor methods like void forEachSuccessor(CfgBlockRef, std::function<void (CfgBlockRef)>).

[discussion on printBlockName and similar methods]
I’m generally worried about the genericity of these abstractions - whether or not a generic abstraction over printing is required/pulling its weight. Are there common abstractions over printing you have in mind using this abstraction?

It’s mostly a matter of debuggability of the algorithms that are being
written. Pragmatism is the key.

Example: The current in-tree DivergenceAnalysis leaves debug messages like:

LLVM_DEBUG(dbgs() << "propBranchDiv " << Term.getParent()->getName() << “\n”);

In the new CfgTraits-based divergence analysis I’m working on, a
similar point in the algorithm has:

LLVM_DEBUG(dbgs() << "DIVERGENT TERMINATOR: "
<< printer().printableBlockName(divergentBlock) << ‘\n’);

At this point, divergentBlock is a CfgBlockRef, which is simply an
opaque wrapper around a type-erased pointer. So the only other thing
we could do here is print out the pointer value, which makes for an
awful debugging experience.

I’ll point out again that existing template-based generic algorithms
have leaky interfaces: some parts of the dominator tree machinery call
the node type’s printAsOperand.

That isn’t a necessary feature of template-based generic algorithms, and I agree that the traits-based API probably isn’t the best one for something as complicated as a graph, especially as we’ve added more features to them. Something more like a container abstraction seems more suitable (especially since it would allow carrying more payload in a Graph object itself when writing a decorator - which currently can’t be done so such a payload ends up being carried around in heavyweight node pointers/iterators which is awkward)

[broader discussion]
I don’t mean to treat CfgGraph to be thought of like the template parameter to, say, std::vector - I meant thinking of CfgGraph as something like std::vector itself. Rather than using traits to access containers in the C++ standard library, the general concept of a container is used to abstract over a list, a vector, etc.

eg, if you want to print the elements of any C++ container, the code looks like:

template
void print(const Container &C, std::ostream &out) {
out << ‘{’;
bool first = true;
for (const auto &E : C) {
if (!first)
out << ", ";
first = false;
out << E;
}
out << ‘}’;
}

Which, yes, is much more legible than what one could imagine a GraphTraits-esque API over
containers might be:

template<typename Container, typename Traits = ContainerTraits>
void print(const Container &C, std::ostream &out) {
out << ‘{’;
bool first = true;
for (const auto &E : Traits::children(C)) {
if (!first)
out << ", ";
first = false;
out << Traits::element(E);
}
out << ‘}’;
}

Or something like that - and the features you’d gain from that would be the ability to sort of “decorate” your container without having to create an actual container decorator - instead implementing a custom trait type that, say, iterates container elements in reverse. But generally a thin decorator using the first non-traits API would be nicer (eg: llvm::reverse(container) which gives you a container decorator that reverses order).

I agree that the first form is nicer. I’d raise two points.

First, the first form works nicely if the concept of container is
sufficiently well understood and all relevant containers really are
sufficiently similar.

I’m not sure how a dynamic interface is likely to be significantly different/better in this regard.

Neither is the case here. We’re trying to cover
different pre-existing IRs, and even though all of them are SSA IRs,
the differences are substantial.

I guess that’s another thing that might benefit from further clarity: Currently we have generic graph traits, notionally that’s for any graph (collection of nodes and edges). The proposed abstraction is for CFG graphs, but due to the added layers of abstraction complexity, dynamic and static APIs, etc, it’s not been especially clear to me what makes this abstraction specific to/only for CFG graphs[1] (& then how that relates to IR I’m not sure either - control flow seems like it would be fairly independent of the values within that graph).

Currently, we only really care about
LLVM IR and MachineIR in SSA form; their main differences is that the
concept of “value” is a C++ object in LLVM IR, while in MIR-SSA it’s a
plain unsigned int (now wrapped in a Register). MLIR would add yet
another way of thinking about values. The fact that we have to work
with those pre-existing IRs means we have to compromise.

Not sure I follow what compromises you’re referring to - if we have a container abstraction, some containers are over ints, some over objects, etc - the container can specify via typedefs, for instance, what type its elements are.

Second, none of this applies to dynamic polymorphism, which is really
the main goal here at least for me.

And one I’m trying to push back on - my hope is that a common set of abstractions (much like the container concepts in the STL) would be suitable here. To potentially subsume the existing GraphTraits (could have a “GraphTraits adapter” that gives a Graph-concept-implementing-container given GraphTraits) and potentially has layers to add on more expressive/narrower abstractions (whatever extra features are relevant to CFG graphs that aren’t relevant to all graphs in general([1] above - not super sure what those are)) - much like generic C++ containers, to sequential containers, random access containers, etc - various refinements of the general concept of a container.

What I’m especially interested in is not having two distinct concepts of graphs and graph algorithms with no plan to merge/manage these together, since as you say, the algorithms are complicated enough already - having two distinct and potentially competing abstractions in LLVM seems harmful to the codebase.

Not sure where that leaves us. We need something like CfgTraits to
cover the substantial differences between the IRs, but perhaps we can
work over time to reduce those differences? Over time, instead of
relying on CfgTraits for everything in CfgInterfaceImpl, we could
instead use unified functionalities directly?

Some of that could probably already be done today. For example, we
could probably just use llvm::successors

By way of example, I /think/ the future we’d probably all want for llvm::successors would be to move towards entities having their own successors() function rather than the non-member/ADL-based lookup. (the same way that it’s far more common/readable/expected to use x.begin() rather than “using std::begin; begin(x);” construct, or wrapping that in some std::adl_begin(x) function)

directly in
CfgInterfaceImpl::appendSuccessors (and your proposed
forEachSuccessor) instead of going via CfgTraits.

If you had a runtime polymorphic API over containers in C++, then it might look something
like this:

void print(const ContainerInterface& C, std::ostream& out) {
out << ‘{’;
bool first = true;
C.for_each([&](const auto &E) {
if (!first)
out << ", ";
first = false;
E.print(out);
});
out << ‘}’;
}

Yes, though it wouldn’t quite work like that with the IRs we have in
LLVM. Most of them go to great length to avoid embedding virtual
methods in the IR classes themselves. That’s why you’d more likely see
something like the following:

void print(const CfgInterface& iface, CfgBlockRef block, raw_ostream& out) {
out << "Successors of " << iface.printableName(block) << ‘:’;
iface.forEachSuccessor(block, [&](CfgBlockRef succ) {
if (first)
out << ’ ';
else
out << ", ";
first = false;
iface.printName(succ, out);
});
out << ‘\n’;
}

The CfgBlockRef / CfgValueRef / etc. appear in the design to avoid
having an abstract base class for the different IRs, and also to cover
the difference between LLVM IR Value * and MIR-SSA Register.

If a dynamically polymorphic API is needed/ends up being chosen, I’d /probably/ be inclined to try to add an abstract base class for blocks too (unclear about values) - but not wedded to it. In general I’m more pushing back on the dynamically polymorphic API in general than what it might look like if it exists. In part because of the complexities of having opaque handle objects that are passed out and back into APIs to model things, rather than being able to model them more directly. The APIs become a bit unwieldy because of that, in my opinion. (again, looking at GraphTraits awkwardness of load-bearing node pointers - though having the graph as a parameter (‘this’ parameter or otherwise) to the edge/successor/etc queries (rather than having the queries only based on the node) would go a long way to addressing that particular shortcoming if such a token/ref-passing API were needed)

I’d really like to see examples like this ^ using the different abstractions under consideration here (classic GraphTraits, CfgTraits dynamic and static typed, perhaps what a static API would look like if it wasn’t trying to be dynamic API compatible).

I don’t really care that much about what the static side of things
looks like. CfgTraits is the way it is on purpose because I felt it
would be useful to have a homogenous view of what generic aspects of
IR will ultimately be exposed by the CfgInterface. In the absence of
compiler-enforced concepts, that feels helpful to me. If you put
yourself in the shoes of an author of an algorithm that is supposed to
be generic over IRs,

Perhaps that’s part of my confusion - generic over IRs sounds like a far richer/more complicated API than generic over CFGs.

you’ll realize that it’s great to have some easy
way of discovering which things really are generic over IRs.

Writing out the concept would be a good start, I think, same as the C++ standard does for containers. (not to the kind of C++ language-lawyerly level of detail, but the basic summary of the conceptual API - independent of any of the implementation choices so far considered - this is why I’d really like to see especially trivial/mock use cases, yeah, they won’t be representative of the full complexity of the API in use, but give a sense of what the semantics of the API are so we can consider different syntactic/implementation choices)

The
comparison with containers is slightly lacking because the potential
surface area is so much larger. But perhaps we can iterate on the
design and find a better way.

Larger surface area somewhat concerns me and why I’d like to see more sort of example usage to get a better sense of what this is expected to abstract over now and in the future.

This is kind of a restatement of what I wrote above: If we can get
general agreement in the community that there is a desire that the
different IRs in LLVM follow common concepts “directly”, then we can
iterate towards that over time. I’m personally convinced that unifying
the various IRs is a worthy goal (the introduction of MLIR really was
an unfortunate step backwards as far as that is concerned), it just
feels like it’d be a thankless project because it would go through
many reviews that feel like the one that triggered this thread.

Perhaps - though getting design review/directional buy-in first can go a long way to making incremental reviews much less contentious/easier to commit. And a design that can be implemented via incremental improvement can also mean smaller patches that are easier to review/commit.

Hi David,

So I propose that we discuss perceived shortcomings of the design
here. If folks here can provide a sense of changes to the design
they'd actually be happy with, then I'm happy to spin up the required
patches to improve the design.

I'm still fairly concerned this isn't a matter of fixing it in-tree & not especially happy with the inversion that's occurred here (now it's "this goes ahead unless I can justify why it shouldn't", whereas previously it was "this doesn't go ahead unless you can justify why it should" - that's a pretty significant change) which is why I'd like to revert the patch first, and discuss second.

This is generally the bar for LLVM reviews - if there are outstanding concerns, patches don't move forward. This doesn't mean arbitrary hold-up, and there's admittedly some problems with how to resolve irreconcilable differences, usually by escalating to other, more senior community members to make a determination. Ultimately, given the fairly large surface area/intended use of this abstraction, I think it might rise to the level of using the proposed "contentious decision" process discussed here: https://github.com/llvm/llvm-www/blob/master/proposals/LP0001-LLVMDecisionMaking.md

Yes, which is part of why I thought we should move this to llvm-dev.

One issue I'm having with this discussion if I'm being honest is that
despite all the back and forth I _still_ barely know what your opinion
is, or whether you even have one.

This puts me in a difficult situation. I have no indication of whether
there is _any_ version of this that you would accept, and if so, what
it would look like, so there is no way forward for me. In the
meantime, there is a substantial (and increasing) amount of work that
builds on what we're talking about here.

Other folks weighing in would be great & certainly there's a line at which my opinion isn't the gatekeeper here if other folks are in favor.

Agreed. Part of the dysfunction here is that it's difficult to get
people to pay attention in the first place. The few folks who replied
to my llvm-dev thread in July did have questions but seemed to be
generally okay with the direction.

[0] With that in mind, let me "import"
the last comment on the Phabricator review:

> > > Ah, I see, the "append" functions are accessors, of a sort. Returning a container might be more clear than using an out parameter - alternatively, a functor parameter (ala std::for_each) that is called for each element, that can then be used to populate an existing container if desired, or to do immediate processing without the need for an intermediate container.
> >
> > The code is trying to strike a balance here in terms of performance. Since dynamic polymorphism is used, a functor-based traversal can't be inlined and so the number of indirect function calls increases quite a bit. There are a number of use cases where you really do want to just append successors or predecessors to a vectors, like during a graph traversal. An example graph traversal is here: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329
>
> One way to simplify the dynamic polymorphism overhead of iteration would be to invert/limit the API - such as having a "node.forEachEdge([](const Edge& E) { ... });" or the like.

I assume you're not talking about runtime overhead here? If we're
talking about that, I'd expect that variant to often be slower,
although I obviously don't have numbers.

I think I was referring to runtime overhead - you mentioned the increase in indirect calls. A callback-like API can reduce the number of such calls. (rather than one virtual call per increment, inequality test, and dereference - there would be only one polmorphic call per iteration)

CfgInterface::appendSuccessors requires only a single indirect call
for the entire loop, which is why I chose to do that over some
getSuccessorByIndex-type interface.

A CfgInterface::forEachSuccessor method is a trade-off: it's an
interface that is more in line with patterns you see elsewhere
(std::for_each), at the cost of more indirect calls.

[snip[

> I don't mean to treat CfgGraph to be thought of like the template parameter to, say, std::vector - I meant thinking of CfgGraph as something like std::vector itself. Rather than using traits to access containers in the C++ standard library, the general concept of a container is used to abstract over a list, a vector, etc.
>
> eg, if you want to print the elements of any C++ container, the code looks like:
>
> template<typename Container>
> void print(const Container &C, std::ostream &out) {
> out << '{';
> bool first = true;
> for (const auto &E : C) {
> if (!first)
> out << ", ";
> first = false;
> out << E;
> }
> out << '}';
> }
>
> Which, yes, is much more legible than what one could imagine a GraphTraits-esque API over
> containers might be:
>
> template<typename Container, typename Traits = ContainerTraits<Container>>
> void print(const Container &C, std::ostream &out) {
> out << '{';
> bool first = true;
> for (const auto &E : Traits::children(C)) {
> if (!first)
> out << ", ";
> first = false;
> out << Traits::element(E);
> }
> out << '}';
> }
>
> Or something like that - and the features you'd gain from that would be the ability to sort of "decorate" your container without having to create an actual container decorator - instead implementing a custom trait type that, say, iterates container elements in reverse. But generally a thin decorator using the first non-traits API would be nicer (eg: llvm::reverse(container) which gives you a container decorator that reverses order).

I agree that the first form is nicer. I'd raise two points.

First, the first form works nicely if the concept of container is
sufficiently well understood and all relevant containers really are
sufficiently similar.

I'm not sure how a dynamic interface is likely to be significantly different/better in this regard.

It's mostly orthogonal. Though dynamic polymorphism does end up having
better compiler support for _enforcing_ concepts. This is a common
theme in this discussion: the tooling support for dynamic polymorphism
is far better. That matters.

Neither is the case here. We're trying to cover
different pre-existing IRs, and even though all of them are SSA IRs,
the differences are substantial.

I guess that's another thing that might benefit from further clarity: Currently we have generic graph traits, notionally that's for any graph (collection of nodes and edges). The proposed abstraction is for CFG graphs, but due to the added layers of abstraction complexity, dynamic and static APIs, etc, it's not been especially clear to me what makes this abstraction specific to/only for CFG graphs[1] (& then how that relates to IR I'm not sure either - control flow seems like it would be fairly independent of the values within that graph).

Maybe it would help if CfgTraits was renamed to SsaTraits? Or
SsaCfgTraits? Being able to do some limited inspection of the SSA
values (and instructions, really) is key to what we're trying to do
here -- there's an example below.

If this was only about nodes and edges, I would have added dynamic
polymorphism on top of the existing GraphTraits.

Currently, we only really care about
LLVM IR and MachineIR in SSA form; their main differences is that the
concept of "value" is a C++ object in LLVM IR, while in MIR-SSA it's a
plain unsigned int (now wrapped in a Register). MLIR would add yet
another way of thinking about values. The fact that we have to work
with those pre-existing IRs means we have to compromise.

Not sure I follow what compromises you're referring to - if we have a container abstraction, some containers are over ints, some over objects, etc - the container can specify via typedefs, for instance, what type its elements are.

Simple example: Let's say we have an SSA value and need to find the
basic block in which it is defined.

In LLVM IR, that's `value->getParent()`, where value is an `llvm::Value*`.

In SSA-form MachineIR, that's
`m_machineRegInfo->getVRegDef(value)->getParent()`, where value is an
`llvm::Register`.

In MLIR, that's `value.getParentBlock()`, where value is an `mlir::Value`.

This kind of operation is needed for a complete divergence analysis,
and so it needs to be abstracted somehow.

Second, none of this applies to dynamic polymorphism, which is really
the main goal here at least for me.

And one I'm trying to push back on - my hope is that a common set of abstractions (much like the container concepts in the STL) would be suitable here. To potentially subsume the existing GraphTraits (could have a "GraphTraits adapter" that gives a Graph-concept-implementing-container given GraphTraits) and potentially has layers to add on more expressive/narrower abstractions (whatever extra features are relevant to CFG graphs that aren't relevant to all graphs in general([1] above - not super sure what those are)) - much like generic C++ containers, to sequential containers, random access containers, etc - various refinements of the general concept of a container.

What I'm especially interested in is not having two distinct concepts of graphs and graph algorithms with no plan to merge/manage these together, since as you say, the algorithms are complicated enough already - having two distinct and potentially competing abstractions in LLVM seems harmful to the codebase.

It seems to me that there are two orthogonal issues here.

One is whether dynamic polymorphism can be used or not. If you're in
fundamental opposition to that, then we have a serious problem (a bit
on that further down). Dynamic polymorphism doesn't contradict the
goal of a common set of abstractions.

The other is how _exactly_ the abstractions are factored. For example,
on the dynamic polymorphism side we could have:

- a `GraphInterface` class which provides methods for iterating
predecessors and successors of `CfgBlockRef` (presumably we'd rename
that to something like NodeRef?), and
- an `SsaCfgInterface` class which provides additional methods for
instructions and values.

The question is whether you'd be willing to support something like
that eventually, once we've hashed it out.

Not sure where that leaves us. We need something like CfgTraits to
cover the substantial differences between the IRs, but perhaps we can
work over time to reduce those differences? Over time, instead of
relying on CfgTraits for _everything_ in CfgInterfaceImpl, we could
instead use unified functionalities directly?

Some of that could probably already be done today. For example, we
could probably just use llvm::successors

By way of example, I /think/ the future we'd probably all want for llvm::successors would be to move towards entities having their own successors() function rather than the non-member/ADL-based lookup. (the same way that it's far more common/readable/expected to use x.begin() rather than "using std::begin; begin(x);" construct, or wrapping that in some std::adl_begin(x) function)

Makes sense, though it doesn't really respond to what I was writing
here. My point was about how the bridge between the world of dynamic
polymorphism and that of static polymorphism works.

directly in
CfgInterfaceImpl::appendSuccessors (and your proposed
forEachSuccessor) instead of going via CfgTraits.

> If you had a runtime polymorphic API over containers in C++, then it might look something
> like this:
>
> void print(const ContainerInterface& C, std::ostream& out) {
> out << '{';
> bool first = true;
> C.for_each([&](const auto &E) {
> if (!first)
> out << ", ";
> first = false;
> E.print(out);
> });
> out << '}';
> }

Yes, though it wouldn't quite work like that with the IRs we have in
LLVM. Most of them go to great length to avoid embedding virtual
methods in the IR classes themselves. That's why you'd more likely see
something like the following:

void print(const CfgInterface& iface, CfgBlockRef block, raw_ostream& out) {
  out << "Successors of " << iface.printableName(block) << ':';
  iface.forEachSuccessor(block, [&](CfgBlockRef succ) {
    if (first)
      out << ' ';
    else
      out << ", ";
    first = false;
    iface.printName(succ, out);
  });
  out << '\n';
}

The CfgBlockRef / CfgValueRef / etc. appear in the design to avoid
having an abstract base class for the different IRs, and also to cover
the difference between LLVM IR `Value *` and MIR-SSA `Register`.

If a dynamically polymorphic API is needed/ends up being chosen, I'd /probably/ be inclined to try to add an abstract base class for blocks too (unclear about values) - but not wedded to it. In general I'm more pushing back on the dynamically polymorphic API in general than what it might look like if it exists. In part because of the complexities of having opaque handle objects that are passed out and back into APIs to model things, rather than being able to model them more directly. The APIs become a bit unwieldy because of that, in my opinion. (again, looking at GraphTraits awkwardness of load-bearing node pointers - though having the graph as a parameter ('this' parameter or otherwise) to the edge/successor/etc queries (rather than having the queries only based on the node) would go a long way to addressing that particular shortcoming if such a token/ref-passing API were needed)

In the end, it boils down to a tooling issue, which in turn is a C++
language issue. When you're writing template code, you're basically
throwing the type system out of the window. Sure, it comes back in
once you _instantiate_ the template, but that doesn't help if all the
complexity is in the template code. The kinds of algorithms we're
trying to write have enough inherent complexity as it is. Asking us to
simultaneously fight the accidental complexity caused by how C++
templates work is unreasonable. The sentiment that C++ templates hurt
here has been shared in this discussion by people with experience
working on the dominator tree implementation (which is arguably a
simpler problem than what we're trying to solve).

Keep in mind that it's not just a question of current development but
also of maintainability going forward.

It would help to understand _which_ APIs you think are becoming
unwieldy, because you may be imagining something misleading. For
example, consider the updated generic dominator tree. It has a
`GenericDominatorTreeBase` class which is type-erased and provides all
the usual query functions in terms of opaque handles. Then there is a
derived `DominatorTreeBase<NodeT>` class. If you're writing a
non-generic algorithm, or a template-generic algorithm, you never
interact with GenericDominatorTreeBase directly. Instead, you interact
with `DominatorTreeBase<NodeT>`, which has all the usual methods where
you pass in `NodeT*` instead of opaque handles. You only interact with
GenericDominatorTreeBase directly if you're writing a
dynamically-generic algorithm, where you consciously chose to enter
the world of opaque handles.

There ends up being some boilerplate for wrapping/unwrapping opaque
handles in `DominatorTreeBase<NodeT>`, but it's pretty straightforward
and hidden from users of the class.

(FWIW, the updated dominator tree analysis itself is still implemented
as a template. I converted it to dynamic polymorphism as an
experiment, and there was one outlier test case with 1-2% compile time
performance loss. I haven't pursued this further, and I don't want to
convert this particular algorithm just for the sake of it. The update
to dominator trees is only intended to allow other, more complex
algorithms to be built on top of it using dynamic polymorphism.)

If you're looking for an example of an algorithm that is written using
dynamic polymorphism, you could take a look here:
https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericUniformAnalysis.cpp

> I'd really like to see examples like this ^ using the different abstractions under consideration here (classic GraphTraits, CfgTraits dynamic and static typed, perhaps what a static API would look like if it wasn't trying to be dynamic API compatible).

I don't really care that much about what the static side of things
looks like. CfgTraits is the way it is on purpose because I felt it
would be useful to have a homogenous view of what generic aspects of
IR will ultimately be exposed by the CfgInterface. In the absence of
compiler-enforced concepts, that feels helpful to me. If you put
yourself in the shoes of an author of an algorithm that is supposed to
be generic over IRs,

Perhaps that's part of my confusion - generic over IRs sounds like a far richer/more complicated API than generic over CFGs.

Yes, indeed :slight_smile:

you'll realize that it's great to have some easy
way of discovering which things really _are_ generic over IRs.

Writing out the concept would be a good start, I think, same as the C++ standard does for containers. (not to the kind of C++ language-lawyerly level of detail, but the basic summary of the conceptual API - independent of any of the implementation choices so far considered - this is why I'd really like to see especially trivial/mock use cases, yeah, they won't be representative of the full complexity of the API in use, but give a sense of what the semantics of the API are so we can consider different syntactic/implementation choices)

The complete set of operations we seem to need is in CfgInterface and
CfgPrinter. I do have some further development on it, where they look
as follows:

class CfgInterface {
  virtual void anchor();

public:
  virtual ~CfgInterface() = default;

  /// Escape-hatch for obtaining a printer e.g. in debug code. Prefer to
  /// explicitly pass a CfgPrinter where possible.
  virtual std::unique_ptr<CfgPrinter> makePrinter() const = 0;

  virtual CfgParentRef getBlockParent(CfgBlockRef block) const = 0;

  virtual void appendBlocks(CfgParentRef parent,
                            SmallVectorImpl<CfgBlockRef> &list) const = 0;

  virtual bool comesBefore(CfgInstructionRef lhs,
                           CfgInstructionRef rhs) const = 0;

  virtual void appendPredecessors(CfgBlockRef block,
                                  SmallVectorImpl<CfgBlockRef> &list) const = 0;
  virtual void appendSuccessors(CfgBlockRef block,
                                SmallVectorImpl<CfgBlockRef> &list) const = 0;
  virtual ArrayRef<CfgBlockRef>
  getPredecessors(CfgBlockRef block,
                  SmallVectorImpl<CfgBlockRef> &store) const = 0;
  virtual ArrayRef<CfgBlockRef>
  getSuccessors(CfgBlockRef block,
                SmallVectorImpl<CfgBlockRef> &store) const = 0;

  virtual void appendBlockDefs(CfgBlockRef block,
                               SmallVectorImpl<CfgValueRef> &list) const = 0;
  virtual CfgBlockRef getValueDefBlock(CfgValueRef value) const = 0;
};

class CfgPrinter {
  virtual void anchor();

protected:
  const CfgInterface &m_iface;

  CfgPrinter(const CfgInterface &iface) : m_iface(iface) {}

public:
  virtual ~CfgPrinter() {}

  const CfgInterface &interface() const { return m_iface; }

  virtual void printBlockName(raw_ostream &out, CfgBlockRef block) const = 0;
  virtual void printValue(raw_ostream &out, CfgValueRef value) const = 0;
  virtual void printInstruction(raw_ostream &out,
                                CfgInstructionRef instruction) const = 0;

  Printable printableBlockName(CfgBlockRef block) const {
    return Printable(
        [this, block](raw_ostream &out) { printBlockName(out, block); });
  }
  Printable printableValue(CfgValueRef value) const {
    return Printable(
        [this, value](raw_ostream &out) { printValue(out, value); });
  }
  Printable printableInstruction(CfgInstructionRef instruction) const {
    return Printable([this, instruction](raw_ostream &out) {
      printInstruction(out, instruction);
    });
  }
};

Iterating over this design in-tree seems perfectly viable to me.

The
comparison with containers is slightly lacking because the potential
surface area is so much larger. But perhaps we can iterate on the
design and find a better way.

Larger surface area somewhat concerns me and why I'd like to see more sort of example usage to get a better sense of what this is expected to abstract over now and in the future.

See the example GenericUniformAnalysis I linked to above.

This is kind of a restatement of what I wrote above: If we can get
general agreement in the community that there is a desire that the
different IRs in LLVM follow common concepts "directly", then we can
iterate towards that over time. I'm personally convinced that unifying
the various IRs is a worthy goal (the introduction of MLIR really was
an unfortunate step backwards as far as that is concerned), it just
feels like it'd be a thankless project because it would go through
_many_ reviews that feel like the one that triggered this thread.

Perhaps - though getting design review/directional buy-in first can go a long way to making incremental reviews much less contentious/easier to commit. And a design that can be implemented via incremental improvement can also mean smaller patches that are easier to review/commit.

If people were paying attention, sure :slight_smile: Unfortunately, that's not how
things seem to work in practice...

I'm all for smaller patches as well, which is part of why I think
in-tree iteration is better.

Cheers,
Nicolai

Responding to another comment:

! In D83088#2351225, @mehdi_amini wrote:

! In D83088#2350287, @nhaehnle wrote:

Hi Mehdi, this is not an appropriate place for this discussion. Yes, we have a general rule that patches can be reverted if they're obviously broken (e.g. build bot problems) or clearly violate some other standard. This is a good rule, but it doesn't apply here. If you think it does, please state your case in the email thread that I've started on llvm-dev for this very purpose. Just one thing:

I strongly disagree: the bar for post-review commit is not lower than pre-commit review.

Again: please revert when someone has a concern, including a *design* concern with you patch.

What the developer policy _actually_ says (notwithstanding your
underhanded attempt to rush a change in
https://reviews.llvm.org/D89995) is that: "if substantial problems are
identified, it is expected that the patch is reverted and fixed
offline". This is also how things tend to play out in practice:
problems are fixed in-tree, without a revert, where possible --
because that's how decent human beings operate. So I'd appreciate it
if you stopped playing policy games and started to discuss the case on
its actual merits.

Has a substantial problem truly been identified?

It seems to me that what is really there in David's comments is a
mixture of two things. There are detail questions that are best
addressed in-tree (I discuss this some more below). And there is the
question of using dynamic polymorphism to write algorithms that are
generic over the type of IR. David is clearly at least uneasy about
this, and it is really only this question that would merit a revert.
So let's focus on that.

I think there are two options:

Maybe David wants to forbid this use of dynamic polymorphism outright.
I think this is unacceptable, so in that case, it seems we have no
choice but to start a formal process for resolving contentious
decisions. If that is really the path we need to go down, then sure,
that constitutes a substantial problem and I will accept it if people
still want me to revert the change while that process is ongoing. (I
would still ask people to _please_ be good citizens and allow us to
make upstream progress in AMDGPU as we usually do while this is
happening -- I explain my reasoning a bit more below -- but I'd accept
it based on the rules that are in effect today. Invoking the formal
process should give all participants the confidence that the question
doesn't just end up dropped on the floor, and that the in-tree status
of the code wouldn't implicitly favor either side of the discussion.)

Maybe David merely feels uncertain about whether dynamic polymorphism
is the best path here. In that case, I frankly don't see how reverting
helps the process in any way. It's fine for people to be uncertain
about something, but in that case we should err on the side of making
progress -- that's usually the best way to learn and get more
information!

(And of course maybe somebody else wants to weigh in on the technical
questions as well.)

P.S.: It's easy to miss on Phabricator, but there is already a long stack of patches which build on this

(this is part of my previous point).

It cuts both ways, actually. When making a design like this one, there
are a number of reasons why you really should have a use of the design
early on as a good software developer. The two main ones:

1. Without attempting to use the design, you have no way of knowing
whether it's any good -- having the use case is pretty much a
prerequisite for iterating on the design.
2. Without having a use case, reviewers will likely just tell you to
come back when you have one, because of 1.
3. We have _real_ problems that we're solving and need to make
progress on. I can't just sit on my hands waiting for some discussion
that people don't pay attention to, and having the dynamic genericity
greatly simplified making this progress so far. So going ahead with
work based on this patch was the right decision. (FWIW, I consider the
ability to use dynamic polymorphism here to be such a great benefit
that doing this will have been a net positive _even if_ I'm forced to
remove it eventually, because turning things into templates after the
fact isn't actually too difficult; it would just hurt
maintainability.)

For the LLVM community, the fact that this code exists already today
is a benefit. Having all of the code in-tree (both this particular
change and its uses) would allow us to iterate on the interfaces while
being able to see how real users of them are affected, as opposed to
mere toy examples. Of course, this is subject to those further changes
being accepted to the tree, and not all of them are at that level of
quality yet, but many of them are.

I understand that some people may be scared that once the code is in
there, it'll be in there forever. But this should only be a concern
for those who object to dynamic polymorphism on principle. As I wrote
above, turning things into templates after the fact is _much_ easier
than going in the other direction or writing the code as templates
from scratch, and I have to make progress on this work anyway. What
this means is that _if_ the LLVM community really were to decide that
this use of dynamic polymorphism is unacceptable (why?!?), I'm already
implicitly signed up to doing this templatification work. And that's
okay -- cost of business of working open-source upstream -- but what's
not okay is if LLVM makes upstream development systematically hard.

Consider:

1. We usually develop AMDGPU upstream as much as possible -- most of
our LLVM developers actually do work directly on upstream and not on a
vendor tree -- and we want to continue doing so.
2. A large amount of other work in AMDGPU implicitly or explicitly
depends on it. Most importantly, GlobalISel needs a divergence
analysis on MachineIR. This means that other developers, who usually
work upstream, depend on this work being available. If you insist on
reverting the change, you may be forcing us to build up new internal
processes for doing less work upstream. I would _really_ like us to
avoid that.

So I would ask you to allow us to make this progress in-tree with the
understanding that the changes that would be required to excise the
dynamic polymorphism again are fairly limited and I'm signed up to
doing them if needed. There really is a question here of how friendly
to upstream development you want LLVM to be, and the timescales that
are involved need to be taken into consideration.

Bottom line: please, let's just figure out where we stand on the
question of dynamic polymorphism for algorithms that are generic over
different types of IR, and then see how we go from there.

Cheers,
Nicolai

Responding to another comment:

! In D83088#2351225, @mehdi_amini wrote:

! In D83088#2350287, @nhaehnle wrote:
Hi Mehdi, this is not an appropriate place for this discussion. Yes, we have a general rule that patches can be reverted if they’re obviously broken (e.g. build bot problems) or clearly violate some other standard. This is a good rule, but it doesn’t apply here. If you think it does, please state your case in the email thread that I’ve started on llvm-dev for this very purpose. Just one thing:

I strongly disagree: the bar for post-review commit is not lower than pre-commit review.

Again: please revert when someone has a concern, including a design concern with you patch.

What the developer policy actually says (notwithstanding your
underhanded attempt to rush a change in
https://reviews.llvm.org/D89995)

Nicolai, I don’t appreciate the way you are mischaracterizing this. There is no rush for a topic that was discussed 4 years ago and that was about codifying the existing practices. If there was any rush I would have landed it immediately after getting approval from Renato and wouldn’t have brought it up to llvm-dev@ directly.

is that: “if substantial problems are
identified, it is expected that the patch is reverted and fixed
offline”. This is also how things tend to play out in practice:
problems are fixed in-tree,

Sorry I have a hard time reading this: you quote a sentence saying that the patch should be reverted and follow by saying that problems are fixed in-tree. I see a contradiction here.

without a revert, where possible –
because that’s how decent human beings operate.

Again, I don’t appreciate your implied personal attacks here: do we have to agree with you to be a decent human being? This is violent, please stop.

So I’d appreciate it
if you stopped playing policy games and started to discuss the case on
its actual merits.

Has a substantial problem truly been identified?

Someone asked you to revert because of design concerns, and you are refusing. At this point this is a policy problem, I don’t want to engage in your attempt to redirect what is an easy process towards the details of whether David is right or wrong or what is the best implementation path for this particular patch right now. From this point of view, everything below is irrelevant to the revert request: you can bring this up with David on LLVM-dev, discuss it, make it an RFC in the end, and in the ideal case (for you) recommit the patch as is.
In the meantime: please revert.

Thanks,

[snip]

is that: "if substantial problems are
identified, it is expected that the patch is reverted and fixed
offline". This is also how things tend to play out in practice:
problems are fixed in-tree,

Sorry I have a hard time reading this: you quote a sentence saying that the patch should be reverted and follow by saying that problems are fixed in-tree. I see a contradiction here.

That's not what the sentence is saying *unless* you just flat-out
assume that a substantial problem has been identified. That isn't
clear to me at all. Admittedly the adjective "substantial" is
subjective, which is why I was trying to get clarification here. I
don't know why you're trying to derail that.

And I'm not just being facetious or purposefully obnoxious. Even after
re-reading the relevant comments, it honestly isn't clear to me which
parts (if any) anybody considers substantial problems and which parts
are musings on potential alternatives or what.

[snip]

So I'd appreciate it
if you stopped playing policy games and started to discuss the case on
its actual merits.

Has a substantial problem truly been identified?

Someone asked you to revert because of design concerns, and you are refusing.

Let's disentangle those two things, please. The fact that I'm refusing
to revert (for now, pending further discussion and clarification!) is
not a substantial problem with the code change. So please stop
yourself for a moment, take a step back, and don't try to turn this
into some sort of Kafkaesque scenario along the lines of "well, now
that you refuse to revert, I'm going to use *that* as the argument for
why you must revert!" The LLVM community should be better than that.

Design concerns can be any number of things. They can be as simple as
"I prefer a different name for this class/method/variable" (just
rename it in-tree...) or as significant as "the entire approach is
unacceptable to me" (revert and back to the drawing board).

At this point this is a policy problem, I don't want to engage in your attempt to redirect what is an easy process towards the details of whether David is right or wrong or what is the best implementation path for this particular patch right now. From this point of view, everything below is irrelevant to the revert request: you can bring this up with David on LLVM-dev, discuss it, make it an RFC in the end, and in the ideal case (for you) recommit the patch as is.

I have brought it up on llvm-dev, that's what this thread was
originally trying to do :slight_smile:

Look, let's be clear on one thing here. The problem is that this is
happening after David was away without any sign of life for two
months. I appreciate that he apologized for that and I wouldn't blame
him in any case, I have problems with keeping up on reviews myself.
But it's a fact, and it makes what you're proposing here seem a bit
impractical. There seems to be a high chance that we are just going to
end up in the exact same place two months from now. What you claim to
be an easy process has already been proven to not be easy after all.

That's why I'm looking for reliable ways forward. Mind you, I still
accept the existing policies and would therefore revert if somebody
actually gave a clear reason. Yes, that would require some work of
distillation/synthesis being made by somebody who wants it to be
reverted. I find that a reasonable ask, because reverting would be
pointless anyway if the person asking for it is unwilling to engage in
review discussion.

In my last emails I have been trying to find ways of re-establishing
trust that progress will actually be made. For example, that's why I
was trying to get a feeling for whether people want a formal process
for contentious decisions, and if so, what decision we'd even be
talking about. Getting clarity on what the perceived problems with the
patch really are is an important part of that. It's counterproductive
of you to try to shut that down.

(FWIW, I haven't heard anything from David in a while so I'm also
trying to reach out in private, just in case that's preferred.)

Cheers,
Nicolai

Maybe David wants to forbid this use of dynamic polymorphism outright.
I think this is unacceptable, so in that case, it seems we have no
choice but to start a formal process for resolving contentious
decisions.

If this ends up being the case, then you must revert the patch until the matter can be solved.

But as I said before, I would have reverted it a long time ago.

(I would still ask people to please be good citizens and allow us to
make upstream progress in AMDGPU as we usually do while this is
happening – I explain my reasoning a bit more below – but I’d accept
it based on the rules that are in effect today. Invoking the formal
process should give all participants the confidence that the question
doesn’t just end up dropped on the floor, and that the in-tree status
of the code wouldn’t implicitly favor either side of the discussion.)

The AMD backend doesn’t trump a high-level CFG design decision that affects all back-ends, front-ends and middle-ends.

If you progress your AMD work on your current assumption and it turns out people decide against it you will have to revert all of it, which is a lot more substantial (and a potentially dangerous merge) than just this CFG change.

I’d also strongly advise people reviewing the remaining work from approving the patches until this matter can be resolved. This is not a trivial issue and can have further consequences than just a simple revert.

And please, do not assume what being a good citizens is. We can all be good citizens and still overwhelmingly disagree with each other, as long as we keep it civil.

I see no evidence of lack of civility from either David or Mehdi. On the contrary, they’re being extremely kind and patient.

cheers,
–renato

Hi Nicolai,

I’ve been watching this and the associated review that +Mehdi AMINI brought up as far as the process.

I think you should revert this patch and anything dependent upon it until the review is complete. Dave has many good points in his review and while you pinged there needs to be resolution before applying. In particular, he’s probably the most active reviewer in exactly this space right now and is obviously also the right reviewer for this patch.

Please revert immediately and thanks. I’m sorry the patch has gotten contentious, but this is a fairly major overhaul and it does happen sometimes.

Thanks.

-eric

Also, if this hasn’t happened already, I would recommend some 1-on-1 (at least over chat, or if possible video call) between Nicolai and Dave. In the past, I have found this to fairly quickly come to consensus about design direction (though of course please update any relevant threads with the takeaways of such private discussions!).

– Sean Silva

Hi David,

So I propose that we discuss perceived shortcomings of the design
here. If folks here can provide a sense of changes to the design
they’d actually be happy with, then I’m happy to spin up the required
patches to improve the design.

I’m still fairly concerned this isn’t a matter of fixing it in-tree & not especially happy with the inversion that’s occurred here (now it’s “this goes ahead unless I can justify why it shouldn’t”, whereas previously it was “this doesn’t go ahead unless you can justify why it should” - that’s a pretty significant change) which is why I’d like to revert the patch first, and discuss second.

This is generally the bar for LLVM reviews - if there are outstanding concerns, patches don’t move forward. This doesn’t mean arbitrary hold-up, and there’s admittedly some problems with how to resolve irreconcilable differences, usually by escalating to other, more senior community members to make a determination. Ultimately, given the fairly large surface area/intended use of this abstraction, I think it might rise to the level of using the proposed “contentious decision” process discussed here: https://github.com/llvm/llvm-www/blob/master/proposals/LP0001-LLVMDecisionMaking.md

Yes, which is part of why I thought we should move this to llvm-dev.

One issue I’m having with this discussion if I’m being honest is that
despite all the back and forth I still barely know what your opinion
is, or whether you even have one.

Ah, sorry - I’ll try to state a few parts of my views more directly:

Most importantly, this seems like a new fairly significant abstraction intended for use when writing LLVM analyses (but not transformations, I assume?) - more heavyweight (no matter the implementation choice) than GraphTraits, which already is itself showing its weight (I don’t think it would’ve been designed the way it is if the use cases/amount of use were known when it was implemented - and I think it’s probably worth a bit of a revisit at this point, as Alina and I have discussed a fair bit around the DomTree work). Given its intended use I think it merits more scrutiny than it has had - that’s why I’d like it reverted so we can do that.

I think the RFC and patch probably didn’t receive as much review/attention as they were due may be in part due to the thread title/naming - as we’ve noted in this thread, “CFG” doesn’t, I think, capture enough of what this abstraction is intended to cover (a CFG is a graph with certain semantics about what it represents - a CFG doesn’t necessarily have values/SSA/etc). I think if a thread were proposed “A common abstraction over LLVM IR and MLIR for analyses” it might garner more attention from relevant developers (not necessarily, but at least a better chance/easier to explain to people why they should care about the design of this API).

My initial thinking was that this was closer to GraphTraits - and some of the descriptions around what makes GraphTraits difficult to work with and why that motivates a solution involving dynamic polymorphism to me, sounded not especially C++ idiomatic - not that I think we should write everything with generic templates and layers of C++ metaprogramming - but I do want to make sure the API is fairly idiomatic to C++ rather than, by the sounds of part of the description, butting up against those design concepts and abstractions.

To the actual design: I think traits are probably not the best idea - I think we’ve seen that with GraphTraits being hard to use/unweildy/limiting (not entirely a generic problem with traits, partly a problem with the way GraphTraits are implemented (eg: edge queries only taking a node, instead of a graph and a node)). They’re useful when an API is immutable and you need to adapt things (see iterator traits), but LLVM is quite mutable and moving LLVM IR and MIR to have more common API syntax seems both good for generic algorithms and good for developers even when they’re not writing template-generic code.

With a syntactically common API (ie: I don’t mean shared functions, but API with the same names, etc - ie: with LLVM IR and MIR implementing the same C++ concept, insofar as is needed for analyses) - I think it’d be fairly sensible/easy enough to write analyses in terms of that concept (same as writing C++ container-generic algorithms such as std::sort, etc) - but I’d also be open to discussing a runtime polymorphic API on top, though I do want to push back on that a fair bit due to some of the inherent complexity needed to balance performance (eg: cost of manifesting edge lists/use lists, rather than being able to iterate them directly) and API complexity (eg: same thing, but from a usability perspective - not having the more common iterator api, instead a “populate this vector” api, but iterator solutions here also probably wouldn’t be great due to the runtime overhead of type erased iterators, could potentially use “narrower” iterators that don’t implement the C++ iterator concept, but instead just a single virtual “Optional Next()” function, for instance - reducing the number of virtual calls compared to type erased iterators).

I think I asked in one of my early rounds of review, I think it’d be really helpful to have simple (not real-world algorithms, but trivial code snippets) examples of how code walking the CFG’s features would look under different implementations, eg:

  1. raw writing against the two abstractions, LLVM IR and MIR - two completely separate functions that exercise all the common features that are of interest
  2. a potential common API concept and the trivial walking algorithm in the form of a template over both IR types
  3. CfgTraits
  4. CfgInterface

I think that’d help ground the discussion in the proposal(s) and make it easier to see the tradeoffs.

We may be better off just stopping this email thread at this point ^ and discussing that stuff, rather than going down to the point-by-point discussion later in this email (I’ve debated not replying to the rest of the email to try to focus the discussion - these really long emails get exponentially costlier for me from both a practical time perspective, but also on a deeper level in terms of how it feels to try to write another long-winded reply).

This puts me in a difficult situation. I have no indication of whether
there is any version of this that you would accept, and if so, what
it would look like, so there is no way forward for me. In the
meantime, there is a substantial (and increasing) amount of work that
builds on what we’re talking about here.

Depends what you mean by “version of this” - if by that you mean “some abstraction over the general SSA graph nature of LLVM IR and MIR” yes, I believe I understand the value in having such an abstraction & think there are options within that space that are acceptable to me. It’s possible that they don’t involve type traits or dynamic polymorphism, though.

Other folks weighing in would be great & certainly there’s a line at which my opinion isn’t the gatekeeper here if other folks are in favor.

Agreed. Part of the dysfunction here is that it’s difficult to get
people to pay attention in the first place. The few folks who replied
to my llvm-dev thread in July did have questions but seemed to be
generally okay with the direction.

Yeah, can be a bit like pulling teeth, for sure. I think some of the issue might’ve been the subject line - people probably filter pretty aggressively by that & might’ve missed some of the purpose of the proposal because of it (I think I did). But also we’re all just busy people with our own stuff to do too - so sometimes takes actively seeking out people who might have a vested interest in pieces of common infrastructure, etc.

[0] With that in mind, let me “import”
the last comment on the Phabricator review:

Ah, I see, the “append” functions are accessors, of a sort. Returning a container might be more clear than using an out parameter - alternatively, a functor parameter (ala std::for_each) that is called for each element, that can then be used to populate an existing container if desired, or to do immediate processing without the need for an intermediate container.

The code is trying to strike a balance here in terms of performance. Since dynamic polymorphism is used, a functor-based traversal can’t be inlined and so the number of indirect function calls increases quite a bit. There are a number of use cases where you really do want to just append successors or predecessors to a vectors, like during a graph traversal. An example graph traversal is here: https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericConvergenceUtils.cpp#L329

One way to simplify the dynamic polymorphism overhead of iteration would be to invert/limit the API - such as having a “node.forEachEdge([](const Edge& E) { … });” or the like.

I assume you’re not talking about runtime overhead here? If we’re
talking about that, I’d expect that variant to often be slower,
although I obviously don’t have numbers.

I think I was referring to runtime overhead - you mentioned the increase in indirect calls. A callback-like API can reduce the number of such calls. (rather than one virtual call per increment, inequality test, and dereference - there would be only one polmorphic call per iteration)

CfgInterface::appendSuccessors requires only a single indirect call
for the entire loop, which is why I chose to do that over some
getSuccessorByIndex-type interface.

Oh, indeed - hadn’t mentioned that tradeoff, but that one then comes at a memory overhead cost of having to populate that memory/destroy it/etc.

A CfgInterface::forEachSuccessor method is a trade-off: it’s an
interface that is more in line with patterns you see elsewhere
(std::for_each), at the cost of more indirect calls.

Yep - and even then it’s limiting in that you can’t partially iterate, or save where you’re up to(without incurring more memory usage by giving the resulting container an even longer lifetime). (I’ve seen both of those use-cases with GraphTraits, for instance).

[snip[

I don’t mean to treat CfgGraph to be thought of like the template parameter to, say, std::vector - I meant thinking of CfgGraph as something like std::vector itself. Rather than using traits to access containers in the C++ standard library, the general concept of a container is used to abstract over a list, a vector, etc.

eg, if you want to print the elements of any C++ container, the code looks like:

template
void print(const Container &C, std::ostream &out) {
out << ‘{’;
bool first = true;
for (const auto &E : C) {
if (!first)
out << ", ";
first = false;
out << E;
}
out << ‘}’;
}

Which, yes, is much more legible than what one could imagine a GraphTraits-esque API over
containers might be:

template<typename Container, typename Traits = ContainerTraits>
void print(const Container &C, std::ostream &out) {
out << ‘{’;
bool first = true;
for (const auto &E : Traits::children(C)) {
if (!first)
out << ", ";
first = false;
out << Traits::element(E);
}
out << ‘}’;
}

Or something like that - and the features you’d gain from that would be the ability to sort of “decorate” your container without having to create an actual container decorator - instead implementing a custom trait type that, say, iterates container elements in reverse. But generally a thin decorator using the first non-traits API would be nicer (eg: llvm::reverse(container) which gives you a container decorator that reverses order).

I agree that the first form is nicer. I’d raise two points.

First, the first form works nicely if the concept of container is
sufficiently well understood and all relevant containers really are
sufficiently similar.

I’m not sure how a dynamic interface is likely to be significantly different/better in this regard.

It’s mostly orthogonal. Though dynamic polymorphism does end up having
better compiler support for enforcing concepts. This is a common
theme in this discussion: the tooling support for dynamic polymorphism
is far better. That matters.

Agreed, it does. I’m a bit concerned it’s being too heavily weighted though.

Neither is the case here. We’re trying to cover
different pre-existing IRs, and even though all of them are SSA IRs,
the differences are substantial.

I guess that’s another thing that might benefit from further clarity: Currently we have generic graph traits, notionally that’s for any graph (collection of nodes and edges). The proposed abstraction is for CFG graphs, but due to the added layers of abstraction complexity, dynamic and static APIs, etc, it’s not been especially clear to me what makes this abstraction specific to/only for CFG graphs[1] (& then how that relates to IR I’m not sure either - control flow seems like it would be fairly independent of the values within that graph).

Maybe it would help if CfgTraits was renamed to SsaTraits? Or
SsaCfgTraits? Being able to do some limited inspection of the SSA
values (and instructions, really) is key to what we’re trying to do
here – there’s an example below.

SsaTraits, maybe? Not sure. But that is probably closer than CfgTraits, I think/guess.

If this was only about nodes and edges, I would have added dynamic
polymorphism on top of the existing GraphTraits.

Currently, we only really care about
LLVM IR and MachineIR in SSA form; their main differences is that the
concept of “value” is a C++ object in LLVM IR, while in MIR-SSA it’s a
plain unsigned int (now wrapped in a Register). MLIR would add yet
another way of thinking about values. The fact that we have to work
with those pre-existing IRs means we have to compromise.

Not sure I follow what compromises you’re referring to - if we have a container abstraction, some containers are over ints, some over objects, etc - the container can specify via typedefs, for instance, what type its elements are.

Simple example: Let’s say we have an SSA value and need to find the
basic block in which it is defined.

In LLVM IR, that’s value->getParent(), where value is an llvm::Value*.

In SSA-form MachineIR, that’s
m_machineRegInfo->getVRegDef(value)->getParent(), where value is an
llvm::Register.

In MLIR, that’s value.getParentBlock(), where value is an mlir::Value.

This kind of operation is needed for a complete divergence analysis,
and so it needs to be abstracted somehow.

Yeah, I understand their APIs are different now - but I think there might be value in trying to make them more common. If it weren’t for LLVM IR handling Value*s all the time, I’d say moving to structs for the value handles and having everything look roughly like MLIR. As that API change is probably too invasive for LLVM IR, having some kind of “module.getParent(value)” on each of these 3 entities seems feasible.

For this specific example - perhaps we could have a Module::getParent(Value*) function and a MIRModule (whatever that is) getParent(value) too. I agree it’s not perfect/maybe the LLVM IR one at least owuld only end up being used by SSA-generic code (though the MIR one might actually be more convenient/readable than the current version) so maybe it’s not so far from traits (if it’s only used for generic code, then it being in a generic code utility like a traits class wouldn’t be too surprising - but if it can make the MIR and LLMIR more commonly usable then I think that’s an improvement for generic code and for generic developers moving between these abstractions with less mismatch between the APIs/having to remember which set of APIs they’re using).

Second, none of this applies to dynamic polymorphism, which is really
the main goal here at least for me.

And one I’m trying to push back on - my hope is that a common set of abstractions (much like the container concepts in the STL) would be suitable here. To potentially subsume the existing GraphTraits (could have a “GraphTraits adapter” that gives a Graph-concept-implementing-container given GraphTraits) and potentially has layers to add on more expressive/narrower abstractions (whatever extra features are relevant to CFG graphs that aren’t relevant to all graphs in general([1] above - not super sure what those are)) - much like generic C++ containers, to sequential containers, random access containers, etc - various refinements of the general concept of a container.

What I’m especially interested in is not having two distinct concepts of graphs and graph algorithms with no plan to merge/manage these together, since as you say, the algorithms are complicated enough already - having two distinct and potentially competing abstractions in LLVM seems harmful to the codebase.

It seems to me that there are two orthogonal issues here.

One is whether dynamic polymorphism can be used or not. If you’re in
fundamental opposition to that, then we have a serious problem (a bit
on that further down). Dynamic polymorphism doesn’t contradict the
goal of a common set of abstractions.

Yep, indeed - I agree they’re two issues. If the 3 IRs implemented the same C++ concept we could still talk about whether or not to add a dynamically polymorphic API on top of them. (& conversely, we could have traits without dynamic polymorphism - ala GraphTraits).

The other is how exactly the abstractions are factored. For example,
on the dynamic polymorphism side we could have:

  • a GraphInterface class which provides methods for iterating
    predecessors and successors of CfgBlockRef (presumably we’d rename
    that to something like NodeRef?), and
  • an SsaCfgInterface class which provides additional methods for
    instructions and values.

The question is whether you’d be willing to support something like
that eventually, once we’ve hashed it out.

I don’t think I’d say I’m fundamentally opposed to the introduction of a dynamically polymorphic abstraction over IR - but I have serious doubts/concerns and I think adding it requires buy-in from a lot more folks before I think it should be accepted in LLVM. (coming back to edit this: May not be many more folks to reach out for opinions, I’ve already poked a few folks - so might at least get some more voices, but not many more I’m guessing)

Not sure where that leaves us. We need something like CfgTraits to
cover the substantial differences between the IRs, but perhaps we can
work over time to reduce those differences? Over time, instead of
relying on CfgTraits for everything in CfgInterfaceImpl, we could
instead use unified functionalities directly?

Some of that could probably already be done today. For example, we
could probably just use llvm::successors

By way of example, I /think/ the future we’d probably all want for llvm::successors would be to move towards entities having their own successors() function rather than the non-member/ADL-based lookup. (the same way that it’s far more common/readable/expected to use x.begin() rather than “using std::begin; begin(x);” construct, or wrapping that in some std::adl_begin(x) function)

Makes sense, though it doesn’t really respond to what I was writing
here. My point was about how the bridge between the world of dynamic
polymorphism and that of static polymorphism works.

Given how long GraphTraits has lived (admittedly, without any particular assessment that the direction was to deprecate/remove it by having more API compatibility built-in - though, admittedly, it’s abstracting over more different kinds of graphs than the new thing is likely to (owing to having fewer semantics so broader applicability) so it’s a bit more viable in its situation, even though even then it’s still a bit of a pain to work with) I’d be hesitant to start by adding a new thing like that with the expectation that it would be eliminated over time - I think it’s feasible/reasonable to frontload the work of making the relevant CFG graphs API compatible.

directly in
CfgInterfaceImpl::appendSuccessors (and your proposed
forEachSuccessor) instead of going via CfgTraits.

If you had a runtime polymorphic API over containers in C++, then it might look something
like this:

void print(const ContainerInterface& C, std::ostream& out) {
out << ‘{’;
bool first = true;
C.for_each([&](const auto &E) {
if (!first)
out << ", ";
first = false;
E.print(out);
});
out << ‘}’;
}

Yes, though it wouldn’t quite work like that with the IRs we have in
LLVM. Most of them go to great length to avoid embedding virtual
methods in the IR classes themselves. That’s why you’d more likely see
something like the following:

void print(const CfgInterface& iface, CfgBlockRef block, raw_ostream& out) {
out << "Successors of " << iface.printableName(block) << ‘:’;
iface.forEachSuccessor(block, [&](CfgBlockRef succ) {
if (first)
out << ’ ';
else
out << ", ";
first = false;
iface.printName(succ, out);
});
out << ‘\n’;
}

The CfgBlockRef / CfgValueRef / etc. appear in the design to avoid
having an abstract base class for the different IRs, and also to cover
the difference between LLVM IR Value * and MIR-SSA Register.

If a dynamically polymorphic API is needed/ends up being chosen, I’d /probably/ be inclined to try to add an abstract base class for blocks too (unclear about values) - but not wedded to it. In general I’m more pushing back on the dynamically polymorphic API in general than what it might look like if it exists. In part because of the complexities of having opaque handle objects that are passed out and back into APIs to model things, rather than being able to model them more directly. The APIs become a bit unwieldy because of that, in my opinion. (again, looking at GraphTraits awkwardness of load-bearing node pointers - though having the graph as a parameter (‘this’ parameter or otherwise) to the edge/successor/etc queries (rather than having the queries only based on the node) would go a long way to addressing that particular shortcoming if such a token/ref-passing API were needed)

In the end, it boils down to a tooling issue, which in turn is a C++
language issue. When you’re writing template code, you’re basically
throwing the type system out of the window. Sure, it comes back in
once you instantiate the template, but that doesn’t help if all the
complexity is in the template code. The kinds of algorithms we’re
trying to write have enough inherent complexity as it is. Asking us to
simultaneously fight the accidental complexity caused by how C++
templates work is unreasonable. The sentiment that C++ templates hurt
here has been shared in this discussion by people with experience
working on the dominator tree implementation (which is arguably a
simpler problem than what we’re trying to solve).

Keep in mind that it’s not just a question of current development but
also of maintainability going forward.

It would help to understand which APIs you think are becoming
unwieldy, because you may be imagining something misleading. For
example, consider the updated generic dominator tree. It has a
GenericDominatorTreeBase class which is type-erased and provides all
the usual query functions in terms of opaque handles. Then there is a
derived DominatorTreeBase<NodeT> class. If you’re writing a
non-generic algorithm, or a template-generic algorithm, you never
interact with GenericDominatorTreeBase directly. Instead, you interact
with DominatorTreeBase<NodeT>, which has all the usual methods where
you pass in NodeT* instead of opaque handles. You only interact with
GenericDominatorTreeBase directly if you’re writing a
dynamically-generic algorithm, where you consciously chose to enter
the world of opaque handles.

There ends up being some boilerplate for wrapping/unwrapping opaque
handles in DominatorTreeBase<NodeT>, but it’s pretty straightforward
and hidden from users of the class.

(FWIW, the updated dominator tree analysis itself is still implemented
as a template. I converted it to dynamic polymorphism as an
experiment, and there was one outlier test case with 1-2% compile time
performance loss. I haven’t pursued this further, and I don’t want to
convert this particular algorithm just for the sake of it. The update
to dominator trees is only intended to allow other, more complex
algorithms to be built on top of it using dynamic polymorphism.)

If you’re looking for an example of an algorithm that is written using
dynamic polymorphism, you could take a look here:
https://github.com/nhaehnle/llvm-project/blob/controlflow-wip-v7/llvm/lib/Analysis/GenericUniformAnalysis.cpp

I’d really like to see examples like this ^ using the different abstractions under consideration here (classic GraphTraits, CfgTraits dynamic and static typed, perhaps what a static API would look like if it wasn’t trying to be dynamic API compatible).

I don’t really care that much about what the static side of things
looks like. CfgTraits is the way it is on purpose because I felt it
would be useful to have a homogenous view of what generic aspects of
IR will ultimately be exposed by the CfgInterface. In the absence of
compiler-enforced concepts, that feels helpful to me. If you put
yourself in the shoes of an author of an algorithm that is supposed to
be generic over IRs,

Perhaps that’s part of my confusion - generic over IRs sounds like a far richer/more complicated API than generic over CFGs.

Yes, indeed :slight_smile:

you’ll realize that it’s great to have some easy
way of discovering which things really are generic over IRs.

Writing out the concept would be a good start, I think, same as the C++ standard does for containers. (not to the kind of C++ language-lawyerly level of detail, but the basic summary of the conceptual API - independent of any of the implementation choices so far considered - this is why I’d really like to see especially trivial/mock use cases, yeah, they won’t be representative of the full complexity of the API in use, but give a sense of what the semantics of the API are so we can consider different syntactic/implementation choices)

The complete set of operations we seem to need is in CfgInterface and
CfgPrinter. I do have some further development on it, where they look
as follows:

class CfgInterface {
virtual void anchor();

public:
virtual ~CfgInterface() = default;

/// Escape-hatch for obtaining a printer e.g. in debug code. Prefer to
/// explicitly pass a CfgPrinter where possible.
virtual std::unique_ptr makePrinter() const = 0;

virtual CfgParentRef getBlockParent(CfgBlockRef block) const = 0;

virtual void appendBlocks(CfgParentRef parent,
SmallVectorImpl &list) const = 0;

virtual bool comesBefore(CfgInstructionRef lhs,
CfgInstructionRef rhs) const = 0;

virtual void appendPredecessors(CfgBlockRef block,
SmallVectorImpl &list) const = 0;
virtual void appendSuccessors(CfgBlockRef block,
SmallVectorImpl &list) const = 0;
virtual ArrayRef
getPredecessors(CfgBlockRef block,
SmallVectorImpl &store) const = 0;
virtual ArrayRef
getSuccessors(CfgBlockRef block,
SmallVectorImpl &store) const = 0;

virtual void appendBlockDefs(CfgBlockRef block,
SmallVectorImpl &list) const = 0;
virtual CfgBlockRef getValueDefBlock(CfgValueRef value) const = 0;
};

class CfgPrinter {
virtual void anchor();

protected:
const CfgInterface &m_iface;

CfgPrinter(const CfgInterface &iface) : m_iface(iface) {}

public:
virtual ~CfgPrinter() {}

const CfgInterface &interface() const { return m_iface; }

virtual void printBlockName(raw_ostream &out, CfgBlockRef block) const = 0;
virtual void printValue(raw_ostream &out, CfgValueRef value) const = 0;
virtual void printInstruction(raw_ostream &out,
CfgInstructionRef instruction) const = 0;

Printable printableBlockName(CfgBlockRef block) const {
return Printable(
[this, block](raw_ostream &out) { printBlockName(out, block); });
}
Printable printableValue(CfgValueRef value) const {
return Printable(
[this, value](raw_ostream &out) { printValue(out, value); });
}
Printable printableInstruction(CfgInstructionRef instruction) const {
return Printable([this, instruction](raw_ostream &out) {
printInstruction(out, instruction);
});
}
};

Iterating over this design in-tree seems perfectly viable to me.

The
comparison with containers is slightly lacking because the potential
surface area is so much larger. But perhaps we can iterate on the
design and find a better way.

Larger surface area somewhat concerns me and why I’d like to see more sort of example usage to get a better sense of what this is expected to abstract over now and in the future.

See the example GenericUniformAnalysis I linked to above.

I think some smaller examples of the different API options would be really helpful for me to understand the tradeoffs here.

This is kind of a restatement of what I wrote above: If we can get
general agreement in the community that there is a desire that the
different IRs in LLVM follow common concepts “directly”, then we can
iterate towards that over time. I’m personally convinced that unifying
the various IRs is a worthy goal (the introduction of MLIR really was
an unfortunate step backwards as far as that is concerned), it just
feels like it’d be a thankless project because it would go through
many reviews that feel like the one that triggered this thread.

Perhaps - though getting design review/directional buy-in first can go a long way to making incremental reviews much less contentious/easier to commit. And a design that can be implemented via incremental improvement can also mean smaller patches that are easier to review/commit.

If people were paying attention, sure :slight_smile: Unfortunately, that’s not how
things seem to work in practice…

I’m all for smaller patches as well, which is part of why I think
in-tree iteration is better.

I meant for a different direction - if we got buy-in for making LLVM IR and MIR implement the same C++ concept (& wrote down what that concept was - which APIs, how they look textually) then I think it would be relatively easy to get the small/mechanical renaming patches reviewed. Incremental from day 1, not big new abstraction, then incremental.

  • Dave

Hi David,

Thanks for taking the time to clear up some things more directly. I've
reverted the change(s) for now.

One issue I'm having with this discussion if I'm being honest is that
despite all the back and forth I _still_ barely know what your opinion
is, or whether you even have one.

Ah, sorry - I'll try to state a few parts of my views more directly:

Most importantly, this seems like a new fairly significant abstraction intended for use when writing LLVM analyses (but not transformations, I assume?)

Yes, that's correct. My thinking is that (1) trying to use this for
transformations would increase the surface area by far too large an
amount, and (2) the different IRs correspond roughly to different
parts of the compiler pipeline, and transformations tend to have a
specific point in the pipeline where they "want" to live. (There are
arguable exceptions like InstCombine, but that's far too big a rock to
address here.)

- more heavyweight (no matter the implementation choice) than GraphTraits, which already is itself showing its weight (I don't think it would've been designed the way it is if the use cases/amount of use were known when it was implemented - and I think it's probably worth a bit of a revisit at this point, as Alina and I have discussed a fair bit around the DomTree work). Given its intended use I think it merits more scrutiny than it has had - that's why I'd like it reverted so we can do that.

I think the RFC and patch probably didn't receive as much review/attention as they were due may be in part due to the thread title/naming - as we've noted in this thread, "CFG" doesn't, I think, capture enough of what this abstraction is intended to cover (a CFG is a graph with certain semantics about what it represents - a CFG doesn't necessarily have values/SSA/etc). I think if a thread were proposed "A common abstraction over LLVM IR and MLIR for analyses" it might garner more attention from relevant developers (not necessarily, but at least a better chance/easier to explain to people why they should care about the design of this API).

My initial thinking was that this was closer to GraphTraits - and some of the descriptions around what makes GraphTraits difficult to work with and why that motivates a solution involving dynamic polymorphism to me, sounded not especially C++ idiomatic - not that I think we should write everything with generic templates and layers of C++ metaprogramming - but I do want to make sure the API is fairly idiomatic to C++ rather than, by the sounds of part of the description, butting up against those design concepts and abstractions.

To the actual design: I think traits are probably not the best idea - I think we've seen that with GraphTraits being hard to use/unweildy/limiting (not entirely a generic problem with traits, partly a problem with the way GraphTraits are implemented (eg: edge queries only taking a node, instead of a graph and a node)). They're useful when an API is immutable and you need to adapt things (see iterator traits), but LLVM is quite mutable and moving LLVM IR and MIR to have more common API syntax seems both good for generic algorithms and good for developers even when they're not writing template-generic code.

With a syntactically common API (ie: I don't mean shared functions, but API with the same names, etc - ie: with LLVM IR and MIR implementing the same C++ concept, insofar as is needed for analyses) - I think it'd be fairly sensible/easy enough to write analyses in terms of that concept (same as writing C++ container-generic algorithms such as std::sort, etc) - but I'd also be open to discussing a runtime polymorphic API on top, though I do want to push back on that a fair bit due to some of the inherent complexity needed to balance performance (eg: cost of manifesting edge lists/use lists, rather than being able to iterate them directly) and API complexity (eg: same thing, but from a usability perspective - not having the more common iterator api, instead a "populate this vector" api, but iterator solutions here also probably wouldn't be great due to the runtime overhead of type erased iterators, could potentially use "narrower" iterators that don't implement the C++ iterator concept, but instead just a single virtual "Optional<Value> Next()" function, for instance - reducing the number of virtual calls compared to type erased iterators).

I think I asked in one of my early rounds of review, I think it'd be really helpful to have simple (not real-world algorithms, but trivial code snippets) examples of how code walking the CFG's features would look under different implementations, eg:
1) raw writing against the two abstractions, LLVM IR and MIR - two completely separate functions that exercise all the common features that are of interest
2) a potential common API concept and the trivial walking algorithm in the form of a template over both IR types
3) CfgTraits
4) CfgInterface

I think that'd help ground the discussion in the proposal(s) and make it easier to see the tradeoffs.

I'm going to try to take this block as a whole rather than cut it to pieces.

First, the motivation for doing dynamic polymorphism has always been
about tooling, programmer convenience, and maintainability. Here's an
example:

  struct DynamicExample {
    SmallVector<CfgValueRef, 4> vec;

    void foo(CfgValueRef v, unsigned w) {
      vec.push_back(w);
    }
  };

  template <typename SomeTraits> struct StaticExample {
    using ValueRef = typename SomeTraits::ValueRef;

    SmallVector<ValueRef, 4> vec;

    void foo(ValueRef v, unsigned w) {
      vec.push_back(w);
    }
  };

DynamicExample and StaticExample are trying to do the same thing and
contain the same error. A decent IDE will flag the error in the
DynamicExample but not in the StaticExample, where the earliest point
where a tool will inform you about the error is when the compiler
tries to instantiate the template. I say the earliest point, because
StaticExample can be successfully instantiated if SomeTraits refers to
MachineIR, in which case ValueRef is a Register and so an implicit
conversion from unsigned is possible. The error message you get from
the compiler in the StaticExample case is also worse (admittedly not
by much in this simple example).

Obviously one can have different opinions about how bad these
downsides of the static approach are. I personally found it a huge
relief to be able to focus on the actual algorithms.

One thing I'd like to emphasize is that the example doesn't contain
any _real_ surface area of the traits. That is on purpose: in the code
I'm writing using this machinery, the fraction of code that directly
invokes the API of the IR is very small. It's grating when a large
body of code is forced to pay the price of worse tooling because of a
comparatively small part of it.

I definitely appreciate the problem of iteration. At one point in the
development of the various algorithms, I wanted to do a DFS of the
block graph. llvm::depth_first obviously can't be used on CfgBlockRef
as-is. This could be addressed if the GraphTraits API was changed to
be aware of a graph object, as you alluded to. I didn't go down that
path because I didn't want to change too much at once. As of right
now, I don't even need (unmodified) DFS any more, but it's worth
considering.

That said, I do agree with you that just directly being able to use
llvm::successor-style iteration (or block->successors()) would be
best. My problem with it is that this is only compatible with dynamic
polymorphism / type erasure if all relevant basic block types have a
common base class. You hedged your language a bit when you talked
about this further down, but if you think this is an option we can
seriously pursue, then I'd actually be very happy to do so because
it's clearly a better solution. This would also significantly reduce
the surface area where dynamic method dispatch is needed in the first
place, so perhaps it would make dynamic polymorphism more palatable to
you.

Specifically, I think it's plausible that we could introduce common
base classes for basic block and instruction concepts for all of {LLVM
IR, MLIR, MachineIR}. This would be far more invasive than the change
I proposed, but it _would_ be better. I don't think having a common
base class for values is feasible in a reasonable timeframe, because
the IRs are just way too different in that area. The surface area
could be reasonably limited though, to basically:

- Value -> Instruction that defines it
- Instruction -> operand Values

Speaking of reasonable timeframes, do you have any thoughts on that?
We have a lot of important changes riding on this ability, including
superficially unrelated ones like GlobalISel, as I mentioned before. I
don't want to rush things artificially, but at the same time I'd
appreciate it if we didn't let the perfect be the enemy of the good.
For example, further down you mention massaging LLVM to use something
more like the mlir::Value structs. I'd be totally on board with that,
having had the same idea myself already; but we can't wait on such a
large change.

We may be better off just stopping this email thread at this point ^ and discussing that stuff, rather than going down to the point-by-point discussion later in this email

Yeah, that's reasonable.

Cheers,
Nicolai

(I've debated not replying to the rest of the email to try to focus
the discussion - these really long emails get exponentially costlier
for me from both a practical time perspective, but also on a deeper
level in terms of how it feels to try to write another long-winded
reply).

I've talked about this a bit offline a bit with Alina and David, and
am going to follow up with a new from-scratch RFC on llvm-dev.

Cheers,
Nicolai

I agree with Eric here. There is clearly controversy w.r.t. this patch and regardless of what the formally written policy is, it seems best to revert this and resolve the outstanding issues. Keeping it in tree will cause more things to be built on top of it, and over constrains the discussion about getting to the best possible design.

Sean’s point is also excellent: often it is best to pop out of email and into a higher bandwidth communication form (voice, video chat, etc) when hitting controversy like this.

-Chris

I agree with Eric here. There is clearly controversy w.r.t. this patch and regardless of what the formally written policy is, it seems best to revert this and resolve the outstanding issues. Keeping it in tree will cause more things to be built on top of it, and over constrains the discussion about getting to the best possible design.

Sean’s point is also excellent: often it is best to pop out of email and into a higher bandwidth communication form (voice, video chat, etc) when hitting controversy like this.

Yep, Nicolai’s reverted the patch (see: http://lists.llvm.org/pipermail/llvm-dev/2020-October/146107.html ) - the conversation’s got a bit fragmented between the review thread, new RFC thread, and rename of that thread) and Alina, Nicolai, and myself chatted offline on Thursday - think we came to some pieces of common understanding. Nicolai’s going to work up a new RFC ( https://reviews.llvm.org/D83088#2365320 , http://lists.llvm.org/pipermail/llvm-dev/2020-October/146258.html ) we can hopefully get a few eyes on/clarify some things to make progress.

  • Dave

Fantastic, thank you everyone!

-Chris