[RFC] Polly Status and Integration

Michael,
[Sorry that I don't have you in To:. I don't have your addr that I can use since I'm replying through digest.]

I'm also sorry that I'm not commenting on the main part of your RFC in this reply. I just want to focus on
one thing here.

                      Proposed Loop Optimization Framework

Thank you for bringing in results from previous discussions. The only
resource I was aware of was the EuroLLVM talk where I got the
impression this was done for performance reasons. Do you recall the
arguments why it was considered a bad idea?

I understand that modifying the IR speculatively might not the best
thing to do. However, I also think that another meta-layer
representing instructions increases complexity and duplicates
algorithms when we could just reuse the data structures and algorithms
that already exist and have matured.

Michael

Do you recall the arguments why it was considered a bad idea?

Felt like long time ago, but it turned out that was actually just a bit over a year ago.
Here's the thread.
  http://lists.llvm.org/pipermail/llvm-dev/2016-August/104079.html
Only a few explicitly responded, but I took that as silent majority was satisfied with the responses.
Prior to that thread, I also pinged HPC oriented LLVM developers, and "don't modify IR until deciding
to vectorize" resonated well there, too.

However, I also think that another meta-layer representing instructions increases complexity and duplicates algorithms when we could just reuse the data structures and algorithms that already exist and have matured.

I agree. So, I dream about a lighter weight version of Value class hierarchy where many of the standard Value class hierarchy algorithm can run, without making lighter weight stuff hooked into the actual IR state.
We intentionally tried to make VPValue/VPUser/VPInstruction interfaces "subset of" Value/User/Instruction interfaces such that the code sharing (not copying/pasting) can be possible. This is practical enough
so far, but I'm still wondering, in a long run, whether we can do something better. Thus, constructive ideas are welcome.

Thanks,
Hideki

FWIW: We hit a subset of this issue with MemorySSA (which subclasses value for the MemoryAccess’s, etc), and it was discussed as well during PredicateInfo.

NewGVN has a variant the same issue as well, where it actually creates unattached (IE not in a basic block) new instructions just so it can analyze them.

IMHO, some nice way to make virtual forms over the instructions that didn’t require reimplementing the tons of existing functionality that works with Value would be much appreciated.

But maybe the right answer at some point is to just sit down and build out such an infrastructure. It certainly sounds like there are enough clients.

2017-10-14 1:29 GMT+02:00 Saito, Hideki via llvm-dev <
llvm-dev@lists.llvm.org>:
> I'm also sorry that I'm not commenting on the main part of your RFC in
this reply. I just want to focus on
> one thing here.
>
> Proposed Loop Optimization Framework
> ------------------------------------
> Only the architecture is outlined here. The
reasons for them can be
> found in the "rationales" section below.
>
> A) Preprocessing
> Make a copy of the entire function to allow
transformations only for
> the sake of analysis.
>
> Before we started our journey into VPlan-based vectorization approach,
we explicitly asked about modifying the IR for
> the sake of vectorization analysis ----- general consensus in LLVM-DEV
was "that's a bad idea". We thought so, too.
> That's the reason VPlan has its own local data structure that it can
play with.
>
> Making a copy of the entire function is far worse than copying a loop
(nest). Unless the community mindset has dramatically
> changed since ~2yrs ago, it's unlikely that you'll get overwhelming
support on "copy the entire function" before the decision
> to transform is taken. So, if this part is optional, you may want to
state that.
>
> Having said that, in https://reviews.llvm.org/D38676, we are
introducing concepts like VPValue, VPUser, and VPInstruction,
> in order to manipulate and interact with things that didn't come from
the IR of the original loop (nest). As such, I can see
> why you think "a playground copy of IR" is beneficial. Even before
reading your RFC, I was actually thinking that someone
> else may also benefit from VPValue/VPUser/VPInstruction-like concept
for the analysis part of some heavy-weight transformations.
> Ideally, what we wanted to do was to use LLVM's Value class hierarchy if
we could generate/manipulate "abstract Instructions"
> w/o making them parts of the IR state, instead of creating our own
light-weight VPValue/VPUser/VPInstruction classes.
> If anyone has better ideas in this area, we'd like to listen to. Please
reply either to this thread or through the code review
> mentioned above.

Thank you for bringing in results from previous discussions. The only
resource I was aware of was the EuroLLVM talk where I got the
impression this was done for performance reasons. Do you recall the
arguments why it was considered a bad idea?

I understand that modifying the IR speculatively might not the best
thing to do. However, I also think that another meta-layer
representing instructions increases complexity and duplicates
algorithms when we could just reuse the data structures and algorithms
that already exist and have matured.

I have a feeling that the LLVM LoopInfo/Loop datastructure, which only
focus on the CFG, is not sufficient.
And we build extra layer upon the LLVM LoopInfo/Loop, e.g. Scop in Polly
and the Hierarchical CFG in VPlan (right?).

Maybe we can have layers between the existing LoopInfo/Loop and VPlan/SCoP:

1. LoopInfo/Loop
2. LoopInfo/Loop + Hierarchical CFG
3. LoopInfo/Loop + Hierarchical CFG + Memory Accesses description
4. VPlan/Scop

I think layer 2 and 3 will be very useful even outside of Polly and VPlan

Hongbin

What would be different in such an infrastructure? IMHO the
llvm::Value hierarchy is already relatively thin, e.g.
removing/inserting an instruction just requires updating a linked
list.

Michael

The idea is also that the new hierarchy can be modified iteratively
and IR only generated at the end when all transform decisions have
been made, and analysis/layers be shared between transformations, eg..
a polyhedral dependency analysis, when available, can be used by the
vectorizer without additional programming because the dependency data
structure is the same. PolyhedralInfo currently is very different from
LoopAccessAnalysis.

Michael

The idea is also that the new hierarchy can be modified iteratively
and IR only generated at the end when all transform decisions have
been made, and analysis/layers be shared between transformations, eg..
a polyhedral dependency analysis, when available, can be used by the
vectorizer without additional programming because the dependency data
structure is the same. PolyhedralInfo currently is very different from
LoopAccessAnalysis.

Michael

I think most people would disagree with "Relatively thin".
Given there have also been multiple threads in the past year with most
people not wanting non-actual-IR to derive from Value, i think you may be
in the minority here.

I think the way that we generally look at this is: Our IR data types (Value, User, Instruction, Constant, and so on) are the most efficient representation we have of the IR. There are a lot of capabilities that come along with that hierarchy (use lists, value handles, metadata, names). Value has a non-inline destructor to deal with all of these capabilities, as does Instruction. Each individual instance of these types allocate memory. The question is then: For any particular use case, do you need those capabilities/properties? If not, is there a more succinct representation that can be used efficiently? -Hal

There's also the case where we want *more* information than what the
IR constructs can provide us (like temporary state for the
vectoriser), in which case we can easily wrap the IR values in a small
class with a pointer to the Instruction/Value and the extra bits.

However, the current proposal for the VPlan goes well beyond that:

https://reviews.llvm.org/D38676

and makes VPValue and VPInstruction full-fledged implementations that
have some parts in common of what we already have (like Users).

I think we need to get that story right for both cases up front.

cheers,
--renato

> I have a feeling that the LLVM LoopInfo/Loop datastructure, which only
focus
> on the CFG, is not sufficient.
> And we build extra layer upon the LLVM LoopInfo/Loop, e.g. Scop in Polly
and
> the Hierarchical CFG in VPlan (right?).
>
> Maybe we can have layers between the existing LoopInfo/Loop and
VPlan/SCoP:
>
> 1. LoopInfo/Loop
> 2. LoopInfo/Loop + Hierarchical CFG
> 3. LoopInfo/Loop + Hierarchical CFG + Memory Accesses description
> 4. VPlan/Scop
>
> I think layer 2 and 3 will be very useful even outside of Polly and VPlan

The idea is also that the new hierarchy can be modified iteratively
and IR only generated at the end

In general 1-4 are different representations to the same program, or a set
of equivalent programs (at different levels of abstraction), and eventually
we need to properly connect different representations together such that we
can materialize the changes from one representation to another.
Maybe we could have a common infrastructure builtin to the Value class
hierarchy to allow us to easily connect LLVM IR to different
representations (e.g. SCoP or VPlan) and propagate the changes back?

when all transform decisions have
been made, and analysis/layers be shared between transformations, eg..
a polyhedral dependency analysis, when available, can be used by the
vectorizer without additional programming because the dependency data
structure is the same

One way to achieve this is to design an interface like Alias Analsyis, to
hide different implementations behind a "common" interface.

I think we need to get that story right for both cases up front.

Renato, I kicked off this secondary discussion, borrowing the opportunity from Michael's RFC,
but to the point of reviewing https://reviews.llvm.org/D38676, I'd like the review to proceed
separately from this bigger (and most likely longer) discussion. We intentionally made the interfaces
similar such that whatever the outcome of this discussion would be, the changes we have to make later,
if any, is small and mechanical. We just need to agree that
  VPValue/VPUser/VPInstruction
is not a precedence, i.e., still subject to ongoing discussion and is expected to abide by the eventual
outcome of this discussion.

To the best of my understanding, if we do not want to modify the IR (i.e., CFG, Instructions, and Uses/Defs hooked up
in the Function) before we decide to let vectorizer transform (i.e., cost modeling phase of LV), we really don't have anything
that we can use today in the LLVM infrastructure. If the collective wisdom concludes investment into that, we are more than
happy to contribute our share of effort, but that longer term work (one year is probably too optimistic) should not block shorter
term development of vectorizer.

More inputs from other optimizers would greatly help build up the context of this discussion. Please speak up
if you felt similarly to us in the past. I'll be at the LLVM Conference if anyone is interested in in-person discussions.

Thanks,
Hideki

(I have no beef with using Value, FWIW)
I believe one of the objections is, for example, lib/IR should not have to depend on, say, lib/Analysis, which it does if you extend value :wink:
I’m more surfacing the objections i’ve heard from others, than ones i have myself.
I’ll forward this thread to them and let them speak for themselves if they want

Disclaimer: I haven’t been following this discussion closely, nor do I know what is going on in the modern AA/MemorySSA/GVN infra, but:

I’d prefer to avoid abusing the Value* class hierarchy if reasonable. Adding new subclasses that only occur in special cases makes it more difficult to reason about what can occur and when. A historical example that always grated against my sensibilities was CodeGen/PseudoSourceValue, which originally inherited from Value in order to make it “fit better” into the AA infra. Thankfully Nick Lewycky broke that tie back in a patch on Tue Apr 15 07:22:52 2014, and the codebase has been better off for that change.

All this is to say that I’d prefer to avoid tying it into Value just because that is an expedient way to get something done. If tying into Value is the “right” thing to do for some reason then that is a completely different story.

-Chris

The idea I was proposing is to move instructions to different places
because it helps some kinds of analyses, but could be more expensive
to executes there. This could also be undone again by passes such as
GVN and LICM. Would this also be an abuse of the IR? It does not
involve extending the llvm::Value hierarchy.

This is different for VPlan. VPRecipe has its own class hierarchy
(VPRecipeTy), but I am not sure whethere one could construct a 1:1
relationship between them and vectorized IR instructions.

Michael

one could construct a 1:1 relationship between them and vectorized IR instructions.

Doesn't have to be 1:1 to the output widened IR instruction sequences as long as cost modeling (inside vectorizer and also hopefully downstream
optimizers) can be done properly.

Earlier study in "Vector Idioms" study group (loosely formed as a result of 2016 Dev Con Hackers Lab, had active e-mail discussions in Q1/Q2) found
that LLVM already handles multiple forms of some of the common idiomatic forms (e.g., ABS). As such, we semi-concluded that having a utility to
capture all the different sequences of the same idiom is more important than trying to come up with a single "canonical form" and try to preserve it
---- as long as the complexity of the utility is reasonable. Such a utility should also help in this context (i.e., inside vectorizer and downstream optimizers).

If tying into Value is the “right” thing to do for some reason

In some sense, we are trying the opposite from the historical example ------ start from untied version and see whether the rest of the community thinks
tying is the "right" thing to do, at any moment of the evolution of the untied version. For that, we will try to keep the interface as identical as possible
such that the cost of tying back, if such time ever comes, is minimal ---- and also template based code sharing idea can be kept open that way if we aren't
completely tying.

Thanks,
Hideki

Renato, I kicked off this secondary discussion, borrowing the opportunity from Michael's RFC,
but to the point of reviewing https://reviews.llvm.org/D38676, I'd like the review to proceed
separately from this bigger (and most likely longer) discussion. We intentionally made the interfaces
similar such that whatever the outcome of this discussion would be, the changes we have to make later,
if any, is small and mechanical. We just need to agree that
        VPValue/VPUser/VPInstruction
is not a precedence, i.e., still subject to ongoing discussion and is expected to abide by the eventual
outcome of this discussion.

Agreed.

To the best of my understanding, if we do not want to modify the IR (i.e., CFG, Instructions, and Uses/Defs hooked up
in the Function) before we decide to let vectorizer transform (i.e., cost modeling phase of LV), we really don't have anything
that we can use today in the LLVM infrastructure. If the collective wisdom concludes investment into that, we are more than
happy to contribute our share of effort, but that longer term work (one year is probably too optimistic) should not block shorter
term development of vectorizer.

We don't want to change the IR, no.

My point was to take whatever already exists from Value and do the
rest in VPValue (same for Inst), but I don't know if that's desirable,
I was just raising the issue.

cheers,
--renato

My point was to take whatever already exists from Value and do the rest in VPValue (same for Inst), but I don't know if that's desirable, I was just raising the issue.

Please continue to monitor our future patches as we expand VPInstruction usage (we'll be sure to include you) and provide valuable feedback.

FWIW, for example, at this point, we don't think it's feasible to derive VPInstruction from llvm::User, e.g., since we have a need to create VPUser class
instances, but as we evolve VPInstruction usage in future patches, we'll look into such a possibility and come back to report our findings.

Thanks,
Hideki

Please continue to monitor our future patches as we expand VPInstruction usage (we'll be sure to include you) and provide valuable feedback.

Will do, thanks!

FWIW, for example, at this point, we don't think it's feasible to derive VPInstruction from llvm::User, e.g., since we have a need to create VPUser class
instances, but as we evolve VPInstruction usage in future patches, we'll look into such a possibility and come back to report our findings.

Indeed, and the VP* classes are, so far, simple and unique enough that
we can cope with it. It will be much easier to experiment and grow
around the current set of classes than try to map all possible routes
now and get a very stiff representation that not only we can't do
anything we want but involves cloning large portions of memory for no
benefit.

cheers,
--renato