Scalable Vector Types in IR - Next Steps?

Hi folks,

We seem to be converging on how the representation of scalable vectors
will be implemented in IR, and we also have support for such vectors
in the AArch64 back-end. We're also fresh out of the release process
and have a good number of months to hash out potential problems until
next release. What are the next steps to get this merged into trunk?

Given this is a major change to IR, we need more core developers
reviews before approving. The current quasi-consensus means now it's
the time for you to look closer. :slight_smile:

This change per se shouldn't change how any of the passes or lowering
behave, but it will introduce the ability to break things in the
future. Unlike the "new pass manager", we can't create a "new IR", so
it needs to be a change that everyone is conscious and willing to take
on the project to stabilize it until the next release.

Here are some of the reviews on the matter, mostly agreed upon by the
current reviewers:
https://reviews.llvm.org/D32530
https://reviews.llvm.org/D53137
https://reviews.llvm.org/D47770

And the corresponding RFC threads:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106819.html
http://lists.llvm.org/pipermail/llvm-dev/2017-March/110772.html
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113587.html
http://lists.llvm.org/pipermail/llvm-dev/2018-April/122517.html
http://lists.llvm.org/pipermail/llvm-dev/2018-June/123780.html

There is also an ongoing discussion about vector predication, which is
related but not depending on the scalable representation in IR:
https://reviews.llvm.org/D57504

And the corresponding RFC thread:
http://lists.llvm.org/pipermail/llvm-dev/2019-January/129791.html

cheers,
--renato

Hi all,

Thanks Renato for the prod.

We (Arm) have had more off-line discussions with some members of the
community and they have expressed some reservations on adding scalable
vectors as a first class type. They have proposed an alternative to enable
support for C-level intrinsics and autovectorization for SVE.

While Arm's preference is still to support VLA autovec in LLVM (and not just
for SVE; we'll continue the discussion around the RFC), we are evaluating the
details of this alternative -- SVE-capable hardware will begin shipping within
the next couple of years, so we would like to support at least some
autovectorization as well as the C intrinsics by the time that happens.

This alternative proposal has two parts:

  * For the SVE ACLE (C-language extension intrinsics), use an opaque type
    (similar to x86_mmx, but unsized) and just pass intrinsics straight
    through to the backend. This would give users the ability to write
    vector length agnostic (VLA) code for SVE without resorting to assembly.
  * For SVE autovectorization, use fixed length autovec -- either for a
    user-specified length, or multiversioned to different fixed lengths.

I've spent some time over the last month prototyping an opaque type SVE C
intrinsic implementation to see what it would look like; here's my notes so far:

  * I initially tried to use a single unsized opaque type.

  * I ran into problems with using just a single type, since predicates use
      different registers and I couldn't find a nice way of reassigning all
      the register classes cleanly.
    - I added a second opaque type to represent predicates as a result
    - This could be avoided if we added subtype info to the opaque type
      (basically minimum element count and elt type); this would mean that
      we would either need to represent the count and element type in
      a serialized IR form, or that the IR reader would need to be able
      to reconstruct the types by reading the types from the intrinsic name

  * I ran into a problem with the opaque types being unsized -- the C
    intrinsic variables are declared as locals and clang plants
    alloca/load/store IR instructions for them
    - Could special case alloca/load/store for these types, but that's very
      intrusive and liable to break in future
    - Could introduce a special 'alloca intrinsic', but would require quite
      a bit of code in clang to diverge as well as a custom mem2reg-like
      pass just for these types
    - I ended up making them sized, but with a size of 0. I don't know if
      there's a problem I'll run into later on by doing this.
    - While 'load' and 'store' IR instructions are fine for spill/fill memory
      operations on the stack, we need to use intrinsics for everything else
      since we need to know the size of individual elements -- while there
      might not be many big-endian systems in operation, we still need to
      support that.

  * I reused the same (clang-level) builtin type mechanism that OpenCL does
    for the SVE C-level types, and just codegen to the two LLVM types

I now have a minimal end-to-end implementation for a small set of SVE C
intrinsics. I have some additional observations based on our downstream
intrinsic implementation:

  * Our initial downstream implementation attempted to do everything in
    intrinsics, so would be similar to the opaque type version. However,
    we found that we missed several optimizations in the process. Part of
    this is due to the intrinsics being higher-level than the instructions
    -- things like addressing modes are not represented in the intrinsics,
    and with a pure intrinsic approach we miss things like LSR
    optimizations.

  * We also thought that the need for custom extensions for optimizations
    like instcombine on SVE intrinsics would be reduced since someone using
    the intrinsics is already going to the trouble of hand-optimizing their
    code, but we hadn't appreciated that using C++ templates with constant
    parameters and other methods of code generation would be common. As a
    result, we now have user requests that operations like 'svmul(X, 1.0)'
    be recognized and folded away, and are trying to find better
    representations, including lowering to normal IR operations in some cases.

  * Some operations can't be represented cleanly in current IR, but should
    work well with Simon Moll's vector predication proposal.

Any feedback? I've posted my (very rough) initial work to phabricator:

clang: https://reviews.llvm.org/D59245
llvm: https://reviews.llvm.org/D59246

-Graham

Graham Hunter <Graham.Hunter@arm.com> writes:

We (Arm) have had more off-line discussions with some members of the
community and they have expressed some reservations on adding scalable
vectors as a first class type. They have proposed an alternative to
enable support for C-level intrinsics and autovectorization for SVE.

Can we get a summary of those discussions? What are the concerns?

                          -David

Hi David,

I did ask them to post their arguments on the list, but I guess they've been busy for the last month (or forgot about it).

Who is "them" and who will write up a proposal / RFC on the use of
intrinsics for both lowering and vectorisation?

It goes without saying that those discussions should have been had in
the mailing list, not behind closed doors. Agreeing to implementations
in private is asking to get bad reviews in public, as the SVE process
has shown *over and over again*.

I don't understand why, after so many problems for so many years, this
is still the modus operandi...

The basic argument was that they didn't believe the value gained from enabling VLA autovectorization was worth the added complexity in maintaining the codebase. They were open to changing their minds if we could demonstrate sufficient demand for the feature.

In that case, the current patches to change the IR should be
abandoned, as well as reverting the previous change to the types, so
that we don't carry any unnecessary code forward.

The review you sent seems to be a mechanical change to include the
intrinsics, but the target lowering change seems to be too small to
actually be able to lower anything.

Without context, it's hard to know what's going on.

cheers,
--renato

I did ask them to post their arguments on the list, but I guess they've been busy for the last month (or forgot about it).

Who is "them" and who will write up a proposal / RFC on the use of
intrinsics for both lowering and vectorisation?

It goes without saying that those discussions should have been had in
the mailing list, not behind closed doors.

Renato, I understand your frustration, but I don't want an unproductive
conclusion to be drawn. We should encourage our community members to
talk to each other both on the mailing list and off the mailing list. We
have in-person discussions at the developers' meetings, and my
experience is that sitting in a room with someone, or sometimes talking
with someone over the phone, can really help reach a mutual
understanding more effectively than mailing-list communication. However,
the critical step is that the outcome of that conversation should be
summarized, in a timely manner, for the mailing list (or put in the
relevant code review, bug report, etc.) so that the rest of us can
provide input.

Agreeing to implementations
in private is asking to get bad reviews in public

+1

-Hal

That was my point. Discussing personally on the dev meetings is
essential, but without follow through, the people on the meeting will
be following a different path than the rest of the community.

Right now, whatever conclusions were drawn at that meeting are still
"behind closed doors", which pushed me and David to continue reviewing
patches that were essentially dead and have parallel discussions that
were not only moot, but counter-productive.

--renato

Hi Renato,

It goes without saying that those discussions should have been had in
the mailing list, not behind closed doors.

I have encouraged people to respond on the list or the RFC many times,
but I've not had much luck in getting people to post even if they
approve of the idea.

Agreeing to implementations
in private is asking to get bad reviews in public, as the SVE process
has shown *over and over again*.

There isn't an agreement on the implementation yet; I have posted two
possibilities and am trying to get consensus on an approach from the
community.

The basic argument was that they didn't believe the value gained from enabling VLA autovectorization was worth the added complexity in maintaining the codebase. They were open to changing their minds if we could demonstrate sufficient demand for the feature.

In that case, the current patches to change the IR should be
abandoned, as well as reverting the previous change to the types, so
that we don't carry any unnecessary code forward.

There's no consensus on supporting the opaque types either yet. Even
if we do end up going down that route, it could be modified -- as I
mentioned in my notes, I could introduce a single toplevel type to
the IR if I stored additional data in it (making it effectively the
same as the current VectorType, just opaque to existing optimization
passes), and then would be able to lower directly to the existing
scalable MVTs we have.

The review you sent seems to be a mechanical change to include the
intrinsics, but the target lowering change seems to be too small to
actually be able to lower anything.

The new patches are just meant to demonstrate the basics of the opaque
type to see if there's greater consensus in exploring this approach
instead of the VLA approach.

Without context, it's hard to know what's going on.

The current state is just what you stated in your initial email in this
chain; we have a solution that seems to work (in principal) for SVE, RVV,
and SX-Aurora, but not enough people that care about VLA vectorization
beyond those groups.

Given the time constraints, Arm is being pushed to consider a plan B to
get something working in time for early 2020.

-Graham

Disclaimer: I’m only speaking for myself, not Apple.

This is really disappointing. Resorting to multi-versioned fixed length vectorization isn’t a solution that’s competitive with the native VLA support, so it doesn’t look like a credible alternative suggestion (at least not without elaborating it on the mailing list). Without a practical alternative, it’s essentially saying “no” to a whole class of vector architectures of which SVE is only one.

Amara

Disclaimer: I’m only speaking for myself, not Apple.

This is really disappointing. Resorting to multi-versioned fixed length vectorization isn’t a solution that’s competitive with the native VLA support, so it doesn’t look like a credible alternative suggestion (at least not without elaborating it on the mailing list). Without a practical alternative, it’s essentially saying “no” to a whole class of vector architectures of which SVE is only one.

To the extent that this alternative direction represents an exploration
so that we can all evaluate in a more-informed manner, I think that is
valuable. However, let me agree with Amara, I prefer the original
approach. Among many other advantages, users will expect the compiler to
perform arithmetic optimizations on VLA operations (e.g., InstCombines),
and if we can't reuse the existing logic for this purpose, we'll end up
with an inferior result.

Thanks again,

Hal

Agreed with both!

Furthermore, any temporary solution will have to be very similar to what we expect to see natively, or the transition to native may never happen.

Thanks for the support.

To clarify, Arm would very much prefer to proceed with the full scalable
IR type proposal, but we're facing time pressure now.

We would like to be able to reach consensus on an approach around the end
of EuroLLVM this year so that we can begin a full implementation.

The opaque type patches were only intended to show how the third party proposal
might look; I agree it should be closer to the scalable IR proposal. The
two main points that (imo) would make it easier to switch later are:
  - Embedding the element type and minimum length, which copies the basic
    semantics of VectorType
  - Serializing in the same way we do in the scalable IR proposal
    (to a '<scalable n x ty>'). We should then just be able to switch
    the Types used by the IR reader and writer.

-Graham

EuroLLVM will be a very good occasion to hear all the opinions and reach consensus on the approach.

@Graham, maybe you should set up a round table open to anyone, and make sure that key people (third parties asking for a different approach, people involved in this discussion) will receive the invite?

Francesco

Francesco Petrogalli via llvm-dev <llvm-dev@lists.llvm.org> writes:

We would like to be able to reach consensus on an approach around the end
of EuroLLVM this year so that we can begin a full implementation.

EuroLLVM will be a very good occasion to hear all the opinions and reach consensus on the approach.

@Graham, maybe you should set up a round table open to anyone, and
make sure that key people (third parties asking for a different
approach, people involved in this discussion) will receive the invite?

We tried two round tables at the Nov. LLVMDev and no serious objections
were raised, but we knew we didn't have all the right people there. I
am somewhat skeptical another roundtable without commitment to attend
from all able parties ahead of time will accomplish much.

Speaking for myself (and not Cray), it is frustrating to have had a
bunch of discussion on the mailing list and in reviews where concerns
were raised and to see a lot of radio silence to responses to those
concerns, only to see a message about a potential change in direction
driven by off-list discussions where concerns and responses to concerns
are unknown and therefore not addressable.

I completely understand that ARM needs to make progress and I very much
want to see that progress. I just don't want to see a Plan B leading to
a situation where VLA support doesn't ever make it into LLVM. It is
somewhat embarrassing that gcc already has a release with VLA support
for SVE and LLVM is stuck in the starting blocks.

                            -David

Hi David,

We tried two round tables at the Nov. LLVMDev and no serious objections
were raised, but we knew we didn't have all the right people there. I
am somewhat skeptical another roundtable without commitment to attend
from all able parties ahead of time will accomplish much.

Agreed, but I'll try scheduling one anyway.

Speaking for myself (and not Cray), it is frustrating to have had a
bunch of discussion on the mailing list and in reviews where concerns
were raised and to see a lot of radio silence to responses to those
concerns, only to see a message about a potential change in direction
driven by off-list discussions where concerns and responses to concerns
are unknown and therefore not addressable.

I didn't want private meetings either, but repeatedly requesting public
feedback for the RFC or patches hadn't provided reasoning behind any
concerns that people had.

The agreement reached at the meeting was for the objectors to post their
reasons for objecting and counter-proposal in public so discussion could
take place, and Arm would investigate the details of the counter-proposal.

Unfortunately, that post never happened, so I found myself a bit stuck and
had to post it for them -- not a situation I wanted.

I have always wanted the discussion to take place in public.

I completely understand that ARM needs to make progress and I very much
want to see that progress. I just don't want to see a Plan B leading to
a situation where VLA support doesn't ever make it into LLVM. It is
somewhat embarrassing that gcc already has a release with VLA support
for SVE and LLVM is stuck in the starting blocks.

Agreed.

-Graham

Hi David,

We tried two round tables at the Nov. LLVMDev and no serious objections
were raised, but we knew we didn't have all the right people there. I
am somewhat skeptical another roundtable without commitment to attend
from all able parties ahead of time will accomplish much.

Agreed, but I'll try scheduling one anyway.

Speaking for myself (and not Cray), it is frustrating to have had a
bunch of discussion on the mailing list and in reviews where concerns
were raised and to see a lot of radio silence to responses to those
concerns, only to see a message about a potential change in direction
driven by off-list discussions where concerns and responses to concerns
are unknown and therefore not addressable.

I didn't want private meetings either, but repeatedly requesting public
feedback for the RFC or patches hadn't provided reasoning behind any
concerns that people had.

The agreement reached at the meeting was for the objectors to post their
reasons for objecting and counter-proposal in public so discussion could
take place, and Arm would investigate the details of the counter-proposal.

I've talked with a number of people about this as well, and I think that
I understand the objections. I'm happy that ARM followed through with
the alternate set of patches. Regardless, however, unless those who had
wished to object still wish to object, and then actually do so, we now
clearly have a good collection of contributors actively desiring to do
code review, and we should move forward (i.e., start committing patches
once they're judged ready).

-Hal

Let's start by closing the three flying revisions, so that people that
weren't involved in the discussion don't waste time looking at them.

Graham, can you also add a comment to reflect the change in
implementation while closing?

Second, it would be very good if someone, anyone, would write an RFC
on the new proposal. Graham started it on this thread and on the two
reviews, but this thread is now dead. We should have a new one, with
subject "RFC etc" and the contents of the current understanding, so
that it can serve as context for the two reviews.

Thanks!
--renato

Graham Hunter <Graham.Hunter@arm.com> writes:

I have always wanted the discussion to take place in public.

I just want to make sure you know that we all know that. I was not in
any way disparaging the work you and ARM have done on this!

                    -David

"Finkel, Hal J." <hfinkel@anl.gov> writes:

The agreement reached at the meeting was for the objectors to post their
reasons for objecting and counter-proposal in public so discussion could
take place, and Arm would investigate the details of the counter-proposal.

I've talked with a number of people about this as well, and I think that
I understand the objections. I'm happy that ARM followed through with
the alternate set of patches. Regardless, however, unless those who had
wished to object still wish to object, and then actually do so, we now
clearly have a good collection of contributors actively desiring to do
code review, and we should move forward (i.e., start committing patches
once they're judged ready).

I am not sure this is your intended meaning, but if those objecting
don't come forward, I would like to see Graham's current patches
supporting scalable types go in. A number of people have now stated
that they are desirable, people have reviewed them, Graham has made
changes and we're ready to review them some more and iterate on them.

It would set a bad precedent to block patches based on some vague
objections that aren't being discussed publicly. We can debate the
actual patches, that of course needs to happen. But the RFC looks
reasonable to me and apparently others. Let's start moving forward.

                              -David

Renato Golin <rengolin@gmail.com> writes: