[RFC] Upstream development of support for yet-to-be-ratified RISC-V extensions

# Overview and background

RISC-V is a free and open instruction set architecture. It is a modular
specification, with a range of standard extensions (e.g. floating point,
atomics, etc). New standard extensions are developed through RISC-V
Foundation working groups. The specifications for such extensions (e.g. vector
and bit manipulation) are publicly available, but are still in flux and won't
be finalised until "ratified" by the Foundation. This RFC considers how we
should handle review and merging of support for in-development standard
extensions, with a view to maximising collaboration and reduce duplicate effort.

The summarised proposal is that patches for in-development standard
extensions should be encouraged, but support should be gated by a sufficiently
unappetising flag and it should be made clear that it may change substantially
between LLVM releases, up until the standard is ratified.

# Discussion

RISC-V LLVM contributors have started posting patches to Phabricator that
start to implement both the proposed bit manipulation and vector extensions.
Leaving these patches uncommitted until the relevant extension is ratified
matches GCC's current approach, but significantly hampers the ability for
contributors to collaborate on these efforts using normal LLVM development
processes and infrastructure. I believe one motivation for the current GCC
policy is to avoid promoting fragmentation within the RISC-V ecosystem. I'm
hugely concerned that two RISC-V vendors have announced IP offerings
supporting the not yet finalised vector extension, yet don't feel it's the
LLVM community's role to try to police this.

I believe code should be committed to LLVM when it is of sufficient
quality, when it can be shown to benefit the LLVM user or developer
communities, and when there is someone willing to support it. All of these can
be true even for unratified standard RISC-V extensions, with a caveat on
"support". It may not be appropriate to try to police RISC-V fragmentation,
but we also shouldn't needlessly add to it. Shipping compiler support may
further encourage hardware with unfinalised specifications, and so we should
guard it behind suitably discouraging flags and be clear that the
implementation will change to match the specification up until it's finalised.
Naturally we want as many of the benefits of supporting these extensions
upstream with as few downsides as possible. Limiting development to downstream
branches can be a barrier to collaboration, and pushes people away from the
upstream LLVM community.

I believe this situation is somewhat new to LLVM, as typically work on new ISA
extensions/revisions is performed behind closed doors until the final version
is announced.

There are multiple options for inclusion of an unratified standard extension
upstream:
1) Feature is in-tree, but only compiled with an explicit flag. Guarding by
  ifdefs is likely to get ugly, and this also requires extra build bots
  etc.
2) Feature always compiled, but the code path to enable it is disabled
  without a particular build flag. Also needs extra build bots.
3) Feature is always compiled and can be enabled regardless of LLVM
  build flags used.

Option 2) has some precedent in the form of flags like
`-fexperimental-new-pass-manager`. Option 3) doesn't have precedent in LLVM,
but Robin Kruppe pointed out it has similarities to Rust, where experimental
features can only be enabled in nightly builds.

# Proposal

Although we want to discourage downstream reliance on unratified extensions,
there doesn't seem to be a strong motivation for requiring a custom LLVM build
to force this. However, such unratified extensions shouldn't be accessible via
normal RISC-V ISA naming strings (e.g. "rv32imafdc"), and instead flags of the
form `-riscv-experimental-vector-ext` in LLVM and `-mexperimental-vector-ext`
in Clang should be used (i.e. option 3)). We discussed this in our weekly call
however, and there were voices advocating either option 2 or 3. Input welcome.

If going down the option 3 route, the flags could be made more discouraging
through requiring an additional `-riscv-enable-experimental-extension` flag or
making the flag itself longer and scarier. Thoughts?

If there is agreement that this is a sensible approach, we'd work to review
and refine the proposed patches starting bit manipulation and vector extension
support with a goal of getting them committed once of a sufficient quality.

# Credits
Thanks for input from Sam Elliott and Luís Marques on the lowRISC
toolchain team, and for participants of the RISC-V sync-up call for
initial feedback (especially Robin Kruppe and Evandro Menezes).

Best,

Alex Bradbury,
CTO and Co-founder, lowRISC CIC

# Overview and background

RISC-V is a free and open instruction set architecture. It is a modular
specification, with a range of standard extensions (e.g. floating point,
atomics, etc). New standard extensions are developed through RISC-V
Foundation working groups. The specifications for such extensions (e.g. vector
and bit manipulation) are publicly available, but are still in flux and won't
be finalised until "ratified" by the Foundation. This RFC considers how we
should handle review and merging of support for in-development standard
extensions, with a view to maximising collaboration and reduce duplicate effort.

The summarised proposal is that patches for in-development standard
extensions should be encouraged, but support should be gated by a sufficiently
unappetising flag and it should be made clear that it may change substantially
between LLVM releases, up until the standard is ratified.

# Discussion

RISC-V LLVM contributors have started posting patches to Phabricator that
start to implement both the proposed bit manipulation and vector extensions.
Leaving these patches uncommitted until the relevant extension is ratified
matches GCC's current approach, but significantly hampers the ability for
contributors to collaborate on these efforts using normal LLVM development
processes and infrastructure. I believe one motivation for the current GCC
policy is to avoid promoting fragmentation within the RISC-V ecosystem. I'm
hugely concerned that two RISC-V vendors have announced IP offerings
supporting the not yet finalised vector extension, yet don't feel it's the
LLVM community's role to try to police this.

I believe code should be committed to LLVM when it is of sufficient
quality, when it can be shown to benefit the LLVM user or developer
communities, and when there is someone willing to support it. All of these can
be true even for unratified standard RISC-V extensions, with a caveat on
"support". It may not be appropriate to try to police RISC-V fragmentation,
but we also shouldn't needlessly add to it. Shipping compiler support may
further encourage hardware with unfinalised specifications, and so we should
guard it behind suitably discouraging flags and be clear that the
implementation will change to match the specification up until it's finalised.
Naturally we want as many of the benefits of supporting these extensions
upstream with as few downsides as possible. Limiting development to downstream
branches can be a barrier to collaboration, and pushes people away from the
upstream LLVM community.

I believe this situation is somewhat new to LLVM, as typically work on new ISA
extensions/revisions is performed behind closed doors until the final version
is announced.

There are multiple options for inclusion of an unratified standard extension
upstream:
1) Feature is in-tree, but only compiled with an explicit flag. Guarding by
  ifdefs is likely to get ugly, and this also requires extra build bots
  etc.
2) Feature always compiled, but the code path to enable it is disabled
  without a particular build flag. Also needs extra build bots.
3) Feature is always compiled and can be enabled regardless of LLVM
  build flags used.

Option 2) has some precedent in the form of flags like
`-fexperimental-new-pass-manager`. Option 3) doesn't have precedent in LLVM,
but Robin Kruppe pointed out it has similarities to Rust, where experimental
features can only be enabled in nightly builds.

This should read:
Option 3) has some precedent in the form of flags like
`-fexperimental-new-pass-manager`. Option 2) doesn't have precedent in LLVM,
but Robin Kruppe pointed out it has similarities to Rust, where experimental
features can only be enabled in nightly builds.

# Overview and background

RISC-V is a free and open instruction set architecture. It is a modular
specification, with a range of standard extensions (e.g. floating point,
atomics, etc). New standard extensions are developed through RISC-V
Foundation working groups. The specifications for such extensions (e.g. vector
and bit manipulation) are publicly available, but are still in flux and won't
be finalised until "ratified" by the Foundation. This RFC considers how we
should handle review and merging of support for in-development standard
extensions, with a view to maximising collaboration and reduce duplicate effort.

The summarised proposal is that patches for in-development standard
extensions should be encouraged, but support should be gated by a sufficiently
unappetising flag and it should be made clear that it may change substantially
between LLVM releases, up until the standard is ratified.

# Discussion

RISC-V LLVM contributors have started posting patches to Phabricator that
start to implement both the proposed bit manipulation and vector extensions.
Leaving these patches uncommitted until the relevant extension is ratified
matches GCC's current approach, but significantly hampers the ability for
contributors to collaborate on these efforts using normal LLVM development
processes and infrastructure. I believe one motivation for the current GCC
policy is to avoid promoting fragmentation within the RISC-V ecosystem. I'm
hugely concerned that two RISC-V vendors have announced IP offerings
supporting the not yet finalised vector extension, yet don't feel it's the
LLVM community's role to try to police this.

I believe code should be committed to LLVM when it is of sufficient
quality, when it can be shown to benefit the LLVM user or developer
communities, and when there is someone willing to support it. All of these can
be true even for unratified standard RISC-V extensions, with a caveat on
"support". It may not be appropriate to try to police RISC-V fragmentation,
but we also shouldn't needlessly add to it. Shipping compiler support may
further encourage hardware with unfinalised specifications, and so we should
guard it behind suitably discouraging flags and be clear that the
implementation will change to match the specification up until it's finalised.
Naturally we want as many of the benefits of supporting these extensions
upstream with as few downsides as possible. Limiting development to downstream
branches can be a barrier to collaboration, and pushes people away from the
upstream LLVM community.

I believe this situation is somewhat new to LLVM, as typically work on new ISA
extensions/revisions is performed behind closed doors until the final version
is announced.

There are multiple options for inclusion of an unratified standard extension
upstream:
1) Feature is in-tree, but only compiled with an explicit flag. Guarding by
  ifdefs is likely to get ugly, and this also requires extra build bots
  etc.
2) Feature always compiled, but the code path to enable it is disabled
  without a particular build flag. Also needs extra build bots.
3) Feature is always compiled and can be enabled regardless of LLVM
  build flags used.

I personally prefer option 3. Additional build flags are a burden on the
rest of the community and confusing for users in my opinion.

-Tom

I believe this situation is somewhat new to LLVM, as typically work on new ISA
extensions/revisions is performed behind closed doors until the final version
is announced.

There are multiple options for inclusion of an unratified standard extension
upstream:
1) Feature is in-tree, but only compiled with an explicit flag. Guarding by
  ifdefs is likely to get ugly, and this also requires extra build bots
  etc.
2) Feature always compiled, but the code path to enable it is disabled
  without a particular build flag. Also needs extra build bots.
3) Feature is always compiled and can be enabled regardless of LLVM
  build flags used.

I would steer away from needing extra build time flags to enable these
extensions. If most people don't use these flags then there's a risk of
these bitrotting or getting broken by other patches in the backend or
elsewhere and may add to the support burg

Option 2) has some precedent in the form of flags like
`-fexperimental-new-pass-manager`. Option 3) doesn't have precedent in LLVM,
but Robin Kruppe pointed out it has similarities to Rust, where experimental
features can only be enabled in nightly builds.

# Proposal

Although we want to discourage downstream reliance on unratified extensions,
there doesn't seem to be a strong motivation for requiring a custom LLVM build
to force this. However, such unratified extensions shouldn't be accessible via
normal RISC-V ISA naming strings (e.g. "rv32imafdc"), and instead flags of the
form `-riscv-experimental-vector-ext` in LLVM and `-mexperimental-vector-ext`
in Clang should be used (i.e. option 3)). We discussed this in our weekly call
however, and there were voices advocating either option 2 or 3. Input welcome.

If going down the option 3 route, the flags could be made more discouraging
through requiring an additional `-riscv-enable-experimental-extension` flag or
making the flag itself longer and scarier. Thoughts?

If there is agreement that this is a sensible approach, we'd work to review
and refine the proposed patches starting bit manipulation and vector extension
support with a goal of getting them committed once of a sufficient quality.

I would propose a different way of exposing this to the user, exposing
these extensions through the ISA string, but require that an explicit
version number is provided when requesting the extension using the
standard syntax. For example, bitmanip currently sits at version 0.92,
so I would have clang accept for example -march=rv32ib0p92, but error on
-march=rv32ib since it is not yet ratified.

This would avoid the ambiguity of the "which experimental b extension
does the compiler support" problem all the way until an extension is
ratified by making it explicit everywhere. My concern with a blanket
"enable the experimental support for X" approach is if users depend on
them (because they will, regardless of the name of the flag), then when
the supported version by the compiler changes, we should stop a user
from generating code that for their use case would then be invalid.

Alongside this I would add relevant error messages noting the current
version supported e.g. "this clang only supports b extension version
0.91". I appreciate we don't want to support every unratified version of
an extension, but I think maintaining a check that the user wants "the
current supported version" is probably a good balance.

Thanks,
Simon

Thanks for the feedback Simon. I'd considered sticking to just the ISA
string, but my concern was that (and I'll admit this is just a gut
feeling) that it might imply a level of support / stability that
simply isn't there. In particular, people might get the impression
that b0p92 will continue to be supported, when we'd want to avoid
supporting old unratified spec versions (or even reasoning about
whether they remain supported or not due to later spec changes). But I
accept that the separate flag means that we break the property that
you can describe the targeted architecture concisely with a single
string.

Best,

Alex

I believe this situation is somewhat new to LLVM, as typically work on new ISA
extensions/revisions is performed behind closed doors until the final version
is announced.

There are multiple options for inclusion of an unratified standard extension
upstream:

  1. Feature is in-tree, but only compiled with an explicit flag. Guarding by
    ifdefs is likely to get ugly, and this also requires extra build bots
    etc.
  2. Feature always compiled, but the code path to enable it is disabled
    without a particular build flag. Also needs extra build bots.
  3. Feature is always compiled and can be enabled regardless of LLVM
    build flags used.

I would steer away from needing extra build time flags to enable these
extensions. If most people don’t use these flags then there’s a risk of
these bitrotting or getting broken by other patches in the backend or
elsewhere and may add to the support burg

I’d agree that compile-time optionality is best avoided when possible.

I would propose a different way of exposing this to the user, exposing
these extensions through the ISA string, but require that an explicit
version number is provided when requesting the extension using the
standard syntax. For example, bitmanip currently sits at version 0.92,
so I would have clang accept for example -march=rv32ib0p92, but error on
-march=rv32ib since it is not yet ratified

That march flag doesn’t scream out to me “Warning! You’re using an experimental extension which will probably be changed incompatibly in the next release!” like an explicit command-line flag could do.

One way to resolve this would be to require both – you must specify your ISA options with explicit versions in the ISA string, but, doing so would print an error without a separate flag also being enabled, e.g. “error: ISA extension b0p92 is an experimental extension, and support may be removed in future LLVM releases. Pass -mallow-experimental-riscv-isa-extensions to enable.”

One way to resolve this would be to require both – you must specify your ISA options with explicit versions in the ISA string, but, doing so would print an error without a separate flag also being enabled, e.g. “error: ISA extension b0p92 is an experimental extension, and support may be removed in future LLVM releases. Pass -mallow-experimental-riscv-isa-extensions to enable.”

I think this makes a lot of sense, it’s also similar to how experimental features are enabled on cargo (rust’s package manager/build system) where you have to
use “-Z unstable-features” as well as the flag for whatever feature you wanted to use.

https://doc.rust-lang.org/cargo/reference/unstable.html

Example for using the unstable --out-dir option:

cargo +nightly build --out-dir=out -Z unstable-options

I’m probably going to shamelessly steal your idea when we (libre-riscv) add our GPU extensions to PowerISA and/or RISC-V.

Jacob

# Overview and background

RISC-V LLVM contributors have started posting patches to Phabricator that
start to implement both the proposed bit manipulation and vector extensions.
Leaving these patches uncommitted until the relevant extension is ratified
matches GCC's current approach, but significantly hampers the ability for
contributors to collaborate on these efforts using normal LLVM development
processes and infrastructure. I believe one motivation for the current GCC
policy is to avoid promoting fragmentation within the RISC-V ecosystem. I'm
hugely concerned that two RISC-V vendors have announced IP offerings
supporting the not yet finalised vector extension, yet don't feel it's the
LLVM community's role to try to police this.

Thank you for raising this concern and getting ahead of it. I agree with you completely that we should aspire to have people work together, and duplicating effort with similar but different patches doesn’t help things.

I believe code should be committed to LLVM when it is of sufficient
quality, when it can be shown to benefit the LLVM user or developer
communities, and when there is someone willing to support it. All of these can
be true even for unratified standard RISC-V extensions, with a caveat on
"support”.

+1

# Proposal

Although we want to discourage downstream reliance on unratified extensions,
there doesn't seem to be a strong motivation for requiring a custom LLVM build
to force this. However, such unratified extensions shouldn't be accessible via
normal RISC-V ISA naming strings (e.g. "rv32imafdc"), and instead flags of the
form `-riscv-experimental-vector-ext` in LLVM and `-mexperimental-vector-ext`
in Clang should be used (i.e. option 3)). We discussed this in our weekly call
however, and there were voices advocating either option 2 or 3. Input welcome.

If going down the option 3 route, the flags could be made more discouraging
through requiring an additional `-riscv-enable-experimental-extension` flag or
making the flag itself longer and scarier. Thoughts?

I think you’ve got the right tradeoff here. If I’m understanding your plan, it is:

- don’t add extensions to the isa naming string until they are finalized.

- in-progress extensions can land in mainline LLVM, and are protected under a -experimental-foo style of flag. There is no support implied
or compatibility in the future for such flags, they can be changed or removed at any time, they are just for staging.

- it doesn’t seem like there is any need to #ifdef out these flags. A scary enough flag name should be fine. The goal isn’t to prevent malice, it is to prevent accidental reliance.

This all makes sense to me.

If there is agreement that this is a sensible approach, we'd work to review
and refine the proposed patches starting bit manipulation and vector extension
support with a goal of getting them committed once of a sufficient quality.

Fantastic, thank you for the very nice summary and for driving this discussion Alex!

-Chris

That's correct, thanks for the feedback.

I do like the idea from James of having the compiler always spit out a
note when enabling the experimental extension, warning of its
experimental nature. If we had such a warning and additionally
required a `-riscv-enable-experimental-extensions` or similar, then I
think there could be merit in including in the ISA string as Simon
suggests, especially as we're likely to start putting that string in
ELF output etc.

So based on feedback so far, I'd like to narrow the field to:

Option 1)
* Unratified extensions are not available through the ISA naming
string and are enabled via `-riscv-experimental-foo` or similar
* A warning is always emitted when enabling experimental extensions
* Support is always compiled in, and the flags aren't ifdeffed out

Option 2)
* Unratified extensions can be enabled by passing a flag like
-riscv-enable-experimental-extensions and additionally putting the
extension name and version number in the ISA naming string (version
number is always required, and we will only accept the 'current'
version number).
* A warning is always emitted when enabling experimental extensions
* Support is always compiled in, and the flags aren't ifdeffed out

Best,

Alex

Option 2 for me!

Jacob

Option 2 seems more intentional to me, so it gets my vote.

Thank you, Alex.

So based on feedback so far, I'd like to narrow the field to:

Option 1)
* Unratified extensions are not available through the ISA naming
string and are enabled via `-riscv-experimental-foo` or similar
* A warning is always emitted when enabling experimental extensions
* Support is always compiled in, and the flags aren't ifdeffed out

Option 2)
* Unratified extensions can be enabled by passing a flag like
-riscv-enable-experimental-extensions and additionally putting the
extension name and version number in the ISA naming string (version
number is always required, and we will only accept the 'current'
version number).
* A warning is always emitted when enabling experimental extensions
* Support is always compiled in, and the flags aren't ifdeffed out

Option 2 is what I would opt for.

- Simon

Are you suggesting this behavior from Clang or from LLVM? I think it would be a bad thing for LLVM to produce this warning: there isn’t a precedent for this, and it breaks the library-based design goals. Having clang produce a warning could be done, but it would be very noisy (one warning for every .c file in a build) and I’m not sure how much value it provides.

-Chris

Yeah. I didn’t mean to actually suggest that the compiler emit a warning diagnostic when you pass the flag. I only meant that an appropriately named flag would be, itself, a warning sign against inappropriate use, simply by its name, and the user having to explicitly pass it.

That's a good point. It felt like there may be an opportunity to
educate users that they're enabling a feature that might mutate from
release to release, but hopefully the "experimental" string in the
flag name indicates that, and as you say there's not much precedent
for such noisy warnings. After all, you can have a really bad time by
setting -mstack-alignment and not understanding the consequences.

So I'm in favour of dropping the noisy warning idea.

Thanks again for the input, and thanks James for your clarification.

Best,

Alex

+1, I think the “experimental” in the name of the flag is sufficiently scary. Thanks!

-Chris

Something no one has mentioned is the way the RISC-V ratification process works.

Alex touched on this when he mentioned that in other ISAs development
work is done behind closed doors, and only announced when finished.
But it's more than simply this.

In the RISC-V world, ISA extensions will *not* be ratified until
experience is gained with them on real hardware, in real world use.
This necessitates having toolchain support *before* ratification,
otherwise the experience necessary to gain ratification will never
happen.

This point also need to be made somehow to the gcc community.

I agree with the concept of the code always being compiled to avoid
bitrot (as much as possible), and simply guarded with a scary
"experimental" flag.

p.s. it's really great to see Chris take a greater interest in RISC-V
and I look forward to seeing him around the SiFive offices!

Hi all,

Following this discussion (and last week's RISC-V call) I've implemented
a first version of this in https://reviews.llvm.org/D73891. We've
rebased the other bitmanip patches (D65649, D67348 and D71553) to match,
and with these you should be able to use the 0.92 version of the B
extension using this method. Any feedback on the approach taken welcome.

Thanks,
Simon