[RFC] Resolving issues related to extension versioning in RISC-V

Following discussion on this topic a few weeks ago at the bi-weekly sync up call, a number of us had a focused discussion on this last Tuesday. We came to a couple of tentative decisions. The plan is to take these back to the sync up call on Thursday, and then if no one has objection move forward with changes to documentation to ratify these.

Decision - We will track only the most recent specification with pragmatic variances. We are actively deciding not to support multiple specification versions at this time. We acknowledge I likely future need, but actively defer the decisions around handling this until we have a concrete example of real hardware having shipped and an incompatible change to the specification made afterwards.

There was a general acknowledgement that the specifications have had many incompatible changes made, and that we see significant cost and little definite user value in trying to support many versions. We do intend to allow pragmatic divergence from the specification. As a specific example, we plan to continue allowing CSRs to be named without gating on specific instructions. These will be evaluated individually, and documented in our user docs.

We anticipate the future need for specification versioning. Specifically, we anticipate incompatible specification changes being made after hardware ships. We don’t have a specific example of this today, and are deferring most decision making about how to handle it until we do.

There was a general feeling - not deeply explored - that a vendor implementation of a non-ratified extension might get different treatment than an implementation of a ratified extension. We note that both vendor extensions, and non-conforming extensions are existing concepts in the ecosystem. We hope to avoid full specification versioning for as long as possible.

Decision - We will not error on extension incompatibilities. We may warn, and we explicitly endorse the notion of having a flag to promote warnings to fatal errors.

The case we spend a bunch of time discussing was a single library mixing F and Fintx. This is a great example because the encodings for these two overlap. As a result, running Fintx code on hardware which implements F (or vice versa) is likely to lead to surprising and non-obvious results. (e.g. corrupting registers instead of simply crashing with a sigill). As a result, there’s relatively high value in reporting such a case of user error. The problem is that there exist valid use cases where this mixture is not user error. As an example, a soft float library may legitimately provide softfp, F, and Fintx in the same library. Critically, the dispatch mechanism does not have to be IFUNC. It can be something completely outside the toolchains understanding (e.g. checking an environmental variable).

We are explicitly deferring decisions on review standards for compatibility checks otherwise. We ran out of time for this discussion.