[RFC] Resolving issues related to extension versioning in RISC-V

Overview

We’ve recently discussed a number of issues related to extension versioning in RISC-V. This short piece aims to discuss what I see as the highest priority ones, and does so as a whole rather than piecemeal (as the fixes tend to be inter-related and may have unexpected interactions).

What’s wrong right now? Main top-level issues:

  • Error checking in tools like lld and objdump is too strict (arguably for the .attribute arch directive and -march command line option)
    • This is particularly acute in lld, as it currently makes it difficult to link .o from a recent GCC with .o from a recent Clang.
  • Version attributes for some of the standard extensions doesn’t accurately reflect the version of the spec we’re conformant too (at least A, F, D, with I being a special case due to Zicsr and Zifencei).

The purpose of this document is two fold:

  • Provide a central point of reference for the short series of patches addressing these issues. The fixes should be incremental, but I think we reviewers need the overall picture.
  • Get feedback and ensure we’re not missing anything in terms of unexpected interactions etc.

I’m discussing the topics from the perspective of the external user interface, rather than the details of the current implementation in RISCVISAInfo.

LLD RISC-V attribute checking - highest priority problem

This is something that I think was unanticipated when D138550 was merged (the problem was raised in passing, but I don’t think it would have been left as a TODO if it had been appreciated how much this would regress behaviour for users).

The basic problem is that LLD will refuse to link a .o from current gcc that now by default has the rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0 attribute. If will also refuse to link a .o with a RISC-V extension name that is unrecognised. I think from a user perspective this is a regression, and if we’re able to I’d like to get this fixed in time for 16.0.0, but if it’s too late at this point then we should address it in 16.0.1.

Proposed new behaviour: The merging logic for RISC-V ELF attributes should be maintained, but LLD should not complain about version numbers or extensions it doesn’t recognise (I’d argue it’s not even worth a warning, but input welcome).

Other issues with (I think) a clear fix

  • objdump and readobj have a similar problem to LLD in that they will error out upon an unrecognised extension or extension version. I propose warning would be a better policy.
  • We don’t support zicsr and zifencei, which were moved out of I in 2.1. Philip’s patch D143953 resolves the immediate problem, with the question of whether we want to simultaneously support e.g. 2.0 and 2.1 remaining open.

Issues with a less clear answer

  • We currently report support for A 2.0, F 2.0, and D2.0. But the current versions (and the ones gcc targets by default) are A2.1, F2.2, and D2.2. I believe it would be correct for us to simply change the versions we claim to target and we’re wrong to claim we’re targeting the 2.0 versions of these specs. I propose simply changing the version number.
    • The more difficult bit us what to do if encountering older version numbers in march or in .attribute. I propose that accepting them, just as they were before would be at least as correct as what we’re doing today.
  • Should the checking for .attribute in assembly be relaxed? e.g. we might want to enable someone to set accurate metadata when they’re using .insn to target instructions from a custom extension, or when they know their asm is compatible with the semantics of the listed extension version.
  • If the answer to the above is yes, should the same be true of -march?
    • I observe that GCC/gas accepts higher version numbers for known extensions in both -march and .attribute. It’s inconsistent about accepting unrecognised extension names though - gcc 12.2.0 seems to accept an unrecognised extension name and emit assembly with it in .attribute arch, but gas will then reject it.
    • And would you expect __riscv_$ext defines to be set for these unrcognised extensions and versions?
  • I observe that gcc will “correct” -march=rv64i2p0_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0 to just rv64i2p0_m2p0_a2p1_f2p2_d2p2_c2p0 in the assembly output. Is there any reason we’d want to adopt similar behaviour?
  • EDIT: I’d previously noted here that gcc with -march=rv64imafdc adds in zicsr, but that is of course totally expected as F requires zicsr.

Following up on the discussion, which was during the last RISCV sync-up call regarding extension versioning, I was thinking about linker flags that were proposed.

I think there should be at least three (1., 2. and/or 3, 4.) of them:

  1. extensions-pedantic-mode: riscv.attributes have to be exactly the same in each linked object file

  2. extensions-permissive-verbose-mode:

  • extensions that are not recognized (not supported) by a used version of LLVM toolchain or are different between object files are printed as warnings
  • extensions which are known as conflicting ones between object files (for example rv64i2p0 & rv32i2p0) lead to errors
  1. extensions-permissive-quiet-mode (probably default one):
  • only extensions which are known as conflicting ones between object files (for example rv64i2p0 & rv32i2p0) lead to errors
  1. extensions-full-permissive-mode: riscv.attributes are not compared between object files at all

There might be a structure introduced in which conflicting extensions could be maintained. It should be easily expandable because I suppose it is nearly not possible to determine all such conflicts right away.

I’m working with RISCV for a few months so I don’t have broad knowledge regarding the whole spectrum of extensions but I wanted to share my thoughts, which I hope might be taken into consideration during your sync-up calls dedicated just to this topic.

PrzemekO

There was full agreement at the RISC-V LLVM sync-up call last week that the change in behaviour described represents a regression, and is undesirable. I’ve posted ⚙ D144353 [lld][RISCV] Avoid error when encountering unrecognised ISA extensions/versions in RISC-V attributes to address this.

Thanks for addressing the regression! This is very helpfull for my case.
I was also trying to describe what additonal options could be potentially available in the future. More pedantic option would be very convenient in my case when I would like to check if someone built application using “strange” objects files (for example compiled using different extension enabled).

Sorry for not responding directly - I like your framework for looking at the options here, I was just quickly dropping in a separate update. By way of further update, the patch I linked was now merged and I’ve made a backport request for 16.0.0.

I see your commit was merged to branch release/16.x: [lld][RISCV] Avoid error when encountering unrecognised ISA extension… · llvm/llvm-project@d42c232 · GitHub and there is no tag llvmorg-16.0.0 yet so it should be in release 16.0.0 I assume?
BTW,
how did you made a backport request for 16.0.0 - is there a procedure for that described anywhere?

Yes, the fix should be in 16.0.0 final.

See here for docs on requesting a backport.

Following discussion on this topic a few weeks ago at the bi-weekly sync up call, a number of us had a focused discussion on this last Tuesday. We came to a couple of tentative decisions. The plan is to take these back to the sync up call on Thursday, and then if no one has objection move forward with changes to documentation to ratify these.

Decision - We will track only the most recent specification with pragmatic variances. We are actively deciding not to support multiple specification versions at this time. We acknowledge I likely future need, but actively defer the decisions around handling this until we have a concrete example of real hardware having shipped and an incompatible change to the specification made afterwards.

There was a general acknowledgement that the specifications have had many incompatible changes made, and that we see significant cost and little definite user value in trying to support many versions. We do intend to allow pragmatic divergence from the specification. As a specific example, we plan to continue allowing CSRs to be named without gating on specific instructions. These will be evaluated individually, and documented in our user docs.

We anticipate the future need for specification versioning. Specifically, we anticipate incompatible specification changes being made after hardware ships. We don’t have a specific example of this today, and are deferring most decision making about how to handle it until we do.

There was a general feeling - not deeply explored - that a vendor implementation of a non-ratified extension might get different treatment than an implementation of a ratified extension. We note that both vendor extensions, and non-conforming extensions are existing concepts in the ecosystem. We hope to avoid full specification versioning for as long as possible.

Decision - We will not error on extension incompatibilities. We may warn, and we explicitly endorse the notion of having a flag to promote warnings to fatal errors.

The case we spend a bunch of time discussing was a single library mixing F and Fintx. This is a great example because the encodings for these two overlap. As a result, running Fintx code on hardware which implements F (or vice versa) is likely to lead to surprising and non-obvious results. (e.g. corrupting registers instead of simply crashing with a sigill). As a result, there’s relatively high value in reporting such a case of user error. The problem is that there exist valid use cases where this mixture is not user error. As an example, a soft float library may legitimately provide softfp, F, and Fintx in the same library. Critically, the dispatch mechanism does not have to be IFUNC. It can be something completely outside the toolchains understanding (e.g. checking an environmental variable).

We are explicitly deferring decisions on review standards for compatibility checks otherwise. We ran out of time for this discussion.

One point that occurred to me after the meeting is that LTO is a bit more messy. One could conceivably have an F module and an Zfinx module, representing the “I want to do feature-based dynamic dispatch without IFUNCs/function attributes” case, and need to link the two LLVM modules together. I think with our LTO model (which I don’t have much experience with) that would mean creating a TargetMachine with both F and Zfinx? We’d certainly want that in the output attributes to match the non-LTO equivalent, but those are currently derived from the TargetMachine (or something else with target features, I’ve lost track of quite where we are with all that).