[RFC] Support cheriot target triple as subarchitecture of riscv32

Following onto the broader CHERI upstreaming RFC, I would like to open discussion on adding support for the CHERIoT in LLVM’s target triple. For context, CHERIoT is a full system architecture (both HW and SW) for a CHERI-based RISCV32E microcontroller.

While adding target triple enums would not normally merit a full RFC, CHERIoT requires more discussion because the proposal is to add it as a subarchitecture of riscv32, for which there are no extant examples.

What exactly is being proposed?

I am proposing adding support for riscv32cheriotv1-unknown-unknown and riscv32cheriotv1-unknown-cheriotrtos triples to LLVM, which the cheriotv1 component captured in Triplevia the SubArchfield. I also propose to parse riscv32cheriot-… as an alias for riscv32cheriotv1-…

What makes CHERIoT special in this regard?

Relative to a RISCV32E + CHERI baseline, CHERIoT extends (and restricts) the instruction set, changes calling conventions, changes the capability and permission formats, exposes additional source-level annotations, adds relocations, and defines a custom (non-Unix-like) linkage model. All of these divergences are documented in the CHERIoT Architecture Specification.

Critically to this discussion, these divergences are not supported independently of each other, such that it is not desirable to toggle them individually. Moreover, CHERIoT users want to be able to use clang to target CHERIoT without needing to pass a long series of individual feature flags, but rather a single flag identifying CHERIoT as the target.

Could you not just use the OS field of the triple?

While some elements are captured by the OS field (CHERIoT uses OS unknown for bare metal code, and cheriotrtos for code running inside the CHERIoT RTOS), many of the divergences apply across both, with the custom linkage model being the primary exception.

Should this not be a new Arch rather than SubArch?

While CHERIoT has many divergences compared to the baseline, it is still ultimately a derivative of RISCV32E + CHERI, and the number of checks for CHERIoT SubArch in CHERIoT clang is tiny compared to the number of checks for RISCV(32E) and CHERI. As such, splitting it in to its own Arch would create significantly more code churn and complexity.

Where can I see the code?

The SubArch as described is implemented in CHERIoT clang today, requiring fairly minimal code changes. The core parsing is here, with other minor pieces throughout Triple.cpp

1 Like

I don’t understand why all the things you talk about can’t be based on -march=rv32cheriot[1[p0]] (substituting for the real arch string) or -mcpu=some-cheriot-cpu. We pick a default ABI based on the arch string (and fall back to the triple). We pick a default arch string based on the CPU (or ABI) or triple. So I’m not really sure why you can’t have cheriotrtos in the triple default to a CHERIoT architecture and ABI, cheriot in the ABI or a CHERIoT CPU default to a CHERIoT architecture, and cheriot in the architecture default to a CHERIoT ABI; so long as at least one is provided, the others should all default to something sensible and CHERIoT-y.

-march is used to represent HW capabilities (hah) such as ISA extensions and profiles, not software configurations. The RISCV GCC man page documents this fairly clearly, and includes examples of -march not implying a particular -mabi.

While we probably could diverge from this behavior and use -march for this purpose, it seems like it would be a subversion of user expectations. It would also be at least a little bit of a practical problem, as Clang & LLVM are generally good at having the Triple available almost everywhere, but not so much the -march string.

The other alternative I think you are suggesting is that we could infer CHERIoT from -mcpu=[something cheriot] || Triple.getOS() == cheriotrtos? This is also not impossible, but does not seem advisable as it opens up footguns by allowing (even more) combinations of options that are representable but not supported or expected to work. I would also argue that, from the user’s perspective, the intent would be backwards. I think it’s much preferable to have the user expressing their intent to target CHERIoT explicit, rather than inferring it from adjacent signals.

At the end of the day, CHERIoT represents a very specific co-designed hardware & software environment. It supports only certain configurations of the toolchain, and is not binary compatible (and is only somewhat source-compatible) with any other RISC-V or even CHERI environment. While mixing and matching arbitrary internal options in unsupported ways is useful for us as compiler developers, when it comes to user-visible flags I view that as negative value.

I would additionally highlight a few use cases of the SubArchfield on other architectures that seem comparable to this:

  • AArch64SubArch_arm64e / AArch64SubArch_arm64ec - Represent specific ISA feature sets combined with the ABIs and other software environments that support their usage, on Darwin and Windows respectively.
  • PPCSubArch_spe - A binary-incompatible-but-still-PPC-based variant.
  • Various ARM sub-architectures - Some of them seem to be ISA revisions, while others are a combination of ISA revision and ABI / software environment.

Well, GCC is different, and I don’t think helps your case. GCC has no equivalent of --target, so --target=riscv32cheriotv1-cheriotrtos is not something you can pass to it. If the patches to it were written correctly, you could provide a set of -march/-mabi/-mcpu that overrides its configure-time defaults, but that would be your only option. However, at configure time, it does choose a default ABI based on --with-arch: gcc/gcc/config.gcc at eecff13cdcc625f6112b12c899f2d3a4c868273a · gcc-mirror/gcc · GitHub.

In Clang land, we don’t have --with-arch/–with-abi, and we have much more run time flexibility. So Clang’s logic is to mirror GCC’s --with-arch/–with-abi logic for -march/-mabi as the closest equivalent. See llvm-project/clang/lib/Driver/ToolChains/Arch/RISCV.cpp at 0045bfca9c51e3b61b1e1772adf3a2c391a039c2 · llvm/llvm-project · GitHub.

I don’t see how this wouldn’t be equivalent to, say, rv32e (or an rv32e CPU) defaulting to ilp32e and being incompatible with other ABIs. rv32xcheriot1p0 would default to il32pc64xcheriot or whatever you call it and be incompatible with anything else.

You just forbid those combinations, just as we forbid all manner of incompatible combinations today, be it due to RVE, XLEN or FLEN.

For some non-RISC-V architectures you likely would do it with SubArch. But the RISC-V way to do this is not that, as it stands. In RISC-V land it’s all about -march and -mabi.

Thanks for writing this up. I’ve put it on the agenda for today’s RISC-V LLVM sync mostly as a heads up - no problem if you’re not around. For topics of this type I find we might have a few questions etc in such a call, but async discussion via discourse threads / issues is probably the best path forwards.

Like @jrtc27 I probably would have defaulted to handling your use case via march + ABI. I don’t see a super strong reason against adding riscv32cheriotv1, but then I don’t see a strong argument for it (and it certainly differs from e.g. how we handle RISC-V profiles). I don’t think availability of these options through compilation is a big issue - we can easily enough query RISCVSubtarget and add a helper function if necessary.

I think this RFC does a great job of outlining the thought process behind this specific choice for adding a subarchitecture, but I wonder if it makes sense to zoom out a bit and focus more on the Clang-side command-line interface and changes? e.g. the source-level annotations you wish to expose seem like potentially a big piece of this.

My understanding of going the route you propose:

  • Users can specify –target=riscv32cheriotv1-unknown-cheriotrtos and it defaults to your desired ABI and architectural extension. Additional code will reject switching -mabi to something not supported, or adding architectural extensions that would clash with the cheriot ones.

But in @jrtc27’s alternative:

  • Users specify -target=riscv32-unknown-cheriotrtos. We could choose a different default march string and -mabi based on that. As before, additional code would be needed to reject rejecting unsupported -mabi or additing architectural extensions that don’t compose with the cheriot ones. For baremetal you’d need to specify the extension or an appropriate -mcpu, and we’d compute the default ABI for that.
  • We’ve loosened up the -march string parsing to accept profiles, though don’t have a mechanism for custom profiles. I suspect supporting something like -mcpu=generic-rv32cheriot or similar might be a reasonable path though if typing a -march string is too annoying and you can’t rely on inferring it from the OS target.

Thanks for writing this up. I’ve put it on the agenda for today’s RISC-V LLVM sync mostly as a heads up - no problem if you’re not around.

Thank you. I will try to join if possible, but the call is quite late for my current time zone.

Like @jrtc27 I probably would have defaulted to handling your use case via march + ABI.

We actually did this for quite a while, and it created practical problems in that people would forget or omit one flag or the other and then receive unexpected compilation results. I consider it a strong requirement for us to have a single “top-level” flag for CheriotRTOS users to pass to enable the correct compilation mode for the target.

e.g. the source-level annotations you wish to expose seem like potentially a big piece of this.

We have a number of calling conventions, type annotations, function and global attributes, and compiler builtins that only make sense for our platform. Writing them all up here would be a big undertaking, but for the sake of example: we have function attributes for toggling interrupt enabled/disabled state, global attributes for constructing capabilities to MMIO regions, global attributes for constructing capability sealing keys at compile-time, function attributes related to our custom linkage model, etc.

These only make sense on CHERIoT, and they make sense on CHERIoT independent of which implementation is being used (today, that would be Sonata or ICENI). Some of them only make sense when targeting CheriotRTOS, while others make sense in bare metal as well.

My understanding of going the route you propose:

  • Users can specify –target=riscv32cheriotv1-unknown-cheriotrtos and it defaults to your desired ABI and architectural extension. Additional code will reject switching -mabi to something not supported, or adding architectural extensions that would clash with the cheriot ones.

Correct, modulo that -target=riscv32cheriotv1-unknown-unknown is also used for bare metal, in keeping with the common naming pattern for bare metal triples. This is what is implemented in CHERIoT Clang and used by CheriotRTOS and down-stream user projects today.

  • Users specify -target=riscv32-unknown-cheriotrtos. We could choose a different default march string and -mabi based on that. As before, additional code would be needed to reject rejecting unsupported -mabi or additing architectural extensions that don’t compose with the cheriot ones. For baremetal you’d need to specify the extension or an appropriate -mcpu, and we’d compute the default ABI for that.

As noted above, I’m not a fan of inferring user intent. I suspect that in practice this will result in the compiler having an internal IsCheriot boolean, which the user can only indirectly manipulate via the OS field, or even less directly in bare metal mode. That kind of UX sounds like pain to me.

We’ve loosened up the -march string parsing to accept profiles, though don’t have a mechanism for custom profiles.

I’m not very familiar with the support for profiles. Would it make sense to think of “cheriotv1” as a profile?

My main line of thinking was whether your requirements here impact the discussion at all. e.g. is it reasonable that these different language modes etc are inferred based on the subarch triple, or might there be a need for a separate flag to enable them. Based on what you’re saying, it sounds like the kind of thing you’d typically enable to be present or not based on the target, rather than something you might view as a “language extension”.

It seems like a future Cheriot is something that could be a profile if the RVM (microcontroller) profiles are further elaborated. But you’re looking at specifying a bunch of software/ABI things while the profiles are focused on ISA features so it doesn’t really sidestep your concerns about “inferring user intent”.

I think you’ll find that, in my suggestion, there is no possibility of getting unexpected compilation results by forgetting flags. In my suggestion, whichever of the flags you specify, the others are automatically inferred, and if you specify multiple, conflicting flags then you get an error. So it’s a single “top-level” flag in the sense that you do not need to provide a whole set of flags all at once for it to work, but it’s not a single “top-level” flag in the sense that there are multiple different, equivalent flags you could provide.

I do not propose ever allowing a situation where you end up with CHERIoT code but not a CHERIoT ABI, or vice versa. I just do not believe you need to add a SubArch to achieve that, all that does is add to the set of things that differ between non-CHERIoT and CHERIoT.

I’m going to do some prototyping of this, and will report back on how it goes.

Thank you. I’m happy to discuss issues or questions that come up on Signal (rather than overusing this thread) if that’s helpful.

Reporting back on this thread, I was able to get the approach proposed by @jrtc27 working for Cheriot, so this RFC is no longer required.

2 Likes