The CHERI fork of LLVM has been developed out of tree for about 10 years now, occasionally upstreaming bits that are generally useful. CHERI provides a capability model on top of virtual memory such that every memory operation (load, store, instruction fetch) must be authorised by a capability. This is either explicit (as an operand of the instruction) or implicit (for legacy operations, there is a default capability in a special register).
The CHERI MIPS and RISC-V prototypes are research artefacts and, as such, their instruction sets evolve very rapidly. This has meant that thereās been limited value in upstreaming LLVM changes because any given LLVM release would be unlikely to be able to target this weekās version of the ISA. A couple of weeks ago, Arm shipped the first set of Morello boards (our, somewhat longer, blog on the subject, a modified Neoverse N1 with CHERI extensions. Morello is a āsuperset architectureā, a limited-run prototype intended to explore which subset of the possible CHERI features make sense for a real CHERI extension to AArch64 (and, more broadly, if CHERI is a useful feature to add to an ISA).
Arm explicitly describes Morello as a one-off with no backwards compatibility guarantees and does not commit to adding CHERI support to AArch64 in the future. Nevertheless, the fact that Morello exists in silicon means that the Morello ISA is now a stable target that can act as an example in-tree CHERI back end until either production CHERI silicon arrives or the CHERI experiment is deemed to have failed, at which point it can be either superseded or removed. Emulators for Morello are available, including Armās fixed virtual platform and qemu and in the next few months we expect there to be enough Morello machines available for a self-hosting buildbot.
CHERI requires that the entire compiler pipeline, from the front end to code generation, maintain the distinction between pointers and integers. We lower Cās [u]intptr_t
to an LLVM pointer type and use explicit intrinsics to extract the address, which provides a simpler model for tracking pointer provenance than the existing model, and so I believe that a lot of the target-independent code will probably end up being more generally useful (particularly for anyone wanting to target GCād environments, where maintaining this distinction is equally critical). CHERI provides byte-granularity memory safety and so requires optimisers to refrain from doing āsafeā out-of-bounds reads.
Having an in-tree target and tests would make it harder to accidentally break these guarantees. It probably makes sense to wait until after flipping the switch for opaque pointers before merging all of the CHERI diffs because a lot of them are ensuring that the address space is preserved across pointer bitcasts (especially in the simplify libcallls infrastructure), which would all go away with opaque pointers. It would be great to get feedback on both whether folks agree that this is the right time to start the upstreaming effort and, if so, what the right process should be. The diffs are not huge but they are quite invasive, making small changes across the codebase.