Authors: Owen Anderson, Jessica Clarke, Alex Richardson, David Chisnall
This RFC is a proposal to gain consensus on upstreaming target support for the CHERI-enabled architectures to the LLVM project. This is an “entire project” RFC, as CHERI support touches many parts of the toolchain: primarily LLVM, Clang, and LLD, with other components such as runtime libraries or LLDB potentially being touched as well.
Upstreaming many of these sub-components of CHERI support will likely merit their own area-specific RFCs. The purpose of this RFC is to seek consensus on the directional goal of upstreaming CHERI support, not to get into the details of the individual parts.
One item of note is that we do propose upstreaming support for the RISC-V CHERI platforms as a component of this work, as we believe it is important to have at least one end-to-end functional toolchain for available hardware [3] in upstream LLVM for testing purposes.
Please see [14] for a previous early-stage discussion of this effort from 2022.
Background
CHERI [2] is capability architecture developed over more than 10 years as a research project out of the University of Cambridge and SRI, which enables both hardware-enforced memory safety as well as significant security and isolation improvements. It has been embodied in several host architectures, including MIPS, RISC-V, and Armv8-A.
In terms of hardware and emulator availability:
- 
CHERIoT is a CHERI-enabled 32-bit RISC-V embedded platform [1]. It was initially developed by Microsoft [9] but is now developed by an open-source community [10] [12]. CHERIoT development boards [7] developed by lowRISC are available today, and SCI Semi will ship commercial CHERIoT hardware in 2026 [3]. 
- 
Codasip’s X730 implements CHERI on a 64-bit RISC-V core [11]. 
- 
Arm has implemented a CHERI prototype for Armv8-A as the Morello project, for which development boards are available [8] as well as support in their FVP emulators. 
- 
CHERI-QEMU supports Morello and CHERI-RISC-V. 
- 
Various open-source research cores. 
Why upstream?
The CHERI toolchain was originally developed out-of-tree because, as an active research project, the details of all components of the platform were rapidly changing, and it would have been burdensome on upstream developers to deal with a shifting and inaccessible target.
With the stabilization of CHERI platforms and the improved access to hardware, end-users of CHERI platforms now want to be able to develop for these platforms using the latest and greatest toolchains. That means using the latest Clang/LLVM, up to and including HEAD. We are also seeing interest in, and intend to support, efforts to build a CHERI-enabled Rust toolchain, which would be greatly facilitated by enabling CHERI in upstream LLVM.
Additionally, many of the core components of CHERI are shared by the various hardware embodiments of the architecture. By moving these common components upstream, compiler developers for all the platforms that share a CHERI heritage will be more easily able to collaborate on improvements to them. As a specific example, work on a standardized form of CHERI for RISC-V is being carried out as the Y extension [6], and many parties are interested in collaborating on support for that extension.
Finally, CHERI exercises LLVM in interesting ways by virtue of having non-integral pointers, non-default address spaces, and various other features. While these are nominally supported by upstream today, regressions and missing support are common due to lack of an upstream target that tests these areas. Upstreaming this support will help prevent the accidental introduction of bugs with respect to IR semantics in this area.
What has changed since the previous discussion? [14]
The primary changes since the 2022 discussion have been an increase in the number of interested parties collaborating on CHERI support in open source and the increased availability of hardware CHERI implementations. The former has increased the need for upstream integration of CHERI support to facilitate collaboration and avoid a proliferation of downstream forks of LLVM. The latter has eased the burden associated with validating CHERI correctness upstream.
What does upstreaming entail?
The core of this proposal is to upstream the required elements from CTSRD-CHERI/llvm-project and CHERIoT-Platform/llvm-project into llvm/llvm-project to make targeting RISC-V CHERI platforms with an upstream build of Clang/LLVM possible. By upstreaming the backend components of at least one CHERI-enabled architecture along with general CHERI support, we will be able to ensure that the upstream codebase is tested end-to-end.
Based on the current state of the downstream CHERI and CHERIoT forks of LLVM, we currently estimate that this will result in approximately 40KLOC of code being added to upstream LLVM. This is lower than discussed in [14] partially because untyped pointers defined away some downstream changes.
In keeping with standard LLVM development practices, we propose to upstream these changes in self-contained, incremental pieces, and welcome feedback on everything from code quality to overall architecture.
The major components to be upstreamed include:
- 
Support for CHERI annotations and intrinsics from the C/C++ source level through the IR level 
- 
RISC-V backend support for CHERI / CHERIoT types and instructions 
- 
Optimizer changes required to safely optimize CHERI types, annotations, and intrinsics, including bug fixes related to use of non-integral pointers, address spaces, etc. 
- 
Linker support for CHERI and CHERIoT RTOS ABIs 
- 
Testsuites throughout 
Many of these areas are large enough that we expect to present more detail-oriented RFCs in those areas down the road before proceeding. At this point we are only looking for directional consensus on the upstreaming overall.
Note that this does not include upstreaming support for Morello. Whilst it is currently the most capable CHERI system, it is a prototype, not a commercial product, and only a limited number of systems have been produced. We may propose upstream changes that facilitate Morello, but we will maintain Morello support downstream until it is no longer useful for us and the wider CHERI community. Similarly, research on CHERI will continue for the foreseeable future, and so any research extensions we are working on, even for CHERI-RISC-V, will be maintained downstream, until such time as they go through the standardisation process.
This RFC also does not concern CHERI as a host architecture (outside of runtimes), which is orthogonal to target support.
Ongoing Support
We are committed to ongoing support and development of CHERI and CHERIoT support in LLVM. While CHERI is not strictly a backend in LLVM terms, we are open to feedback on what kinds of CI support we can provide to support upstream.
We are also committed to continuing to track the evolution of CHERI in the RISC-V ecosystem, including future development of an official RISC-V CHERI extension [6] [13] as appropriate.
FAQ About CHERI
Where can I learn about CHERI?
Here are a few resources:
- 
“An Introduction to CHERI” [16] provides a general introduction to CHERI. 
- 
The CHERI ISA v9 [17] specifies the architecture for current CHERI implementations. 
- 
The CHERIoT Programmers’ Guide [18] presents a software-oriented introduction to the implementation of CHERI in CHERIoT. 
- 
The CHERI C/C++ Programming Guide [19] provides a programmer-oriented guide for pure-capability CHERI C/C++. 
- 
“Formal Mechanised Semantics of CHERI C: Capabilities, Provenance, and Undefined Behaviour” [20] is a more in-depth exploration and discussion of CHERI C/C++’s semantics. 
- 
The draft RISC-V CHERI specification is available [21] 
What are some of the challenges of supporting CHERI in LLVM?
- 
CHERI replaces pointers with capabilities, which are non-integral and non-forgeable. This violates historical assumptions throughout Clang and LLVM that pointers could be losslessly reinterpreted as integers, and vice-versa. This also extends to type-punning through memory. This has been improved in recent releases due work on non-integral pointers, as well as in-progress work such as [22]. 
- 
CHERI targets in “pure-capability” (i.e. all pointers are capabilities) ABIs do not use address space zero as the default address space. This is to allow implementation reuse between hybrid (where both pointers and capabilities exist, the former in address space zero) and pure-capability ABIs. 
What are areas of active development and known issues in CHERI support for LLVM?
- 
RISC-V CHERI is in the process of standardization as the Y extension, but existing implementations use the older RISC-V XCheri (and XCheriot) vendor extensions. 
- 
LLVM’s DataLayout does not have a concept of “default address space corresponding to void *”, only address spaces for specific purposes, instead assuming that void * is address space 0, but pure-capability CHERI code uses a non-zero address space for void *. 
- 
atomicrmw does not support operating on pointers (except for xchg); instead they are lowered using a pointer-sized iN. Downstream this is supported, but by stuffing the integer to add/or/etc into a pointer and using that, with the later generated code turning the pointer operand back into an integer. Ideally the value would be the index type in this case, and a separate type added (like load has) to use as the in-memory type, that would be permitted to be a pointer. This is also useful outside of CHERI [15]. 
- 
Relocations used to initialise capabilities need to support SymA + (SymB - SymA) + Const in order to express a capability with bounds of SymA offset to point at SymB + Const (e.g. a landing pad SymB within a function SymA). This would be folded to SymB + Const for an integer address, but for a capability the assembler has to disable this fold to preserve provenance, just as is the case for IR. MCExpr can represent this, but MCValue cannot (it can only represent SymA - SymB + Const) if linker relaxation is in use (if it is not, the entire offset can be constant-folded to give SymA + FoldedCons). As a result, linker relaxation is not supported for any CHERI implementations, and the relocations that would be required to encode such constructs are not specified. 
External Links
[1] https://cheriot.org/cheriot-sail/cheriot-architecture.pdf
[2] https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201406-isca2014-cheri.pdf
[3] https://www.scisemi.com/news-1/press-release-iceni-family/
[4] https://github.com/CTSRD-CHERI/llvm-project
[5] https://github.com/CHERIoT-Platform/llvm-project
[6] https://riscv.github.io/riscv-cheri/
[7] https://www.sunburst-project.org/
[8] https://www.morello-project.org/
[10] https://cheriot.org
[11] https://codasip.com/solutions/riscv-processor-safety-security/cheri/
[12] https://cheriot.org/rtos/sail/2024/07/31/moving-to-the-cheriot-org.html
[13] https://cheriot.org/isa/roadmap/2024/10/31/isa-roadmap.html
[14] https://discourse.llvm.org/t/is-it-time-to-start-upstreaming-the-cheri-support-to-llvm/60032
[15] https://github.com/llvm/llvm-project/issues/120837
[16] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-941.pdf
[17] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf
[18] https://cheriot.org/book/
[19] https://ctsrd-cheri.github.io/cheri-c-programming/
[20] https://www.cl.cam.ac.uk/~pes20/asplos24spring-paper110.pdf
 resistor:
 resistor: