It has been under development since 2010 and received funding for hardware extensions in 2019. Large portions of FreeBSD, KDE and others are already working on the hardware with the Cheri C/C++ extensions. Cheri also appears to have its own upstreaming effort:
Their long term goal is to incorporate the ISA extension into future versions of ARM and other ISAs. However, there should be room to implement a software version on top of the work already being done to leverage a hardware extension. This effort should be able to achieve that if it worked with them. That would spare open source projects from having to support multiple dialects of C that all attempt to the same thing.
If you mean ignored by compilers then this already works in the current implementation because attributes like __bidi_indexable are actually macros that expand to nothing for compilers that donât support the bounds safety extention.
If you mean ignored by developers, then the fact the attributes are not easily ignored is an intentional feature, not a design flaw.
Bounds attributes are part of the type system because they change the type of the pointer. With that in mind it is very natural to specify the bounds attributes with the type because it is part of the type.
Bounds attributes also serve as a form of documentation, telling the developer how many objects the pointer is supposed to point to in many cases. Again specifying that as part of the type is natural because it is (IMHO) a desirable feature for the developer to see that. Deliberately trying to hide the attributes from the developer by using trailing attributes hinders discoverability.
Specifying part of the pointer type elsewhere as a trailing attribute sounds unnatural to me because we would be splitting the type up (making the code harder to read and thus reason about) and also creating a potential foot-gun.
To illustrate what I mean by a potential foot-gun. Letâs imagine using your suggested syntax.
Now imagine a decision is made to change the API so that the pointer bounds are now [ptr, element_size * N) (i.e. the API now works in terms of bytes rather than element count) where element_size is a new parameter.
The function signature has been changed but the developer forgot to update the pointer bounds (easy to do because they arenât written as part of the type) so now the pointer bounds are wrong and thereâs nothing the compiler can do to warn about this mistake.
For the sake of completeness. Hereâs what a correctly annotated version of the above function would look like with attributes proposed by this RFC.
A name like pointer_bounds would be ambiguous because this RFC proposes multiple pointer annotations that all have âboundsâ (e.g. __counted_by , __sized_by , __bidi_indexable ). So pointer_bounds probably isnât a good choice.
-fbounds-safety is strictly more powerful than -fsanitize=bounds, but -fbounds-safety does not use -fsanitize=bounds as a matter of implementation. As a result, although there is no functional interference, enabling both will emit twice the checks, and sometimes duplicates will survive the optimizer pipeline. There are no known situations in which Clangâs implementation of -fsanitize=bounds could catch an issue that -fbounds-safety would have let through, making -fbounds-safety a strict superset of -fsanitize=bounds. We have recommended to project owners who adopted -fbounds-safety and already used -fsanitize=bounds to disable -fsanitize=bounds. Itâs not a diagnostic to use the two together in our implementation, but it would be reasonable to warn about it.
Our adoption experience with C strings is that itâs cumbersome to keep mutable char *__null_terminated values. We encourage adopters to use char *__counted_by for as long as they need to modify strings and turn them into const char *__null_terminated when theyâre done.
Itâs also the case that many string functions have unverifiable interfaces. For instance, strncmp must use __unsafe_indexable pointers, despite its n parameter, because itâs perfectly fine to pass strings that are shorter than n characters as long as the NUL terminator is found in the first n characters (which __counted_by would reject), and itâs also fine to pass strings that are not NUL-terminated if they are more than n characters long (which __null_terminated would reject). In cases where you must preserve ABI at all costs and cannot pass bounds, sometimes the least friction is achieved by falling back to unsafe pointers and checking them outside of the model offered by -fbounds-safety. Not being entirely familiar with Linux kernel APIs, itâs possible that this is the case here.
Right. Another interesting thing related to this is that using such function attributes -fbounds-safety can infer bounds annotations to create âdependent typesâ and it already does to a certain extent. Using the âalloc_sizeâ attribute (e.g, void *alloc(size_t len) __attribute__((alloc_size(1))) ), -fbounds-safety treats the function call as if the return type is a dependent type annotated as void *__sized_by(len) alloc(size_t len).
Similarly, this model can be extended to support âaccessâ and relevant function attributes to infer bounds annotations (or dependent types).
Since there are some general comments about using function declaration attributes, itâs important to reiterate that bounds annotations are not just about function parameters. Bounds annotations, integrated into the type system, provide a comprehensive way to describe the bounds information of pointers used in various contexts (and have them all bounds checked): global and local variables, struct fields, and flexible array members, as shown in the example.
Right, this also requires a run-time check. To be clear, -fbounds-safety also does the run-time check when the pointer has the bounds information while it cannot do this when itâs __single .
This is interesting. While both the approaches enforce bounds safety, I believe they cater distinct requirements and target audiences. As you pointed out, CHERI can always verify the bounds of pointer that is cast back-and-forth between void * because it changes the pointer representation to contain the bounds information and capability. And this necessitates its own ABI. In this context, CHERI resembles a hardware version of a wide pointer (a.k.a. fat pointer) approach, like HardBound [1] (though CHERI might be more than that with its capability model). A software version of CHERI can be compared to software implementations of wide pointer approaches.
-fbounds-safety, on the other hand, is designed for applications that need to maintain the ABI for interactions with the external libraries (without a rebuild) and are looking to incrementally adopt the technique, which is a common challenge for software-based wide pointer approaches. -fbounds-safety achieves this by using bounds annotations that do not modify the ABI.
Furthermore, employing wide pointers everywhere without hardware support (i.e., software-only wide pointer approaches) can incur significant performance and code size overhead. Since -fbounds-safety uses ABI-preserving pointers on the ABI surface, it would incur a smaller performance impact compared to using wide pointers everywhere without special hardware support.
Having said that, it would be interesting to explore if -fbounds-safety could utilize CHERI ISA or similar hardware support to enhance performance. Additionally, the wide pointer implementation in Clang could potentially be generalized to accommodate both approaches.
Devietti J, Blundell C, Martin MM, Zdancewic S. Hardbound: Architectural support for spatial safety of the C programming language. ACM SIGOPS Operating Systems Review. 2008 Mar 1;42(2):103-14. âŠď¸
You surely could, but thereâs not much point. CHERI really shines when you use it fully, not when you mix it with legacy code (which it supports, but only for compatibility, we encourage people to avoid doing so whenever possible).
If the cast in the call to foo is allowed,
the following program has different semantics with and without -fobunds-safety.
With this flag, it must trap, while without it (if annotations are ignored), it is a valid
ISO C program without UB.
void foo(int *__counted_by(count) p, size_t count) {
*(p+3)=1;
}
int main()
{
int a[10];
foo((int *__counted_by(1)) a,1);
}
@vzaliva It sounds like in your example the API is expected to access beyond the provided count, so youâd just have to annotate it as doing that, e.g.
Taking the address of a count parameter is not allowed unless the use site maintains the same count parameter relationship, leaving it to the callee to check any new bounds. So your example would be a compilation error, but this would compile:
void bar(int *cptr, int *__counted_by(*cptr) *p) {
(*cptr)--;
*p++;
}
void foo(int *__counted_by(count) p, int count) {
bar(&count, &p);
}
Also, it should reference declared identifiers after theyâve been declared, so len should be the first member of the struct.
Back to my point, nesting declarations within declarations is a road to madness and I personally say hell no to allowing that, like thatâs a hard deal breaker to me.
As for the relationship comment, this makes me wonder if thereâs a better way we can define relationships between identifiers.
JeanHeyd proposed _Alias to allow two functions to basically manually mangle them for ABI purposes.
Iâve got an idea Iâve been working on to add operator overloading to C without name mangling via an _Overload keyword that would establish a relationship between operator syntax and the named function that implements the operation, and you guys are trying to basically create fat pointers.
Is there a better way to denote these relationships that can be a bit more abstract and reusable?
Another important part of C semantics not discussed in this proposal is handling pointer-to-integer casts, and casts to intptr_t (and back) in particular.
Thatâs a great point. An incorrect annotation like your example could lead to a run-time trap, even for a valid C program that doesnât have undefined behavior. To address this problem, our goal is to report such annotation mistakes at compile time as much as possible. This way, programmers can correct errors quickly without the need for run-time testing.
Interestingly, the presence of such a mistake often suggests that the code already contains some elements of âunsafenessâ. For instance, the count parameter in this case is supposed to represent the size of p, but itâs not utilized because the code lacks the necessary bounds checks. In other words, safer code would have employed the count parameter for bounds checking, as demonstrated in the example below, to prevent out-of-bounds access to p. Consequently, the run-time trap triggered by -fbounds-safety would ultimately expose the absence of bounds checking and an incorect count value, which can be interpreted as a logic bug.
int foo(int *__counted_by(count) p, size_t count) {
// safe code would have bounds checks like below before accessing `p`
if (count <= 3) return ERROR;
*(p+3)=1;
return SUCCESS;
}
First, it is important to note there is precedence for placing type attributes to the right of the * in pointer types. Clangâs Nullability Attributes do this.
Second, the attributes in this RFC act like existing GNU style type attributes so the syntax you suggest is already supported in the current implementation as noted here in the RFC.
That being said placing the attribute after the variable declaration is not sufficient when there are nested pointers.
Consider this example. It is not possible to use an attribute that trails the declaration to describe every pointer.
struct buff_to_ptrs {
// `buf1` is a sized_by pointer that points
// to a buffer bounded by [buf1, buf1 + len).
// The buffer elements are all `void *__indexable`
// pointers.
void *__indexable *__sized_by(len) buf1;
size_t len;
};
Technically it is possible to use a trailing attribute as shown below and this would be equivalent to buff_to_ptrs .
struct buff_to_ptrs_v2 {
// __sized_by(len) only applies to the outer most
// pointer.
void *__indexable * buf1 __sized_by(len);
size_t len;
};
But personally I prefer to annotate all the pointers in a uniform manner where the annotation is immediately to the right of the *.
It is also worth noting that the trailing attribute currently only works because our implementation uses GNU style attributes under the hood. If we used C2x style attributes (e.g. something like [[__bidi_indexable]] ) then these could only appear to the right of the * in a pointer type (i.e. the trailing declaration syntax you suggested isnât possible).
That doesnât work if the goal is to not break ABI. If an existing struct definition has the pointer come before the count the members cannot be re-ordered without breaking ABI .
One of bounds-safetyâs primary goals is to avoid breaking ABI as little as possible because it is necessary for incremental adoption to be practical. Thus, we have to allow pointer annotations in structs on pointers that reference members that havenât been declared yet.
The problem you mentioned later with multiple pointers would be solved by taking the name of the implicitly referenced pointer variable and making the reference explicit.
__sized_by(Buf, Len);
This is what I was trying to say with my mention of _Alias and _Overload operator .