RFC: Enforcing Bounds Safety in C (-fbounds-safety)

There is already a large body of existing work in this area under active development:

https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/

https://www.cheribsd.org/

It has been under development since 2010 and received funding for hardware extensions in 2019. Large portions of FreeBSD, KDE and others are already working on the hardware with the Cheri C/C++ extensions. Cheri also appears to have its own upstreaming effort:

Their long term goal is to incorporate the ISA extension into future versions of ARM and other ISAs. However, there should be room to implement a software version on top of the work already being done to leverage a hardware extension. This effort should be able to achieve that if it worked with them. That would spare open source projects from having to support multiple dialects of C that all attempt to the same thing.

If you mean ignored by compilers then this already works in the current implementation because attributes like __bidi_indexable are actually macros that expand to nothing for compilers that don’t support the bounds safety extention.

If you mean ignored by developers, then the fact the attributes are not easily ignored is an intentional feature, not a design flaw.

Bounds attributes are part of the type system because they change the type of the pointer. With that in mind it is very natural to specify the bounds attributes with the type because it is part of the type.

Bounds attributes also serve as a form of documentation, telling the developer how many objects the pointer is supposed to point to in many cases. Again specifying that as part of the type is natural because it is (IMHO) a desirable feature for the developer to see that. Deliberately trying to hide the attributes from the developer by using trailing attributes hinders discoverability.

Specifying part of the pointer type elsewhere as a trailing attribute sounds unnatural to me because we would be splitting the type up (making the code harder to read and thus reason about) and also creating a potential foot-gun.

To illustrate what I mean by a potential foot-gun. Let’s imagine using your suggested syntax.

// Equivalent to void foo(int * __counted_by(N) ptr, size_t N);
void foo(int * ptr, size_t N) __pointer_bounds(ptr, N)

Now imagine a decision is made to change the API so that the pointer bounds are now [ptr, element_size * N) (i.e. the API now works in terms of bytes rather than element count) where element_size is a new parameter.

void foo(void * ptr, size_t element_size, size_t N)  __pointer_bounds(ptr, N)

The function signature has been changed but the developer forgot to update the pointer bounds (easy to do because they aren’t written as part of the type) so now the pointer bounds are wrong and there’s nothing the compiler can do to warn about this mistake.

For the sake of completeness. Here’s what a correctly annotated version of the above function would look like with attributes proposed by this RFC.

void foo(void * __sized_by(element_size * N) ptr,
  size_t element_size,
  size_t N);

A name like pointer_bounds would be ambiguous because this RFC proposes multiple pointer annotations that all have “bounds” (e.g. __counted_by , __sized_by , __bidi_indexable ). So pointer_bounds probably isn’t a good choice.

-fbounds-safety is strictly more powerful than -fsanitize=bounds, but -fbounds-safety does not use -fsanitize=bounds as a matter of implementation. As a result, although there is no functional interference, enabling both will emit twice the checks, and sometimes duplicates will survive the optimizer pipeline. There are no known situations in which Clang’s implementation of -fsanitize=bounds could catch an issue that -fbounds-safety would have let through, making -fbounds-safety a strict superset of -fsanitize=bounds. We have recommended to project owners who adopted -fbounds-safety and already used -fsanitize=bounds to disable -fsanitize=bounds. It’s not a diagnostic to use the two together in our implementation, but it would be reasonable to warn about it.

Our adoption experience with C strings is that it’s cumbersome to keep mutable char *__null_terminated values. We encourage adopters to use char *__counted_by for as long as they need to modify strings and turn them into const char *__null_terminated when they’re done.

It’s also the case that many string functions have unverifiable interfaces. For instance, strncmp must use __unsafe_indexable pointers, despite its n parameter, because it’s perfectly fine to pass strings that are shorter than n characters as long as the NUL terminator is found in the first n characters (which __counted_by would reject), and it’s also fine to pass strings that are not NUL-terminated if they are more than n characters long (which __null_terminated would reject). In cases where you must preserve ABI at all costs and cannot pass bounds, sometimes the least friction is achieved by falling back to unsafe pointers and checking them outside of the model offered by -fbounds-safety. Not being entirely familiar with Linux kernel APIs, it’s possible that this is the case here.

I worked with Peter Sewell and others on putting the a provenance model into C (draft TS by now), so I am aware of Cheri.

2 Likes

Right. Another interesting thing related to this is that using such function attributes -fbounds-safety can infer bounds annotations to create “dependent types” and it already does to a certain extent. Using the ‘alloc_size’ attribute (e.g, void *alloc(size_t len) __attribute__((alloc_size(1))) ), -fbounds-safety treats the function call as if the return type is a dependent type annotated as void *__sized_by(len) alloc(size_t len).

Similarly, this model can be extended to support ‘access’ and relevant function attributes to infer bounds annotations (or dependent types).

Since there are some general comments about using function declaration attributes, it’s important to reiterate that bounds annotations are not just about function parameters. Bounds annotations, integrated into the type system, provide a comprehensive way to describe the bounds information of pointers used in various contexts (and have them all bounds checked): global and local variables, struct fields, and flexible array members, as shown in the example.

int glen;
void *__sized_by(glen) glob;

struct sized_buf {
    void *__sized_by(len) buf;
    size_t len;
};

struct flexible_buf {
    size_t len;
    int fam[__counted_by(len)];
};
1 Like

Right, this also requires a run-time check. To be clear, -fbounds-safety also does the run-time check when the pointer has the bounds information while it cannot do this when it’s __single .

This is interesting. While both the approaches enforce bounds safety, I believe they cater distinct requirements and target audiences. As you pointed out, CHERI can always verify the bounds of pointer that is cast back-and-forth between void * because it changes the pointer representation to contain the bounds information and capability. And this necessitates its own ABI. In this context, CHERI resembles a hardware version of a wide pointer (a.k.a. fat pointer) approach, like HardBound [1] (though CHERI might be more than that with its capability model). A software version of CHERI can be compared to software implementations of wide pointer approaches.

-fbounds-safety, on the other hand, is designed for applications that need to maintain the ABI for interactions with the external libraries (without a rebuild) and are looking to incrementally adopt the technique, which is a common challenge for software-based wide pointer approaches. -fbounds-safety achieves this by using bounds annotations that do not modify the ABI.

Furthermore, employing wide pointers everywhere without hardware support (i.e., software-only wide pointer approaches) can incur significant performance and code size overhead. Since -fbounds-safety uses ABI-preserving pointers on the ABI surface, it would incur a smaller performance impact compared to using wide pointers everywhere without special hardware support.

Having said that, it would be interesting to explore if -fbounds-safety could utilize CHERI ISA or similar hardware support to enhance performance. Additionally, the wide pointer implementation in Clang could potentially be generalized to accommodate both approaches.


  1. Devietti J, Blundell C, Martin MM, Zdancewic S. Hardbound: Architectural support for spatial safety of the C programming language. ACM SIGOPS Operating Systems Review. 2008 Mar 1;42(2):103-14. ↩︎

You surely could, but there’s not much point. CHERI really shines when you use it fully, not when you mix it with legacy code (which it supports, but only for compatibility, we encourage people to avoid doing so whenever possible).

1 Like

How about this:

    void bar(int *cptr)
    {
       (*cptr)++;
    } 
    
    void foo(int *__counted_by(count) p, size_t count) {
        // count++; violates the invariant of __counted_by 
        bar(&count); // what about this?
    }

It will be pretty tricky to track down all the places where count might be changed.

If the cast in the call to foo is allowed,
the following program has different semantics with and without -fobunds-safety.
With this flag, it must trap, while without it (if annotations are ignored), it is a valid
ISO C program without UB.

   void foo(int *__counted_by(count) p, size_t count) {
      *(p+3)=1;
   }

   int main()
   {
       int a[10];
       foo((int *__counted_by(1)) a,1);
   }

@vzaliva It sounds like in your example the API is expected to access beyond the provided count, so you’d just have to annotate it as doing that, e.g.

void foo(int *__counted_by(count + 4) p, size_t count) {
      *(p+3)=1;
   }

Hope this helps clarify things :smiley:

Taking the address of a count parameter is not allowed unless the use site maintains the same count parameter relationship, leaving it to the callee to check any new bounds. So your example would be a compilation error, but this would compile:

void bar(int *cptr, int *__counted_by(*cptr) *p) {
   (*cptr)--;
   *p++;
} 
    
void foo(int *__counted_by(count) p, int count) {
    bar(&count, &p);
}

Good point, but it’s still distracting putting it in the middle of the declaration.

Put it after the variable has been declared, not in the middle of the declaration.

struct sized_buf {
void *buf __sized_by(len);
size_t len;
};

Also, it should reference declared identifiers after they’ve been declared, so len should be the first member of the struct.

Back to my point, nesting declarations within declarations is a road to madness and I personally say hell no to allowing that, like that’s a hard deal breaker to me.

As for the relationship comment, this makes me wonder if there’s a better way we can define relationships between identifiers.

JeanHeyd proposed _Alias to allow two functions to basically manually mangle them for ABI purposes.

I’ve got an idea I’ve been working on to add operator overloading to C without name mangling via an _Overload keyword that would establish a relationship between operator syntax and the named function that implements the operation, and you guys are trying to basically create fat pointers.

Is there a better way to denote these relationships that can be a bit more abstract and reusable?

Another important part of C semantics not discussed in this proposal is handling pointer-to-integer casts, and casts to intptr_t (and back) in particular.

@vzaliva what do you mean? conversion to a safe pointer is done via the _unsafe_forge* builtins

@waffles_the_dog sorry, not sure which of my comments you are responding to. Please clarify.

That’s a great point. An incorrect annotation like your example could lead to a run-time trap, even for a valid C program that doesn’t have undefined behavior. To address this problem, our goal is to report such annotation mistakes at compile time as much as possible. This way, programmers can correct errors quickly without the need for run-time testing.

Interestingly, the presence of such a mistake often suggests that the code already contains some elements of “unsafeness”. For instance, the count parameter in this case is supposed to represent the size of p, but it’s not utilized because the code lacks the necessary bounds checks. In other words, safer code would have employed the count parameter for bounds checking, as demonstrated in the example below, to prevent out-of-bounds access to p. Consequently, the run-time trap triggered by -fbounds-safety would ultimately expose the absence of bounds checking and an incorect count value, which can be interpreted as a logic bug.

int foo(int *__counted_by(count) p, size_t count) {
    // safe code would have bounds checks like below before accessing `p`
    if (count <= 3) return ERROR;
    *(p+3)=1;
    return SUCCESS;
}

First, it is important to note there is precedence for placing type attributes to the right of the * in pointer types. Clang’s Nullability Attributes do this.

Second, the attributes in this RFC act like existing GNU style type attributes so the syntax you suggest is already supported in the current implementation as noted here in the RFC.

So these two struct types are equivalent:

struct sized_buf {
void *buf __sized_by(len);
size_t len;
};

struct sized_buf_v2 {
void * __sized_by(len) buf;
size_t len;
};

That being said placing the attribute after the variable declaration is not sufficient when there are nested pointers.

Consider this example. It is not possible to use an attribute that trails the declaration to describe every pointer.

struct buff_to_ptrs {
// `buf1` is a sized_by pointer that points
// to a buffer bounded by [buf1, buf1 + len).
// The buffer elements are all `void *__indexable`
// pointers.
  void *__indexable *__sized_by(len) buf1;
  size_t len;
};

Technically it is possible to use a trailing attribute as shown below and this would be equivalent to buff_to_ptrs .

struct buff_to_ptrs_v2 {
  // __sized_by(len) only applies to the outer most
  // pointer.
  void *__indexable * buf1 __sized_by(len);
  size_t len;
};

But personally I prefer to annotate all the pointers in a uniform manner where the annotation is immediately to the right of the *.

It is also worth noting that the trailing attribute currently only works because our implementation uses GNU style attributes under the hood. If we used C2x style attributes (e.g. something like [[__bidi_indexable]] ) then these could only appear to the right of the * in a pointer type (i.e. the trailing declaration syntax you suggested isn’t possible).

That doesn’t work if the goal is to not break ABI. If an existing struct definition has the pointer come before the count the members cannot be re-ordered without breaking ABI .

One of bounds-safety’s primary goals is to avoid breaking ABI as little as possible because it is necessary for incremental adoption to be practical. Thus, we have to allow pointer annotations in structs on pointers that reference members that haven’t been declared yet.

The problem you mentioned later with multiple pointers would be solved by taking the name of the implicitly referenced pointer variable and making the reference explicit.

__sized_by(Buf, Len);

This is what I was trying to say with my mention of _Alias and _Overload operator .