To be clear: I think this proposal absolutely goes into the right direction.
I guess my point is that its should be coordinated with similar work elsewhere and also WG14 and WG21 to avoid that it works well with existing language features. We should also try to make sure that sanitizers and this new feature work coherently. For example, we already -fsanitize=bounds and -fsanitize=object-size and it does not seem that already this is always perfectly integrated.
I would also recommend to approach this a bit more incrementally.
BTW. The GCC extension can be defined away just by hiding it behind a macro. But even for decls where the ordering does not require it, I already have to do the following for headers which need to be C++ compatible:
void foo(int N, char buf[VLA(N)]);
I wish C++ would at least simply ignore the size argument.
(but as I said elsehwere: I like this proposal in general.)
I suppose the annotations (for the known builtins as well as for application code) will only make it easier for builtins like __builtin_dynamic_object_size to glean object sizes but…
… it sounds like this essentially subsumes _FORTIFY_SOURCE, -fsanitize=bounds, etc. This is great, but one of the friction points for adoption has been the code size/performance overhead of bounds checking, so that’s probably going to affect -fbounds-safety too.
The -fbounds-safety extension provides __null_terminated (or __terminated_by(0) ) to annotate C strings. You can find more details in “Annotations for sentinel-delimited arrays”.
Thanks for your insightful feedback! Yes, we are planning to incrementally make changes being guarded by an experimental flag. We also listed possible patch sets. You can find our rough planning for this here. Please let me know if this matches your expectations.
Right, but there are be cases where this isn’t sufficient. Take for example Linux providing an architecturally accelerated version of strcpy() (i.e. in assembly). Generally speaking this is implemented in such a way that during the copy:
source buffer size is checked while finding the source string length
Callers cannot verify string termination happens within src_size bytes of src without losing the performance benefit of the asm, and the size of the copy is not known without doing that first, so callers cannot do any of the verification. The size of the buffers need to be passed in. Hence, __builtin_dynamic_object_size is used to gather that information. This currently works with the alloc_size attribute, so I’d expect it to work with the counted_by attribute as well.
Right, it might be a good idea to make some of them into compiler errors. However, legacy C code tends to massively use a back-and-forth casting with void * , potentially leading to type confusion, and removing them altogether is likely impractical.
So I like the attributes, but there are several reasons I prefer the GCC extension to “late parsing”. IMHO it is conceptually cleaner (IMHO) and fundamentally more powerful. First, in C you can do
int n;
void foo(char buf[n]);
or
void foo(char *p __counted_by(n));
But if you allowed this, then you have problem with
int n;
void foo(char *p __counted_by(n) p, int n);
To which ‘n’ does this refer? And how would you refer to the other? You can limit it to parameters, but then the design is restricted in this way. Or you could prefer parameters but then fall back for others, but this is fragile, as the appearance of another identifier would silently change the meaning of the code. Also with GCC extension you can do other things:
And dependent types are certainly the right long-term direction, so I think we should keep this in mind. And then, all this already works in C (although not yet with all the checking we want to have):
And can be hidden from C++ in a way that makes it compatible.
But the annotations used for -fbounds-safety express a similar thing to (some of) SAL’s annotations. How and when tools interpret them doesn’t change that.
The access attribute specifies that a function to whose by-reference arguments the attribute applies accesses the referenced object according to access-mode. The access-mode argument is required and must be one of four names: read_only, read_write, write_only, or none. The remaining two are positional arguments.
I agree with the use of an attribute that trails the parameter list to address (at least) the cases that would otherwise require late parsing. For example:
Absolutely! We see bounds annotations as attributes to turn a pointer type into dependent type. Once C standard fully integrates the size into a type (like your VM type proposal), it might make sense to map pointers with bounds annotation to the equivalent dependent type.
Also, it’s interesting how bounds annotations are currently integrated with array parameters. With -fbounds-safety, when an array parameter decays into a pointer, it becomes a pointer with the corresponding __counted_by annotation like the example below:
void foo(int n; int arr[n], int n);
// Without `-fbounds-safety` above decays into:
void foo(int *arr, int n);
// With -fbounds-safety it decays into:
void foo(int *__counted_by(n) arr, int n);
What you propose looks very similar to the Cheri extensions to C/C++ for Cheri’s pure capability mode. Section 4.2 of the CHERI C/C++ Programming Guide talks about what happens when you cast back and forth using void *:
The compiler will generate suitable code to propagate the provenance validity
of pointers by using capability load and store instructions. This occurs when using a
pointer type (e.g., void *) or an integer type defined as being able to hold a pointer
(e.g., intptr_t).
Casting between pointer does not allow out of bounds accesses since the “capability” stored in the extra information stays with the pointer.
Beyond that, there are some changes needed, but the Cheri project has found the changes needed to be fairly small:
Page 13 of the presentation from BSDCan 2023 puts the LoC changes at 0.04% of userspace and 0.18% of kernel.
One example of a code change would be cases where you are casting an integer to a pointer to pass it in a pointer type. When that happens, Cheri LLVM will issue a warning (that becomes an error with -Werror). You must cast to (u)intptr_t before casting it to a pointer type to say that you really mean that. The pointers generated are invalid, so attempting to dereference one will trigger a hardware exception in Cheri C/C++. That avoids the following problem that would otherwise occur whenever a pointer is made out of thin air:
That said, this is so much like a software version of Cheri that I suspect that it could be implemented using Cheri’s extensions instead of the new ones proposed here. I would prefer for the two efforts to agree on a single set of extensions since having two different C dialects that ultimately do the same thing would be lousy for projects that must support both.