TLDR: Clang needs wrapping and non-wrapping arithmetic types similar to Rust’s Wrapping struct for C.
Background
The root cause of many security vulnerabilities and bugs is arithmetic overflow. C is traditionally one of the worst offenders of these types of flaws. There’s a large quantity of published security flaws involving arithmetic overflow, including 1 and 2 to name a couple. The sheer quantity of vulnerabilities of this kind is mainly due to the popularity and abundance of software written in C but partly due to C’s historical lack of non-wrapping overflow resolution support (so it must be explicitly open-coded with every calculation and is easily forgotten or done incorrectly).
Moreover, Clang is nicely positioned to add non-wrapping types to the frontend as LLVM already supports wrapping operations.
Goals
No undefined behavior (currently available with -fno-strict-overflow)
In-source type-level control for wrapping and non-wrapping arithmetic
Have program-level control over trap/abort vs warn/wrap overflow resolution (currently available with sanitizer)
Current options
Currently, there are no in-source strategies for designating wrapping or non-wrapping integral types. The lazy option is a comment like /* this type really shouldn't wrap! */ but this obviously isn’t enforced by the compiler. The type behavior and semantics need to be evident and enforced by the type itself and not derived from the usage of the type. Compiler intrinsics exist for wrapping arithmetic but are not always what developers want to use; complex cryptography code would require dozens of __builtin_{add,mul,sub,div}_overflow() calls to construct a single algorithm, for example.
Other options like -fno-strict-overflow (or -fwrapv and -fwrapv-pointer) define signed wrapping arithmetic across an entire compilation unit. For many, this is too broad. We need a more granular approach for arithmetic overflow annotations. Recently I landed 3 which added type-filtering using sanitizer case lists for the overflow and truncation sanitizers. This works great for projects that want better control over which types are instrumented by sanitizers but it has some flaws:
These “annotations” are not in-source and are therefore not very reader-friendly.
Implicit promotions cause sanitizer case lists to no longer match types post-promotion.
Only affects sanitizer instrumentation. Wrapping and non-wrapping types in the frontend could be used for better diagnostics and codegen (great for linters too).
“But unsigned types ARE the built-in way of having well-defined wrapping types!”
Some specializations of unsigned types make no sense when wrapping, e.g, size types, indices, or reference counting types. We would like to use runtime sanitizers to catch these obvious bugs when these types are involved in overflows but we hit many false positives coming from the other unsigned types. A way to tell the compiler and readers of code which types are expected to wrap or not is required.
allowing no_sanitize to be used on types (this has a problem of getting lost during regular integer promotions, same as wraps/no_wraps).
(in for Clang 20) Type-filtering using SCLs - this is an important feature (especially for the Linux kernel) but doesn’t solve the whole problem. We really need in-source annotations (read: types) paired with this!
Implementation
Need wrapping/non-wrapping types to persist through usual arithmetic conversions, this allows less-than-int types to actually be used.
Having non-wrapping types in the frontend opens the door for new compile-time diagnostics, helping developers write less bugs.
– Design questions:
How should I handle expressions with wrapping AND non-wrapping types? (In my earlier attempts, the non-wrapping, i.e. new behavior, would be applied to the entire expression.)
Should I go with a canonical type as non-canonical types tend to get lost (like AttributedType)?
Should I inherit from BuiltinType since we essentially want most of the same behavior there?
Should these wrapping/non-wrapping types be denoted by qualifiers? _NoWrap/_Wrap? For example:
_NoWrap unsigned long a;
_Wrap int b;
How to address int promotions, e.g, we have a non-wrapping u8 type participate in arithmetic with an int; should the resulting expression retain the non-wrapping type?
How should I handle expressions with wrapping AND non-wrapping types? (In my earlier attempts, the non-wrapping, i.e. new behavior, would be applied to the entire expression.)
By definition this is ambiguous. Depending on which code you look at one or the other choice could be the right one.
I think there are 3 options here:
Pick some default behaviour as you suggest.
Make the behaviour depend on -fwrapv, -fno-strict-overflow, or -fstrict-overflow being set, i.e. revert to current default behaviour.
Make the compiler warn (e.g. -Wmismatching-wrap-qualifiers) if non-wrap and wrap qualified types are used in the same expression. The programmer can resolve this by casting mismatching variables to the right type. E.g.
_Wrap int x;
_NoWrap int y;
x + y; // warning: wrapping and non-wrapping integers used in same expression
x + (_Wrap int)y; // ok
(_NoWrap int)x + y; // ok
Given I’m suggesting only a warning here, you still need some fall-back default behaviour. In that case I’d probably choose option #2, i.e. strip all wrap/no-wrap qualifiers and adhere to the default per flags.
Should I go with a canonical type as non-canonical types tend to get lost (like AttributedType)?
Canonical type would give you the least surprising behaviour - but I tend to prefer stronger type systems if it helps me avoid mistakes (others might disagree). With a canonical type, more explicit casts may be required to remove the wrapping/non-wrapping qualifier.
Should I inherit from BuiltinType since we essentially want most of the same behavior there?
Should these wrapping/non-wrapping types be denoted by qualifiers? _NoWrap/_Wrap? For example:
Having them type-qualifiers would seem most intuitive to me. This is also close to your original attribute approach, and the usage should be almost the same (right?).
How to address int promotions, e.g, we have a non-wrapping u8 type participate in arithmetic with an int; should the resulting expression retain the non-wrapping type?
As above, I think this should result in a warning and the programmer must explicitly cast one or the other variable so that all type qualifier match (either all no qualifier, all wraps, or all no-wraps).
At the same time you are able to support the more relaxed behaviour if the programmer chooses to add -Wno-mismatching-wrap-qualifiers (I’d expect this to initially be required for the Linux kernel).
What is meant by “stronger type systems”? What options do I have for Clang in the form of stronger typing? My limited understanding of Clang’s type system leads me to believe you mean adding something to the BuiltinType::Kind enum instead of creating a new AST node (because a new AST node may lead to more casts, implicit or otherwise).
edit: What I just mentioned reminds me of _Sat fixed point types like _Sat _Fract. Now that I’m saying (typing) this out loud, perhaps a similar approach could be used for _Wraps. Thoughts?
What “strong” means is debatable - what I meant here is that the type system does not do implicit (or silent) casts / type promotion, nor silently strips type qualifiers. Essentially what I roughly outlined above (no silent casts by default).
I’m not familiar enough with semantics of _Sat & _Fract to comment on it. I guess the question is: are the typing rules close to what we would expect from _Wraps? If yes, perhaps it’s a reasonable starting point.
According your proposal how should _NoWrap unsigned behave? The straightforward meaning would be to map to the LLVM nuw flag which means undefined behaviour (unless -fsanitizer=undefined) but that seems to contradict your goals. Should it deterministically trap?
I think the answer here is informed by the choice for “How should I handle expressions with wrapping AND non-wrapping types?”. In particular, the issue is that literals are typed to “int” by default. For example with this:
_NoWrap u8 var = ...;
...
var++;
the var++ will effectively be resolved as:
var++; // increment and assign
var = var + 1; // expanded to explicit assignment
var = var + (int)1; // literal "1" is an int
var = (int)var + (int)1; // must type promote u8 var to int
var = (u8)((int)var + (int)1); // must truncate back down to u8 for assignment
Given how implicitly tied to “int” much of C ends up being, I think we need to have mixed overflow resolutions take on the non-default state. In other words, if both _Wrap and _NoWrap are in an expression, the entire expression must be treated as nowrap.
While I agree that it might be nice to add a warning for mixed resolutions, I worry it would be so noisy as to be useless. For example, emitting a warning for var++ above will likely be very annoying because there isn’t a sane way to silence it that doesn’t end up being unidiomatically verbose (e.g. var = var + 1U). At the end of the day, var++ should be happily self-contained without exposing the user to int promotion internals if var is _NoWrap.
Using another example, performing an index (_NoWrap) calculation from a a signed offset is common (seeking forward/backward). In this case, we’d again want the _NoWrap behavior:
loff_t offset = ...; // could be positive or negative
_NoWrap size_t file_position = ...; // some unsigned file position/size
...
return file_position + offset; // this must not go below 0 nor above SIZE_MAX
As a random thought, it seems like all native integral types under this proposal start their lives as _Wrap, yes?
According to LLVM Language Reference Manual — LLVM 21.0.0git documentation it seems nuw always produces a poison value, this is unfortunate. I suppose _NoWrap unsigned should not be emitting nuw since we do not want undefined behavior, we want deterministic trapping.
Maybe just two: one for wrapping according to 2’s complement and the other for trapping. The trap handling would be controlled by sanitizers.
I am also thinking of adding frontend diagnostics that are switchable with some flag. If we can compute that some constant expression wraps on a _NoWrap type then we can give compile-time diagnostics.
nuw has its uses, or LLVM would not even have this flag. “We” are also concerned with program performance.
In some sense signed int is already a signed integer with _NoWrap behaviour and unsigned integer is a unsigned integer with _Wrap behavior. It would be nice to be able to use signed/unsigned behavour independent of wrapping behavor.
_NoWrap unsigned long a and _Wrap int b seems like reasonable syntax… but there’s a risk _Wrap conflicts with existing code.
I think we want canonical types because non-canonical types cause a bunch of issues. Using BuiltinType is probably okay (I don’t know if you’d want to inherit, as opposed to just adding new BuiltinType kinds.)
We already have flags to control what happens if an operation that’s defined not to wrap: -fwrapv, and -ftrapv/-fsanitizer. I expect unsigned nowrap types are controlled by the existing flags, and signed wrap types are not affected by any of those flags. In contexts like the Linux kernel, I assume people build releases with -fwrapv anyway. We can add additional flags if people want to control unsigned wrap separately from signed wrap, I guess.
Does this interact at all with division? ((_Wrap int)INT_MIN / (_Wrap int)-1). Historically -fwrapv hasn’t, but maybe worth considering.
Promotion is hard… I tend to think we want to encourage explicit casts to make the intended meaning clear, but I’ve never really liked C integer promotions in the first place. If we make the rules too strict, it might be hard to use the types.
We can make up new rules for ++/-- on wrapping types to avoid that particular issue.