Support the Linux kernel’s need for defining overflow resolution behavior for types at the source level by introducing OverflowBehaviorTypes to Clang’s type system.
Key discussion areas for this RFC include C++ support, narrowing semantics and other design or implementation topics.
This RFC is a follow-up to [RFC] [Clang] Canonical wrapping and non-wrapping types. There is valuable background and discussion on that RFC. To summarize, it seems folks are on board with the general idea of canonical wrapping types.
Implementation
v2 uses an attribute to create an OverflowBehaviorType within the AST. This is not as prone to getting “lost” or stripped away like my PR add wraps and no_wraps attributes. This is because v2 doesn’t use AttributedTypes.
v1 used type specifiers to instantiate built-in integral types like _Wrap int
or _NoWrap int
. v2 instead parses overflow_behavior
attributes to create OverflowBehaviorTypes within the AST. Promotions and codegen take special care to consider OverflowBehaviorTypes. Since OBTs are canonical types in the AST we have a very nice time persisting the types through promotions. During codegen, llvm already has intrinsics across a variety of bitwidths which we emit based on the OBTs present; Clang doesn’t do any of the overflow calculations itself (except for -Winteger-overflow ICE checks for which OBTs have a backdoor).
To best grasp the behaviors mentioned above, take a look at the Clang sema and codegen tests:
- clang/test/CodeGen/overflow-behavior-types.c
- clang/test/CodeGen/overflow-behavior-types.cpp
- clang/test/Sema/attr-overflow-behavior.c
- clang/test/Sema/attr-overflow-behavior.cpp
I have a linux tree which is a fork of Kees’ for-next tree which has the necessary kbuild changes to support the INTEGER_WRAP config option as well as OverflowBehaviorTypes and sanitizer special case lists.
With this kernel tree and this llvm tree I built an x86 kernel with the INTEGER_WRAP sanitizers on and marked size_t with a no_wrap overflow behavior type. The current configuration means I am only getting splats around arithmetic containing size_t types. I’ve had a syzkaller instance running against this image for a couple weeks with promising results. We are getting solid signal for overflow bugs in the kernel which is not something we’ve ever really had since we couldn’t turn on the integer sanitizers without tons of noise. Now, with a SSCL ignorelist and a size_t overflow_behavior annotation, we are getting type-level granularity.
Here’s my Linux tree with the necessary kbuild changes: GitHub - JustinStitt/linux at dev/v6.15-rc4/int-wrap-types-poc
Here’s my LLVM tree implementing OverflowBehaviorTypes: GitHub - JustinStitt/llvm-project at overflow-behavior-types (wip)
Narrowing semantics
OverflowBehaviorTypes will perform implicit narrowing casts on other operands in order to 1) get both types to the same bitwidth and 2) have the final result type be the same bitwidth and share the overflow behavior of the OBT.
Here is the most obvious example to show this behavior:
void foo(char __no_wrap A) {
unsigned long B;
(A + B);
}
// relevant AST snippet
`-BinaryOperator 0x55d82d556558 <col:4, col:8> '__no_wrap char':'char' '+'
|-ImplicitCastExpr 0x55d82d556510 <col:4> '__no_wrap char':'char' <LValueToRValue>
| `-DeclRefExpr 0x55d82d5564d0 <col:4> '__no_wrap char':'char' lvalue ParmVar 0x55d82d556278 'A' '__no_wrap char':'char'
`-ImplicitCastExpr 0x55d82d556540 <col:8> '__no_wrap char':'char' <IntegralCast>
`-ImplicitCastExpr 0x55d82d556528 <col:8> 'unsigned long' <LValueToRValue>
`-DeclRefExpr 0x55d82d5564f0 <col:8> 'unsigned long' lvalue Var 0x55d82d556450 'B' 'unsigned long'
There are two main reasons for this narrowing cast:
-
So we don’t forget relevant bitwidths for the OBTs
-
Better integer-overflow signal since the splat reports on the arithmetic and not on the implicit cast which may cause truncation.
To help explain (1) and (2), consider this arithmetic expression:
__nowrap char a;
int b; long c;
(a + b + c + 7);
Following normal C implicit promotion rules a+b
has the result type of ‘int’ which means any further arithmetic we wish to check for overflow is now checking at the incorrect bit boundary. And in this particular example, again following normal C promotion rules, the final arithmetic expression has a result type of ‘long’ so we would not even get reliable truncation reporting from the sanitizers.
To solve this we could carry bitwidth information within the OverflowBehaviorType itself and proceed with the usual promotion semantics. We could then just emit the overflow intrinsic that matches the “stored” bitwidth from and OBT field. To reduce initial complexity to myself and to reviewers, I’ve opted to just add narrowing casts with their own simple (yet maybe-not-so obvious) semantics.
Current problems
We ran into a case in the kernel that spawns -Wformat warnings. It looks something like this:
unsigned long long a;
size_t b;
printf("%llx", a + b);
The format specifier expects an ‘unsigned long long’ but due to an implicit narrowing cast the type is actually ‘unsigned long’. Note that ‘long’ and ‘long long’ have different bitwidths on 32bit vs 64bit. Currently I am just doing my builds with -Wno-format
but I see two realistic solutions for stuff like this (but I’m happy to hear about any new ideas):
-
change all the occurrences in the kernel to use new specifiers (gets weird with the aforementioned 32bit vs 64bit differences)
-
add some compiler workaround which adds a final implicit cast back up to the expected type.
We should be careful with (2) as adding too many compiler backdoors risks overcomplications. The aforementioned narrowing semantics is already “pushing it”.
TODO
- add a flag to enable OverflowBehaviorTypes
-foverflow-behavior-types
- expand on C++ compatibility
- _BitInt types?
- other overflow behaviors (e.g., saturation)
CCs (because you participated in v1)
@kees @mizvekov @Meinersbur @AaronBallman @jyknight @melver @efriedma-quic @ldionne @rnk @reinterpretcast @philnik @vitalybuka
Consensus to move forward with this RFC was called in this message.