Defining what happens when a bool isn’t 0 or 1

C and C++ specify that bool can only be true or false , which Clang translates to the numeric values 1 and 0, respectively. In most ABIs that LLVM supports, nevertheless, bool has 8-bit storage, which creates “opportunity” for 254 invalid states.

It’s not possible to assign a bool to one of these invalid states because Clang will convert the assigned value to either true or false . However, if you memcpy something on top of a bool , or if you cast some pointer into a pointer to bool , you may well find yourself with a bool that isn’t in a valid state.

Due to hard-dying habits of C and C++ developers, we have found that copying a value that isn’t 0 or 1 in a bool can lead to memory corruption. This happens because at all optimization levels except -O0, Clang adds range metadata to bool values indicating that they can only be 0 or 1.

For instance:

struct foo {
    int buf[4];
};

struct request {
    bool which_one;
};

void do_call(foo &f, const request &r) {
    if (r.which_one) {
        f.buf[2] = 100;
    } else {
        f.buf[1] = 100;
    }
}

This code may lead to memory corruption if request::which_one is not 0 or 1. This should only ever set buf[1] or buf[2] , but when which_one is in an invalid state, this function can also write to buf[3] , or out of bounds of buf entirely. See the third execution window for a case which sets f.buf[2] .

The same pattern may lead to the compromise of control flow integrity. Take this example:

class foo {
public:
    virtual void if_false();
    virtual void if_true();
    virtual void cannot_be_called();
};

struct request {
    bool which_one;
};

void do_call(foo &f, const request &r) {
    if (r.which_one) {
        f.if_true();
    } else {
        f.if_false();
    }
}

At -Oz targeting x86_64, you get the following:

do_call(foo&, request const&):
    movzx eax, byte ptr [rsi]
    mov rcx, qword ptr [rdi]
    jmp qword ptr [rcx + 8*rax]

Effectively, the compiler looked at which_one , declared it would only ever be 0 or 1, and decides to call f.vtable[r.which_one]() . If which_one had the illegal bit pattern “2”, this would call cannot_be_called .

One of the worst aspects of this behavior is that engineers who think of C as a high-level assembler don’t know how to fix this issue. In one instance that we reported, the engineer’s proposed fix was to add r.which_one = !!r.which_one , which does nothing. Confronted with that information, they next sought r.which_one = !!(int)r.which_one , which also does nothing. Curing a bool that has been “poisoned” with a value other than 0 or 1 is not possible robustly because the standard has no provision for fixing undefined behavior after it has already happened.

In March, the Exploiting Undefined Behavior in C/C++ Programs for Optimization: A Study on the Performance Impact paper was accepted for PLDI ’25. One of the core tenets of the paper is that several optimizations that exploit undefined behavior, in practice, increase security exposure without demonstrated performance benefits in real-world programs. Assuming that bool is always 0 or 1 is one of the specifically called-out behaviors. Our experience matches those findings: not much is gained by assuming that all bool values are 0 or 1.

-fno-strict-bool

Clang supports -fstrict-enum , which controls whether it can be assumed that all values of an enum type compare equal to at least one of its enumerators. We have implemented -fstrict-bool and its inverse -fno-strict-bool following the same model and we propose upstreaming this change.

In the current implementation, -fstrict-bool is enabled by default (whereas -fstrict-enum is disabled by default), except when compiling with either -mkernel or -fapple-kext , which are Apple-centric. Our implementation of -fno-strict-bool does not take a stance on how to interpret bool values that are neither 0 nor 1: the emergent behavior is that only the lowest bit is considered. This is correct when evaluated in the context of the C and C++ standards: if a bool is a value other than 0 or 1, the behavior is undefined, so no interpretation is off-limits. At this time, the purpose of -fno-strict-bool is only to remove the memory unsafety implications of a bool in an invalid state.

As we upstream -fstrict-bool , we are seeking input on both of these initial choices:

  • Should the default really be -fstrict-bool? Our experience is that the performance gain is negligible, but we only verified on the small part of the system where we chose to enable -fstrict-bool by default, and we are open to the idea that some platforms benefit from -fno-strict-bool more than others.
  • What is the desirable behavior for bool values that are in an invalid state? Users could find it more intuitive that all non-zero values are truthy, since that’s how it works for integers, but this is not the current implementation..

Please let us know what you think.

5 Likes

(Cc: @rjmccall, @ahatanak)

At least currently -fsanitize=undefined is able to detect this. Maybe you should turn non 0/1 into a trapping instead. I am not even sure what the semantics of -fno-strict-bool would mean. I am not sure your requested behavior of doing &1 is good semantics either. 0 or non-zero is definitely better idea. But so are 0/1/trapping.
So unlike no-strict enum, there is no obvious right answer. Maybe have -ftrapping-bool should be the default and then have -fbool-lowbit and -fbool-nonzero as seperate options too. If someone wants to turn to strict bool you get -fstrict-bool . This allows a few differnet options to the user and such.

trapping-bool probably has a larger codesize impact… and some kernel developers don’t like panicking.

We probably want to clarify how this interacts with calling convention rules. With the version of the patch I remember, you still get undefined behavior if you write some assembly code that gets the calling convention wrong, or cast a function pointer incorrectly.

I don’t really see any reason to avoid documenting the behavior. We have to choose some behavior (or else we’re back in the land of indeterminate behavior, which is exactly what you’re trying to avoid). And if we have to choose some behavior, we might as well tell the user what we’re choosing.

Making all non-zero values truthy is better for codegen on many targets for simple cases: for example, RISCV requires an extra instruction to branch on the low bit. I guess the tradeoff is that boolean and/or/xor gets more expensive.

-fsanitize=bool exists today for people who want to do that. The main reason I feel strongly that it cannot be the default is that you cannot test ahead of time whether reading a bool is going to trap.

I would like us to address that as well, but it expands the scope from a CFE-only change to a LLVM backend change. The front-end is responsible for the range metadata that it assigns to loads, but it’s not responsible for the interpretation of i1 argument values. For instance:

int foo(_Bool b) {
	return b * 5;
}

This compiles to a function that accepts an i1 and uses select, which is perfectly normal IR:

define noundef range(i32 0, 6) i32 @foo(i1 noundef zeroext %0) local_unnamed_addr #0 {
  %2 = select i1 %0, i32 5, i32 0
  ret i32 %2
}

As I understand it, our choices would be to either:

  • change that from accepting an i1 to accepting an i8, and narrow in the function body;
  • change backends that make assumptions about the value of i1 arguments.

I worry that the first option is an ABI break and I don’t know how to do the second one. I think it would be doing a disservice to the Clang community to gate the change on my learning of how backends work.

As a partial solution, I could add a diagnostic when casting a pointer to a function that takes a bool to a function that takes another type of integer.

For conciseness, let’s say a value is “boolean” if it’s in {0,1}.

The representation rules for arguments, return values, and memory are all technically independent. If loads of bools properly booleanize the value, arguments and return values can only end up a non-boolean value from (1) a compiler bug (since compilers are required to pass a boolean value by the psABI[1]) or (2) some kind of stack corruption, which is very likely to be exploitable or lead to UB regardless. So there’s at least a colorable argument that we don’t really need to define anything but the in-memory representation. I don’t know that I completely buy it, especially on targets like Apple’s arm64e that try to limit the UB-ness of memory corruption, but it’s there.

If we’re going to define a behavior, I agree that it’s best to make all non-zero values true.


  1. This is explicit in the x86_64 psABI. A lot of other psABIs overlook this, or they did the last time I checked, but we can broadly assume a similar rule. ↩︎

3 Likes

I think the case Eli refers to is this:

static int array[2];

void set_elem(bool index, int value) {
    array[index] = value;
}

typedef void (*func_ptr)(int index, int value);
void takes_func_ptr(func_ptr f);

void foo(void) {
    takes_func_ptr((func_ptr)set_elem);
}

It “mostly works” when provided “correct inputs” on many ABIs, but if takes_func_ptr calls f with a value that isn’t 0 or 1, set_elem may write out of bounds.

This is a related but somewhat different problem for the reasons I’ve already explained, but also because this isn’t unique to bool, whereas the part that -fstrict-bool addresses is. I think that x86_64 is safe for non-bool integers, but arm64 isn’t as the backend won’t bother masking w registers for integers smaller than int32_t, so int8_t and int16_t arguments have the same issue.

I for one, would look forward to decommissioning my trick of bool value being both true and false at the same time.

I would be open to hearing of folks taking advantage of this but unless someone provides a compelling case I would prefer to see see -fstrict-bool be the default. I think non-zero should be truthy, but I would like to see what others have to say here.

Any chance you might try to push some changes in WG14 or WG21 in this area? The UB Annex is coming along and we would love to have removal candidates eventually.

1 Like

Yes, that’s true, calling a function using the wrong function type is another way to see the problem. I can’t remember ever having seen a bug where somebody did that, but I suppose that could just be because calling conventions generally make it close to innocuous. At any rate, I’m not strongly opposed to being more conservative for arguments and results.

I think there is a problem with hasn’t been addressed at all yet: Libraries are allowed to assume that a bool is zero or one, and IMO are perfectly justified in that assumption. If we allow bools to be anything other than zero or one, would

int test(bool b) {
  int arr[2] = {2, 4};
  return arr[b];
}

now potentially be an out-of-bounds inside test?
If not, I think clang wouldn’t just have to not emit a range constraint, but actually emit additional code to make sure b is indeed zero or one in this case.

(Do you mean -fno-strict-bool? -fstrict-bool is the status quo. “Strict” implies the developer is rigorous about it; it’s using the same naming scheme as -fstrict-enum.)

To be clear, this proposal doesn’t allow you to create a bool that is any value other than 0 or 1. We can’t blow up ABI expectations like that. This proposal is about what happens when you use a bool created in an irregular way such that the storage contains a value other than 0 or 1 (for instance, if you memcpy the value 2 into a bool). In this case, the behavior is undefined, and currently, it can lead to memory corruption, including in the case of your example, for more or less the reasons you cite.

Ah, sorry, I didn’t read careful enough. I didn’t catch that you want to only check the lowest bit. That resolves my concern.

In general, I think having a flag to control this behavior makes sense.

In terms of the default, my intuition is that -fstrict-bool is a reasonable default because it’s the status quo that users are familiar with. However, if performance measurements find it’s a wash, I think -fno-strict-bool is better security posture.

In terms of behavior for bool values in an invalid state: I think it makes sense to accept with the same semantics you’d get from integers. Users who want to know about the invalid state can discover that through other means like -fsanitize=undefined.

I’m surprised this was measured to have no performance impact. From poking around with a couple examples, this seems like it would have some. E.g. taking this code,

void bool2int(bool *b, int *out) { *out = *b; }
void copybool(bool *b) { *b = *b; }
void invert(bool *b) { *b = !*b; }

And testing via clang test.c -std=c2x -S -o - -O2 -Xclang -disable-llvm-optzns -emit-llvm | sed 's/, !range ![0-9]*//' | opt -S -O2 | llc -o - to emulate the effect of the new option, I see lots of extra code to ensure the output values are only ever 0 or 1:

  • bool2int adds an extra andl $1, %eax
  • copybool was previously a no-op (no instructions!), but now emits andb $1, (%rdi).
  • invert was just 1 instruction xorb $1, (%rdi), and now is 4, movzbl (%rdi), %eax; notb %al; andb $1, %al; movb %al, (%rdi).

I expect that all modern CPUs will blast through and instructions almost like they weren’t there. For the case of invert, it’s 2 instructions instead of 1 on arm64, which is a smaller diff than on x86.

But even with that said, in our workloads, the common case is testing bools, not copying them. In this function:

int select(bool *cond, int a, int b) {
	return *cond ? a : b;
}

the difference is only whether the backend generates a compare with 0 or a bit test for bit 1.

It also goes without saying that the code is identical when creating a bool out of an actual condition (like *cond = a == 150).

In my examples that have memory corruption when the bool isn’t 0 or 1, the difference is also just one and instruction. We find that it’s an acceptable price to pay.

We’re being bitten by this issue with increasing frequency on Sony targets. We deal with many pieces of 3rd party code that we can’t update, and in some circumstances the code contains bugs such as non-zero-or-one Booleans. As clang/LLVM optimises more over time, more transformations are based on the range metadata for Booleans, and our code ends up with more exposure to faults in 3rd party code. We’ve hit at least one scenario where control-flow is compromised by these kinds of bugs, via switches that get turned into jump tables.

We would use -fno-strict-bool immediately if it becomes available. I feel that non-zero values being truthy is the most likely interpretation to meet everyones needs; it’s the most popular in a quick straw-poll of colleagues.

No opinion on whether it should be the default: there are more (and less) annoying candidates for undefined behaviour that people run into.

3 Likes

Your observation of one and instruction seems to be based on semantics that base the “trueness” upon the value of the low bit, which appears (from this thread) to not be the preferred semantics.

The preferred semantics (“trueness” == non-zero) generates 2 extra instructions in some cases: Compiler Explorer

unsigned long f(_Bool b) {
  return b ? 16 : 0;
}
unsigned long g(unsigned long b) {
  return b ? 16 : 0;
}

What workloads (with which semantics) did you use for your analysis?

Apologies, I meant -fno-strict-bool.

1 Like

It certainly does assume that; I’m going to call this the “low bit semantics” going forward. While that behavior is up for debate, it is one of the options, and it’s what’s implemented in the clang that we released with Xcode 26.

The just-announced releases of XNU and associated kernel extensions all use -fno-strict-bool with “low bit” semantics. We have found that there is no measurable performance impact in kernel workloads as a result of that change. We have not tried to apply the setting to our entire build.

Similarly to my previous response, though, I want to note that there’s several ways your example may not be fully representative:

  • bool arguments are not impacted by this change, at least at this time, so that exact program compiles identically with and without -fno-strict-bool.
  • The choice of constants (0 or 16) is really convenient when you can assume 0 or 1 already, since it’s just (int)b << 4 in that case. g is smaller in all cases if you change 16 for 17, except on x86 where f and g become identical. (Of course, smaller might not mean faster, but I don’t have any great way to test that for any of the architectures you built for.)

The argument is that we don’t find that the occasionally slightly better code makes a performance difference in our (kernel) code bases, whereas memory corruption bugs caused by this assumption always make a difference where they show up.

1 Like