Llvm_unreachable is widely misused

(I probably mis-categorized this, please feel free to move)

llvm_unreachable is being widely misused as an assertion, but in reality, its behavior is totally different depending on whether assertions are enabled. When asserts are enabled, it is an assertion. When asserts are disabled, it becomes an assumption, which is a serious departure from the typical behavior of assertions which is to merely get disabled. This misunderstanding traces back to its own comment as it says,

/// Use this instead of assert(0).  It conveys intent more clearly and
/// allows compilers to omit some unnecessary code.

I’m not denying that there are legitimate uses for llvm_unreachable. But they are a too small minority (see next paragraphs) for such blanket statements to be useful. That is then further expanded on in the coding guide section on asserts which promotes llvm_unreachable as “something much better” than assert(0) without qualification.

Most people writing assertions have the mental modal that they are enforcing something: code past the assertion will be able to assume that the asserted condition has already been enforced.

But __builtin_unreachable does the opposite: instead of enforcing, what it does is tell the compiler to assume that the condition has already been enforced. That may help the compiler generate better code (thus it’s an optimization) but at the cost of great unsafety if the assumption is ever defeated. That can only be the right trade-off of safety for performance in a small minority of performance-critical code.

Regarding the llvm_unreachable macro specifically as opposed to __builtin_unreachable, there is an additional concern: llvm_unreachable is a macro that expands to one thing or its opposite (“enforce” vs “assume previously enforced”) depending on whether asserts are enabled. I understand that can still make sense for some situations but the point is, that should be made explicit.

A quick code search shows llvm_unreachable being widely used across llvm and several dependent projects in a way that’s clearly not falling into that category. My own experience in code reviews is that reviewers will tell me to use llvm_unreachable where an assertion is meant. I’ve also seen this idiom propagate in all sorts of projects that ex-LLVM developers join e.g. this is where the pandemic reached mozilla (it was overturned fairly quickly).

I’m sure I don’t need to tell here how unsafe optimizations a compiler can do when it is given a __builtin_unreachable to work with. This Compiler Explorer playground shows for instance how a switch jump table drops its bounds-check, and how an unexpected if-branch is taken.

Potential action items:

  • Fix the comments and the coding guide to make it clear that llvm_unreachable is only useful when performance is critical and one is willing to trade safety for it, or something like that. Remove blanket statements presenting it as “like assert(0)”.
  • Consider renaming llvm_unreachable to something more explicit, possibly llvm_assume_unreachable.
  • If assert(0) is deemed unsatisfactory for some purpose, consider adding some new macro.
1 Like

I don’t get this. assert(0) is also something different in assert vs. release builds. llvm_unreachable is like that, just with even less chance you accidentally “fall through” the case you did not expect in a release build (which can easily lead to fun errors in the distance).

Renaming seems unnecessary but I guess everyone likes the bike-shed a different color (and I don’t care enough).

I did not see a reason in the post arguing that assert(0) is in any configuration superior to llvm_unreachable. Did I miss something?

1 Like

I feel like I’m stepping into a trap by commenting here, but I think historically and culturally, the fact that assert(X) doesn’t compile to __buitlin_assume(!X) in NDEBUG builds is considered to be a missing feature by the people who wrote the guidance that you referenced. In that light, unreachable is an upgrade over assert(0): it implements the missing feature. I think there’s a strong argument that LLVM could make different decisions on switch lookup table bounds checks, but folding away conditional branches is kind of the point here. The guidance was written by compiler engineers with the intent to provide the compiler as much information as possible so that it could optimize as much as possible.

It’s totally legitimate to question these values. This is just my attempt to capture the zeitgeist from an earlier period in LLVM’s history.

2 Likes

I don’t think I understand the concern here. If an assertion-enabled build asserts, then a not-assertion-enabled build will generally misbehave, sometimes in obvious ways (null pointer dereference) and sometimes in not-so-obvious ways (reading out of bounds memory). I’ve seen enough cases where code was silently miscompiled in non-assertion-enabled builds.

I don’t think the use of __builtin_unreachable here makes a material difference in this regard – but it is, I believe, important to avoid compiler warnings in certain cases. I don’t think any action should be taken here.

In non-debug mode, assert doesn’t get turned into ‘(expr) ? (void)0 : __unreachable()’ because compilers (often) lack a way of restricting 'expr’s evaluation to just compile-time components. It would be unfortunate if that evaluation called some expensive non-marked-constant function. C++ proposal http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1774r5.pdf documents such a language extension. (llvm already has such a feature).

As ever, such a feature allows exciting ‘backwards in time’ optimizations similar to ‘UB never happens’ optimizations.

1 Like

Let me try a more concise phrasing:

  • llvm_unreachable expands to __builtin_unreachable in non-asserts builds
  • __builtin_unreachable is a really dangerous thing to have in one’s code, unless one is really, really sure that it will never be reached.
  • Therefore, llvm_unreachable should be regarded as a dangerous, niche, specialized tool. It should not be promoted as an alternative to assert(0) to a general audience / without lots of qualification.
  • Any code search will show that the typical current use of llvm_unreachable is not sufficiently performance-critical to warrant such a trade-off of safety for performance. Whence “its is being misused”.

Assertions are meant to check invariants: things that cannot possibly occur unless the program is in an undefined state. This has always been the case with llvm_unreachable as well. Both describe assumptions that a function is making about the state of the world as a result of how it is called. Using __builtin_unreachable may eliminate some checks but without it there is no guarantee that the code will behave correctly (and there is a strong reason to assume that it won’t) if these assumptions are not true.

I think your compiler explorer example is fine here. If you removed the __builtin_unreachable, then you would get less efficient code in the cases where the API is used correctly and it would still be wrong in cases where the API is used incorrectly. With llvm_unreachable, you can try again with a debug build and you’ll be told explicitly what you did wrong.

2 Likes

My point is, APIs will be used incorrectly, code will have bugs, etc. This will always have negative consequences, but using __builtin_unreachable makes the consequences worse. I understand that this is a non-issue in an ideal world where we can say that people should write correct code and too bad for them if they don’t, but I don’t live in that world.

1 Like

I still haven’t seen an alternative in this thread. So far it reads: llvm_unreachable is bad, don’t use it.

Arguing llvm_unreachable sometimes makes errors go completely off the rails is reasonable. Arguing assert(0) would be any different does still not make sense to me. If you’d want to have something that always catches the error, build with assertions enabled (also in release mode). If you do, there is no user-facing difference between llvm_unreachable and assert(0).

Again, just using assert(0) will not help in any reasonable way. All you do is to replace some erroneously executed code with some other erroneously executed code in case the assumptions are violated.

FWIW, one way to ensure the result of release builds is not totally random is to replace __builtin_unreachable with __builtin_trap. At least you’d be failing hard when things go wrong.

1 Like

The default alternative is assert(0). (Other alternatives are conceivable too for some use cases, e.g. alternatives aborting even asserts-disabled builds. Which is what you propose here with __builtin_trap).

If you’d want to have something that always catches the error, build with assertions enabled (also in release mode).

It’s a fact that most software that is shipped to end users, is built with assertions disabled. We can’t change that, and there are good reasons for that anyway (certain assertions cause high overhead).

Again, just using assert(0) will not help in any reasonable way. All you do is to replace some erroneously executed code with some other erroneously executed code in case the assumptions are violated.

This is factually incorrect. The corrected version is: In an asserts-disabled build, replacing llvm_unreachable by assert(0) is replacing __builtin_unreachable by some erroneously executed code in case the assumptions are violated. The key is that __builtin_unreachable is not merely some erroneously executed code, it’s Undefined Behavior that the compiler is particularly trained to exploit in optimizations. Again, take a look at my Compiler Explorer. The bounds-check from the switch jumptable gets dropped. That means that if an attacker controls the int value that is being switch’d on, they can potentially jump to a nearly arbitrary address.

My point is, APIs will be used incorrectly, code will have bugs, etc.

Agreed.

This will always have negative consequences, but using __builtin_unreachable makes the consequences worse.

You have not seen shown any evidence to support this claim. Using __builtin_unreachable allows the compiler to make assumptions that the programmer is already making. There are three alternatives:

  • Do the same thing as assert(0): compile away the check entirely in release builds. You are now hitting a code path that assumes a condition that is false. Bad things will happen.
  • Compile it into a __builtin_trap, which requires the compiler insert a branch to a trapping instruction but allows it to make the same assumption later on. You will get bigger code and slightly slower code for correct API users. Incorrect uses will cause the program to crash before it enters an undefined state.
  • Compile it into __builtin_unreachable. You get no extra branches, the compiler is free to optimise assuming the code is never reached. You get faster and smaller code and it has no effect on correct callers of the APIs.

In the first and last cases, incorrect callers of the APIs will leave the program in an undefined state.

I understand that this is a non-issue in an ideal world where we can say that people should write correct code and too bad for them if they don’t, but I don’t live in that world.

Neither do it, which is why I recommend that people test against builds with assertions enabled.

Assertions (including llvm_unreachable) are not error checking. A lot of LLVM APIs provide error checking and will return error values if you use them incorrectly. Assertions are not a substitute for this, they are defence in depth that allow you to find cases during debugging where you’ve managed to slip past error checking and fix them before you ship something with a release build.

1 Like

It sounds like the OP may have inadvertently promoted a practical observation into a principled stand and if we reevaluate a bit, there may be more here to discuss.

I would totally buy and want to hear more if it is the case that the existing coding standards were having unintended consequences and we were using __llvm_unreachable (and asserts) for things in practice that they shouldn’t be. Anecdotally, I’ve run across numerous such uses that struck me as “that’s not really an invariant” while working on various parts of the project, and I have heard this critique verbatim by people who have chosen to not base their compiler on LLVM.

If that’s the discussion we are having, I would probably approach it differently: not focusing on one facet but evaluating how much of a problem we have due to misuse and seeing if there is a way to address it (through documentation clarification, fixes, review, tooling, etc). Not really how it was framed, but that is a discussion that I feel needs to happen, and I would welcome evidence about it (and not just anecdote - which is all I have).

The example you refer to is just that, an example. Some erroneously executed code is in this case more “unsecure” than some other. However, if you’d have checked assert(0) you would see the exact same code is generated for the first example: Compiler Explorer
The second example is translated to a switch but that is unrelated and heuristics could decide not to do that.
However, if you try __builtin_trap() you’ll get something different that will fail properly rather than executing “some code”.

For a simpler example, take this Compiler Explorer.
First, all but trap run into the UB, the “erroneously executed code”.
Second, we can now hide a security vulnerability there and see assert(0) can be the worst of the 3 options: Compiler Explorer
That said, it’s not that assert(0) is worse than __builtin_unreachable; again it’s just different code that is accidentally executed. If we put the vulnerability in the else branch this becomes obvious as now the __builtin_unreachable is the problematic one: Compiler Explorer
Again, note that trap always “does the sensible thing here”.

1 Like

My sense here is that there are two conflicting goals here:

  1. A desire to Increase speed
  2. A desire for bugs to become quickly evident

The people who are developing code almost always prefer 2, since bugs are likely to be common. People who are using code might prefer 1 or 2, depending on their level of paranoia or confidence in the code. As a user of LLVM I can often, but perhaps not always, make this choice for myself, by compiling with asserts enabled or disabled. As a project, we at least need to help users navigate these choices and understand the consequences of this choice, which isn’t necessarily clear. If we choose to allow uses to prioritize speed over ‘bugs becoming obvious’, then we probably want to select what level of unsafety would would like to allow, even if someone chooses to prefer speed over safety:

a) bugs may cause the compiler to crash with a stack trace (inconvenient, but easy to trace)
b) bugs may cause the compiler to crash without an error message (Inconvenient and harder to trace, but obvious)
c) bugs may cause the compiler to generate incorrect code (very inconvenient, hard to trace and perhaps insidiously unobvious)
d) bugs may cause the compiler to crash in a way that crashes the OS (Extraordinarily inconvenient, but at least obvious)
e) bugs may cause the compiler to generate incorrect code that does something undesirable “rm -rf” (Really Bad, at least obvious)
f) bugs may cause the compiler to expose information in my web browser in such a way that someone can steal my bank account. (Really Really Bad…)

Personally, my threshold is around a) and I always compile with asserts enabled (At least after I figured out that that wasn’t the default!) I think anything beyond ‘c’ is a really big mistake and something to be avoided. I hope we can agree that e or f should never be even a remote possibility, regardless of the potential speedup.

2 Likes

Anecdotally, I’ve run across numerous such uses that struck me as “that’s not really an invariant” while working on various parts of the project, and I have heard this critique verbatim by people who have chosen to not base their compiler on LLVM.

I agree, LLVM has a lot of assertions that are not invariants, and are actually reachable. If we ever got serious about fuzzing, we’d probably discover that it’s actually really hard to make broad assertions about the state of the program, and we’d have to restructure lots of code to do more error recovery. I think evangelizing fuzzing would do more to promote software safety than changing our documentation recommendations for llvm_unreachable.

6 Likes

Where would the line be between interface and implementation? As in, what happens to invariants that would hold true given a earlier check (e.g., closer to the “interface”).

I think such assertions, ones that confirm things checked earlier, should remain, and are a useful defense against changes to those earlier checks closer to the interface.

1 Like

I agree; however, I am not sure it is clear what we would count (for this purpose) as an “external” interface beyond command line options, environment variables, files, filesystems, interactive input, etc.

If we up our game on llvm.trap handling we can probably get similarly fast code with__builtin_trap as we have with __builtin_unreachable today. That would mean we have a) in debug mode mode and b) in release mode. While performance impact has to be actually verified (esp. with other compilers), I find b) in release mode w/o assertions a reasonable compromise.

2 Likes

I think evangelizing fuzzing would do more to promote software safety than changing our documentation recommendations for llvm_unreachable.

That sounds like an interesting idea to fuzz the compiler. I wonder if there is any ongoing work that has already started?