[RFC] Make asserts runtime togglable

One problem we are regularly running into is that most LLVM builds ship without asserts enabled, and that at least a few times each release we were seeing asserts triggering on code in wild. We previously tried to run with asserts enabled for our custom clang deployment here at Epic Games, and had to turn it off because we kept running into broken things upstream (like this one we reported).

It’s painful for us to maintain an assert-built-clang and a non-assert-built-clang in our build systems, so what I want to investigate is moving the asserts in LLVM to be controllable via an LLVM option.

Here’s my rough strawperson idea for this:

  • We’d add a new mode to the CMake assertion-enabled variable for RUNTIME.
  • We’d add a new llvm_assert that would default to C’s assert with asserts ON like before, a no-op with OFF and map to a new assert backed by a cl::opt-enablement when RUNTIME is specified.
  • We’d have to check all the cases where NDEBUG is used to add data structures for asserting too.
  • Ideally we’d want to be sure we could enable this option via the C API (I floated this idea to a bunch of gamedev friends and someone with a custom language that uses the C API really really really wanted to be able to use this too).

Assuming the above would acceptable - we could then look at potentially switching the default assert mode to be always-on (behind an option), once we could investigate the runtime and executable size costs. I want that step to be out-of-scope from the initial proposal, as there be dragons that don’t need to be fought quite yet!

3 Likes

Thanks for reporting assertions upstream. If they don’t reported, they probably won’t get fixed.

One reason that Release builds generally turn off assertions is compile-time performance; I’ve heard numbers as high as 10% for Release+Asserts.

I actually have a separate use-case for an LLVM-specific wrapper around assert() which is related to code coverage: We can make sure that every assertion does get checked by some test. This is much more feasible with llvm_assert() versus plain assert().

That said, the reason I didn’t pursue it was implementation churn. Replacing every assert() in the codebase with llvm_assert() would be a big chunk of work. But, if we have enough motivation, anything is possible.

Even having a runtime-enabled assert would mean setting NDEBUG I think.

We could just keep them as being assert for now, or bite the bullet and pay the cost for the extra structs. I’m not bothered either way - just wanted to call out that I had already thought about that side of the problem!

10% is very optimistic :slight_smile: The overhead tends to be more in the 30-40% range, see this comparison.

It’s worth noting that asserts also add 20% to the clang binary size. That’s a price you’d definitely pay, even if the asserts are behind an option (actually more, because you also need all those option checks…)

As far as publicly exported structures are concerned, such cases should already beusing LLVM_ENABLE_ABI_BREAKING_CHECKS rather than NDEBUG. At least in theory, I think in practice this separation is not well maintained.

But yes, more generally there is a lot of cases like helper functions that are only used by asserts being guarded by NDEBUG. But also many cases where expensive assertion code is guarded by NDEBUG (probably some of our most expensive assertions are not actually inside an assert call). So neither setting NDEBUG nor unsetting it would give you the right behavior.

While replacing assert → llvm_assert is easy, we currently have more than 1000 uses of #ifndef NDEBUG, and doing something appropriate for all of them is going to be a very substantial amount of work.

Is the right fix there for these to also move to being behind LLVM_ENABLE_ABI_BREAKING_CHECKS?

The following is from prior experience with analogous ideas in previous projects, nothing LLVM specific.

Runtime toggleable asserts tend to add a significant runtime cost even in the disabled state. You end up with a huge number of small conditional blocks. With an assert, you’re hopefully branching to a terminating call which helps, but with the conditional variant, every assert condition now has a path which continues into normal execution.

Another idea I’ve seen used successfully is to separate assertions (statically) into debug assertions and runtime assertions. The former are what we currently have. The later are a separate macro which (by contract) checks the assert condition in all builds. Most asserts stay asserts, but a few really important ones are selectively made into the runtime form. (Spelling wise, I’ve seen the name “guarantee(c);” for these.) Note that our use of conditional llvm_unreachable() approximates this.

I’ll note that you can also combine the ideas, and have a separate macro for runtime enableable assertions.

1 Like

Not really. The purpose of LLVM_ENABLE_ABI_BREAKING_CHECKS is to allow linking together objects built with and without assertions (as long as both have the same LLVM_ENABLE_ABI_BREAKING_CHECKS value), so anything that’s not required for that shouldn’t use the option.

To expand on this, we have the -DLLVM_UNREACHABLE_OPTIMIZE=OFF option to turn llvm_unreachable() into runtime traps. The overhead of this option is about 0.5%, so two orders of magnitude cheaper than full assertions, which makes it suitable for production use.

5 Likes

Ok I’ve left this (almost) a week and we’ve even been on the front pages of LLVM Weekly. The only push-back on this has been that the scope of this is huge, I’m not reading that anyone is fundamentally against this.

I’ll let someone senior of the project shout me down, but this all being the case I’m thinking I’ll try land an llvm_assert in a header as a standalone PR, have that just map to cassert’s assert, then piecemeal (trying to touch smaller areas at a time) move our assert’s over to this.

Might take a while and there’ll be some churn, but I think that is the easiest first step towards the goal?

1 Like

Which part do you find painful? Building two clang binaries, or being able to select whether to use asserts or non-asserts clang in particular compilations that use clang?

In google’s internal codebase, we build two versions of “clang” for every compiler release: one with asserts enabled, and one without. It’s then trivial to choose to run a build with the asserts-enabled clang, or even (via a wrapper script that can dispatch to one or the other), have only some files built with asserts-enabled clang.

I suspect we would not want to transition to a unified optional-asserts binary – my expectation is that the overhead on “normal” builds from having the asserts be runtime-conditional would be too high for that to be worth it.

Perhaps an alternative would be to implement support for similar “build two copies, and a dispatch script” upstream?

6 Likes

As far as I can tell the only feedback on this thread at all has been:

  • This is huge
  • Just updating the assert calls won’t necessarily do what you want
  • This will probably have too much overhead to be generally useful

So while there haven’t been strong objections there also isn’t a lot of evidence of support either.

My main concern with doing this is that we already have way too many build options, and adding yet another adds that much more testing burden, so I’d be hesitant to go forward with this without a little bit more motivation.

3 Likes

To add some specifics – some code is explicitly ifndef NDEBUG as it might compute some pre-requisites for asserts and there is no code at all for non-asserts build here. Wrapping this code into some runtime check might affect optimizations and increase the code size.

1 Like

Yeah I know - that’s why I called out NDEBUG in my initial post. I know its not free for either opts or binary size to have asserts on (even if runtime disabled).

Yeah I guess I could copy that approach.

I guess my fundamental thing when pitching this was we found clang was often broken on upstream code when we enabled asserts, and I suppose I naively thought we could get the additional cost of having asserts compiled-in but runtime-disabled to be low enough that we’d eventually be able to switch the default to it.

Unreal Engine is a pretty huge C++ codebase and our asserts (check’s and ensure’s) are enabled in a bunch of our critically performant Fortnite servers with almost no performance cost (it’s just a highly predictable branch if you do it right).

I hoped that if the above all could prove true we’d be able to give a tool more broadly to clang users (hey run with --enable-asserts!) if they hit issues.

But totally happy to just bail and not do this huge amount of work too if people aren’t keen! Just trying to default to ‘what can I do that would give the most benefit to the most users’ :slight_smile:

I’d say 30-40% performance overhead and >20% space overhead is pretty strong evidence against such a move. (Not that @nikic was strongly objecting, only that the data itself is a pretty strong rejection that the cost is manageable).

Furthermore, the NDEBUG macro isn’t the only issue with finding asserts. To me, the main issue is that we have all been pretty liberally spreading asserts all over the code “with the clear expectation that this will only show up in Asserts builds”. Complex macro structures, complex branching structure, function calls, side-effects…

The alternatives all bank on some form of runtime checks or build options that do not address the fundamental design of asserts in LLVM, and add their own build and run-time costs.

This illustrates my point above. LLVM asserts were mostly done as a debug time checker (very expensive, very liberal, very complex), while Unreal was mostly production time checks with very predictable branches.

Totally different “kinds of asserts” unfortunately.

I don’t think this is about “most users”, but about the assert design policy for the last 20 years that have spread throughout all LLVM projects.

Even if some of us may want this dynamic assert mechanism, converting the current asserts to dynamic would be a disaster.

There could be a parallel effort to create a new type of assert and start converting the “cheap ones” to them, but some would still say they’re too expensive and would still like them to be disabled on the build, so you end up with a new build type and all its combinations with the others.

I guess Unreal’s developer base is so much simpler than LLVM’s that some executive decisions are easier to do and make more sense. After all, LLVM isn’t a compiler, but a compilation infrastructure to build compilers, which the upstream Clang is one example of.

From the point of making Clang more robust, it would be beneficial to add an extra assertions build to our distribution, which could be used via e.g. a driver flag. Then we can ask users with large codebases like you to try it out and report crashes.

That said, we’re far from being in a short supply of opened bugs, so I hope no one will be too disappointed if their bugs will lay dormant for a long while. But we definitely can try to increase the ratio of users who report issues out of all people who experience them.

I’d like to thank you for bringing this up. I believe there is a room for improvement here, even if implementation is not the one you initially envisioned. Your efforts are not in vain.

Adding another data point: Chromium used to build its toolchain with assertions enabled. We valued the additional safety, determinism, and debuggability. However, users ultimately wanted 20% faster compiles, and it became untenable to say no after Hans looked at the data, so we ultimately disabled assertions.

Compilers are so deterministic, it feels more practical to build two packages, one with asserts, one without, and add build logic to switch between the packages if anyone ever wants to see if an assertion fired. You can also set up non-interactive continuous testing with the asserts build that nobody waits for.

5 Likes