RFC: C++ Buffer Hardening

This looks like a great idea.

One thing I always missed from C++ standard library versus the classical frameworks delivered with C++ compilers (Turbo Vision, OWL, VCL, MFC, PowerPlant, AppToolbox, Tools.h++,…) was their better defaults for writing safe code in C++.

Somehow this got lost in the standard library, making it compiler specific to track down all the compiler flags that bring back that safety, even in release builds.

Thousands of applications have been shipped with those security knobs turned on, and the performance issues they might have suffered from where in other code paths, like wrongly chosen algorithms and data structures.

So I welcome the possibility of having C++ Buffer Hardening available.

It’s difficult to say that pre-condition checking and out-of-bounds checking are unreasonable concepts but IMO these types of bugs are really scarce so I find the net benefits questionable.

In industries where these types of bugs are unacceptable, as mentioned above in one of the comments (aerospace, automotive and alike), processes there will go loooong way to make sure that these types of situations never happen. This is done through extensive testing so there’s probably a very narrow amount of situations left where this still can happen but the question is what to do about it. Aborting is usually not an option. Registering a handler is not practical for the same reason why catching the exception and gracefully handling it in some other reasonable way is also not possible. If you’re lucky you can log the error and that’s pretty much it. Fantasizing about repeating XYZ operation when an unexpected error comes out is really far-fetched and which is why I don’t find this machinery in these types of use-cases will be as particularly useful as implied.

The benefit I do see is that now I will be able to run the test suite against both hardened and non-hardened library variants so that I can now reveal more bugs prior to deploying the code to production. The question from my intro paragraph however still remains: how much more?

4 Likes

It has also been tried - Microsoft had “secure STL” levels. And people hated it. For example didn’t use iterators in vectors, instead realying on accessing data .

It was only enabled by default in debug mode and really helped to find some bugs early on. In the release build, this functionality is disabled (_ITERATOR_DEBUG_LEVEL == 0)

Are there any plans to continue the _FORTIFY_SOURCE story? GCC is getting better:

That’s a C library feature. I think the closest analog of libc++'s debug mode is libstdc++'s debug mode – and possibly the concept checks?

I know that it is a C feature. But there are gaps in the STL and I have to use some C features.

I meant performance wise behaviour change.

bounds checking - I ran the test cases locally, both as written and again with a much larger test vecto

The bigger size obviously will have effects on better branch prediction. Small sizes are important.
Since you mentioned WebKit - here is a speedup from Chromium’s base utf conversions that removed bounds checks: https://source.chromium.org/chromium/chromium/src/+/63dbcdf2bfd553bc91524ec0a77dfd32a4d4a427

I’d suggest to agree that there are reasonable cases where performance cost is not trivial

Can you clarify what you mean by assertion failures? Is this assert or something else? I had a patch a while ago that used __builtin_trap in a bunch of places and it made the code size only slightly larger and made a few things faster because the compiler could assume that the invalid condition hadn’t happened in later code.

In snmalloc, we have a memcpy implementation that, if the pointers given to it are heap pointers (optionally checking only the destination), looks up the bounds of the allocation and checks for overflow. We did some work on the code that reports the failures to minimise the branches on the hot path so that the code that reports the human-friendly error message does not disrupt the fast path (we also have a small subset of std::format implemented, which uses an on-stack buffer so that we can print formatted output even if the program’s heap is in an undefined state, which was a helpful building block for this).

Android is strongly supportive of this proposal. Chrome has been putting energy into this by adding hardening asserts to libc++ and enabling existing hardening flags in Chrome.

I always #define _ITERATOR_DEBUG_LEVEL 1 on release builds.

It was never an issue on my domain, most serious performance problems were related to bad datastructures or algorithms.

The short answer is no, there is no good way of shipping multiple versions of the library and have users select which one to use in the scope of this proposal. However, I do have a draft proposal already written that explains how to do that, what the tradeoffs would be, etc. What I’ve used in that proposal is called -fsanitize=library, and then you have a bunch of subchecks like -fsanitize=library-iterator-ownership, etc. You end up needing to ship several versions of libc++.dylib, each with a different ABI, and things get a bit ugly. This may very well happen in the future since this is a simple extension to our current proposal, however for now we are scoping this proposal as described to avoid having to consider a bunch of additional tradeoffs and complexity.

Libc++ implements the C++ Standard, and that is a crucial point that we do not plan on changing. None of these additional checks would change the API (either compile-time or even runtime like the guaranteed complexity of some algorithms). The feature we are proposing is basically something that folks should never notice, except if they would otherwise run into undefined behavior.

As such, we would not be adopting anything from gsl::span that makes it different from std::span in an observable way. That would go against the spirit of libc++.

There’s plenty of folks wanting to have their C++ code abort instead of causing a security issue when they make a bounds calculation mistake. And just to be clear, this would totally not be enabled by default. It would be up to vendors to decide whether they want to enable most runtime checks in the library that they ship, and I would expect most vendors not to enable these checks, except perhaps the cheapest ones. What we are saying is that we will provide a coherent framework for enabling those checks for those that want it.

The current assertion handler design is ⚙ D121478 [libc++] Add a lightweight overridable assertion handler.
I also experimented with compile-time customization in D121123, but it was not very ergonomic. There are some challenges with compile-time customization like the fact that you need all translation units to have a declaration of the handler available. I could revisit that and try to see if a compile-time customization could be possible, but it would definitely be less ergonomic than a link-time override. For completeness, I also experimented with a purely runtime customization in D119969 (which is what we have in the current broken Debug mode) here, but it seemed like the wrong mechanism.

This is the implementation: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__assert#L44-L57. This is the implementation of the default assertion handler: https://github.com/llvm/llvm-project/blob/main/libcxx/include/__verbose_abort#L32-L36. We welcome any suggestions to improve how this is implemented.

In a big company this will likely be done by some compiler team that is going to make a decision for everyone. And then it’s going to be sabotaged by actual developers doing work.

On a more practical side - will the standard library itself opt out of checks?
Otherwise you are likely see a lot of custom implementations of sort going to produciton.

If your compiler team makes a decision that you disagree with w.r.t. adding checks, you should take it up to them.

The standard library (its headers and the dylib) will use whatever checks they are configured with, which depends on that compiler team. Checks that don’t impact the ABI can also be controlled by developers (regardless of what the compiler team decided) on a per-TU basis.

I’m strongly in favor of hardening libc++. As a long-time MSVC user, I’ve really enjoyed using their hardening in debug mode to catch significant problems early (so much so that I’ve been trying to push my employer to add a community build bot that does an MSVC debug build so we get post-commit CI testing with the extra checking). Having similar functionality in libc++ is great! The goal of being able to enable this hardening in optimized builds is even better!

To that end, do you have performance goals in mind for how to measure success for enabling in non-debug builds?

Do you have a policy in mind for which features in libc++ qualify for this hardening? Also, how do you intend to convey information to users about what is and isn’t hardened and in what ways (are you going to update documentation somewhere)?

Thank you for this detail! I think that’s going to save quite a few headaches from not having a stable ABI for the checked versions.

I’m uncomfortable with this approach without further demonstration that it’s viable in terms of finding true positive issues vs swamping the user with diagnostics. Will this diagnostic be triggered by code in system headers? Have you tried this approach out yet on real world code bases – how many diagnostics do you get? And you mentioned this is potentially useful for C – are you intending to enable the diagnostic for C as well as C++? (How do C users eliminate pointer arithmetic to silence the warnings?)

I’m especially wary of introducing anything like the C++ Core Guideline rules into the compiler proper. The C++ Core Guidelines are pretty reasonable for folks starting a new project from scratch, but effectively zero effort has been put into their enforcement recommendations and my experience with trying to support the coding standard in clang-tidy is that it’s not worth the effort (we frequently go back to the C++ Core Guidelines authors with questions and the responses are usually along the lines of “please tell us what enforcement should look like” which is far too much effort for reviewers to have to put into a review).

It almost certainly will be outside of -Wall unless the implementation is drastically different from what I’m imagining in terms of diagnostic behavior. However, we have historically had quite a bit of evidence that off-by-default warnings are a lot of effort for very little gain because the vast majority of users will never enable them. Typically, we ask “what can be done to enable this diagnostic by default?” and I wonder the same thing here.

This also makes me a little uncomfortable; our typical policy for fix-its is that they have to actually fix the code, not break it differently. And in this case, it may take working code and break it rather than taking broken code and breaking it differently, which is a bit worse. This matters because fix-it hints can be automatically applied when they’re associated with a warning and there’s only one hint for that line of code – so it would be very easy for a user to do significant damage to their code base unless we attach these fix-its to a note. But if we attach them to a note, we now potentially double the number of diagnostics we emit for something I’m already worried is going to be too chatty.

Despite my reservations, I’m not opposed to the safe buffers part of the proposal at this point, just concerned about the value given the expected chattiness on real world code.

3 Likes

Not yet, and I think it will be quite challenging because it will definitely depend on each codebase’s tolerance for performance regressions. I can imagine some codebases where even a 0.5% regression would not be acceptable (so they’d likely never enable this), but also other codebases where a 20% regression would be acceptable, if the carrot is getting closer to a memory safe language. That’s one of the reasons we intentionally leave the burden on vendors (or toolchain teams at large companies, or name-your-equivalent) to decide how (and whether) to roll this out.

So far, the only semi-formal criteria I’ve been able to come up with is that no implementation of a check should require blocking thread synchronization (basically taking a global lock). Apart from that, we also won’t do anything that would change the complexity of algorithms, but that’s a byproduct of conforming to the Standard in most cases.

IMO, anywhere in the C++ Standard Library where the behavior is undefined is a place where we could have a check. In practice, I expect that we’ll need to carefully consider each proposed check individually. And perhaps we’ll begin to develop more formal criteria as we gain experience with this.

If someone has a check in mind that they’d like to add, I’d encourage first reaching out on the libc++ Discord channel to float the idea around to see if it could be viable – at least until we’re able to formulate more formal guidelines.

1 Like

Where are you getting “really scarce”?

There have been innumerable exploits from out of bounds accesses over the years, and those are just the ones we know about as they were used in actual attacks rather than misc crashes/incorrect behaviour potentially long after the error.

The environments where termination isn’t considered acceptable, the alternative of “keep going with corrupted memory” is even less acceptable. We have a lot of evidence to show that “zero bugs” is not feasible (and even safe languages like rust, etc will crash on bounds errors).

In terms of testing: there are lots of projects that already do perform all their testing and fuzzing, etc in debug builds, and those already perform runtime bounds checks today.

2 Likes

If you work at a company sufficiently large to have a compiler team, and that team turns on bounds checking for libc++ I would expect removal of those bounds by a developer to be caught during code review. If it was landed anyway I would expect that such decisions would be caught by that company’s security team[s]. I suspect that if it did make it all the way to production and was subsequently used in an exploit chain there would be some unhappy people in BigCorp.

But more seriously, there is nothing we can do to stop a sufficiently determined developer from writing unsafe code, but we can make the default behaviour safe for everyone else.

1 Like

I said I tried large and small. Branch prediction, including indirect load prediction is usually primed through very few iterations - a standard entry level predictor isn’t hugely slower than top end and iirc it’s 3 branches to being primed. More over if your work set is small enough that priming the branch predictors all the other predictors of the loop would also not be primed, which would make the loading of the values in the buffer far more expensive than the branch.

As I said I also tested large vs. small work sets, I included the larger one as that got the timing up to a level where any difference in performance should have been visible

What? that change is not “removing bounds checks”. They added fast paths for ascii, and they got rid of vector::push_back to grow the result buffer. Those are incredibly significant performance changes on their own, and I’m surprised that they didn’t have them a decade ago. (now to see if webkit has them…)

Since “they” are “me” - I probably know what I am talking about. A big part of that was replacing “reserve” + “push_back” with “resize”

I am not sure what you are saying. Are you saying that quick bench result is a fiction?

Ok - here is another example: boundary checks preventing vectorisation: Compiler Explorer

Thank you for mentioning this - I was not aware of the existence.

Looking at “Buffer Boundary Violation [HCB] / 6.8.2 Guidance to language users” and " 6.9 Unchecked Array Indexing [XYZ] / 6.9.2 Guidance to language users" I would say our plans are very much aligned with the advice.