RFC: C++ Buffer Hardening

Are there actual statistics for (reasonably modern) C++? Of course there have been plenty of CVEs with buffer overflows, but the vast majority of them seem to have been in C code. We’ve been starting to categorize bugs on our pretty large C++ code base a while ago, and have around 2.5% of crash/UB bugs due to out-of-bounds accesses. (We don’t separately count security issues.) Most of them have been in older code that is still closer to C than C++.

It seems totally legitimate to me to question whether a number of abstractions enabled by modern C++ aren’t already good enough. Do we have to completely forbid any kind of pointer arithmetic (and thus custom abstractions)? Does every memory access have to be bounds-checked (in production no less), even if it’s obvious from the code that everything is fine?

Of course you can prove formal correctness of a program. It’s certainly not feasible for every kind of software, but especially in the area of security-critical software there are a number of projects out there that went through the hoops. Of course there are things outside of the model (think side channel attacks), but they are certainly also out of scope for boundary checks.

Some issues are not so trivial as this:

  • Index vectors (i.e. vectors that contains indices to another vector) are problematic, because we’d need the compiler to realize that all integers in some arbitrarily large array are bounded.
  • Because char can alias everything, things like std::vector<char> (and I think also std::string) can make it hard to hoist boundary checks, because the character buffer can alias with everything, including, sadly, the fields of std::vector/std::string itself that store the size or end pointer. This might get tricky.

That’s not hardcoded into libstdc++ though, every user can decide for themselves whether to enable it. That’s different from what is being proposed here:

So a given binary version of libc++ will either always have the checks on or not. @ldionne went into some details later. I guess the difficulty arises from wanting to have everything boundary checked, versus libstdc++'s “best effort” assertions that don’t guarantee the standard library to be fully safe against overruns.

The main difficulty in fact is that some of the checks we want to implement require an ABI change. ABI-affecting properties need to be pinned down by vendors at configuration time because all the software in the stack needs to be built using the same ABI. Any check that is not ABI-affecting in our proposal will still be controllable by users regardless of how libc++ has been configured in the first place.

Ah, indeed some iterators will need to carry additional data to enable range checks.

Ok let me try to address a few points that so far have been missing:

  • We have a strong signal for ourselves that the approach is both useful for and desired by a large-ish chunk of internal developers. I’ll share what I can about that below.
  • Despite that, we are currently relatively early in the development. As good citizen, we’ve sent out the RFC early to gather feedback. As our work progresses we’ll definitely provide additional data and more precise specifications but for now we’re working out the details ourselves.

So we’re running into what sounds to me like an LLVM policy conflict: on one hand, we (as a community) encourage avoiding large downstream efforts and sunk cost fallacies, on the other hand we’re encouraged to have everything figured out before we begin(?) So even regardless of this effort, I want to arrive at a shared understanding of how we as a community intend to approach such situations. So below I lay down the practical situation we’re in, hoping that it’ll resolve the current concerns, as well as act as a model upon which we can resolve similar situations in the future.

Our (Positive) Experimental Data So Far

Basically, there’s been dozens of internal projects who have already been, for several years, voluntarily enforcing the proposed set of warnings upon themselves. We’ve been circulating an unofficial compiler patch that implements a basic version of the warnings we’re proposing, that they incorporated into their workflows. Instead of hardened libc++ they were using a custom class that acted like std::span with bounds checks. They didn’t have fixits so they needed to adopt spans manually. They’ve already found the performance tradeoff acceptable. Their sole purpose was improving security of their codebases.

Our goal with this project is to take the experience that we’ve gained from the prototype to inform the development of a warning that is available to all clang users and, in particular, improve the user experience beyond what we had in the basic prototype. Even though this tool is clearly not for everybody and we definitely don’t envision it being applied to all codebases, the large amount of interest we’ve observed so far is alone a good indication that there may be other parties all over the world interested in a similar solution.

So, while definitely experimental, our tool is much less experimental than a lot of other similarly-looking suggestions. Our preliminary investigation goes way deeper than just saying “Hmm these cppcoreguidelines sound cool, let’s build a clang tool to enforce them”.

Given that; And given the overall positive feedback we see in this discourse thread; I am optimistic that despite a significant effort-for-security tradeoff we still have relatively large audience. I also think that this amount of data is roughly as much as we can expect for any similar proposal; I don’t think it’s realistic to ask for much more data at this early stage of development. But we’ll definitely keep providing more and more data points as things progress.

Our (Positive) Relationship With Incremental Development

We’ve sent out this RFC early in the spirit of LLVM’s policy on incremental development, tring to avoid long-term downstream development branches.

So far we’ve seen both our company and other companies develop large features downstream and then only publish them when they’re finished enough for day-to-day use. However we believe that developing them openly from the start would have been a better thing to do in a lot of these cases. Not only it adheres to the policy, but it also saves effort on merge conflicts, it avoids wasted work when the discussion leads to change of direction, it keeps everyone informed to avoid duplication of effort, the benefits are numerous. We’ve observed that such open model works really well in other areas, such as my area – the clang static analyzer, so we want to do more of that.

Of course, LLVM-style incremental development on trunk is impossible without some initial preparation. There needs to be a staging area where features can live temporarily until they mature. There need to be separate criteria for accepting a feature in a staging area versus accepting a feature as stable. For example, the static analyzer has “alpha” checkers that are hidden from the user until they mature, and it has a somewhat specific checklist for “moving out of alpha”. In order to develop clang warnings incrementally, we’ll need a similar staging area in clang warnings. Say, treating clang-tidy as a staging area of clang warnings doesn’t sound quite right to me. Not only because clang-tidy is a lot more than that and it’s unfair to devalue its other strengths, but also because there’s a non-insignificant amount of effort necessary to move a warning from clang-tidy to clang proper. Incremental development is supposed to be composed of steps in the right direction, while putting a warning into a wrong tool is a step in an opposite direction. Whatever tool we ultimately choose, I think it shouldn’t be just because it’s “experimental”, we’d much rather be able to focus on the ultimate greater good when making this decision.

So we’re interested in trying to “marry” these two concepts – the usefulness of incremental development with the understandably high bar for compiler features. We’re interested in working out the requirements and criteria for initiating incremental development and managing the early-stage uncertainty. We’re interested in developing the necessary staging areas, if necessary, say we could do -Wexperimental-… or cc1-only warning flags. A staging mechanism would be highly beneficial beyond just this proposal and would likely encourage others in community to take a more incremental approach to their work as well.

3 Likes

We think the warning should be in clang because our experience with it internally indicates it will be a better user experience. We’re encouraging developers to never write unsafe code in the first place, right at their desk as diagnostics produced at any later stage of code integration (like code getting merged to a repository or CI testing) cause additional overhead. At the same time requiring all users to add an extra tool to their “write code, compile, repeat” workflow would regress the experience and impact the adoption.

Security guarantees are the main motivation for any prospective adopter project. Clang is a tool that all relevant code goes through. This makes verification of the security guarantees a simple matter of checking build settings of the project. Reliance on CI systems running external tools provides weaker security guarantees and this would also influence adoption.

For many projects the only feasible way to adopt the new warnings is having them emitted directly by clang. The burden of having to set up new infrastructure for clang-tidy can be prohibitive.

Our observations are based on adoption by internal projects but given the scale we believe it is a good signal for our expectations in general.

There are many codebases that simply don’t use (or significantly restrict use of) certain features because they are difficult to use correctly and can cause security or other problems.

Clang provides a lot of value to such projects by relevant off-by-default warnings, which are widely-used. For example:
https://clang.llvm.org/docs/DiagnosticsReference.html#wvla
https://clang.llvm.org/docs/DiagnosticsReference.html#wformat-nonliteral
https://clang.llvm.org/docs/DiagnosticsReference.html#walloca
https://clang.llvm.org/docs/DiagnosticsReference.html#wold-style-cast

We think our proposal fits the same bucket as these warnings and don’t quite see it as a language extension or a language variant.

That’s actually a really good description of the kinds of projects that we’re after. Modern C++ provides useful and relatively safe abstractions but a lot of old code simply doesn’t use them, so we’re trying to encourage them to do so.

That’s right, eliminating custom containers was never part of the plan. It’s fine to opt out of the warning for the container implementation code, either for entire .cpp file or with “unsafe” opt-out markers for small sections of code. Same with performance bottlenecks in general. We simply want to minimize the opt-out areas and have them stand out, clearly marked for audit. If the user isolates all unsafe operations in a well-encapsulated container class, instead of passing buffers as raw pointers all over the place, we see it as a definite success.

Yes libc++ has to be chosen somewhat globally due to ABI concerns. But warning usage can have arbitrary granularity, warnings can be applied incrementally, or optionally, to pieces of code as small or as large as necessary. Users are supposed to use their best judgement when they decide to secure their custom containers, we’re just providing a secure out-of-the-box solution that everybody has access to, and in many cases that solution is an easy drop-in replacement.

Agreed that there’s a tension between these two goals. I think the way it’s intended to be interpreted is that you have to convince the community that you can achieve reasonable end results despite not having everything figured out up front. Sometimes that’s easy (for example, when another implementation has done something similar) and other times that’s harder.

I think this is a good discussion for us to have a community, but in a separate thread from this one. (FWIW, I was planning to start such a discussion before your RFC anyway! I have concerns about some things in this space that I wanted to discuss in more detail.)

Thank you, that’s really good to know! Did you have projects that failed to adopt the new diagnostics and ended up turning them off? If so, do you have insights into what caused that? Also, do those self-selected projects span a wide variety of industries or are they largely in the same space? (Basically, are a significant number of the projects ones which naturally do not use a significant amount of array access and pointer arithmetic?)

If you run your patch over LLVM + Clang + clang-tools-extra + compiler-rt, how many diagnostics does it generate and what does it do to compile times?

-Walloca seems to be enabled < 100 times. (context:global -file:.*t… - Sourcegraph)
-Wold-style-cast looks like it’s enabled a fair amount, but if you look at the results, the vast majority of them are third-party headers disabling the warning locally or in comments, rather than folks enabling the warning. The actual usage seems to be in the < 100 realm as well. (context:global -file:.*t… - Sourcegraph)
-Wformat-nonliteral seems to be in the same situation where there’s a lot of code to disable the use, and not a lot of code enabling it. The actual usage seems to be in the < 100 realm also. (context:global -file:.*t… - Sourcegraph)

-Wvla is actually enabled a fair amount, though. (context:global -file:.*t… - Sourcegraph)

This is what I meant when I said we have significant evidence that off-by-default warnings do not carry their weight. From my recollection, most (all?) of the diagnostics you listed a pretty old and come from a time when we didn’t have as strong of a reaction to off-by-default warnings.

Don’t forget that those warnings are enabled by -Weverything, which is used by a decent number of projects. Indeed, if there weren’t a bunch of projects enabling them at the project level, it would be surprising to see so much code disabling them at the file level.

Of course, it’s not clear how many of the projects using -Weverything actually want those warnings in particular. Out of the source files using #pragma clang diagnostic ignored "-Wold-style-cast", for instance, some apply the pragma to the whole file and evidently don’t want the warning, but others only apply it to short section of code (e.g. a set of #includes or a macro invocation), suggesting they do want the warning for the rest of the code. Meanwhile, other projects apply -Wno-old-style-cast as a compiler flag. But plenty of projects don’t mention it at all, providing little evidence either way.

1 Like

On Friday, I met with @NoQ, @devincoughlin, and @jankorous to discuss the RFC over a conference call. We figured that would be a quicker way for us to come to a shared understanding of what was proposed and what we want to do moving forward. This is a brief summary of the discussion and how things have changed. (And btw, thank you for that meeting, it was very helpful!)

We spent a fair amount of time discussing points around user experience and the extension requirements for Clang. My primary concerns boiled down to this being a new programming model that effectively subsets the language and generates a significant amount of diagnostics for working, well-defined code. How often will users be frustrated by the amount of diagnostics or unable to address the diagnostics without disabling them, how often does addressing the warning introduce a new deficiency in the code, how many warnings will this generate for existing code, and that sort of thing.

The points raised that I found persuasive were:

  • The experiments done internally at Apple were self-selected projects of varying sizes and ages, so there was a reasonable mixture of projects interested in using the functionality, not just new projects. While the user community is not “everyone, in general”, it is still a significant user community of security-minded folks, especially for projects like parsers or other text manipulation applications. None of us are aware of any coding standards which prohibit use of pointer arithmetic. It would strengthen the case for adding this functionality to Clang if there was such a rule as it signals even more evidence of a user community, but it’s not a strict requirement that we find such a coding standard.

  • The experiments done showed that users were able to use the model successfully. One of the big concerns I had was that prior experience with wholesale rewriting of existing, working code can lead to bad outcomes (introducing more bugs than are fixed), such as described in N1969 — Updated Field Experience With Annex K — Bounds Checking Interfaces when Cisco attempted to rewrite a bunch of working code to use Annex K functions. The team at Apple mentioned that this is a valid concern. During their experiment, the situation they saw more frequently was that broken pointer arithmetic code was rewritten to use broken std::span code instead, so the defect moved around rather than got repaired or made worse. In their security team’s estimation, more code was repaired than harmed based on the results they were seeing internally.

  • We do not typically add new off-by-default warnings because users don’t enable them enough to justify them. We discussed -Wvla as an existing example of an off-by-default warning, and we eventually concluded that it is more relevant than it first appears. -Wvla was introduced to tell people about accidental use of VLAs because the syntactic space was reused with startlingly different semantic effects. In this case, the warning will be to tell people about purposeful use of pointer arithmetic. So the two situations are a bit different at first blush. However, people do use -Wvla to tell them about use of a language feature that they want to excise from the code base for security reasons; that’s the same situation here with pointer arithmetic. In that regard, they’re serving quite similar purposes.

  • This will be diagnosing pointer arithmetic, not pointer use. e.g., ptr + 1 and ptr + N will be diagnosed, but ptr + 0 (and by extension ptr) will not be diagnosed. I had mistakenly believed this would be warning about other direct pointer uses (as opposed to using smart pointers, references, etc). This reduces the amount of code I was envisioning would be diagnosed.

  • This proposal is not strongly tied to use of libc++. Yes, using libc++ will get you extra hardening, but switching from pointer arithmetic to alternatives like std::span also give the intended benefits even with libstdc++ or MSVC STL because the static analysis work being proposed as part of this is keyed off the use of STL interfaces.

That said, I do still have some lingering concerns that I hope can be addressed in some manner. Specifically:

  • I think we’re going to get surprisingly bad behavior out of -Weverything from this diagnostic. For some projects enabling that diagnostic, I would expect the number of warnings from this proposal will dwarf the other diagnostics generated and sometimes cause significant performance problems due to the number of diagnostics needing to be emitted. I don’t know that we’ve ever opted any diagnostics out of -Weverything before, and if we haven’t, I don’t think we should start. But if there’s something else we can do to ameliorate this situation somewhat, that would be a good thing. That said, if we can’t… oh well; people should not be using -Weverything for production uses.

  • This feature needs a design document for the community to truly evaluate the RFC. The 50,000ft overview we have here in the RFC is enough to give us an idea of the feature, but it’s not really giving a full picture. I feel like I have more of a full picture after our discussion, but that doesn’t help anyone who wasn’t on the call. Given that I’m seeing a much more plausible path forward for this part of the RFC than I was last week, I feel a bit more confident that asking you to write a more complete design document for the feature won’t be a waste of your time. I would appreciate it if you were to write up an .rst file that can serve as the basis for the feature’s documentation so that folks know what kind of command line options, pragmas, fix-it hints, code examples, static analysis checks, etc are being impacted.

All in all, I think the points covered during our discussion convinced me this proposal is reasonable. Having a more detailed specification for all the components of the proposal might raise additional concerns or ideas, but I no longer have concerns that this functionality cannot be added to Clang due to a fundamental issue with the basic design.

4 Likes

See AUTOSAR C++14 and any MISRA guidelines:

Rule M5-0-15 (required, implementation, automated)
Array indexing shall be the only form of pointer arithmetic.

I would say the safety-critical community (automotive, avionics, etc) would greatly benefit from this proposal.

1 Like

That’s a different rule because array indexing is also diagnosed by the safe buffers proposal.

Thank you for meeting with us on Friday - we found the discussion just as helpful!

I think we’re going to get surprisingly bad behavior out of -Weverything from this diagnostic. For some projects enabling that diagnostic, I would expect the number of warnings from this proposal will dwarf the other diagnostics generated and sometimes cause significant performance problems due to the number of diagnostics needing to be emitted. I don’t know that we’ve ever opted any diagnostics out of -Weverything before, and if we haven’t, I don’t think we should start. But if there’s something else we can do to ameliorate this situation somewhat, that would be a good thing. That said, if we can’t… oh well; people should not be using -Weverything for production uses.

Including the warning in -Weverything is problematic indeed. Our observation is that especially security-conscious projects tend to use it together with -Werror. We would like to avoid breaking projects if at all possible.

We will provide a more detailed design document.

I want to push back on this a little. -Weverything exists precisely because too many projects used -Wall for Clang to put all of its diagnostics in -Wall. The value of -Weverything is that it helps users discover every diagnostic that exists. If we hide some diagnostics from that group, it makes -Weverything less useful for its intended use case. It instead becomes something like -Wall -Wextra.

If someone out there is using -Werror -Weverything, I argue that they actually want their build to break when they get a new compiler revision so they can evaluate new warnings on a case by case basis.

What is most important, to me, is that users have a way to unblock compiler updates by opting into the old behavior. For new warnings, this is very straightforward: simply add -Wno-new-thing to the build.

9 Likes

Do you intend to mark pointer arithmetic standard library functions (e.g. std::next) as being unsafe, as well?

It would seem a bit silly if a diff like this “fixed” the warning envisioned by this proposal.

  int *x = ...;
- int *y = x + 456;
+ int *y = std::next(x, 456);
1 Like

The user manual specifically warns projects about the dangers of using -Weverything and declares that upgrading the compiler will be difficult if you do use it. If a project chooses to use it, that project also signs up for the accompanying difficulties. Avoiding such breaks make it harder for those who use -Weverything as intended.

Blockquote
Since :option:-Weverything enables every diagnostic, we generally don’t
recommend using it. -Wall -Wextra are a better choice for most projects.
Using :option:-Weverything means that updating your compiler is more difficult
because you’re exposed to experimental diagnostics which might be of lower
quality than the default ones. If you do use :option:-Weverything then we
advise that you address all new compiler diagnostics as they get added to Clang,
either by fixing everything they find or explicitly disabling that diagnostic
with its corresponding Wno- option.

We’ve been using -Weverything for more than 2 years, bump the compiler (trunk) regularly, and could not be any happier. We have discovered so many problems in our code thanks to this. Bumping the compiler is a no risk / effort operation. Warnings that are experimental/immature can simply be turned off via -Wno-. And since they are listed, we can visually see what’s disabled and come back to them in the future.

What’s most important is to have a process for handling false positives. There will always be false positives, no matter if you run a released version, trunk, enable -Weverything or -Wall. We are all human after all :slight_smile:

For example, GCC has lots of false positives in -Wall and -Wextra for warnings that are years old. That doesn’t mean we should disable -Wall and -Wextra.

Thank you all for feedback on the “let’s not include the waring in -Weverything” idea.
It sounds like our concern about breaking projects does not warrant the erosion of semantics it would cause.

1 Like

That’s a really good point which we’ve missed!
We should treat std::next (and other functions in the iterator header) as unsafe.

1 Like