RFC: C++ Buffer Hardening

I don’t believe that’s written properly. You dereference it before checking for end.

And also it’s not vectorized - see the first output.

Hi, I recently became aware of this project and I hope this an appropriate place to post this. This is particularly addressed to those implementing the “adoption tooling”/“fixits”. From the description, this tooling seems to have some commonality with an open source tool that I’ve worked on that aims to automatically convert C code to a safe subset of C++: GitHub - duneroadrunner/SaferCPlusPlus-AutoTranslation2: auto-conversion of C source files to a memory-safe subset of C++

I don’t know how far along the implementation of your adoption tool is, but I can tell you a bit about my experience. Ideally the autoconversion tool would be so reliable that it could even be used as just a step in the build process, so that a safe(r) executable could be built from any code (at least theoretically). But that is of course easier said than done. The two main challenges I encountered when converting 3rd party code were reliably determining whether a pointer is being used as an iterator, and dealing with “non-trivial” usage of macros. I have not yet fully dealt with the latter, but the former has been largely addressed I think. It involves checking for iterator operations (pointer arithmetic) on the pointer itself of course. But also “transitive inheritance” of iterator status. I.e. Whether a pointer has been assigned the value of another pointer which has been determined to be an iterator. Or whether it has been compared to an iterator pointer, etc. And don’t forget the case where if one branch of a conditional operator is a pointer iterator, then the other one probably is too :slight_smile: Unfortunately my code is not well documented yet, but if anyone wants to have a look, some relevant part is here: https://github.com/duneroadrunner/scpptool/blob/9c4d361e5ed6b636334174bbe1bc9750958a9e91/src/converter_mode1.h#L1979-L1997

Btw, is there some venue where outsiders can keep up-to-date on this project?

Given that the functionality is supposed to be built as clang warnings, you can always do a

#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wunsafe-buffer-usage"
// pointer arithmetic
#pragma clang diagnostic pop

Please be mindful that these pragmas are not portable, they are relatively ugly and difficult to maintain. To get it to work in portable code, you probably need twice as much complexity.

1 Like

Branch prediction, including indirect load prediction is usually primed through very few iterations

Branch predictors are fantastic on recent large cores (e.g., Intel, AMD, Apple). And the direct cost of a well predicted branch can be small… but there are side-effects: it can prevent other optimizations (e.g., autovectorization) and adding a (predictable) branch to existing code can increase branch mispredictions.

1 Like

and to add insult to injury making them less ugly usually (is there even another way?) means macros, which do not mesh well with modules.

also this warning currently fires at the one place where (afaict) there is no getting around it. int main(int argc, char const** argv)

1 Like

This example is exemplary of why the problem is hard–because you are applying it to every element
access. What is needed is a way to look at the access from “higher up”.

Take loops for example:: {Because this is where programs spend their time}

Anytime you have a loop that starts at (or after) vector[0] and ends at (or before) vector[max]
you do not need any checks !! the loop control has already performed them. So, whatever
security enhancing mechanics are applied, if you can recognize the loop and the boundaries
of the accesses in the loop, you are in a position to need no checks in the code in the loop.

After getting to the point where you recognize “safe” looping on one or more vectors, the
miscellaneous checked accesses (out of the loops) should not be “that” much of a performance
degradation.

All the “unsafe” buffer warnings are enabled for C with -Weverything, which is a complete waste of time because there is no alternative. It’s also extremely unprincipled:

int foo(int *p) { return *p; } // does not warn (but p could be one past the end)
int bar(int *p) { return *(p + 1); } // warns (as does p[1])

I disagree with their presence in C++, but at least one can avoid them by using std::array and friends. However, in C, pointers and arrays (if you even think of them as distinct) are all you get, and so enabling the warning effectively means you’re banning arrays. Unlike (almost?) every other warning, there isn’t even a way I can tell to silence it, like how you can add attributes to unused identifiers, cast function return values to void, add redundant parentheses, add explicit casts, and so on, for the myriad of other warnings enabled by -Weverything.

Whilst I appreciate that you shouldn’t be enabling -Weverything on real-world code, it is useful to enable on snippets, such as on godbolt.org, to check you’ve not done anything stupid. But now if you do that it spams you with warnings about using arrays, which is entirely unhelpful.

-Wno-unsafe-buffer-usage? This is kind-of what you are subscribing to when using -Weverything.

Sure, but that’s not silencing instances of the warning, that’s turning the warning off. Which makes me question why it’s even possible to enable it in C when the language does not give you a safe alternative.

1 Like

The reason why we didn’t prevent the warning to be enabled for C is that the general idea of replacing pointer arithmetic with safe constructs is language agnostic.

Motivated C projects might want to create their own safe APIs, denote them as safe via the opt-out pragma (rG829bcb06ec43) and use the warning to drive migration from pointer arithmetic to the safe APIs.

Now that -fbounds-safety will provide a better solution for C the above option becomes less interesting.
https://llvm.swoogo.com/2023eurollvm/session/1414468/keynote-“-fbounds-safety”-enforcing-bounds-safety-for-production-c-code

Note: we have recently posted a separate updated RFC related to the Hardened C++ Standard Library section. Looking forward to your feedback!

2 Likes

What a great idea, avoiding buffer overruns is a very good idea.
Lots of informative replies too.

Re the post article “For example, accessing a std::span or a std::vector outside of its bounds would abort the program”

– if this is calling std::terminate() or abort() that makes it impossible to use of safety critical software (software that can’t crash). Why not default to throwing a C++ exception? std::logic_error() etc, that would give the application chance to handle the unexpected issue.

Personally I really appreciate my own compile time code which prevents buffer overruns, checks for null pointers, and other such things to validate function parameters. This all works at compile time, not at runtime!

It is still a precondition violation, so a recoverable exception does not make sense. If any software is hitting this, it is buggy.

Converting such a bug into a definite (controlled!) crash, vs arbitrary unknown undefined behavior (such as stomping over random unrelated memory) does not make you worse off, even in safety critical software.

James, Thank you for you reply. A lot of software engineers would agree, assert() and other runtime checks will raise a SIGABRT which core dumps with a backtrace for a developer to check. The problem is that in a safety critical system, say automotive, aviation, a runtime throw logic_error would at least let the software log and carry on running when there is a range error. But I would agree, any use-after-free is very hard to detect, and those will often still crash. At least ranges, and nullptr checks are easy to do. Bjarne was talking about C++ profiles which cover these ideas too.

Not necessarily. I would argue that in the safety-critical domain you should have full control of what assert() does and when it is enabled (e.g. Release), by creating your own assert macro and assertFail function that does what you expect to do. All that can be done without exceptions, which are banned in these domains due the usage of dynamic memory.

Maybe you can share an example of what kind of macro you are thinking of? Some projects like Gnome make extensive use of macros, that I am personally not fond of. I’ve seen some code like assert_check(cond, action) that then do some operation, although it makes the code a bit messy. I like macros to have the same code pathway in Debug or Release, the only addition to the Debug build might be an additional log call, or a debugbreak(), stepping over it, would carry on as normal. It’s good as you say to ban C++ exceptions, mark as nothrow, to avoid dynamic memory allocation on a safety critical system. Although a C++ exception for something simple may be fine… all depends on the use case.

I’m thinking of the same macro as the standard assert macro:

https://sourceware.org/git/?p=glibc.git;a=blob;f=assert/assert.h;h=62670e4bbbb39bdc522c6fe2ad4ef6a67d3c5cad;hb=refs/heads/master#l99

But creating your own instead. Then you are in full control of whether it’s enabled in Debug/Release, and what exactly your own __assert_fail function does.

AFAIK there is no other way to implement assert without macros, because you need to be able to get things like __FILE__, __LINE__, etc at the right location. But maybe in C++20 it’s possible without a macro.

Hi Carlos, that __assert_fail() is the case that is handling failures (so it’s after the condition check, it has #expr so the condition is formatted as a string); I take your point, a macro that checks condition and has an application defined action. It’s what I have made, and projects like Gnome seem to use.

For C++ can use a macro again, but pass a filesystem object as the last argument of the a macro, gives the full backtrace then.

std::source_location loc = std::source_location::current()

This only works for toy examples where all code is visible and inlined. It’s neither practical nor robust for large-scale systems.

Why do you feel that?
It’s exactly the way to make a safety critical system robust, secure. No need for all code or data to be visible.