clang and -D_FORTIFY_SOURCE=1

Hi folks (CCing llvm-dev, but that's probably more of a cfe-dev topic),

As a follow-up to that old thread about -D_FORTIFY_SOURCE=n

    http://lists.llvm.org/pipermail/cfe-dev/2015-November/045845.html

And, more recently, to this fedora thread where clang/llvm -D_FORTIFY_SOURCE
support is claimed to be only partial:

    https://pagure.io/fesco/issue/2020

I dig into the glibc headers in order to have a better understanding of what's
going on, and wrote my notes here:

    https://sergesanspaille.fedorapeople.org/fortify_source_requirements.rst

TL;DR: clang does provide a similar compile-time checking as gcc, but no runtime
checking. To assert that I wrote a small test suite:

    https://github.com/serge-sans-paille/fortify-test-suite

And indeed, clang doesn't pass it, mostly because it turns call to
__builtin__(.*)_chk into calls to __builtin__\1.

We need to support the runtime behavior of the following builtins:

- __builtin___memcpy_chk
- __builtin___memmove_chk
- __builtin___mempcpy_chk
- __builtin___memset_chk
- __builtin___snprintf_chk
- __builtin___sprintf_chk
- __builtin___stpcpy_chk
- __builtin___strcat_chk
- __builtin___strcpy_chk
- __builtin___strncat_chk
- __builtin___strncpy_chk
- __builtin___vsnprintf_chk
- __builtin___vsprintf_chk

And I'd like to implement them at clang level, leveraging their existing
implementation. Is that the right way to go / any comments / issue with that
approach ?

Are you sure you've diagnosed the issue correctly? __builtin___memcpy_chk works correctly, as far as I know.

-Eli

I remember looking at the fortify macros recently, and iirc the issue was that the __builtin_object_size builtin, when used in an inline function, can't evaluate the size of the object in the context where it is inlined, which the glibc fortify macros/inline functions depend on.

This has been discussed before, e.g. here: http://lists.llvm.org/pipermail/cfe-dev/2015-November/045846.html

// Martin

The glibc implementation of FORTIFY_SOURCE is indeed not compatible with clang, because it uses compiler extensions which were trivial to implement in GCC’s architecture, but difficult/impossible in clang’s.

However, clang does provide other mechanisms by which the same results can be achieved.

If anyone is interested in glibc fully supporting fortify mode with clang, way to achieve this would be to look at the implementation in the bionic libc’s headers, and then implement the same technique in glibc.

That discussion is no longer relevant. __bos is only lowered to a
result, if the answer is definitely known and otherwise kept through a
good chunk of the pass pipeline, most importantly until after inlining
is done.

Joerg

Oh, ok.

Hmm, when I looked at some cases of it (_FORTIFY_SOURCE macros in mingw-w64, modelled after the ones in glibc), __bos didn't seem to be able to figure out the size as long when the use of it was in an inlined function (built with optimization) while the buffer to measure was in the caller's scope. As I found that ML post I didn't dig into the matter further. Maybe I should look at it again why it didn't work, if it is supposed to work these days...

// Martin

FWIW, there’s an existing implementation of a unified gcc+clang FORTIFY on top of glibc 2.27: https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+/master/sys-libs/glibc/files/local/glibc-2.27-clang-fortify.patch . For the reasons James mentioned, this needs a hefty amount of macro magic, and there’re always going to be discrepancies in what GCC catches vs what Clang catches(*).

Upstream glibc was generally receptive to the changes when I proposed them a while back, but I haven’t found the time to push it through. I’m more than happy to help as I can if someone wants to pick that up and run with it.

(*) - especially in the realm of compiler diags, and as we need to inline N callers and are expecting -D_FORTIFY_SOURCE=2-level accuracy. Happy to elaborate if it’d be helpful.

Are you sure you’ve diagnosed the issue correctly? __builtin___memcpy_chk works correctly, as far as I know.

100% sure. Let’s have a look at the output of

#include <string.h>
static char dest[10];
char* square(int n) {
memcpy(dest, “hello”, n);
return dest;
}

compiled with -D_FORTIFY_SOURCE=1 -O1 : https://godbolt.org/z/UvABWp

Clang issues a call to memcpy, while gcc issues a call to __memcpy_chk.
The call to __memcpy_chk performs extra runtime checks memcpy doesn’t,
and clang doesn’t generate the extra checks inline either. This is a separate
concern from the accuracy of __builtin_object_size, just a different runtime behavior.

Clang could generate the call to __memcpy_chk if its declaration is available, which is the case for the glibc.

No, this doesn't seem to be the case.

I expanded your testcase by removing the string.h include and replacing it with the relevant bits of it:

https://godbolt.org/z/NKUQbW

The regular call to memcpy ends up without the extra safety checks and just generates a call to memcpy. But if I call __builtin___memcpy_chk directly, it does generate a call to __memcpy_chk.

So I would conclude that the builtin works just fine, but the issue is with the exact rules for when and how the inline function is preferred over the externally available version.

I added a "#define VISIBIILTY extern" to ease testing and changing it. If you change that into static, you'll see that the inline version of the function actually ends up used, and it does generate a call to __memcpy_chk even there.

And this does seem to show that __builtin_object_size does work fine from within an inlined function now, contrary to my earlier comments.

// Martin

So I would conclude that the builtin works just fine, but the issue is
with the exact rules for when and how the inline function is preferred
over the externally available version.

Fascinating. I can elaborate a bit more based on https://godbolt.org/z/EZGfqd : there’s a special handling for memcpy (and probably other similar functions) in clang that prevents him from considering the inline version.I’ll dig in clang source to find out more about this behavior.

Got it. There are two places (one in clang, one in llvm) where being named `memcpy’ is a free pass for a function, independently of the presence of an actual definition. The attached patch makes sure that if a function named after an intrinsic has a body, calls to that function are not lowered to intrinsics. It has the (expected) side effect of getting a behavior closer to GCC wrt. -D_FORTIFY_SOURCE=n, and wrt. using function named after famous ones.

Do we want to go that way?

memcpy.patch (1.25 KB)