GCC 5 and -Wstrict-aliasing in JSON.h

Hello,

For the IWYU project, we have a buildbot on Ubuntu 16.04 and its
bundled GCC (which I think is some GCC 5 variant).

We're getting a number of -Wstrict-aliasing warnings from JSON.h on this line:
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/JSON.h#L455

I'm not sure if GCC has a point here but GCC 7.2 does not complain, so
I'm going to guess no.

Would you consider patches to disable -Wstrict-aliasing wholesale for
GCC 5 and older?

- Kim

Hi Kim,

Thanks for your report!

My GCC 4.9.3 is *not* able to reproduce the issue with reduced testcase[1]

But it is able to reproduce the issue by compiling LLVM with GCC toolchain[2]

In file included from /home/loongson/llvm/lib/Support/JSON.cpp:10:0:
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = bool]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:393:23: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
      return *reinterpret_cast<T *>(Union.buffer);
                                                ^
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = double]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:398:25: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = long int]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:400:26: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = std::basic_string<char>]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:418:46: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::StringRef]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:420:34: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::json::Object]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:424:62: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::json::Array]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:430:60: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
In file included from /home/loongson/llvm/lib/TableGen/JSONBackend.cpp:20:0:
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = bool]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:393:23: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
      return *reinterpret_cast<T *>(Union.buffer);
                                                ^
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = double]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:398:25: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = long int]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:400:26: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = std::basic_string<char>]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:418:46: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::StringRef]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:420:34: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::json::Object]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:424:62: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
/home/loongson/llvm/include/llvm/Support/JSON.h: In instantiation of ‘T& llvm::json::Value::as() const [with T = llvm::json::Array]’:
/home/loongson/llvm/include/llvm/Support/JSON.h:430:60: required from here
/home/loongson/llvm/include/llvm/Support/JSON.h:455:47: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

Hi GCC developers,

Thanks for your patch to gcc-7-branch!

Could you test to compile LLVM with GCC old versions 4/5/6? Does it need to backport your patch to GCC old version?

1.
$ cat punning.cpp
template<unsigned _Len, unsigned _Align>
struct aligned_storage
{
   union type
     {
       unsigned char __data[_Len];
       struct __attribute__((__aligned__((_Align)))) { } __align;
     };
};

aligned_storage<sizeof(int), alignof(int)>::type storage;

int main()
{
   *reinterpret_cast<int*>(&storage) = 42;
}

2.
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/mips64el-redhat-linux/4.9.3/lto-wrapper
Target: mips64el-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-arch=loongson3a --enable-languages=c,c++,objc,obj-c++,fortran,go,lto --enable-plugin --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.9.3/obj-mips64el-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.9.3/obj-mips64el-redhat-linux/cloog-install --enable-gnu-indirect-function --with-long-double-128 --build=mips64el-redhat-linux
Thread model: posix
gcc version 4.9.3 20150626 (Red Hat 4.9.3-8) (GCC)

Hello,

For the IWYU project, we have a buildbot on Ubuntu 16.04 and its
bundled GCC (which I think is some GCC 5 variant).

We're getting a number of -Wstrict-aliasing warnings from JSON.h on this line:
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/JSON.h#L455

I'm not sure if GCC has a point here but GCC 7.2 does not complain, so
I'm going to guess no.

Would you consider patches to disable -Wstrict-aliasing wholesale for
GCC 5 and older?

But it is *no* warning when bootstrap with LLVM toolchain, and Sam should test it before code review https://reviews.llvm.org/D45753

GCC versions before 6.4 are not supported, so no backports will happen
to 4.x or 5 releases and I doubt anybody's going to routinely test
anything with them. If there's a bug in GCC 4.9.3 that you can't work
around then you should upgrade to a supported release (or use a
distribution that provides support for older releases).

Hi Jonathan,

Thanks for your response!

So workaround for Kim's issue is bootstrap old version LLVM with GCC 4/5 to build old clang, then bootstrap latest LLVM with old clang.

It would be much easier to just use -fno-strict-aliasing with the older GCC.

Author of the problematic code here. Thanks everyone, and sorry to have caused difficulty!

Obviously if there really is something illegal here we should fix it in LLVM, but it looks like this warning is a false positive (anyone disagree?)

Still if there’s a simple source-level workaround, or we can suppress the warning with a #pragma, I’d be happy to do that. GCC 4.9.3 is a supported compiler for LLVM and the more configurations we build cleanly in, the better.

If this is a useful direction, could someone with an affected environment send me a small patch? I don’t have the right setup to verify myself.

Cheers, Sam

Hi Sam,

Thanks for your response!

Please review it, thanks a lot!

A patch by Loongson!

diff --git a/include/llvm/Support/JSON.h b/include/llvm/Support/JSON.h
index da3c5ea..fd60b40 100644
--- a/include/llvm/Support/JSON.h
+++ b/include/llvm/Support/JSON.h
@@ -452,7 +452,10 @@ private:
      new (reinterpret_cast<T *>(Union.buffer)) T(std::forward<U>(V)...);
    }
    template <typename T> T &as() const {
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wstrict-aliasing"
      return *reinterpret_cast<T *>(Union.buffer);
+#pragma GCC diagnostic pop
    }

    template <typename Indenter>

Author of the problematic code here. Thanks everyone, and sorry to have caused difficulty!

Obviously if there really is something illegal here we should fix it in LLVM, but it looks like this warning is a false positive (anyone disagree?)

False positive!

Still if there's a simple source-level workaround, or we can suppress the warning with a #pragma, I'd be happy to do that. GCC 4.9.3 is a supported compiler for LLVM and the more configurations we build cleanly in, the better.

If this is a useful direction, could someone with an affected environment send me a small patch? I don't have the right setup to verify myself.

I test it for mips64el built with gcc-4-branch:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/mips64el-redhat-linux/4.9.3/lto-wrapper
Target: mips64el-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-arch=loongson3a --enable-languages=c,c++,objc,obj-c++,fortran,go,lto --enable-plugin --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.9.3/obj-mips64el-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.9.3/obj-mips64el-redhat-linux/cloog-install --enable-gnu-indirect-function --with-long-double-128 --build=mips64el-redhat-linux
Thread model: posix
gcc version 4.9.3 20150626 (Red Hat 4.9.3-8) (GCC)

Thanks all for pitching in to help!

Obviously if there really is something illegal here we should fix it in
LLVM, but it looks like this warning is a false positive (anyone disagree?)

The little I've read about strict aliasing rules leaves me firmly
incompetent to judge what's valid and not :slight_smile:

But since both Clang and GCC 6+ are happy with this, it seems
plausible that this would be a false positive.

Still if there's a simple source-level workaround, or we can suppress the
warning with a #pragma, I'd be happy to do that. GCC 4.9.3 is a supported
compiler for LLVM and the more configurations we build cleanly in, the
better.

I *think* Leslie's warning-disable patch should work even with MSVC
(it should ignore unknown #pragmas, right?). I can't think of a
straight-up code change that would fix this.

Cheers,
- Kim

Thanks all for pitching in to help!

>
> Obviously if there really is something illegal here we should fix it in
> LLVM, but it looks like this warning is a false positive (anyone disagree?)

The little I've read about strict aliasing rules leaves me firmly
incompetent to judge what's valid and not :slight_smile:

But since both Clang and GCC 6+ are happy with this, it seems
plausible that this would be a false positive.

If GCC 4.9.3 thinks there's an aliasing violation it might
misoptimise. It doesn't matter if it's right or not, it matters if it
treats the code as undefined or not.

And apparently GCC does think there's a violation, because it warns.

Unless you're sure that not only is the code OK, but GCC is just being
noisy and doesn't misoptimise, then I think using -fno-strict-aliasing
is safer than just suppressing the warning.

Good point, I can see how that would play out nicer.

So this would probably need to be addressed in the LLVM build system, I’ll try and work up a patch tomorrow.

Thanks,

  • Kim

If GCC 4.9.3 thinks there's an aliasing violation it might
misoptimise. It doesn't matter if it's right or not, it matters if it
treats the code as undefined or not.

And apparently GCC does think there's a violation, because it warns.

Unless you're sure that not only is the code OK, but GCC is just being
noisy and doesn't misoptimise, then I think using -fno-strict-aliasing
is safer than just suppressing the warning.

Good point, I can see how that would play out nicer.

So this would probably need to be addressed in the LLVM build system, I'll
try and work up a patch tomorrow.

When I used to do such type punning in C, I got similar warnings. Then I looked for some solutions... I can't recall the principle now and I fail to find it in the C or C++ standard. Despite that, the solution is simple:

Only an lvalue of a pointer to (possibly CV-qualified) `void` or a pointer to a character type (in C) / any of `char`, `unsigned char` or `std::byte` (in C++) can alias objects.

That is to say, in order to eliminate the aliasing problem an intermediate lvalue pointer is required.

Hence, altering

     return *reinterpret_cast<T *>(Union.buffer);

to

     auto p = static_cast<void *>(Union.buffer);
     return *static_cast<T *>(p);

will probably resolve this problem.

Hi LH_Mouse,

Thanks!

When I used to do such type punning in C, I got similar warnings. Then I
looked for some solutions... I can't recall the principle now and I fail to
find it in the C or C++ standard. Despite that, the solution is simple:

Only an lvalue of a pointer to (possibly CV-qualified) `void` or a pointer
to a character type (in C) / any of `char`, `unsigned char` or `std::byte`
(in C++) can alias objects.

That is to say, in order to eliminate the aliasing problem an intermediate
lvalue pointer is required.

Hence, altering

    return *reinterpret_cast<T *>(Union.buffer);

to

    auto p = static_cast<void *>(Union.buffer);
    return *static_cast<T *>(p);

will probably resolve this problem.

I'm worried that this might only serve to trick the compiler.

Explicitly using `-fno-strict-aliasing` for GCC < 6 would seem more
direct to me -- as Jonathan says, if the compiler classifies a strict
aliasing rule violation as undefined behavior, and that is further
used to optimize in an unexpected manner, it doesn't matter whether it
warns or not. Then again, I guess disabling strict aliasing would also
disable optimizations that are generally useful for LLVM as a whole.

I'm reading up on safe aliasing techniques, but so far nothing stands
out to me as applicable in this scenario.

- Kim

I'm worried that this might only serve to trick the compiler.

It shouldn't. If it was merely a trick then `std::aligned_storage` would be completely unusable.

Explicitly using `-fno-strict-aliasing` for GCC < 6 would seem more
direct to me -- as Jonathan says, if the compiler classifies a strict
aliasing rule violation as undefined behavior, and that is further
used to optimize in an unexpected manner, it doesn't matter whether it
warns or not. Then again, I guess disabling strict aliasing would also
disable optimizations that are generally useful for LLVM as a whole.

I'm reading up on safe aliasing techniques, but so far nothing stands
out to me as applicable in this scenario.

The C++ standard requires creation of objects in such ways to use new expressions (a.k.a. placement new). Athough [intro.object]-3 only defines /provides storage/ for arrays of a character type or `std::byte`, the specification of `aligned_storage` and `aligned_union` in [meta.trans.other] doesn't require the type used as uninitialized storage to be an array of a character type or `std::byte` - in fact it cannot be, because alignment information is not part of the nominal type system of C and will be lost when obtaining the `type` member.

Focusing on the cast: As long as the compiler is unable to know whether a placement new has been made on the union (i.e. whether it is providing storage for another object), I don't think a standard-conforming compiler is ever allowed to ignore such possibility.

Only an lvalue of a pointer to (possibly CV-qualified) `void` or a
pointer to a character type (in C) / any of `char`, `unsigned char` or
`std::byte` (in C++) can alias objects.

Yes.

That is to say, in order to eliminate the aliasing problem an
intermediate lvalue pointer is required.

Not exactly. You can cast a pointer to a pointer to some character
type or the type of the object stored in memory. It does not matter
whether you use an intermediate type or not. Having not seen the test
case, I can't tell whether this rule is followed.

What you can't do is store as a double, cast the double pointer to a
character pointer, cast that pointer to some other pointer type, and
read from memory. GCC won't give you a warning for that, but it's
still undefined.

JSON.h seems to hope that if you cast a pointer to T to a pointer to
some union type, magic will happen. It won't work, unless the object
stored in the memory at that address was stored as the union type.

Do not lie to the compiler or it will get its revenge.

json::Value in JSON.h is a discriminated union.
The storage is a char array of appropriate type and alignment. The storage holds one object at a time, it’s initialized (and for nontrivial types, destroyed) at the right times to ensure this. The cast is only to the type of object that’s already there, there’s no magic here.

Yes, if Andrew's explanation agrees with the standard, then this code
should be benign (if it breaks any of these rules, that's a bug in and
of itself).

So maybe we're back to figuring out how to silence GCC's warning machinery :slight_smile:

- Kim

I did some more extensive testing and found that all GCCs older than 7
trigger the warning, but only if CMAKE_BUILD_TYPE=Release (which I
guess indicates optimizations are enabled).

There's a patch up for disabling the warning here:
https://reviews.llvm.org/D50607.

I still feel a little uncomfortable, because I think Jonathan makes an
excellent point -- if GCC thinks there's a strict-aliasing violation
(whether the standard agrees or not) and classifies this as undefined
behavior, we might invoke the optimizers wrath. The warning is a nice
hint that this could happen.

While writing this I realized that LH_Mouse's double static_cast also
silences the warning, and I think that's a nicer workaround, so I put
up a patch for that too, here:
https://reviews.llvm.org/D50608

Let me know what you think,
- Kim

Indeed. And I'm not convinced that the pointer cast is necessary anyway:
if the type passed in is a union, why not simply take the union member of
the appropriate type?

I don't think that GCC would produce this warning unnecessarily. You
are in dangerous waters.

As it turns out, Union is not a union ¯\_(ツ)_/¯. (I thought it was, up
until this point.)

It's a template producing a char array suitably aligned for any number
of mutually exclusive types, much like
https://en.cppreference.com/w/cpp/types/aligned_union.

I can't tell if that makes the waters less dangerous, but there's an
invariant for this code that it only reinterpret_casts out Ts from the
char buffer that it has previously put in there using placement new.
As far as I can tell, that should be safe. The local construct in the
as<T>() member function template, something like 'return
*reinterpret<T *>(buffer);' is generally unsafe. But the calling code
always maintains the type invariant (only invoking as<T> for the right
T), so I'd argue this is fine.

Whether all compilers see through this is another question, and I
trust you that there may be dragons.

Thanks,
- Kim

I still feel a little uncomfortable, because I think Jonathan makes an
excellent point -- if GCC thinks there's a strict-aliasing violation
(whether the standard agrees or not) and classifies this as undefined
behavior, we might invoke the optimizers wrath. The warning is a nice
hint that this could happen.

Indeed. And I'm not convinced that the pointer cast is necessary anyway:
if the type passed in is a union, why not simply take the union member of
the appropriate type?

As this piece of code occurs in 'JSON.h' I presume it is sort of variant
implementation, then handwritten specializations of this template,
taking the member expected, might be a bit superfluous.

I don't think that GCC would produce this warning unnecessarily. You
are in dangerous waters.

It it still in doubt that whether a union can /provide storage for/
other objects. Other than that, I think it should be safe to assume the
behavior is quite defined here, as long as the object constructed in the
union is always referenced through the _correct_ type of reference (that
is, a reference type to the dynamic type, or a type similar to the
dynamic type, possibly cv-qualified or different only in sign-ness, of
the object that has been constructed in the union), and the union itself
is never accessed directly, because no rule in [basic.lval]-11 is ever
violated.