[RFC] Bump up clang's __GNUC_MINOR__

When clang was invented as a gcc-compatible compiler it started to pose as the latest Apple GCC that was available at the time: GCC 4.2.1. This made a lot of sense because features from newer GCC versions were missing and 4.2-level code got the most testing. However, GCC 4.2.1 was released in 2007 and GCC development hasn't stalled since, with clang following suit (and inventing new features). This lead to a weird disparity between "only gcc 4.2" and "awesome new features".

A few examples:
* __builtin_bswap (added in GCC 4.3) and __builtin_unreachable (GCC 4.5) are two builtins that are primarily used for optimization. Portable code will have a generic fallback for other compilers. If we level up our gcc version we'll get better code for free.
* Since GCC doesn't have an equivalent to __has_feature, C++11 code tends to make use of the new features based on the compiler version. GCC 4.2 had hardly any C++11 support at all.
* Things like __attribute__((deprecated("message"))) have been supported by clang for a long time, but portable code only enables it for GCC >= 4.5
* If the code isn't aware of clang we may get workarounds for compiler bugs that were fixed in GCC a long time ago and never existed in clang.

Of course this doesn't come without problems.
* New attributes were added, I'm relatively sure we support (or at least silently ignore) most of them but some code may get an exploding number of warnings due to ignored attributes when we flip the version number.
* New builtins were added. This is even worse, as compiling will fail if such a builtin is encountered. I dug through GCC release notes from 4.3 to 4.7 and only found __builtin_assume_aligned and __builtin_complex (both added in 4.7) that are missing in clang. There may be others that weren't listed in the release notes.

With that info in mind I propose changing clang's gcc version to 4.6.3 (currently the latest in the 4.6 series) and keep on bumping it as we gain source-level compatibility with newer GCC versions. If you have any concerns about code that may break if we do that, please share so we can avoid problems.

- Ben

With that info in mind I propose changing clang's gcc version to 4.6.3 (currently the latest in the 4.6 series) and keep on bumping it as we gain source-level compatibility with newer GCC versions. If you have any concerns about code that may break if we do that, please share so we can avoid problems.

Some time ago I considered changing clang to impersonate the version
of gcc whose libraries we were using. The problem I was trying to
solve was fixed by patching glibc, but, as you note, there are others
left. I now agree that if we are not going to impersonate gcc 4.2.1,
we should impersonate a newer fixed version.

Pretending to be an old gcc protects us from some of the more obscure
gcc features used in glibc. test/CodeGen/pr9614.c has some cases that
we already have to handle.

I am OK with changing which gcc we impersonate, but we have to try
building quiet a lot of software with different glibc headers to see
if more special cases in would be needed.

ccing chandler since he convinced me to patch glibc last time :slight_smile:

- Ben

Cheers,
Rafael

i'd like to add that pretending to be gcc 4.5 or newer will also require support
for attribute((__cold__)) and "asm goto" stmts that are used in linux extensively
(and there's possibly more, these are the ones i remember from the code).

For attribute __hot__ and __cold__ we could get away with just dropping the attribute silently, as it has no observable effect except performance. Implementing it in clang should be easy though, we do something similar for static constructors already.

asm goto is crazy and requires backend support, I don't think that will be implemented in llvm any time soon :frowning:

- Ben

Makes sense to me, we should do this early in the release cycle though (ie soon).

-Chris

I have previously objected because I worry about getting sucked into busy-work trying to support “just one more” GCC extension. Also, I think that eventually projects will need to start using clang-specific preprocessor guards to enable features.

That said, today I’m a lot less nervous about the “just one more” extension thing – partly because we’re already stuck in that cycle, and partly because we’ve got a lot more contributors so it seems less scary to have to keep up.

I think I’m down with the change as well, but I’d like to carefully document what we mean by the emulation so when we get questions (as I have repeatedly in the past) of the form: “Why does Clang claim to support GCC 4.X.Y when it doesn’t support feature Foo?” Essentially I think we should state that Clang is acting as a subset of a “modern” GCC, supporting most but not all of its features. Then folks can guard with Clang-specific macros if they need to differentiate that subset.

I’m looking into making this change, but there is at least one roadblock: __builtin_va_arg_pack (and _len). glibc makes quite heavy use of these in its public headers when the compiler is newer than GCC 4.3. I think we will have to add support for them in order to make this change. I’ll try to look into that next…

-Chandler

I'm looking into making this change, but there is at least one roadblock: __builtin_va_arg_pack (and _len). glibc makes quite heavy use of these in its public headers when the compiler is newer than GCC 4.3. I think we will have to add support for them in order to make this change. I'll try to look into that next....

This is also http://llvm.org/bugs/show_bug.cgi?id=7219

Sadly, this is one of the more annoying builtins :frowning:

- Ben

I'm looking into making this change, but there is at least one roadblock: __builtin_va_arg_pack (and _len). glibc makes quite heavy use of these in its public headers when the compiler is newer than GCC 4.3. I think we will have to add support for them in order to make this change. I'll try to look into that next....

Looks like we really underestimated this builtin, it's not really feasible to implement without a IRGen-level inliner. This is a huge amount of work for a little feature and I'm inclined to put it into the hall of broken gcc extensions right next to __builtin_apply.

On the attribute side there are at least 3 more attributes we'd need for glibc:

- attribute((warning("msg"))) and attribute((error("msg")). They look easy but the painful part is that they rely on CFG information so no warning/error is emitted for dead code. It's intended for use with __builtin_constant_p and inlining, something that we don't fully support either. At the moment we could get away with just using the normal mechanisms for dead-code warning suppression in clang.
- attribute((artificial)) this looks like a crude hack to suppress debug info for inline functions, I'm not sure about the implications.

So far we got away rather well with only implementing a more or less sane subset of gcc extensions even if our GNUC_MINOR suggests that we support all of them up to gcc 4.2 (including __builtin_apply and nested functions). So now thanks to the piece of software glibc is we're stuck at gcc 4.2 even though our subset of extensions expands greatly beyond that.

- Ben

So far we got away rather well with only implementing a more or less sane subset of gcc extensions even if our GNUC_MINOR suggests that we support all of them up to gcc 4.2 (including __builtin_apply and nested functions). So now thanks to the piece of software glibc is we're stuck at gcc 4.2 even though our subset of extensions expands greatly beyond that.

What version of glibc do we have to support? It does look like the
current maintainers are more reasonable, so if the really bad parts
were fixed we might be able to bump GNUC_MINOR one day.

Would it be worth it trying to bump it to only 4.3 for now?

- Ben

Cheers,
Rafael

__builtin_va_arg_pack was introduced in 4.3 and is used by glibc as a wrapper for many vararg functions when FORTIFY_SOURCE is enabled. We can't afford breaking open(2) for everyone on linux :frowning:

Arguing against uses of __builtin_va_arg_pack is also hard, as the only other way to implement the same features for FORTIFY_SOURCE is a variadic macro. While this is perfectly fine with the standard, some code may break if it sees a macro instead of a real function.

It may be possible to persuade the glibc maintiners into using the same fallback path as gcc 4.2 uses (some stuff is implemented with macros and some checks seem to be missing). But turnaround time for glibc is really slow. Debian stable is still stuck at a 3 year old version and RHEL is probably even older.

- Ben

have you considered making the simulated/claimed gcc version configurable
on the clang command line? this way distros/etc could just set it in the
CFLAGS used for compiling glibc, so it'd be a very simple change in the
existing build/packaging systems.

cheers,
PaX Team

that's easy:
clang -U__GNUC_MINOR__ -D__GNUC_MINOR__=7 foo.c

However the problem is not compiling glibc. That's something completely different that won't ever work with clang unless the glibc maintainers move. It uses extensions that are broken beyond repair like __builtin_apply and -fno-toplevel-reorder, making it a lot worse to deal with than the linux kernel.

The problem is that glibc uses those extensions in the standard header files (if FORTIFY_SOURCE is specified) so any code base using e.g. stdio.h and FORTIFY_SOURCE will have to change the gcc version clang simulates, which is not acceptable.

- Ben

I think if you claim that compiler is a drop-in replacement of certain GCC version you must support all of its intrinsics, and glibc is merely a test case.

This isn’t about compiling glibc, it’s about compiling the public glibc headers. As a consequence, any code which (transitively) includes a glibc header would break.

Technically, we don't do that for 4.2 either, since we don't implement the __builtin_apply() stuff, which GCC 4.2 does (or, rather, claims to - it works in some cases on some platforms), nor do we support nested functions (unless someone added that while I was asleep).

Perhaps I missed something, but I am not certain what the advantage of bumping the gcc version that we claim is. We are not gcc, we are clang. We advertise GCC 4.2 compatibility, because that makes porting code easier - anything that worked with GCC 4.2 should work with clang.

We also provide a rich set of feature test macros that allow people to check for specific features. By chasing a specific GCC version, we would be telling people that GCC is the standard compiler, look to it for a reference. I don't believe this is the correct message to be sending. The definition of __GNUC__ should be regarded as a legacy compatibility hack, not a maintained feature.

If the real problem is glibc not turning on features that we support, then the correct solution is to provide glibc with patches that use __has_feature() and friends to turn them on selectively. An even better solution would be to engage with the GCC community and persuade them of the merits of implementing __has_feature() and __has_builtin(). A couple of codebases that I work on have started using these as the default way of testing for features - if gcc doesn't implement the test macros then they are in the exact situation that clang is in with respect to glibc: the compiler is simply assumed not to support them and we fall back to a slower code path for that compiler.

David

-- Sent from my Apple II

However the problem is not compiling glibc. That's something
completely different that won't ever work with clang unless the glibc
maintainers move. It uses extensions that are broken beyond repair
like __builtin_apply and -fno-toplevel-reorder, making it a lot worse
to deal with than the linux kernel.

__builtin_return/__builtin_apply seem to be used in dlfcn/eval.c only
which is no longer built by glibc since 2004 according to git log.

-fno-toplevel-reorder can be a real problem, i wonder where glibc relies
on its behaviour though.

are you aware of any other incompatibility that prevents clang from
compiling glibc (i'm just preparing mentally for the day i get to try
this one day ;)?

The problem is that glibc uses those extensions in the standard header
files (if FORTIFY_SOURCE is specified) so any code base using e.g.
stdio.h and FORTIFY_SOURCE will have to change the gcc version clang
simulates, which is not acceptable.

ok, i misunderstood the original issue as being with glibc itself, not
with its headers used everywhere else. in that case i suggest that you
simply ignore this problem and stipulate that distros building with
clang not define _FORTIFY_SOURCE (it's not a big loss anyway).

However the problem is not compiling glibc. That's something
completely different that won't ever work with clang unless the glibc
maintainers move. It uses extensions that are broken beyond repair
like __builtin_apply and -fno-toplevel-reorder, making it a lot worse
to deal with than the linux kernel.

__builtin_return/__builtin_apply seem to be used in dlfcn/eval.c only
which is no longer built by glibc since 2004 according to git log.

ah, ok. I guess they left it in the tree for entertainment purposes then.

-fno-toplevel-reorder can be a real problem, i wonder where glibc relies
on its behaviour though.

Their runtime stuff (initfini.c) uses a weird mixture of module-level inline
asm and C code which breaks when the compiler doesn't preserve the
order.

are you aware of any other incompatibility that prevents clang from
compiling glibc (i'm just preparing mentally for the day i get to try
this one day ;)?

Apart from integrated-as issues, which is mostly missing support for some
macros and some other minor issues, missing support for tls_model is an
issue, but that should be fixable if someone really cares.

Their configure script also checks for http://llvm.org/bugs/show_bug.cgi?id=12554
which is … crazy.

The problem is that glibc uses those extensions in the standard header
files (if FORTIFY_SOURCE is specified) so any code base using e.g.
stdio.h and FORTIFY_SOURCE will have to change the gcc version clang
simulates, which is not acceptable.

ok, i misunderstood the original issue as being with glibc itself, not
with its headers used everywhere else. in that case i suggest that you
simply ignore this problem and stipulate that distros building with
clang not define _FORTIFY_SOURCE (it's not a big loss anyway).

It is a hardening feature and there are hundreds of projects out there that
enable it by default in their makefiles. It may not give a lot of value but
discouraging its use will only bring us "clang is insecure" comments and
angry users.

- Ben

Fully agreed. There are very few things that *fail* because of that,
e.g. current db4 is one example because they want to redefine a builtin.
Except that, it is mostly a question of quality of implementation, if
additional builtins are used.

Joerg