architecture endianness and preprocessor defines

everyone--

One of the things I've long disliked about how GCC works is that its developers have still not really sorted out how to handle architectures that can operate in either big- or little-endian mode. I'd like to know if the LLVM CFE developers have any thoughts on how to improve matters here.

Here's what GCC does today, and how that situation produces consequences downstream:

+ The various architecture configurations define built-in preprocessor definitions like __BIG_ENDIAN__ and __LITTLE_ENDIAN__.

+ These are hard-coded for architectures that don't have any choice, e.g. IA32, but they're switched by the -mbig-endian and -mlittle-endian on architectures that can be configured to run in either mode.

+ These built-in definitions aren't consistently defined across all the architectures either, so on some architectures you get __BIG_ENDIAN and on others you get __BIG_ENDIAN__. Isn't that wonderful?

One of the additional hassles with GCC is that its "multilib" feature doesn't consistently build the C runtime environment, i.e. crtstuff.c, for both big- and little-endian modes. This is why there are all those GCC target triples that look like "armeb-netbsd-elf" and "mipsel-wrs-vxworks" and "armle-linux-gnu" in the configure script. Notice that the suffixes aren't used consistently across operating system platforms?

The suffix on the architecture name ends up getting translated into the endianness of the C runtime environment modules used by the linker (except when -nostdlib is used... sigh). If it weren't for this, you'd be able to build GCC for ARM or MIPS or whatever, without adding that suffix to the architecture part of the triple, and the -mbig-endian and -mlittle-endian switches would select the proper C runtime environment. Sadly, that doesn't happen like it should.

I'm not sure how much Clang should need to know about the C runtime environment that will eventually get linked up with final executable machine objects, but it would be nice if you didn't have to apply this horrible corruption to the architecture part of the target triple. I'd rather the command driver were responsible for sorting out which runtime environments to link into what executables, and it should be able to do the right thing with just the command line switches.

That still leaves the C preprocessor built-ins, which are clearly in Clang's domain to manage. Here's what I propose: Clang should define a small set of general preprocessor built-ins that identify the CPU architecture family specified in the target triple, e.g. __ia32__, __x86_64__, __arm__, __powerpc__, __mips__, etc; it should also define __LITTLE_ENDIAN__ and __BIG_ENDIAN__ as appropriate, and it should offer the -mbig-endian and -mlittle-endian switches for explicitly specifying the endianness on architectures that can execute in either mode. The command driver can then do the right thing (or the wrong thing) as necessary.

I'd like to know if the Clang developers are interested in resisting the endianness suffixes on the architecture parts of the target triple specification. I hope the answer is yes.

I'll apologize that you've had such a hard time with this, but I've also
never seen an email from you asking about these things. That said,
many things you mentioned are just wrong.

One of the additional hassles with GCC is that its "multilib" feature
doesn't consistently build the C runtime environment, i.e. crtstuff.c,
for both big- and little-endian modes. This is why there are all
those GCC target triples that look like "armeb-netbsd-elf" and "mipsel-
wrs-vxworks" and "armle-linux-gnu" in the configure script. Notice
that the suffixes aren't used consistently across operating system
platforms?

This is true, however, the suffixes are usually created by the people
doing the work for the platform for the specific target triple...

The suffix on the architecture name ends up getting translated into
the endianness of the C runtime environment modules used by the linker
(except when -nostdlib is used... sigh). If it weren't for this,
you'd be able to build GCC for ARM or MIPS or whatever, without adding
that suffix to the architecture part of the triple, and the -mbig-
endian and -mlittle-endian switches would select the proper C runtime
environment. Sadly, that doesn't happen like it should.

Interestingly enough you're absolutely wrong here. The suffixes merely
allow you to select a different default. They are an alias. Nothing else.
The switches allow you, if the target has support for it, to change mode.
Many of the OS targets aren't bi-endian and don't support changing.

For the last 5 years at least almost all of the preprocessor builtins are
done using a standard method that will have them in 3 different
canonical forms, 2 of which you mentioned in your mail for big endian.

This is also mostly a complaint about configure and not gcc and mostly
not applicable to cfe either (though it may be a general llvm discussion).
If you want to respond I suggest we take this discussion to private
email or the llvm list (or the gcc list if you want to change how gcc does
it).

-eric

One of the things I've long disliked about how GCC works is that its
developers have still not really sorted out how to handle
architectures that can operate in either big- or little-endian mode.
I'd like to know if the LLVM CFE developers have any thoughts on how
to improve matters here.

You bring up a lot of interesting issues. Some meta answers :slight_smile:

Here's what GCC does today, and how that situation produces
consequences downstream:

+ The various architecture configurations define built-in preprocessor
definitions like __BIG_ENDIAN__ and __LITTLE_ENDIAN__.

We aim to be GCC compatible with preprocessor directives. This is important for compatibility with existing code.

One of the additional hassles with GCC is that its "multilib" feature
doesn't consistently build the C runtime environment, i.e. crtstuff.c,
for both big- and little-endian modes. This is why there are all
those GCC target triples that look like "armeb-netbsd-elf" and "mipsel-
wrs-vxworks" and "armle-linux-gnu" in the configure script. Notice
that the suffixes aren't used consistently across operating system
platforms?

I agree that this is irritating. Two issues: 1) we will support the GCC target triples, at least when/if people contribute support for them. 2) clang is explicitly designed to support building a single tool chain in place that supports multiple targets. The ultimate goal is that you should be able to configure clang with "--targets='armeb-netbsd-elf mipsel-wrs-vxworks armle-linux-gnu'" and get support in the toolchain for all of them. We already have support for handling this (-arch option and friends). When we bring up the "libgcc" runtime library stuff, we'll make sure it can be built for multiple targets.

The suffix on the architecture name ends up getting translated into
the endianness of the C runtime environment modules used by the linker
(except when -nostdlib is used... sigh). If it weren't for this,
you'd be able to build GCC for ARM or MIPS or whatever, without adding
that suffix to the architecture part of the triple, and the -mbig-
endian and -mlittle-endian switches would select the proper C runtime
environment. Sadly, that doesn't happen like it should.

Just because we will support the existing GCC target triples (again, when/if people contribute support for them) it doesn't mean we can't support simplified triples also.

That still leaves the C preprocessor built-ins, which are clearly in
Clang's domain to manage. Here's what I propose: Clang should define
a small set of general preprocessor built-ins that identify the CPU
architecture family specified in the target triple, e.g. __ia32__,
__x86_64__, __arm__, __powerpc__, __mips__, etc; it should also define
__LITTLE_ENDIAN__ and __BIG_ENDIAN__ as appropriate, and it should
offer the -mbig-endian and -mlittle-endian switches for explicitly
specifying the endianness on architectures that can execute in either
mode. The command driver can then do the right thing (or the wrong
thing) as necessary.

We have to support the existing ones. Requiring people to 'port' their code to clang from GCC is not desirable.

That said, we *can* support nicer and cleaner interfaces as well for feature queries. Over time, we can encourage people (who don't care about writing portable code (?)) to use these and/or try to get the GCC folks to adopt similar features.

-Chris