AArch64 Clang CLI interface proposal

Hi,

Recently, I committed a patch adding default features for ‘-mcpu’. And after that, Eric replied me here’s a proposal toward using ‘-march’ instead of ‘-mcpu’. As it’s half a year later from original proposal, some background may changes. One thing worth to mention is, during this time, Apple Contributed its backend and introduced another new CPU type: cyclone. Now, AArch64 target supports 4 kinds of CPU types: cyclone, cortex-a53, cortex-a57 and generic. First three cover full feature from fp to crypto, and for generic, only fp and neon are enabled by default. As time goes by, more and more CPU types will be introduced with different combination of features. Then, the end-user may not have knowledge to find out what instruction sets does each CPU support. So from my point of view, it’s not quite wise to put CPU names into ‘-march’. ‘-march’ should only select architecture level feature, which means decide instruction sets. If a binary is complied by ‘-march=+neon+crypto’, then it should be able to run on all CPU supporting neon and crypto. And end-user won’t need to doubt if ‘-march=cyclone’ can safely run on another CPU without crypto, like generic(This kind of CPU will quite possibly be reality in future). In summary, I suggest ‘-march’ only accept architecture level feature to point out explicit instruction sets to get better portability among CPUs based on AArch64.

To select CPU type as optimizing target, I suggest to use ‘-mtune’, which indicates macro architecture information to get fully optimization for it. ‘-mtune’ won’t automatically change any architecture level feature selection, but will enable some macro architecture feature according to CPU. For example, ‘-mtune=cyclone’ won’t enable crypto, but will enable zcm and zcz, and also enable all special pass and scheduler.

Last change is about ‘-mcpu’. If user don’t care portability and only want to get good performance on a certain CPU, he can use ‘-mcpu=XXX+[no]featureA’, which is an alias of ‘-march=default feature of XXX+[no]featureA -mtune=XXX’. All default feature will get enabled and tune target will be selected. So it’s just a short hand of ‘-march’ and ‘-mcpu’.

I think those changes can easily build binary running on multiple CPUs and get more compatible with gcc. Is this new proposal reasonable?

Best Regards,
Kevin Qin

Hi Kevin,

I assume you've looked at the GCC documentation in this area, since
your ideas are very similar:
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html. I actually
think that looks like a rational set of conventions too.

The main difference appears to be that GCC requires "armv8-a" in
-march before any features, which I quite like. But even if I didn't
it's close enough that we should follow anyway, for compatibility if
nothing else.

Last change is about '-mcpu'. If user don't care portability and only want
to get good performance on a certain CPU, he can use
'-mcpu=XXX+[no]featureA', which is an alias of '-march=default feature of
XXX+[no]featureA -mtune=XXX'. All default feature will get enabled and tune
target will be selected. So it's just a short hand of '-march' and '-mcpu'.

I think we should also specifically warn on the use of -mcpu:
aggressively deprecate it on AArch64. AArch64 is a new architecture,
there's no real excuse for using the legacy options.

I think those changes can easily build binary running on multiple CPUs and
get more compatible with gcc. Is this new proposal reasonable?

I like it!

Cheers.

Tim.

The main difference appears to be that GCC requires "armv8-a" in
-march before any features, which I quite like. But even if I didn't
it's close enough that we should follow anyway, for compatibility if
nothing else.

Agreed.

I think we should also specifically warn on the use of -mcpu:
aggressively deprecate it on AArch64. AArch64 is a new architecture,
there's no real excuse for using the legacy options.

Not entirely true. Legacy systems will be easily migrated to AArch64
by replacing previous flags with updated values (possibly migrated
from even older values still).

I'm fairly certain that GCC will enable any kind of legacy options,
which will encourage people to keep using it and complain that Clang
doesn't, and you know the rest of the story.

We could label them "lazy" but whoever had to deal with build systems'
crap knows how hard it is to make even the smallest of changes to a
legacy system already migrated twice. The guys that wrote the original
are all dead and your life is not long enough.

I'm not advocating to keep legacy, only that we think about it. Clang
already has the pretend-GCC-behaviour in many ways, and if that is
implemented orthogonally from the modern compiler (facade pattern for
those GOF-inclined), than we won't have to break our compiler but
still understand legacy for its merits.

One way to do this is to make triple/mcpu/mfpu parsing a way of
setting flags, and then feeding the flags back to -cc1, so that -cc1
doesn't have triples/mcpu/mfpu by definition.

I think those changes can easily build binary running on multiple CPUs and
get more compatible with gcc. Is this new proposal reasonable?

I like it!

+1.

--renato

Hi Tim,

I'm fairly certain that GCC will enable any kind of legacy options,
which will encourage people to keep using it and complain that Clang
doesn't, and you know the rest of the story.

Yep. So we'll have to keep it around for quite a while. But we should
make it noisily annoying to encourage correct behaviour.

Cheers.

Tim.

Hi Kevin,

Because triple is necessary for clang and it have selected the architecture.

This is one of the worst parts about the Clang CLI for cross compilation at the moment. I’d really like, if we’re changing the CLI, to allow users to remove it. For example, if I specify -march=armv7-a, it shouldn’t need me to put “-target arm” before it to work!

On a similar note, how does this proposal deal with instruction set selection? what does “-march=armv8-a” select- AArch32 or AArch64? or is that expected to be handled by the “-target” (grim!)

Cheers,

James

One thing that I've been pondering for a little while is making clang's use of the triple a config file. In particular, we'd like to invoke things like mips4-unknown-freebsd-clang and have it really mean 'clang -target mips64-unknown-freebsd -mcpu=mips4 -msoft-float --sysroot=/usr/local/sysroots/mips4 {whatever}'. For cross compiles, we'd only need one clang, a symlink, and a config file. I pondered a more structured config file, but really there isn't much that we'd want to do that isn't already covered by command-line arguments, so just having it look for ${LOCALBASE}/etc/llvm/mips4-unknown-freebsd-clang.conf would be fine.

We can do this via shell scripts now, but even with exec the extra shell invocation has a measurable overhead and in some cases (e.g. i386-unknown-freebsd-clang on FreeBSD/x86-64) we'd like to just use the default values for the triple.

David

This is one of the worst parts about the Clang CLI for cross compilation at
the moment. I'd really like, if we're changing the CLI, to allow users to
remove it. For example, if I specify -march=armv7-a, it *shouldn't* need me
to put "-target arm" before it to work!

Good lord, that's horrendous!

On a similar note, how does this proposal deal with instruction set
selection? what does "-march=armv8-a" select- AArch32 or AArch64? or is
that expected to be handled by the "-target" (grim!)

Can't we do the same as -thumb? Like -a32 and -a64, with default to -a64?

--renato

On this subject, what if the build system (autoconf / cmake / etc)
could, based on its own arguments, build the command line
automatically, based on the host architecture, target triple and other
flags? Wouldn't that solve the problem of having config files AND the
problem of an extra bash script?

However, I think this is independent of the CLI change, that needs to
be saner no matter how we solve cross-compilation. I still believe
that the CLI could have different ways of detecting target properties
(triple, flags, config files) but all of them should plug into the
same common infrastructure, ie. arch + feature flags to -cc1.

cheers,
--renato

Our experience with the FreeBSD ports tree is that things that require changing a build system are difficult.

David

I know... :frowning:

--renato

This is one of the worst parts about the Clang CLI for cross compilation at
the moment. I'd really like, if we're changing the CLI, to allow users to
remove it. For example, if I specify -march=armv7-a, it *shouldn't* need me
to put "-target arm" before it to work!

Good lord, that's horrendous!

*shrug* it's that or we start figuring out how to make -arch work on
all hosts to all targets. (Hint: Not possible without a triple)

i.e. if you specify -march=armv7-a what OS do you want? You could say
that you want the same OS that you're on I guess, but...

-eric

Hi James,

I understand the annoyance from ‘-target’. I can do something around this to override triple from ‘-march’, but the trouble is how to define default value for OS and ABI? Now I can give 2 solutions:

A. Get OS and ABI from host triple. For example, If host triple is “x86_64-unknown-linux-gnu”, then using ‘-march=armv8-a’ without specifying ‘-target’ will give ‘aarch64-unknown-linux-gnu’.
B. Use “unknown” value for OS and ABI. Then ‘-march=armv8-a’ equals to ‘-target=aarch64’, which represents ‘aarch64-unknown-unknown’.

For solution A, it can bring convenience to user running same operation system on host and target device, but also introduce uncertain “environment flag” to command line, which may cause confusion when migrating build system to other OS.
For solution B, it seems more reasonable. But I doubt this kind of triple is really useful.

Do you have any good idea?

Regards,
Kevin

*shrug* it's that or we start figuring out how to make -arch work on
all hosts to all targets. (Hint: Not possible without a triple)

Everything is "possible". :slight_smile:

My view is that there are many poor solutions, like triples, but that
triples have already been abused so much that the other poor ones are
looking a little better. The legacy in triples makes dealing with edge
cases a lot harder.

i.e. if you specify -march=armv7-a what OS do you want? You could say
that you want the same OS that you're on I guess, but...

That's where it gets complicated... From x86_64 to x86, yes, I would
assume the environment is the same, but from x86 to ARM, I wouldn't.
To have that kind of logic on the triple parsing code would be
nefarious.

Another example is the case where "-triple armv7-linux-gnueabi -thumb"
becomes "-triple thumbv7-linux-gnueabi" within the driver, and
getArchCPU() returns "cortex-a8" in the first two times it's called
and "arm7tdmi" for the last, which end up being the CPU of choice.
This is a bug, yes, but it's a result of having triples being used as
input AND data structure AND output to -cc1 in the driver.

That's why I suggested to have a clean triple parser, that only parses
triples and produces arch+flags, which will then be passed to the
-cc1. David's configuration files could have the exact same effect and
even be pluggable on the driver in the same way the triple parsing and
-march+flags does. A GNU compatibility layer would also be easy to
implement that way, and be completely separated from pure-Clang
parsing, if that infrastructure would exist.

But I've seen the code that deals with flags and triples and it has
more hard-coded flag-guessing that lines of code, so I'm not sure
anyone would be willing to do that refactoring... :frowning:

cheers,
--renato

*shrug* it's that or we start figuring out how to make -arch work on
all hosts to all targets. (Hint: Not possible without a triple)

Everything is "possible". :slight_smile:

My view is that there are many poor solutions, like triples, but that
triples have already been abused so much that the other poor ones are
looking a little better. The legacy in triples makes dealing with edge
cases a lot harder.

This is where you choose simplicity rather than trying to lump
everything into a triple. -march and the various feature flags
can/should work on top of a basic triple to do everything.

-eric