The Trouble with Triples

Hi,

In http://reviews.llvm.org/D10969, Eric asked me to explain the wider context of the TargetTuple object that was replacing Triple on llvmdev so here it is.

Before I start, I’m sure I don’t know the full extent of GNU triple ambiguity and lack of canonicity. Additional examples are welcome.

The Problem

As you know, LLVM uses a GNU Triple is as a target description that can be relied upon to make decisions. It’s used for various decisions such as the default cpu, the alignment of types, the object format, the names for libcalls, and a wide variety of others.

In using it like this, LLVM assumes that triples are unambiguous and have a specific defined meaning. Unfortunately, this assumption fails for a number of reasons.

The first reason is that compiler options can overrule the triple but leave it unchanged. For example, in GCC mips-linux-gnu-gcc normally produces 32-bit MIPS-I output using the O32 ABI, but ‘mips-linux-gnu-gcc –mips64’ normally produces 64-bit MIPS-III output using the N32 ABI. Like GCC, compiler options to mips-linux-gnu-clang should (and mostly do but MIPS has a few crashing cases caused by triple misuse) overrule the triple. However, we don’t mutate the triple to reflect this so any decisions based on the overridable state cannot rely on the triple to accurately reflect the desired behaviour.

It’s worth mentioning here that some targets have hacks to partially mutate the triple in clang to work around issues they would otherwise have in the backend but this is done on an ad-hoc basis for specific details (e.g. mips <-> mipsel for –EL and -EB).

The second reason is that there is no canonical meaning for a given GNU Triple, it varies between vendors and over time. There is also no requirement for vendors to have a unique GNU Triple for their toolchain. For GCC, it’s fairly common for distributors to change the meanings of triples using options like --with-arch, --with-cpu, --with-abi, etc. There are also some target-specific options such as --with-mode to select ARM/Thumb by default and --with-nan for MIPS NAN encoding selection. Different vendors use different configure options and may change them at will. When they do change them, the vendors often desire to keep the same triple to be able to drop in the new version without causing wider impact on their environment. For example, assuming I’m reading debian/rules2 for Debian’s gcc-4.9 package correctly then the i386-linux-gnu means i486 on Debian Etch and Lenny but means i586 on more recent versions. On a similar note, on Debian, mips-linux-gnu targets MIPS-II (optimised for typical MIPS32 implementations) rather than the usual MIPS-I. The last example of this ambiguity I’d like to reference is that mentioned by https://wiki.debian.org/Multiarch/Tuples#Why_not_use_GNU_triplets.3F. In that example, hard-float and soft-float on ARM both used arm-linux-gnueabi but were mutually incompatible. The Multiarch tuples described on that page are an attempt to resolve the ambiguity but I’m told that they aren’t likely to be universally adopted.

The third reason, is that different triples can mean the same thing. Jim Grosbach has mentioned that the prefixes of the GNU Triple are different between Linux and Darwin for ARM despite sharing the same meaning (presumably subject to the issues above). As a result decisions based on the string have to take care of multiple possible values. Mips has a similar issue too since a host triple (and therefore default target triple) of mips64-linux-gnu needs to behave like mips-linux-gnu on a 32-bit Mips port of Debian.

Although not included in the description of the assumption above, one additional flaw in the use of GNU Triples is that they are sometimes inadequate as a description of the target. One example affecting MIPS in particular is that the ABI is not represented in the GNU Triple we require significant API changes to get this information where we need it. It would be helpful to be able to pass such information through the existing plumbing.

The Planned Solution

The plan is to split the GNU Triple represented by the llvm::Triple object into two pieces. The first piece is the existing llvm::Triple and is responsible for parsing the GNU triple and canonicalizing it. The second piece is a mutable target description named llvm::TargetTuple. TargetTuple is responsible for interpreting the triple according to the vendor’s rules, providing an interface to allow mutation by tools, and authoritatively defining the target being targeted without the ambiguity of GNU Triples. As an example, ‘mips-linux-gnu-clang –EL …’ would:

// Parse the GNU Triple

llvm::Triple GnuTriple(“mips-linux-gnu”);

// Convert it to a TargetTuple according to the (possibly customized) meanings in

// use by the vendor.

llvm::TargetTuple TT(GnuTriple);
// Then mutate the TargetTuple according to the compiler options (or equivalent depending

// on the tool, for example disassemblers would mutate it according to the object headers).
if (hasOption("-EL"))
TT.setLittleEndian()

At this point, TT would be “+mipsel-unknown-linux-gnu-elf32-some-other-stuff” (exact serialization is t.b.d and may end up target dependent) which we can then rely on in the rest of LLVM. This split resolves the issue of llvm::Triple objects not being reliable when used as a target description since TargetTuple will reflect the result of interpreting the triple as well as applying appropriate options. It also provides a suitable place for vendors to define the meanings of their GNU Triples.

One significant detail is the way vendors customize the meaning of their Triples. Currently, the plan is to nominate a constructor (TargetTuple::TargetTuple(const Triple &)) a vendor can patch to redefine their triples with the default implementation being the ‘usual’ meaning (the meaning that should be used in the absence of customization). One nice benefit of this configure-by-source-patch approach is that vendors can customize multiple triples as easily as their native triple or intended target triple. To use Debian as an example again, they would be able to customize all their supported triples such that ‘clang –target arm-linux-gnueabihf’ on the amd64 port targets their armhf port using the same customization that makes ‘clang’ on the armhf port do the right thing natively. Android, and toolchains for heterogenous platform would likely benefit from this too. This configure-by-source-patch approach seems to make some people uncomfortable so we may have to find another way to configure the triples (tablegen?).

To reach this result the plan is to do the following:

  1. Replace any remaining std::string’s and StringRef’s containing GNU triples with Triple objects.

  2. Split the llvm::Triple class into llvm::Triple and llvm::TargetTuple classes. Both are identical in implementation and almost identical in interface at this stage.

  3. Gradually replace Triples with TargetTuples until the C APIs and the LLVM-IR are the only place inside LLVM where Triples are still used.

  4. Change the implementation of TargetTuple to whatever is convenient for LLVM’s internals and decide on a serialization.

  5. Replace serialized Triples with serialized TargetTuples in LLVM-IR.

a. Maintain backwards compatibility with IR using triples, at least for a while.

  1. Add TargetTuple support to the C API. Exact API is t.b.d.

  2. Have the API users mutate the TargetTuple appropriately.

Renato: This has been revised slightly from the last one we discussed due to public C++ API’s being used internally as well as externally.

Where we are now

I’ve just started posting patches for step 2 and 3 of the plan. My working copy is nearly at step 4.

What’s next

Upstream step 2 and 3 and then begin replacing the TargetTuple implementation as per step 4.

Previous Discussions

http://thread.gmane.org/gmane.comp.compilers.llvm.devel/86020/focus=86073. I should mention that I’ve since been made aware that the original topic of private label prefixes could be solved in a much simpler way than previously thought. The triple related discussion is still relevant though.

I understand from Renato that there are more threads over the last few years but I haven’t looked for them.

Daniel Sanders

Leading Software Design Engineer, MIPS Processor IP

Imagination Technologies Limited

www.imgtec.com

The first reason is that compiler options can overrule the triple but leave
it unchanged. For example, in GCC mips-linux-gnu-gcc normally produces
32-bit MIPS-I output using the O32 ABI, but 'mips-linux-gnu-gcc –mips64'
normally produces 64-bit MIPS-III output using the N32 ABI. Like GCC,
compiler options to mips-linux-gnu-clang should (and mostly do but MIPS has
a few crashing cases caused by triple misuse) overrule the triple. However,
we don't mutate the triple to reflect this so any decisions based on the
overridable state cannot rely on the triple to accurately reflect the
desired behaviour.

Another very annoying fact is that the Clang driver re-parses triples
many times, and sometimes they change the triple based on a CPU, and
then end up with a different CPU.

There was a bug that if you passed "thumbv7", it would not recognise,
pick "ARM7TDMI" CPU, and later change the triple to "armv4t" because
of that, and pass *that* to the internal processes (GAS, IAS, linker).

This bug has been fixed by adding "thumb" to it, but the underlying
reason it happened means there are plenty of other similar bugs
waiting to happen. We need to fix the mechanism in which we understand
targets, and having an unambiguous description that spans across *all*
LLVM projects (including Clang, LLD, LLDB) and tools (llc, lli,
llvm-mc, etc) is the ultimate goal.

Most targets do not have those problems, but Mips and ARM are a big
mess. That's why we're so interested in making that happen.

This configure-by-source-patch approach seems to make
some people uncomfortable so we may have to find another way to configure
the triples (tablegen?).

Another option that would make *all* distributions happy would be to
adopt the same approach as GCC and have CMake options for default ABI
choices.

This would be harder to implement, but we can hide the mess under a
separate class (TargetABI?). I actually prefer this solution to either
tablegen or patch-sets.

a. Maintain backwards compatibility with IR using triples, at least
for a while.

Probably forever... :frowning:

Again, this can be isolated in TargetABI or some other place.

Renato: This has been revised slightly from the last one we discussed due to
public C++ API's being used internally as well as externally.

Roger.

I'd also wanted to add the TargetParser changes and how this all fit
into the plan.

TargetParser is a target-specific class that knows how to parse
strings and convert to platform choices (and back) in a way that is
both unambiguous and ubiquitous.

Meaning, all tools and projects should use the TargetParser to
understand what are the platform specific options, what do they mean
and how are they related to each other.

The TargetTuple is the perfect companion, since we can use the
TargetParser to understand triples, cpu/fpu names, extensions, etc,
and related them inside the Tuple in an ABI-specific way (via the ABI
modifiers).

Right now, both Clang and LLVM ARM assembler are using the
ARMTargetParser, but the idea is to expand this to the Triple when
TargetTuple beings Daniel's plan step #4 onwards.

We'll probably have a MIPSTargetParser, too. And probably refactor the
hierarchy of that, to make sure we can get polymorphism and so on. But
I wanted to keep-it-simple-and-stupid before we had a reason not to.

I understand from Renato that there are more threads over the last few years
but I haven't looked for them.

There were numerous discussions about the driver complicated
structure, never-ending bugs, parsing mismatches and triple
shenanigans over the last 5 years. I couldn't possibly link all of
them here. :slight_smile:

Specifically between Mips and ARM, I think me and Reed had a few
specific discussions a few years ago, but I can't seem to find them.
That was probably 2010/2011.

cheers,
--renato

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: 08 July 2015 16:09
To: Daniel Sanders
Cc: LLVM Developers Mailing List (llvmdev@cs.uiuc.edu); Eric Christopher
(echristo@gmail.com); Jim Grosbach (grosbach@apple.com)
Subject: Re: The Trouble with Triples

> The first reason is that compiler options can overrule the triple but leave
> it unchanged. For example, in GCC mips-linux-gnu-gcc normally produces
> 32-bit MIPS-I output using the O32 ABI, but 'mips-linux-gnu-gcc –mips64'
> normally produces 64-bit MIPS-III output using the N32 ABI. Like GCC,
> compiler options to mips-linux-gnu-clang should (and mostly do but MIPS
has
> a few crashing cases caused by triple misuse) overrule the triple. However,
> we don't mutate the triple to reflect this so any decisions based on the
> overridable state cannot rely on the triple to accurately reflect the
> desired behaviour.

Another very annoying fact is that the Clang driver re-parses triples
many times, and sometimes they change the triple based on a CPU, and
then end up with a different CPU.

There was a bug that if you passed "thumbv7", it would not recognise,
pick "ARM7TDMI" CPU, and later change the triple to "armv4t" because
of that, and pass *that* to the internal processes (GAS, IAS, linker).

This bug has been fixed by adding "thumb" to it, but the underlying
reason it happened means there are plenty of other similar bugs
waiting to happen. We need to fix the mechanism in which we understand
targets, and having an unambiguous description that spans across *all*
LLVM projects (including Clang, LLD, LLDB) and tools (llc, lli,
llvm-mc, etc) is the ultimate goal.

Most targets do not have those problems, but Mips and ARM are a big
mess. That's why we're so interested in making that happen.

This reminded me of something I noticed in passing and haven't investigated yet. I think TargetRegistry::lookupTarget(const std::string &ArchName, Triple&, std::string&) can do this kind of thing when ArchName is a registered architecture but is not understood by Triple::getArchTypeForLLVMName(). I haven't had chance to try it but it looks like you'd get the intended target architecture without updating the triple to match (so x86_64-linux-gnu could target a completely different architecture).

> This configure-by-source-patch approach seems to make
> some people uncomfortable so we may have to find another way to
configure
> the triples (tablegen?).

Another option that would make *all* distributions happy would be to
adopt the same approach as GCC and have CMake options for default ABI
choices.

This would be harder to implement, but we can hide the mess under a
separate class (TargetABI?). I actually prefer this solution to either
tablegen or patch-sets.

I can see a way to make the CMake option approach work nicely for native. The constructor can check for the default triple and apply the effects of the CMake options to it. I don't think there's a good way to support the multiple triples or heterogenous use cases via CMake options but support for that was more a happy coincidence rather than intentional design.

If we take this route, I'm hoping I don't need to do autoconf since I don't know it very well.

> a. Maintain backwards compatibility with IR using triples, at least
> for a while.

Probably forever... :frowning:

Again, this can be isolated in TargetABI or some other place.

As far as I know there is no backwards compatibility promise, but equally it doesn't seem reasonable to give no notice before removing it. I'm therefore thinking that we can deprecate it in one release (3.7 or 3.8), then remove it in the next.

I can see a way to make the CMake option approach work nicely for native. The constructor can check for the default triple and apply the effects of the CMake options to it. I don't think there's a good way to support the multiple triples or heterogenous use cases via CMake options but support for that was more a happy coincidence rather than intentional design.

Well, I'd say the CMake options would change the behaviour for the
target architecture, not the host, which is a GCC thing, not an LLVM
thing.

Some people have suggested config files. So we'd have (say)
Targets.cfg on LLVM's source tree copied to the build tree, unless you
specify -DTARGETS_CONFIG=/foo/bar/UbuntuTargets.cfg, and that would
populate the defaults in TargetABI. Of course, this would be a big
change and it's probably for after we do all we already planned to. :slight_smile:

As far as I know there is no backwards compatibility promise, but equally it doesn't seem reasonable to give no notice before removing it. I'm therefore thinking that we can deprecate it in one release (3.7 or 3.8), then remove it in the next.

Well, the problem here is that changing build systems is even harder
than changing user code. So changes in how triples or legacy/GNU
command line options are interpreted end up being kept *a lot* longer
than other LLVM specific features.

cheers,
--renato

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: 08 July 2015 19:01
To: Daniel Sanders
Cc: LLVM Developers Mailing List (llvmdev@cs.uiuc.edu); Eric Christopher
(echristo@gmail.com); Jim Grosbach (grosbach@apple.com)
Subject: Re: The Trouble with Triples

> I can see a way to make the CMake option approach work nicely for native.
The constructor can check for the default triple and apply the effects of the
CMake options to it. I don't think there's a good way to support the multiple
triples or heterogenous use cases via CMake options but support for that
was more a happy coincidence rather than intentional design.

Well, I'd say the CMake options would change the behaviour for the
target architecture, not the host, which is a GCC thing, not an LLVM
thing.

I agree that the target architecture is the one that should be configured, but which architecture is that? In GCC, this is obvious because there is only one target triple in each build of the compiler. Similarly, in clang's there is only one native triple in each build so that case has an obvious answer too. However, for cross-compilation with clang we have all possible targets to choose from. How would CMake know whether to apply the customizations specified in the CMake variables to 'clang -target armv7-linux-gnu', 'clang -target mips-mti-linux-gnu', or 'clang -target x86_64-linux-android'?

Some people have suggested config files. So we'd have (say)
Targets.cfg on LLVM's source tree copied to the build tree, unless you
specify -DTARGETS_CONFIG=/foo/bar/UbuntuTargets.cfg, and that would
populate the defaults in TargetABI. Of course, this would be a big
change and it's probably for after we do all we already planned to. :slight_smile:

Some of my colleagues from other projects have suggested the same thing off-list. It sounds like a good solution to me. I haven't given much thought to the details yet, but the one concern that springs to mind is that a simple config file (e.g. a triple -> tuple map) is likely to repeat itself a lot, and avoiding that redundancy moves the config file towards a small scripting language. Finding the right balance might be tricky.

> As far as I know there is no backwards compatibility promise, but equally it
doesn't seem reasonable to give no notice before removing it. I'm therefore
thinking that we can deprecate it in one release (3.7 or 3.8), then remove it in
the next.

Well, the problem here is that changing build systems is even harder
than changing user code. So changes in how triples or legacy/GNU
command line options are interpreted end up being kept *a lot* longer
than other LLVM specific features.

cheers,
--renato

I don't think this IR change is the same as changing build systems. My thinking is that llvm::Module has a TargetTuple and AssemblyWriter/BitcodeWriter will always write out this tuple. With this, natural recompilation should remove the 'target triple' statements from all IR in the wild in a reasonable timescale.

I agree that the target architecture is the one that should be configured, but which architecture is that? In GCC, this is obvious because there is only one target triple in each build of the compiler. Similarly, in clang's there is only one native triple in each build so that case has an obvious answer too. However, for cross-compilation with clang we have all possible targets to choose from. How would CMake know whether to apply the customizations specified in the CMake variables to 'clang -target armv7-linux-gnu', 'clang -target mips-mti-linux-gnu', or 'clang -target x86_64-linux-android'?

That's why I said this is a "GCC thing"... :slight_smile:

Apart from using a config file, adding multiple triples to the CMake
command line would work ok-ish. The other unspecified targets would
keep their defaults, if built.

Something like:

$ cmake $llvm_src -DLLVM_TARGETS_TO_BUILD="ARM;AArch64;X86"
-DLLVM_TARGETS_DEFAULTS="armv7a-linux-gnueabihf;aarch64-linux-gnu;x86_64-linux-gnu"

Would work pretty easy in the same way TARGETS_TO_BUILD already work.

Some of my colleagues from other projects have suggested the same thing off-list. It sounds like a good solution to me. I haven't given much thought to the details yet, but the one concern that springs to mind is that a simple config file (e.g. a triple -> tuple map) is likely to repeat itself a lot, and avoiding that redundancy moves the config file towards a small scripting language. Finding the right balance might be tricky.

Adding another DSL would be a barrier... I believe that's why you
suggested tablegen.

I'm only foreseeing a couple of fields per target anyway, so a simple
json file with overriding semantics would work. The default one in
LLVM may be big and ugly, and distros only override what they want,
making their patches as simple as they need to be.

I don't think this IR change is the same as changing build systems. My thinking is that llvm::Module has a TargetTuple and AssemblyWriter/BitcodeWriter will always write out this tuple. With this, natural recompilation should remove the 'target triple' statements from all IR in the wild in a reasonable timescale.

Oh, I thought you were referring to change how triples were
interpreted by the driver, which unfortunately, has to be done the GCC
way for all legacy ones. :frowning:

In the module, I agree, it should be ok to deprecate that fast.

cheers,
--renato

The use-case that I’d really like to go from mostly-working to actually-working is the ability to create symlinked versions of clang with a triple prefix and have it behave sensibly. We can symlink clang to mips64-unknown-freebsd-clang and get a working cross-compiler, more or less, except that we also want to specify things like the default sysroot. Having the bit in the name of the compiler just be a name for a config file containing a set of command-line options would be very nice - we’d have a set of predefined names, and then if someone wanted to provide a androidsdk-v47-arm.conf (or whatever) and just drop it into a known location then they’d be able to use androidsdk-v47-arm-clang as a cross compiler.

David

This already works well with Clang, but is restricted to the triples
that actually make sense. If you need to change anything that the
triple can't, you're on your own.

But I agree this is a sensible alternative, or even in conjunction,
with CMake options or config files.

cheers,
--renato

Right, the problem is that the triple almost never contains enough information for a cross-compile (which, at a minimum, needs to know where the default sysroot is, where to find cross-linkers, and may also need to target a specific CPU variant or turn on soft float). It works moderately well for trivial cases (e.g. targeting x86 Linux from x86-64 Linux, and even targeting ARM Linux from x86-64 Linux, as long as ARM Linux doesn’t mean Android or WebOS or some flavour of Linux that’s different from the host).

I would really like to completely remove the triple as something that can be decomposed into meaningful components and just have it become a name, which is only used by the front end to identify a configuration.

David

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: 09 July 2015 10:33
To: Daniel Sanders
Cc: LLVM Developers Mailing List (llvmdev@cs.uiuc.edu); Eric Christopher
(echristo@gmail.com); Jim Grosbach (grosbach@apple.com)
Subject: Re: The Trouble with Triples

> I agree that the target architecture is the one that should be configured,
but which architecture is that? In GCC, this is obvious because there is only
one target triple in each build of the compiler. Similarly, in clang's there is only
one native triple in each build so that case has an obvious answer too.
However, for cross-compilation with clang we have all possible targets to
choose from. How would CMake know whether to apply the customizations
specified in the CMake variables to 'clang -target armv7-linux-gnu', 'clang -
target mips-mti-linux-gnu', or 'clang -target x86_64-linux-android'?

That's why I said this is a "GCC thing"... :slight_smile:

Apart from using a config file, adding multiple triples to the CMake
command line would work ok-ish. The other unspecified targets would
keep their defaults, if built.

Something like:

$ cmake $llvm_src -DLLVM_TARGETS_TO_BUILD="ARM;AArch64;X86"
-DLLVM_TARGETS_DEFAULTS="armv7a-linux-gnueabihf;aarch64-linux-
gnu;x86_64-linux-gnu"

Would work pretty easy in the same way TARGETS_TO_BUILD already work.

That makes sense to me with a small tweak. Different triples having different customizations is likely to be quite common for ARM and MIPS in particular so I'd suggest using lists of triple=tuple pairs. For example:
  -DLLVM_TARGETS_DEFAULTS="armv7a-linux-gnueabihf=...armv7atuple...;aarch64-linux-gnu=...aarch64tuple...;x86_64-linux-gnu=...x86_64tuple..."

> Some of my colleagues from other projects have suggested the same thing
off-list. It sounds like a good solution to me. I haven't given much thought to
the details yet, but the one concern that springs to mind is that a simple
config file (e.g. a triple -> tuple map) is likely to repeat itself a lot, and
avoiding that redundancy moves the config file towards a small scripting
language. Finding the right balance might be tricky.

Adding another DSL would be a barrier... I believe that's why you
suggested tablegen.

That's right.

I'm only foreseeing a couple of fields per target anyway, so a simple
json file with overriding semantics would work. The default one in
LLVM may be big and ugly, and distros only override what they want,
making their patches as simple as they need to be.

That makes sense to me.

Right, the problem is that the triple almost never contains enough information for a cross-compile (which, at a minimum, needs to know where the default sysroot is, where to find cross-linkers, and may also need to target a specific CPU variant or turn on soft float). It works moderately well for trivial cases (e.g. targeting x86 Linux from x86-64 Linux, and even targeting ARM Linux from x86-64 Linux, as long as ARM Linux doesn’t mean Android or WebOS or some flavour of Linux that’s different from the host).

Yes, multi-arch / multi-lib kind of thing works well if we're only
talking about Linux.

I would really like to completely remove the triple as something that can be decomposed into meaningful components and just have it become a name, which is only used by the front end to identify a configuration.

That's what I'm sceptical about. I don't think that's possible.

What we can (and should) do is to make it redundant. If we have a
better solution, and it's clearly beneficial, people will start using
it and maybe GNU tools could even implement that. Legacy systems will
still use triples forever, though.

cheers,
--renato

Yes, and that's why a config file would be the answer for those.

It may be beneficial to have the CMake option for simple triples, for
now as an implementation example, but later override it with the
config file.

We could even keep the CMake options in conjunction with the config
file, but that'd have to be well documented so people understand what
overrides what.

cheers,
--renato

My slightly modified version of clang, ecc (http:ellcc.org) has been using config files for quite a while. I've mentioned them on the mailing list before.

In the current implementation, if either the program name (via a symlink) or the argument to the -target option matches the name of a config file, that file is read and used to control the driver. I used the pre-existing YAML parser to parse the config files. A typical config file looks like this:

based_on: microblaze-ellcc-linux
compiler:
   options:
     - -target microblaze-ellcc-linux
     - -D__ELK__=1
   c_include_dirs:
     - '$R/include/elk/microblaze'
     - '$R/include/elk'
     - '$R/include/microblaze'
     - '$R/include'
linker:
   options:
     - -Telk.ld
     - -m elf32mb_linux
   static_crt1: $R/lib/microblaze-elk-eng/crt1.o
   dynamic_crt1: $R/lib/microblaze-elk-eng/Scrt1.o
   crtbegin: $R/lib/microblaze-linux-eng/crtbegin.o
   crtend: $R/lib/microblaze-linux-eng/crtend.o

   library_paths:
     - -L$R/lib/microblaze-elk-eng
     - -L$R/lib/elk
     - -L$R/lib/microblaze-linux-eng
   c_libraries:
     - -lelk
     - '-('
     - -lc
     - -lcompiler-rt
     - '-)'

The "based_on" field allows a configuration file to be based on another config file. Base config files can be compiled into the driver like "microblaze-ellcc-linux" in this example. Other examples of config files can be found at http://ellcc.org/viewvc/svn/ellcc/trunk/libecc/config/

-Rich

That's more or less what was proposed back then.

I'm not sure how all the options will be laid out, but we'll need some
documentation outlining precisely what each one means and how they
behave, so that others can extend it without having to look at the
source code.

Also, we can hide most complexity on the default configurations for
each target (like you do on {arch}-ellcc-{env}).

Maybe this would be a good BoF for the US LLVM Meeting?

cheers,
--renato

Hi Daniel,

I’m not sure I agree with the basic idea of using the target triple as a way of encoding all of the pieces of target data as a string. I think in a number of cases what we need to do is either open up API to the back end to specify things, or encode the information into the IR when it’s different from the generic triple. Ideally the triple will have enough information to do basic layout and anything else can be either gotten from the IR or passed via option.

My suggestion on a route forward here is that we should look at the particular API and areas of the backend that you’re having an issue with and figure out how to best communicate the data you’d like to the appropriate area. I realize this probably seems a little vague and handwavy, but I don’t know what areas you’ve been having problems with lately. I’ll absolutely help with this effort if you need assistance or guidance in any way.

Thanks!

-eric

I'm not sure I agree with the basic idea of using the target triple as a way
of encoding all of the pieces of target data as a string.

Hi Eric,

That's not the idea at all.

The Triple object will remain unchanged.

The Tuple will be the API to handle getting/setting parameters
depending on the Triple, compiler flags, attributes, etc.

There will be no string representation of all options, as that would
be impossible, or at least, highly undesirable, especially in the IR.

The Tuple is for the sole use of front-ends, middle-ends and back-ends
to communicate and understand the *same* meaning regarding the *same*
input.

Having a Tuple class that encodes details of the targets go a long way
to ensure that, since you can directly pass the Tuple when you build
the Target objects, and the information it provides will be identical,
no matter where it is. Right now, we have multiple representations of
the targets' information because the Triple object cannot encode every
aspect of them, especially problematic between Clang and LLVM.

The decision to create a new class (Tuple) is because Triple already
has a legacy-heavy meaning, which should not change.

This is not about the serialization of the target information, but
rather the consistent manipulation of it inside the compiler.

My suggestion on a route forward here is that we should look at the
particular API and areas of the backend that you're having an issue with and
figure out how to best communicate the data you'd like to the appropriate
area.

That is precisely what he's doing. :slight_smile:

cheers,
--renato

The Triple object will remain unchanged.

The Tuple will be the API to handle getting/setting parameters
depending on the Triple, compiler flags, attributes, etc.

This part doesn’t seem obvious from the direction the patches are going.

There will be no string representation of all options, as that would
be impossible, or at least, highly undesirable, especially in the IR.

Yes.

The Tuple is for the sole use of front-ends, middle-ends and back-ends
to communicate and understand the same meaning regarding the same
input.

Definitely don’t want this in the middle end at all. That all can be part of the TargetMachine/TargetSubtargetInfo interface.

Having a Tuple class that encodes details of the targets go a long way
to ensure that, since you can directly pass the Tuple when you build
the Target objects, and the information it provides will be identical,
no matter where it is. Right now, we have multiple representations of
the targets’ information because the Triple object cannot encode every
aspect of them, especially problematic between Clang and LLVM.

This part I agree with.

The decision to create a new class (Tuple) is because Triple already
has a legacy-heavy meaning, which should not change.

Agreed with at least the “because” part.

This is not about the serialization of the target information, but
rather the consistent manipulation of it inside the compiler.

My suggestion on a route forward here is that we should look at the
particular API and areas of the backend that you’re having an issue with and
figure out how to best communicate the data you’d like to the appropriate
area.

That is precisely what he’s doing. :slight_smile:

OK. What’s the general class design look like then? The text from the original mail was fairly confusing then as I thought he was doing something that you say he isn’t doing :slight_smile:

-eric

This part doesn't seem obvious from the direction the patches are going.

Until now, most of what he has done was to refactor the Triple class,
with no functional change, and to create a thin layer around the
Triple (called Tuple) and pass those instead. This is on par with that
premise.

The current patch is the first one to actually have some more
substantial change, so it's good that you stopped it now, before we
start breaking everything.

Maybe, knowing what it is now, if you could have another quick look at
the patch, and see if the new light has helped understand the patch
for what it will be. Maybe it's still not good enough, so then we'll
have to resort to a new round of design discussions.

Definitely don't want this in the middle end at all. That all can be part of
the TargetMachine/TargetSubtargetInfo interface.

Ah, yes! But the TargetMachine (& pals) are created from information
from the Triple and the other bits that front-ends keep for
themselves.

So, in theory, if the Tuple is universal, creating them with a Tuple
(instead of a Triple+stuff) will free the front-ends of keeping the
rest of the info on their own, and TargetMachine/SubTargetInfo/etc
will be more homogeneous across different tools / front-ends than it
is today.

Another strong point is: we're not trying to change any other machine
description / API. This is just about the user options and defaults,
that are used to *create* machine descriptions.

The decision to create a new class (Tuple) is because Triple already
has a legacy-heavy meaning, which should not change.

Agreed with at least the "because" part.

There was also the name. Triple is very far from the truth. :slight_smile:

But changing the Triple class could cause ripples in the mud that
would be hard to see at first, and hard to change later, after people
started relying on it.

The final goal is that the Triple class would end up as being nothing
more than a Triple *parser*, with the current legacy logic, setting up
the Tuple fields and using them to select the rest of the default
fields.

OK. What's the general class design look like then? The text from the
original mail was fairly confusing then as I thought he was doing something
that you say he isn't doing :slight_smile:

Daniel, can you send your current plan for the Tuple class?

cheers,
--renato

Definitely don’t want this in the middle end at all. That all can be part of
the TargetMachine/TargetSubtargetInfo interface.

Ah, yes! But the TargetMachine (& pals) are created from information
from the Triple and the other bits that front-ends keep for
themselves.

Yep.

So, in theory, if the Tuple is universal, creating them with a Tuple
(instead of a Triple+stuff) will free the front-ends of keeping the
rest of the info on their own, and TargetMachine/SubTargetInfo/etc
will be more homogeneous across different tools / front-ends than it
is today.

Another strong point is: we’re not trying to change any other machine
description / API. This is just about the user options and defaults,
that are used to create machine descriptions.

Agreed. This sounds like the direction I’ve been wanting to go for a bit with some target options being passed along at target machine creation time etc. It’s hard though :slight_smile:

The decision to create a new class (Tuple) is because Triple already
has a legacy-heavy meaning, which should not change.

Agreed with at least the “because” part.

There was also the name. Triple is very far from the truth. :slight_smile:

Oh I don’t know. It’s a triple :slight_smile:

But changing the Triple class could cause ripples in the mud that
would be hard to see at first, and hard to change later, after people
started relying on it.

Agreed.

The final goal is that the Triple class would end up as being nothing
more than a Triple parser, with the current legacy logic, setting up
the Tuple fields and using them to select the rest of the default
fields.

Hrm.

OK. What’s the general class design look like then? The text from the
original mail was fairly confusing then as I thought he was doing something
that you say he isn’t doing :slight_smile:

Daniel, can you send your current plan for the Tuple class?

Please. I don’t think we’re far off in goals, just perhaps implementation :slight_smile:

Thanks!

-eric

Hi Eric,

Thanks for getting back to me on this.

I'm not sure I agree with the basic idea of using the target triple as a way of
encoding all of the pieces of target data as a string. I think in a number of
cases what we need to do is either open up API to the back end to specify things,
or encode the information into the IR when it's different from the generic triple.
Ideally the triple will have enough information to do basic layout and anything
else can be either gotten from the IR or passed via option.

(from the context, you might have meant 'tuple' where you've written 'triple'. I'm answering based on the assumption you meant 'triple')

The GNU triple is already used as a way of encoding a large amount of the target data in a string but unfortunately, while this data is passed throughout LLVM, it isn't reliable because GNU triples are ambiguous and inconsistent. For example, in GCC toolchains mips-linux-gnu probably means a MIPS target on Gnu/Linux but anything beyond that (ISA revision, default ABI, multilib layout, etc.) is up to the person who built the toolchain and may change over time. Another example is that Debian's definition for i386-linux-gnu has been i486 and i586 at various points in time.

The proposed TargetTuple is a direct replacement for the GNU triple and is intended to resolve this ambiguity and move away from a string-based implementation (we need to keep a string serialization though, see below). Essentially, I'm trying to push the ambiguity out of the internals and give the distributor control of how the ambiguity is resolved for their environment. Once that is done, we'll be able to rely on the TargetTuple for information about the target such as ABI's, architecture revisions, endianness, etc.

I agree that we should open up the API to specify the appropriate data and that is something that TargetTuple will acquire during step 4 and 7 of the plan (mostly step 7 where compiler/tool options begin mutating the target tuple). I don't agree with keeping the GNU triple around though for two main reasons. The first is that most people believe that GNU triples accurately describe the target and there will be a strong temptation to inappropriately base logic on them. The second is that the meaning of the triple varies between toolchain builds and over time and there is a significant potential for bugs where different parts of the toolchain use different meanings for the same GNU triple (due to rebuilding or switching toolchains, or moving objects from system to system). We ought to resolve the ambiguity once and then stick to that interpretation.

The string serialization I mentioned above is useful for LLVM-IR as part of a direct replacement for the 'target triple' statement. We could split this statement up into smaller pieces but the migration to target tuples is already difficult so I think it would be best to do a direct replacement first and redesign the IR statements later if we want to. The serialization is also useful for command line options on internal tools such as llc to give us precise control over our tests that the GNU triple can't deliver. This will be particularly important when distributors can apply their own disambiguations to GNU triples. The serialization may also be useful as part of a C API but I haven't given the C API much thought beyond preserving the current API.

Hopefully, that helps clear up your concerns. Let me know if there's anything that still seems strange.

My suggestion on a route forward here is that we should look at the particular
API and areas of the backend that you're having an issue with and figure out
how to best communicate the data you'd like to the appropriate area. I realize
this probably seems a little vague and handwavy, but I don't know what areas
you've been having problems with lately. I'll absolutely help with this effort if
you need assistance or guidance in any way.

The MIPS specific problems are broad and varied. Some of the bigger ones are:
* Building clang on a 32-bit Debian and a 64-bit MIPS processor produces a compiler that cannot target the native system. The release packages work around this by 'cross-compiling' from the host triple to the target triple which are different strings (mips-linux-gnu vs mips64-linux-gnu) but have the same meaning.
* It's not possible to produce a clang that can generate code for both 32-bit and 64-bit MIPS without one of them needing a -target option to change the GNU triple. This is because we based the logic on the triple and lack anything else to use.
* Various details (ELF headers, label prefixes, exception personality, JIT target, etc.) depend on the ABI and OS Distribution rather than just 32-bit vs 64-bit
* It's not possible to implement clang in a way that can support all of mips-linux-gnu's possible meanings. mips-mti-linux-gnu, and mips-img-linux-gnu have the same problem to a lesser degree.