RFC: Deprecate config.guess

Hi,

I was looking into fixing host detection (which is currently done using
config.guess) for my operating system and came across these old
reviews[1][2] with objections to updating it due to its license.

If we can't update config.guess any more that I would like to propose that
we (eventually) remove it completely.

I've submitted this patch[3], which turns off config.guess by default
and tries to determine the host triple using the host compiler. Given
all the different combinations for triples, I don't know if there is
any better way to test this than to turn it off and wait for the bug
reports. My hope is that over time, we will be able to replace the
functionality from config.guess that users actually care about and
at some point we'll be able to remove config.guess completely.

Thanks,
Tom

[1] ⚙ D55445 [cmake] Update config.guess to gnuconfig git 2018-12-07
[2] ⚙ D99625 llvm/cmake/config.guess: update to current version
[3] ⚙ D109837 cmake: Remove config.guess

I think this is a good change and one that we should really work on, but I’m not sure compiler front-ends are a drop-in replacement for all variations.

Perhaps add the option and make it ON by default for now, which will still work for all current platforms, and add the possibility of turning guessing OFF and allowing those that aren’t supported by out very old config.guess version to use the compiler.

Fixing the triple in the front-end isn’t really trivial, there are a number of replacements on the triple until it hits the middle-end, so this may actually create problems that aren’t easy to fix down the line.

Just like moving from automake to CMake, I think we need to do this slowly and changing buildbots until all are running the new format, then we change the default behaviour.

cheers,
–renato

I've submitted this patch[3], which turns off config.guess by default
and tries to determine the host triple using the host compiler. Given
all the different combinations for triples, I don't know if there is
any better way to test this than to turn it off and wait for the bug
reports.

I think this is a good change and one that we should really work on, but I'm not sure compiler front-ends are a drop-in replacement for all variations.

+1 :slight_smile:

Perhaps add the option and make it ON by default for now, which will still work for all current platforms, and add the possibility of turning guessing OFF and allowing those that aren't supported by out very old config.guess version to use the compiler.

I think config.guess can be OFF by default. Doing two stages probably
doesn't bring too much benefit.
The nature of such a change is that only when I flip the default,
users will actually notice.
But I have some theories that switching now will actually be better.... Read on.

For the patch (⚙ D109837 cmake: Remove config.guess), I suggested that we
can use {gcc,clang} -dumpmachine.
Users of alternative compilers have already contributed relevant logic
to llvm/cmake/modules/GetHostTriple.cmake ,
and we should just focus on gcc/clang which are more relevant for the
existing config.guess use cases.
I believe {gcc,clang} -dumpmachine is correct in more cases than config.guess .

When is config.guess wrong? Here are two examples that I can attest:

1. For example, I have an unfinished upgrade from FreeBSD 12.2 to FreeBSD 13.0.
The host compiler's `/usr/bin/clang -dumpmachine` output remains
x86_64-unknown-freebsd12.2 while config.guess starts to say
x86_64-unknown-freebsd13 because it just checks `uname -a`.

2. On musl based Linux distributions, other than riscv*, all have
incorrect *-linux-gnu triplets.
So on Alpine Linux amd64, I need to set
-DLLVM_HOST_TRIPLE=x86_64-alpine-linux-musl
because config.guess says x86_64-unknown-linux-gnu, which is wrong.

I think this may be wrong for *-suse-* and *-redhat-* as well but I
don't have such machines at hand.

>
>>
>> I've submitted this patch[3], which turns off config.guess by default
>> and tries to determine the host triple using the host compiler. Given
>> all the different combinations for triples, I don't know if there is
>> any better way to test this than to turn it off and wait for the bug
>> reports.
>
>
> I think this is a good change and one that we should really work on, but I'm not sure compiler front-ends are a drop-in replacement for all variations.

+1 :slight_smile:

> Perhaps add the option and make it ON by default for now, which will still work for all current platforms, and add the possibility of turning guessing OFF and allowing those that aren't supported by out very old config.guess version to use the compiler.

I think config.guess can be OFF by default. Doing two stages probably
doesn't bring too much benefit.
The nature of such a change is that only when I flip the default,
users will actually notice.

s/I/we/ :slight_smile:

When GCC or Clang is unavailable and the compiler is not an existing
with encoded logic (see MSVC, z/OS),
the cmake code will give a nice warning that -DLLVM_HOST_TRIPLE=
should be specified explicitly.
This seems much better than config.guess potentially inferring a wrong triplet.

I've submitted this patch[3], which turns off config.guess by default
and tries to determine the host triple using the host compiler. Given
all the different combinations for triples, I don't know if there is
any better way to test this than to turn it off and wait for the bug
reports.

I think this is a good change and one that we should really work on, but I'm not sure compiler front-ends are a drop-in replacement for all variations.

+1 :slight_smile:

Perhaps add the option and make it ON by default for now, which will still work for all current platforms, and add the possibility of turning guessing OFF and allowing those that aren't supported by out very old config.guess version to use the compiler.

I think config.guess can be OFF by default. Doing two stages probably
doesn't bring too much benefit.
The nature of such a change is that only when I flip the default,
users will actually notice.
But I have some theories that switching now will actually be better.... Read on.

For the patch (⚙ D109837 cmake: Remove config.guess), I suggested that we
can use {gcc,clang} -dumpmachine.
Users of alternative compilers have already contributed relevant logic
to llvm/cmake/modules/GetHostTriple.cmake ,
and we should just focus on gcc/clang which are more relevant for the
existing config.guess use cases.
I believe {gcc,clang} -dumpmachine is correct in more cases than config.guess .

When is config.guess wrong? Here are two examples that I can attest:

1. For example, I have an unfinished upgrade from FreeBSD 12.2 to FreeBSD 13.0.
The host compiler's `/usr/bin/clang -dumpmachine` output remains
x86_64-unknown-freebsd12.2 while config.guess starts to say
x86_64-unknown-freebsd13 because it just checks `uname -a`.

2. On musl based Linux distributions, other than riscv*, all have
incorrect *-linux-gnu triplets.
So on Alpine Linux amd64, I need to set
-DLLVM_HOST_TRIPLE=x86_64-alpine-linux-musl
because config.guess says x86_64-unknown-linux-gnu, which is wrong.

I think this may be wrong for *-suse-* and *-redhat-* as well but I
don't have such machines at hand.

config.guess is wrong on Red Hat systems (which is why I first started
looking into this).

-Tom

The nature of such a change is that only when I flip the default,
users will actually notice.

Getting a different triple may be worse than the wrong triple.

The wrong triple crashes something and is easy to identify. A different, but valid, triple may bring slight architectural changes down the line that are difficult to spot.

Changing the triple on the front end isn’t always trivial either.

There’s a whole dance of changing triples in the clang driver that defies logic sometimes. The command line options to the actual front-end depend on the triple and the path taken, which in turn, can change code generation in unpredictable ways down the line.

Users of alternative compilers have already contributed relevant logic
to llvm/cmake/modules/GetHostTriple.cmake ,
and we should just focus on gcc/clang which are more relevant for the
existing config.guess use cases.
I believe {gcc,clang} -dumpmachine is correct in more cases than config.guess .

Not only that, but using the compiler driver to “predict” what the compiler front-end needs is the obvious thing to do.

I’m not against the change, I think we should have done this a long time ago, but I think we can give people some grace time to test out on their sides, especially downstream people and less popular platforms that still use clang/gcc to build LLVM.

Giving them a CMake flag to test out for a few weeks wouldn’t hurt before we turn it on by default.

cheers,
–renato

The nature of such a change is that only when I flip the default,
users will actually notice.

Getting a different triple may be worse than the wrong triple.

The wrong triple crashes something and is easy to identify. A different, but valid, triple may bring slight architectural changes down the line that are difficult to spot.

Changing the triple on the front end isn't always trivial either.

There's a whole dance of changing triples in the clang driver that defies logic sometimes. The command line options to the actual front-end depend on the triple and the path taken, which in turn, can change code generation in unpredictable ways down the line.

Users of alternative compilers have already contributed relevant logic
to llvm/cmake/modules/GetHostTriple.cmake ,
and we should just focus on gcc/clang which are more relevant for the
existing config.guess use cases.
I believe {gcc,clang} -dumpmachine is correct in more cases than config.guess .

Not only that, but using the compiler driver to "predict" what the compiler front-end needs is the obvious thing to do.

I'm not against the change, I think we should have done this a long time ago, but I think we can give people some grace time to test out on their sides, especially downstream people and less popular platforms that still use clang/gcc to build LLVM.

I'm glad we are on the same page for the direction, but I don't think
giving more grace time would be beneficial.

*-suse-*, *-redhat-*, *-linux-musl triples are already incorrect.
It led to unnecessary riscv-* changes such as
⚙ D63497 Add support for openSUSE RISC-V triple , ⚙ D74399 [Driver][RISCV] Add RedHat Linux RISC-V triple.
For less popular platforms, config.guess really caused more harm than
its benefit.
Some of the less popular platforms (I happen to like exploring such
less popular platforms sometimes) may need LLVM_DEFAULT_TARGET_TRIPLE
and LLVM_HOST_TRIPLE to cancel the config.guess harm.
In ⚙ D109837 cmake: Remove config.guess , Tom mentioned that
llvm/cmake/modules/GetHostTriple.cmake can just notify the user about
fetching the latest config.guess .
This will give the less popular compilers (technically, llvm-project
doesn't even support such compilers
Getting Started with the LLVM System — LLVM 18.0.0git documentation)
transition period.

The nature of such a change is that only when I flip the default,
users will actually notice.

Getting a different triple may be worse than the wrong triple.

The wrong triple crashes something and is easy to identify. A different, but valid, triple may bring slight architectural changes down the line that are difficult to spot.

Changing the triple on the front end isn't always trivial either.

There's a whole dance of changing triples in the clang driver that defies logic sometimes. The command line options to the actual front-end depend on the triple and the path taken, which in turn, can change code generation in unpredictable ways down the line.

Users of alternative compilers have already contributed relevant logic
to llvm/cmake/modules/GetHostTriple.cmake ,
and we should just focus on gcc/clang which are more relevant for the
existing config.guess use cases.
I believe {gcc,clang} -dumpmachine is correct in more cases than config.guess .

Not only that, but using the compiler driver to "predict" what the compiler front-end needs is the obvious thing to do.

I'm not against the change, I think we should have done this a long time ago, but I think we can give people some grace time to test out on their sides, especially downstream people and less popular platforms that still use clang/gcc to build LLVM.

I'm glad we are on the same page for the direction, but I don't think
giving more grace time would be beneficial.

*-suse-*, *-redhat-*, *-linux-musl triples are already incorrect.
It led to unnecessary riscv-* changes such as
⚙ D63497 Add support for openSUSE RISC-V triple , ⚙ D74399 [Driver][RISCV] Add RedHat Linux RISC-V triple.
For less popular platforms, config.guess really caused more harm than
its benefit.
Some of the less popular platforms (I happen to like exploring such
less popular platforms sometimes) may need LLVM_DEFAULT_TARGET_TRIPLE
and LLVM_HOST_TRIPLE to cancel the config.guess harm.
In ⚙ D109837 cmake: Remove config.guess , Tom mentioned that
llvm/cmake/modules/GetHostTriple.cmake can just notify the user about
fetching the latest config.guess .
This will give the less popular compilers (technically, llvm-project
doesn't even support such compilers
Getting Started with the LLVM System — LLVM 18.0.0git documentation)
transition period.

We've been iterating on the patch and I've come to agree with Fāng-ruì
that it would be better to just remove config.guess. config.guess is not
used on Windows and our new detection code works with clang and gcc, so
that is going to cover a majority of the cases already. I don't see much
advantage to keeping config.guess around as a fallback, when we are just
goint to remove it at some point anyway.

-Tom