Updating config.guess, license change

We have a file llvm/cmake/config.guess we inherited from back when we had an Autoconf build system, which guesses the triple of the current system. As support for new targets get added, this file needs updating to detect those new targets. In order to detect a particular target, I proposed updating this to the current upstream version, D99625.

When we had an Autoconf build system, this file contained a special license exception:

# As a special exception to the GNU General Public License, if you
# distribute this file as part of a program that contains a
# configuration script generated by Autoconf, you may include it under
# the same distribution terms that you use for the rest of that program.

In D16471, the Autoconf build system was removed, so this license exception no longer applied and the file became plain GPLv2. As we only ever call this script to read its output, rather than including the script in some way in the generated LLVM, this should not be a problem: the GPLv2 places no restrictions on how its output is used.

The current version is no longer licensed under GPLv2, it is licensed under GPLv3, though the license exception remains.

My understanding is that we are on an older version mostly out of laziness, not because the GPLv3-licensed version is something we need to avoid: we are on the version we happened to use with autoconf, not the most recent GPLv2 version. Nonetheless, on review it was suggested to bring this change to the list; laziness does not rule out the license also potentially being a problem.

So, what is the best way to go here? I think we have three options:

1- Use the upstream version under GPLv3.
2- Use the upstream version along with a dummy configure script
    generated by Autoconf that just spits out an error saying Autoconf
    cannot be used to build LLVM.
3- Independently add support for new targets to the GPLv2 version of
    config.guess.

Option 1 has my preference. As we still only call the script and read its output, the GPLv3 should be no more a problem than the GPLv2. Nonetheless, it would be good to have confirmation that this is okay.

Option 2 very much abuses the spirit of the config.guess license exception, so I would like to avoid that.

Option 3 is possible but requires duplicating work already done by upstream.

Thoughts? Comments?

Cheers,
Harald van Dijk

Is there a possibility to ask CMake for the host's triple?

Michael

There isn't, unfortunately: CMake does not use triples and does not provide it in a variable. It does provide us with predefined variables that provide information similar to what uname gives, which could be used to build the triple ourselves, but that would amount to rewriting config.guess in CMake.

You can construct the triple from the variables CMake provides, we already do this for some of the targets: https://github.com/llvm/llvm-project/blob/00d5f1ecccc6d8ece9ac6dd19e9ad807c8a60097/llvm/cmake/modules/GetHostTriple.cmake

I’d prefer expanding GetHostTriple.cmake until it covers all the targets we support. I don’t think it’d necessarily require reimplementing all of config.guess, since config.guess covers a lot of targets most of which aren’t supported by LLVM.

You can construct the triple from the variables CMake provides, we already do this for some of the targets: https://github.com/llvm/llvm-project/blob/00d5f1ecccc6d8ece9ac6dd19e9ad807c8a60097/llvm/cmake/modules/GetHostTriple.cmake

"Some of the targets" used to be only Windows, where we do not necessarily have a way to run config.guess at all. That seems like a perfectly good reason to me to add an exception for.

I do not know why <Login; then also added logic for AIX there. Either config.guess is buggy, or it is not.

If config.guess is buggy, it is also buggy for every single autoconf-based project there is and will need to be fixed there as well. There has been no relevant change to upstream config.guess for AIX that I can see, but I may have missed something.

If config.guess is not buggy, the change is wrong.

Am I missing something here, perhaps some some scenario that LLVM supports but config.guess does not, that would make it correct to change that only in LLVM, and not in upstream config.guess?

I'd prefer expanding GetHostTriple.cmake until it covers all the targets we support. I don't think it'd necessarily require reimplementing all of config.guess, since config.guess covers a lot of targets most of which aren't supported by LLVM.

Not all, sure, but I do think you're underestimating how much work it will be, at least for an outside contributor who cannot ask people who have access to relevant systems to run some quick tests. Getting an overview of all targets supported by LLVM, the values the relevant CMake variables will be set to, and the triples that config.guess figures out for those systems, that would still be a massive amount of work. Once we have that, writing it up in CMake should be fairly easy, sure, but that's not where the main work will be. All that just to get a result we can get without any extra effort already.

Cheers,
Harald van Dijk

You can construct the triple from the variables CMake provides, we
already do this for some of the targets:
https://github.com/llvm/llvm-project/blob/00d5f1ecccc6d8ece9ac6dd19e9ad807c8a60097/llvm/cmake/modules/GetHostTriple.cmake
<https://github.com/llvm/llvm-project/blob/00d5f1ecccc6d8ece9ac6dd19e9ad807c8a60097/llvm/cmake/modules/GetHostTriple.cmake>

“Some of the targets” used to be only Windows, where we do not
necessarily have a way to run config.guess at all. That seems like a
perfectly good reason to me to add an exception for.

I do not know why <https://reviews.llvm.org/D74256> then also added
logic for AIX there. Either config.guess is buggy, or it is not.

If config.guess is buggy, it is also buggy for every single
autoconf-based project there is and will need to be fixed there as well.
There has been no relevant change to upstream config.guess for AIX that
I can see, but I may have missed something.

If config.guess is not buggy, the change is wrong.

Am I missing something here, perhaps some some scenario that LLVM
supports but config.guess does not, that would make it correct to change
that only in LLVM, and not in upstream config.guess?

LLVM uses triples to identify the host/target architecture (including the difference between 32-bit and 64-bit variants of the “same architecture”). It cannot be said that config.guess must do the same (that would depend on its users). That the selection, on AIX, between the 32-bit and 64-bit variants happens later for an autoconf-based build is entirely possible. As it is, we’re able to do the selection early in the CMake-based build only because we’re using a compiler that respects the OBJECT_MODE environment variable (which we also set up for Clang on AIX to do).

    I do not know why <⚙ D74256 [AIX] Improve 32/64-bit build configuration
    <Login; then also added
    logic for AIX there. Either config.guess is buggy, or it is not.

    If config.guess is buggy, it is also buggy for every single
    autoconf-based project there is and will need to be fixed there as
    well.
    There has been no relevant change to upstream config.guess for AIX that
    I can see, but I may have missed something.

    If config.guess is not buggy, the change is wrong.

    Am I missing something here, perhaps some some scenario that LLVM
    supports but config.guess does not, that would make it correct to
    change
    that only in LLVM, and not in upstream config.guess?

LLVM uses triples to identify the host/target architecture (including the difference between 32-bit and 64-bit variants of the "same architecture"). It cannot be said that config.guess must do the same (that would depend on its users).

It is intended, I believe, that config.guess does the same. That's why upstream config.guess can differentiate between x86_64-pc-linux-gnu and x86_64-pc-linux-gnux32 -- the reason I wanted to update it -- despite the fact that the only way to reliably detect the difference between the two is looking at what the currently selected compiler happens to do exactly the same way the difference between powerpc-ibm-aix and powerpc64-ibm-aix is now detected in CMake in LLVM.

That the selection, on AIX, between the 32-bit and 64-bit variants happens later for an autoconf-based build is entirely possible. As it is, we're able to do the selection early in the CMake-based build only because we're using a compiler that respects the OBJECT_MODE environment variable (which we also set up for Clang on AIX to do).

Individual projects may add their own custom exceptions, but autoconf itself will not override the output of config.guess. If config.guess tells autoconf that the host is powerpc-ibm-aix, then configure will trust that that is correct. configure scripts that then check the host against powerpc-* vs. powerpc64-* will not receive a corrected host based on how the compiler behaves, it's config.guess's job to return the correct host.

Perhaps there are so few projects that contain powerpc-specific or powerpc64-specific logic in configure scripts, or perhaps powerpc64-aix-ibm users are already accustomed to specifying the host explicitly for configure scripts, that this problem goes largely unnoticed.

Cheers,
Harald van Dijk