[llvm-lit] Old echo causes test failure on windows

Hi llvm-dev,

I wanted to bring discussion from D110986 to a wider audience.

After D110986 added a unicode character to the test, llvm\test\tools\llvm-cxxfilt\delimiters.test

started failing in our (Sony’s) downstream Windows CI build (and locally for me) because the

unicode character was being converted to ‘?’ at some stage in the test. The upstream Windows bots

seemed to have no such trouble. I added a workaround in D111072 that we landed because a Chromium

bot was running into the same issue.

Digging a little deeper it turns out that the echo that lit was picking up for me (`echo (GNU

coreutils) 5.3.0` from GnuWin32) converts unicode characters to ‘?’.

llvm-lit has its own built in echo which handles unicode without an issue. This is used in the workaround

D111072; lit’s echo is used iff the echo command isn’t used in a pipeline.

N.B. echo (GNU coreutils) 8.32 shipped with git for Windows also appears to handle unicode without

an issue.

I have a few questions:

  1. Is there a reason we cannot always use lit’s echo, as Hans suggests in D111072?

  2. If we can’t always use lit’s echo, maybe we could add an error/warning if a “bad” echo is

detected at build-config or test-running time?

Thanks,

Orlando

At a minimum, I think we should stop using (and recommending) the bit-rotting GnuWin32 tools, and instead use the tools that come with git-for-Windows. Lit currently checks for 5 tools (cmp, diff, echo, grep, sed) and all of these are in the git-Windows set. This will reduce the set of packages we depend on, even if it doesn’t reduce the set of actual tools we depend on.

–paulr

I assumed that when using the internal shell (non-optional under
Windows), lit would only use its implementation of shell-builtins [1].
If this is not the case yet, I'd vote to do so since I would
understand the option to replace the environment's shell. Shell's
generally prioritize their builtins over executables found in $PATH, I
think lit should do the same.

Michael

[1] https://github.com/llvm/llvm-project/blob/2ac199993764e068494a69a85af098c0ae1ff37e/llvm/utils/lit/lit/TestRunner.py#L624

I assumed that when using the internal shell (non-optional under
Windows), lit would only use its implementation of shell-builtins [1].
If this is not the case yet, I'd vote to do so since I would
understand the option to replace the environment's shell. Shell's
generally prioritize their builtins over executables found in $PATH, I
think lit should do the same.

It looks like that is the case for builtins that are not echo [2]. Presumably
the reason is related to the comment nearby which was added in D35093:

        # FIXME: Standardize on the builtin echo implementation. We can use a
        # temporary file to sidestep blocking pipe write issues.

That might be a clue to the question "why can't we always use lit's echo", but I
don't know what the required work is just looking at this comment.

cc Reid (author of D35093) in case you remember what needs to be done for
lit's echo to be used here?

Thanks,
Orlando

Michael

[1] https://urldefense.com/v3/https://github.com/llvm/llvm-project/blob/2ac199993764e068494a69a85af098c0ae1ff37e/llvm/utils/lit/lit/TestRunner.py*L624;Iw!!JmoZiZGBv3RvKRSx!p1AhbwjYZ8K0b1l4vzhjIlkFVpysqYbPGSLfYkiD4ou7gB-sXy64tQtNw80KvO-GEQ$

[2] https://github.com/llvm/llvm-project/blob/2ac199993764e068494a69a85af098c0ae1ff37e/llvm/utils/lit/lit/TestRunner.py#L680

That list that lit checks for looks out-of-date and insufficient these days. Off the top of my head, I know for example that we need od, cd, rm, and mkdir, and there are probably several others that are necessary.

cd and mkdir are builtin commands in cmd.exe, but I accept that the list is incomplete. I suppose really it’s a representative set rather than an exhaustive list, intended to verify that the place it points to is reasonable.

Re. my reference to bit-rotting GnuWin32 tools, it looks like references to GnuWin32 in llvm/docs/GettingStartedVS.rst were removed by https://reviews.llvm.org/D108513 in August, but it is still mentioned in llvm/docs/CMake.rst, that should be updated to cite the git-bash tools.

–paulr

https://reviews.llvm.org/D84380 was supposed to remove the dependency
on GnuWin32 and use git-for-Windows instead.

Only the Clang doc still refers to GnuWin32:
https://clang.llvm.org/get_started.html
The Clang getting_started, at least the how to build part, feels
redundant with LLVM's. Maybe it should link to LLVM's instead?

I myself am using MSYS2 (not from Git-for-Windows) with its bin
directory in %PATH%.

Michael

FWIW, my LLVM knowledge on Windows setup was based on the assumption of “use GNUWin32” because I think that’s what it was when I set up my machine for building LLVM. I guess a) documentation needs to be more specific, and b) if it wasn’t already announced, we need to make it clear on llvm-dev that Windows users should be using version X of the tools.