Generic LLVM IR -> Windows x64 argument passing issue in LLVM 11.0.0 and later

We (Reason Studios) make some use of “generic” LLVM IR code.

We have been using LLVM 7.0.0 for Mac and Windows/x64 since 2019, and are now updating to a current version. (The update was motivated by gaining support for arm64 on macOS, but also 7.0.0 is old now.)

After this update, some of our test cases don’t work on Windows/x64.

I have narrowed it down to “passing a small type as an argument using byval” which only seems to work properly for the first four arguments.

The details should be clear in this archive of files, which can be used to repeat the issue (see details below).

I tested a number of LLVM stable releases found the last version where this works is LLVM 10.0.0.
Starting from LLVM 11.0.0 (all the way to 16.0.5), arguments five and later do not arrive intact.
So this type of test case stopped working at some point between LLVM 10.0.0 and 11.0.0.
The fact that the first four arguments work makes me think the x64 calling convention is somehow involved (as it treats the first four args differently).

A few observations:

  • If I enable optimizations (-O1 or higher), no code is generated. As if it assumes the external function can not be called (=the checks can’t fail from the perspective of the IR or so).
  • If I compile the equivalent C++ code for Windows and ask clang to emit the IR, it does not use byval like this. So I guess mainstream use cases with LLVM as a C++ compiler for Windows/x64 will never go via such IR code. Unfortunately we are stuck with quite a bit of it…
  • The same test works fine on macOS/x64.

Trying to find an answer to these questions:

  • Is this IR code bad / unsupported somehow (at least on Windows), and why?
  • What might have changed that made this no longer work in LLVM 11 and later (some time after LLVM 10.0.0)?
  • Does anyone know a practical way to make it work?

Any other comments are also welcome.

File list:

  • Test.ll: LLVM IR test code.
  • Script to build a native executable for Windows.
  • testglue/glue1.cpp: Minimal test environment, to let the test code report if things are wrong.
  • src/Test.cpp: Equivalent C++ source for reference.
  • readme.txt: This text.

How to build and run the test:

  1. Install MSVC / MS build C++ tools for Desktop/x64 on a Windows system.
  2. Start a command prompt with that compiler environment activated for targeting x64 (shortcut such as “x64 Native Tools Command Prompt for …”)
  3. Run the script using python 3.x:
python -u --clang="C:\Program Files\LLVM\bin\clang.exe" --run-test
python3 -u --clang="C:\Program Files\LLVM\bin\clang.exe" --run-test
py -3 -u --clang="C:\Program Files\LLVM\bin\clang.exe" --run-test

The results vary when passing different versions of clang.exe as the --clang argument:

Test result if successful (LLVM 10.0.0 and earlier): No output, returns zero.

Test result if non-successful (LLVM 11.0.0 and later - could not find a Windows installer for 10.0.1): Logs and returns non-zero.

Observed log to stdio on failure (LLVM-11.0.0 and later):

CC2023_FailAssertLog(): i5.f_ptr == np
Assertion failed: false, file testglue/glue1.cpp, line 6
...python traceback omitted...
subprocess.CalledProcessError: Command '['Test.exe']' returned non-zero exit status 3221226505.

Edit: Log snippet differs slightly in readme.txt - the test code was tweaked to be shorter, but is equivalent.

I didn’t have time to look deeply at the IR yet, but I did run bisection on your test case.

(Just running

clang -target x86_64-pc-win32 -O0 \src\temp\llvm-IR-win64-issue\Test.ll \src\temp\llvm-IR-win64-issue\testglue\glue1.cpp && a.exe`

did the trick).

Bisection points to:

commit ea85ff82c82687496453bc14c4ac60548a42d8f3
Author: Liu, Chen3 <>
Date:   Tue Jul 7 21:22:27 2020 +0800

    [X86] Fix a bug that when lowering byval argument

    When an argument has 'byval' attribute and should be
    passed on the stack according calling convention,
    a stack copy would be emitted twice. This will cause
    the real value will be put into stack where the pointer
    should be passed.

    Differential Revision:

I’d suggest trying to reduce your IR reproducer a bit more, to the point where the different behaviour is easy enough to see in the assembly. At that point it will hopefully become clear if it’s LLVM or the IR that’s incorrect.

1 Like

Thanks for your input!

I was hoping it was a known issue, already reported bug or so that I had missed.
Will work on narrowing it down further, and look into the commit you found.

I tried the following:

  1. Download the .patch for ea85ff8…
  2. Apply it reversed on LLVM repo @ llvm 16.0.5
  3. Rebuild llvm and run the test case with the resulting clang.exe

This makes the test in the original post pass, so you definitely found the right commit.
However this would of course also reintroduce the bug addressed by that commit, so it’s not a solution.
I am still unable to explain the issue in more detail than the OP.

You can try using bugpoint or llvm-reduce to reduce your IR example. You can also build clang before and after this commit and diff their output.

Thanks for the report! If the LLVM function prototype and argument attributes match, LLVM should generate correct code, or be able to “shake hands with itself”, regardless of whatever ABI bug that patch was meant to solve. There should be a way to fix forward.

I suspect that LLVM is not handling byval incorrectly on one side of the calling convention, either the call setup or the prologue argument reading. I think Clang never uses byval on Windows x86_64, so it’s not surprising that this is undertested.

I think what happened is that [X86] Fix a bug that when lowering byval argument · llvm/llvm-project@ea85ff8 · GitHub changed the call side but not the callee side.

For example:

$ cat /tmp/a.ll
@x = external global i64
@y = external global i64

define i64 @foo() {
  %c = call i64 @bar(i64* byval(i64) @x, i64* byval(i64) @x, i64* byval(i64) @x, i64* byval(i64) @x, i64* byval(i64) @y)
  ret i64 %c

define i64 @bar(i64* byval(i64), i64* byval(i64), i64* byval(i64), i64* byval(i64), i64* byval(i64) %a) {
  %r = load i64, i64* %a
  ret i64 %r

$ build4/bin/llc -mtriple x86_64-pc-win32 -filetype=asm -O0 -o - /tmp/a.ll
        .def    @feat.00;
        .scl    3;
        .type   0;
        .globl  @feat.00
.set @feat.00, 0
        .file   "a.ll"
        .def    foo;
        .scl    2;
        .type   32;
        .globl  foo                             # -- Begin function foo
        .p2align        4, 0x90
foo:                                    # @foo
.seh_proc foo
# %bb.0:
        subq    $120, %rsp
        .seh_stackalloc 120
        movq    x(%rip), %rax
        movq    %rax, 112(%rsp)
        movq    x(%rip), %rax
        movq    %rax, 96(%rsp)
        movq    x(%rip), %rax
        movq    %rax, 80(%rsp)
        movq    x(%rip), %rax
        movq    %rax, 64(%rsp)
        movq    y(%rip), %rax
        movq    %rax, 48(%rsp)                <--- y is written to the stack here
        movq    %rsp, %rax
        leaq    48(%rsp), %rcx
        movq    %rcx, 32(%rax)                 <--- the address of y on the stack is written to the stack here
        leaq    112(%rsp), %rcx
        leaq    96(%rsp), %rdx
        leaq    80(%rsp), %r8
        leaq    64(%rsp), %r9
        callq   bar
        addq    $120, %rsp
                                        # -- End function
        .def    bar;
        .scl    2;
        .type   32;
        .globl  bar                             # -- Begin function bar
        .p2align        4, 0x90
bar:                                    # @bar
# %bb.0:
        leaq    40(%rsp), %rax
        movq    (%rax), %rax
                                        # -- End function

In bar we’re loading y’s address on the stack, but we never dereference it to get y.

Taking a stab at fixing here: ⚙ D153020 [X86] Fix callee side of receiving byval args on the stack

1 Like

Thanks so much for the quick feedback and fix!