invalid code generated on Windows x86_64 using skylake-specific features

I have this code, which works fine on MacOS and Linux hosts:

const char *target_specific_cpu_args;
const char *target_specific_features;
if (g->is_native_target) {
target_specific_cpu_args = ZigLLVMGetHostCPUName();
target_specific_features = ZigLLVMGetNativeFeatures();
} else {
target_specific_cpu_args = “”;
target_specific_features = “”;
}

g->target_machine = LLVMCreateTargetMachine(target_ref, buf_ptr(&g->triple_str),
target_specific_cpu_args, target_specific_features, opt_level, reloc_mode, LLVMCodeModelDefault);

char *ZigLLVMGetHostCPUName(void) {
std::string str = sys::getHostCPUName();
return strdup(str.c_str());
}

char *ZigLLVMGetNativeFeatures(void) {
SubtargetFeatures features;

StringMap host_features;
if (sys::getHostCPUFeatures(host_features)) {
for (auto &F : host_features)
features.AddFeature(F.first(), F.second);
}

return strdup(features.getString().c_str());
}

On this windows laptop that I am testing on, I get these values:

target_specific_cpu_args: skylake

target_specific_features: +sse2,+cx16,-tbm,-avx512ifma,-avx512dq,-fma4,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,-avx512er,-avx512vpopcntdq,-clwb,-avx512f,-clzero,-pku,+mmx,-lwp,-xop,+rdseed,-sse4a,-avx512bw,+clflushopt,+xsave,-avx512vl,-avx512cd,+avx,-rtm,+fma,+bmi,+rdrnd,-mwaitx,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,-prefetchwt1,+f16c,+ssse3,+sgx,+cmov,-avx512vbmi,+movbe,+xsaveopt,-sha,+adx,-avx512pf,+sse3

It successfully creates a binary, but the binary when run crashes with:

Unhandled exception at 0x00007FF7C9913BA7 in test.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.

The disassembly of the crashed instruction is:

00007FF7C9913BA7 vmovdqa xmmword ptr [rbp-20h],xmm0

There is no callstack or source in the MSVC debugger. The .pdb produced is 64KB exactly. The file was linked with:

lld -NOLOGO -DEBUG -MACHINE:X64 /SUBSYSTEM:console -OUT:.\test.exe -NODEFAULTLIB -ENTRY:_start ./zig-cache/test.obj ./zig-cache/builtin.obj ./zig-cache/compiler_rt.obj ./zig-cache/kernel32.lib

When I change the call to LLVMCreateTargetMachine so that both target_specific_cpu_args and target_specific_features are the empty string, the produced binary is valid and runs successfully.

Is this an LLVM bug? Am I using the API incorrectly? Is there more information I can provide to LLVM-dev mailing list that would make it easier to help me?

I suspect that there are 2 issues here:

  • I have incorrect alignment somewhere
  • MSVC / .pdb / CodeView debugging is not working correctly.

I think the latter would help solve the former.

I will send out a new email later talking about the issues I’m having debugging llvm-generated binaries with MSVC.

Can you post test.obj somewhere, and maybe the LLVM IR if you can get it? If it really was reading address 0xFFFFFFFFFFFFFFFF, then RBP must have been completely corrupted, probably by the prologue.

I figured it out. I was using this implementation of __chkstk from compiler-rt:

DEFINE_COMPILERRT_FUNCTION(___chkstk)
push %rcx
cmp $0x1000,%rax
lea 16(%rsp),%rcx // rsp before calling this routine → rcx
jb 1f
2:
sub $0x1000,%rcx
test %rcx,(%rcx)
sub $0x1000,%rax
cmp $0x1000,%rax
ja 2b
1:
sub %rax,%rcx
test %rcx,(%rcx)

lea 8(%rsp),%rax // load pointer to the return address into rax
mov %rcx,%rsp // install the new top of stack pointer into rsp
mov -8(%rax),%rcx // restore rcx
push (%rax) // push return address onto the stack
sub %rsp,%rax // restore the original value in rax
ret
END_COMPILERRT_FUNCTION(___chkstk)

(source https://github.com/llvm-project/llvm-project-20170507/blob/release_50/compiler-rt/lib/builtins/x86_64/chkstk2.S)

When I replaced it with a simple ret, everything worked.

The disassembled ntdll implementation is:

__chkstk:
1800a9f60: 48 83 ec 10 subq $16, %rsp
1800a9f64: 4c 89 14 24 movq %r10, (%rsp)
1800a9f68: 4c 89 5c 24 08 movq %r11, 8(%rsp)
1800a9f6d: 4d 33 db xorq %r11, %r11
1800a9f70: 4c 8d 54 24 18 leaq 24(%rsp), %r10
1800a9f75: 4c 2b d0 subq %rax, %r10
1800a9f78: 4d 0f 42 d3 cmovbq %r11, %r10
1800a9f7c: 65 4c 8b 1c 25 10 00 00 00 movq %gs:16, %r11
1800a9f85: 4d 3b d3 cmpq %r11, %r10
1800a9f88: 73 15 jae 21 <__chkstk+0x3F>
1800a9f8a: 66 41 81 e2 00 f0 andw $61440, %r10w
1800a9f90: 4d 8d 9b 00 f0 ff ff leaq -4096(%r11), %r11
1800a9f97: 45 84 1b testb (%r11), %r11b
1800a9f9a: 4d 3b d3 cmpq %r11, %r10
1800a9f9d: 75 f1 jne -15 <__chkstk+0x30>
1800a9f9f: 4c 8b 14 24 movq (%rsp), %r10
1800a9fa3: 4c 8b 5c 24 08 movq 8(%rsp), %r11
1800a9fa8: 48 83 c4 10 addq $16, %rsp

I tried __chkstk_ms from compiler-rt which has this definition:

DEFINE_COMPILERRT_FUNCTION(___chkstk_ms)
push %rcx
push %rax
cmp $0x1000,%rax
lea 24(%rsp),%rcx
jb 1f
2:
sub $0x1000,%rcx
test %rcx,(%rcx)
sub $0x1000,%rax
cmp $0x1000,%rax
ja 2b
1:
sub %rax,%rcx
test %rcx,(%rcx)
pop %rax
pop %rcx
ret
END_COMPILERRT_FUNCTION(___chkstk_ms)

except I called it __chkstk since that’s the symbol that LLVM generated a dependency on. And it passed all my tests, with optimizations on and off.

Can anyone shed some light on this?

The crashes are gone, but I’m still getting weird behavior with cpu native features turned on. Example:

const assert = @import(“std”).debug.assert;
test “f128” {
if (make_f128(1.0) == 1.1) @panic(“wrong”);
}
fn make_f128(x: f128) → f128 { x }

; Function Attrs: nobuiltin nounwind
define internal fastcc fp128 @make_f128(fp128) unnamed_addr #0 !dbg !357 {
Entry:
%x = alloca fp128, align 16
store fp128 %0, fp128* %x, align 16
call void @llvm.dbg.declare(metadata fp128* %x, metadata !362, metadata !219), !dbg !363
%1 = load fp128, fp128* %x, align 16, !dbg !364
ret fp128 %1, !dbg !367
}

; Function Attrs: nobuiltin nounwind
define fastcc void @f128() #0 !dbg !312 {
Entry:
%0 = call fastcc fp128 @make_f128(fp128 0xL00000000000000003FFF000000000000), !dbg !315
%1 = fcmp fast oeq fp128 %0, 0xLA0000000000000003FFF199999999999, !dbg !317
br i1 %1, label %Then, label %Else, !dbg !317

Then: ; preds = %Entry
call void @panic(%“u8”* bitcast ({ i8*, i64 }* @7 to %“u8”*)), !dbg !318
unreachable, !dbg !318

Else: ; preds = %Entry
ret void, !dbg !319
}

This is calling the panic function, when clearly these f128 floats do not equal each other. When I revert to not using target-native features, the test passes.