variadic functions on X86_64 should (conditionally) save XMM regs even if -no-implicit-float

Specifying -no-implicit-float prevents LLVM from using non-GPR registers for purely integer operations. This is useful for operating systems (such as Wind River's VxWorks) that support tasks that do not save all registers on context switch.

This presents an interesting problem for variadic functions that may optionally take non-integer arguments (e.g. printf style functions). Should non-GPR registers be spilled to the stack when -no-implicit-float is specified?

Ideally we would do so only if non-GPR register arguments are actually passed by the caller. This would require a runtime check, and whether or not such a check is feasible depends on the target and ABI.

However for X86_64, the standard va_start code already has such a check - the number of vector (XMM) register arguments is stored in %al, and the code normally generated for variadic functions (in the absence of -no-implicit-float) includes a guard around the XMM spill code that checks for %al != 0.

Therefore I believe it would be "in the spirit" of -no-implicit-float to remove the NoImplicitFloatOps check from the following:

X86ISelLowering.cpp : get64BitArgumentXMMs()
if (isSoftFloat || NoImplicitFloatOps || !Subtarget.hasSSE1())
    // Kernel mode asks for SSE to be disabled, so there are no XMM argument
    // registers.
    return None;

Does this seem like a reasonable idea?

Thanks,

Salim Nasser
Wind River

Hello

Apologies to hijack old thread, but it seems to receive no response.
It seems we're having 2 problems here and all of them are x86
specific. Here is quick overview of the problem: noimplicitfloat
should in theory prevent compiler to generate floating point
operations by itself. Currently it's used during optimizations and
codegen as follows:
  - It inhibits some optimizations like vectorization
  - It is used to determine optimal set of operations for things like
inlined memcpy / memset implementations

These features are common across middl-end and across various backends
(x86, arm, aarch64 and powepc includes).

However, x86 has one important difference: noimplicitfloat actually
*changes ABI*. Even more, it breaks varargs. Here is how: it is only
taken into account when receiving arguments. Essentially if we're
having varargs function, then it would force it to receive floating
point arguments on stack. However, no changes are made on caller side
– the floating point arguments are still passed in xmm registers.
Therefore even if we compile both caller and callee with
-mno-implicit-float we still will be unable to pass floating point
arguments properly. This is clearly a bug.

I believe that we indeed need to remove the noimplicitfloat case on callee side:
0. This fixes the longstanding bug.
1. We definitely should not change the ABI. And x86-64 ABI mandates us
to pass / receive floating point arguments in xmm registers.
2. There is no problem with kernel code with SSE2 unit disabled: the
use at callee side is guarded by an explicit check of # of fp
arguments (again, prescribed by ABI).
3. We're having explicit checks for no SSE2 cases here and there, so
-mno-sse / -mkernel / -msoft-float is unaffected
4. The use of floating point arguments should be considered as
"explicit use FP" and therefore should be allowed by the spirit of
-mno-implicit-float flag
5. We're having -mgeneral-regs-only these days, so more strict cases
are properly recognized as well.

Any thoughts?

PS: This fixes PR36507 and the fix is in https://reviews.llvm.org/D62639

Thanks for reviving this topic!

Interestingly we have essentially the same fix you mention below ( https://reviews.llvm.org/D62639) as a local change in our Wind River version of LLVM. The reason we didn't try to push it upstream (and in fact have considered removing it) is due to an unfortunate side-effect which is either "expected" or a "bug" depending on your perspective.

The problem is that now, when optimization is disabled, the compiler will *unconditionally* access XMM registers in the prolog of varargs functions. This is *not* the usual code to spill floating point varargs arguments (which is correctly guarded by testing %al). Instead the compiler:

Unconditionally:
- spills the XMM argument registers

Conditionally:
- reloads those values
- stores them in the varargs area

In short, we end up with the same situation we have on AArch64, i.e. (depending on the optimization level) varargs functions compiled with -mno-implicit-float unconditionally use floating point instructions (despite the valiant test of %al).

Here's an example:

$ cat v.c
#include <stdarg.h>

void v(const char* fmt, ...)
{
        va_list va;
        va_start(va, fmt);
        va_end(va);

}

$ clang --target=x86_64 -mno-implicit-float -S v.c

$ cat v.s
...
v: # @v
        .cfi_startproc
# %bb.0:
        ...
        testb %al, %al <= check for floating point args
        movaps %xmm7, -224(%rbp) # 16-byte Spill <= this is *unconditional*
        movaps %xmm6, -240(%rbp) # 16-byte Spill
        ...
        movaps %xmm0, -336(%rbp) # 16-byte Spill <= this is *unconditional*
        je .LBB0_2 <= branch based on the previous testb %al ,%al
# %bb.1:
        movaps -336(%rbp), %xmm0 # 16-byte Reload <= (conditional) reload the original value of XMM0
        movaps %xmm0, -160(%rbp) <= (conditional) store XMM0 to varargs area
        ...

Salim
Wind River

The problem is that now, when optimization is disabled, the compiler will *unconditionally* access XMM registers in the prolog of varargs functions. This is *not* the usual code to spill floating point varargs arguments (which is correctly guarded by testing %al). Instead the compiler:

I believe this is a bug that needs to be handled separately..

I think your reasoning makes sense, and we should make the XMM saving in llvm.va_start conditional.

In my experience, these XMM spills are what users crash on when the misalign their stack. But, I don’t think that’s really relevant at all to this particular question relating to implicit FP codegen.

The problem is that now, when optimization is disabled, the compiler will *unconditionally* access XMM registers in the prolog of varargs functions. This is *not* the usual code to spill floating point varargs arguments (which is correctly guarded by testing %al). Instead the compiler:

I believe this is a bug that needs to be handled separately..

The problem is that now, when optimization is disabled, the

    compiler will *unconditionally* access XMM registers in the prolog
    of varargs functions. This is *not* the usual code to spill floating
    point varargs arguments (which is correctly guarded by testing %al).
    Instead the compiler:
    I believe this is a bug that needs to be handled separately..