ordering of asm constraints in clang CodeGen

hi

this simple code (coming from bug 879) on X86:

double
ldexp(double value, int exp)
{
        double temp, texp, temp2;
        texp = exp;

        __asm ("fscale " : "=u" (temp2), "=t" (temp) : "0" (texp), "1"
(value));
        return (temp);
}

is being handled by CGStmt.cpp:EmitAsmStmt(), the construction of output
constraints (ie. "=u" and "=t") is done simply from left to right. so this
particular example will construct output constraints of:

      st(1), st

according to X86FloatingPoint.cpp:handleSpecialFP()

  // FpGET_ST1 should occur right after a FpGET_ST0 for a call or inline asm.
  // The pattern we expect is:
  // CALL
  // FP1 = FpGET_ST0
  // FP4 = FpGET_ST1
  //
  // At this point, we've pushed FP1 on the top of stack, so it should be
  // present if it isn't dead. If it was dead, we already emitted a pop to
  // remove it from the stack and StackTop = 0.

(note that st(0) is the same as st)
the expected order of constrains is FpGET_ST0 and then FpGET_ST1 while clang
gives it in reverse. This leads to this assertion being hit:

Assertion failed: (StackTop == 0 && "Stack should be empty after a call!"), function handleSpecialFP, file
X86FloatingPoint.cpp, line 964.

can someone shed some light on this? is it true that llvm expects this order?
if so shouldnt clang order it this way? I need a little info about the situation
to be able to fix it.

Thank you!

Roman Divacky

hi

this simple code (coming from bug 879) on X86:

double
ldexp(double value, int exp)
{
       double temp, texp, temp2;
       texp = exp;

       __asm ("fscale " : "=u" (temp2), "=t" (temp) : "0" (texp), "1"
(value));
       return (temp);
}

is being handled by CGStmt.cpp:EmitAsmStmt(), the construction of output
constraints (ie. "=u" and "=t") is done simply from left to right. so this
particular example will construct output constraints of:

Hi Roman,

As we discussed on IRC, this is really an llvm backend issue. My memory is fuzzy here, but last time I tried to tackle this, I ran into problems where we really wanted a single merge FpGET_ST0/1 instruction produced by isel (I think) that the fpstackifier could deal with as one operation. If we don't have that, then other load/stores etc can get in between the machine instructions.

-Chris