SSE Scalar Convert Intrinsics

I have a question about the SSE scalar convert intrinsics.

cvtsd2si is defined thusly:

  def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
              Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;

This matches the signature of the GCC intrinsic. The fact that the GCC
intrinsic has a type mismatch on the input (vector rather than scalar)
is strange, but ok, we'll run with it.

Until this:

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
                         "cvtsd2si\t{$src, $dst|$dst, $src}",
                         [(set GR32:$dst, (int_x86_sse2_cvtsd2si
                                           (load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

                                   -Dave

I have a question about the SSE scalar convert intrinsics.

cvtsd2si is defined thusly:

def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
             Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;

This matches the signature of the GCC intrinsic. The fact that the GCC
intrinsic has a type mismatch on the input (vector rather than scalar)
is strange, but ok, we'll run with it.

Until this:

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
                        "cvtsd2si\t{$src, $dst|$dst, $src}",
                        [(set GR32:$dst, (int_x86_sse2_cvtsd2si
                                          (load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

Yes, that looks wrong, even if it ends up doing something that
ends up working.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

That's what's done for most other instructions, unfortunately.
For cvtsd2si, there's currently no "normal" version in the tree,
but if you add one, it wouldn't be alone.

One thing we'd like to do at some point is have front-ends lower
intrinsics for scalar instructions into
extractelement+op+insertelement, so that we don't need two
versions of each of the instructions. Doing this for everything
will require some work to make sure that the extra insert/extract
operators don't incur unnecessary copying, but that's also
something we'd like to do regardless.

Dan

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
"cvtsd2si\t{$src, $dst|$dst, $src}",
[(set GR32:$dst, (int_x86_sse2_cvtsd2si
(load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

I agree, that doesn't look right.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

We really need zero intrinsics... it's quite easy to map onto existing
LLVM instructions. See the definition of CVTSD2SIrm.

-Eli

Agreed!

Nate

> Do we need two intrinsics for these scalar converts, one to satisfy
> the
> (arguably broken) GCC interface and one to really reflect the
> operation
> as specified by the ISA?

That's what's done for most other instructions, unfortunately.
For cvtsd2si, there's currently no "normal" version in the tree,
but if you add one, it wouldn't be alone.

Ok.

One thing we'd like to do at some point is have front-ends lower
intrinsics for scalar instructions into
extractelement+op+insertelement, so that we don't need two
versions of each of the instructions. Doing this for everything
will require some work to make sure that the extra insert/extract
operators don't incur unnecessary copying, but that's also
something we'd like to do regardless.

So then how does one do a memop intrinsic? Does it mean we can't
match to the memop versions of instructions?

                             -Dave

In some cases, yes. But not all of the X86 instructions are accessible
through LLVM IR. And sometimes we like the ability to have our frontend
lower to intrinsics so we know EXACTLY what code will come out the other
end.

And see my previous post about sint_to_fp with a memory operand not working
in TableGen ("TableGen Type Inference"). I'll be debugging that next week,
probably.

                               -Dave

Memory operands would be lowered to explicit loads and stores,
which would be pattern-matched into memop instructions by
instruction selection.

Dan

Like this one :slight_smile: Sorry, I was confusing it with CVTTSD2SIrm.

-Eli

Ok, that'd be cool. :slight_smile:

                             -Dave

It happens. :slight_smile:

Stupid Intel mnemonics...

                             -Dave