SSE Scalar Convert Intrinsics

David_A_Greene · June 5, 2009, 3:51pm

I have a question about the SSE scalar convert intrinsics.

cvtsd2si is defined thusly:

def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;

This matches the signature of the GCC intrinsic. The fact that the GCC
intrinsic has a type mismatch on the input (vector rather than scalar)
is strange, but ok, we'll run with it.

Until this:

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
                         "cvtsd2si\t{$src, $dst|$dst, $src}",
                         [(set GR32:$dst, (int_x86_sse2_cvtsd2si
                                           (load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

-Dave

Dan_Gohman3 · June 5, 2009, 8:19pm

I have a question about the SSE scalar convert intrinsics.

cvtsd2si is defined thusly:

def int_x86_sse2_cvtsd2si64 : GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
             Intrinsic<[llvm_i64_ty, llvm_v2f64_ty], [IntrNoMem]>;

This matches the signature of the GCC intrinsic. The fact that the GCC
intrinsic has a type mismatch on the input (vector rather than scalar)
is strange, but ok, we'll run with it.

Until this:

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
                        "cvtsd2si\t{$src, $dst|$dst, $src}",
                        [(set GR32:$dst, (int_x86_sse2_cvtsd2si
                                          (load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

Yes, that looks wrong, even if it ends up doing something that
ends up working.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

That's what's done for most other instructions, unfortunately.
For cvtsd2si, there's currently no "normal" version in the tree,
but if you add one, it wouldn't be alone.

One thing we'd like to do at some point is have front-ends lower
intrinsics for scalar instructions into
extractelement+op+insertelement, so that we don't need two
versions of each of the instructions. Doing this for everything
will require some work to make sure that the extra insert/extract
operators don't incur unnecessary copying, but that's also
something we'd like to do regardless.

Dan

Eli_Friedman1 · June 5, 2009, 8:22pm

def Int_CVTSD2SIrm : SDI<0x2D, MRMSrcMem, (outs GR32:$dst), (ins f128mem:
$src),
"cvtsd2si\t{$src, $dst|$dst, $src}",
[(set GR32:$dst, (int_x86_sse2_cvtsd2si
(load addr:$src)))]>;

Er, this makes us load a 128-bit quantity, which is almost certainly not
what we want.

I agree, that doesn't look right.

Do we need two intrinsics for these scalar converts, one to satisfy the
(arguably broken) GCC interface and one to really reflect the operation
as specified by the ISA?

We really need zero intrinsics... it's quite easy to map onto existing
LLVM instructions. See the definition of CVTSD2SIrm.

-Eli

Nate_Begeman1 · June 5, 2009, 8:33pm

Agreed!

Nate

David_A_Greene · June 5, 2009, 10:16pm

> Do we need two intrinsics for these scalar converts, one to satisfy
> the
> (arguably broken) GCC interface and one to really reflect the
> operation
> as specified by the ISA?

That's what's done for most other instructions, unfortunately.
For cvtsd2si, there's currently no "normal" version in the tree,
but if you add one, it wouldn't be alone.

Ok.

One thing we'd like to do at some point is have front-ends lower
intrinsics for scalar instructions into
extractelement+op+insertelement, so that we don't need two
versions of each of the instructions. Doing this for everything
will require some work to make sure that the extra insert/extract
operators don't incur unnecessary copying, but that's also
something we'd like to do regardless.

So then how does one do a memop intrinsic? Does it mean we can't
match to the memop versions of instructions?

-Dave

David_A_Greene · June 5, 2009, 10:19pm

In some cases, yes. But not all of the X86 instructions are accessible
through LLVM IR. And sometimes we like the ability to have our frontend
lower to intrinsics so we know EXACTLY what code will come out the other
end.

And see my previous post about sint_to_fp with a memory operand not working
in TableGen ("TableGen Type Inference"). I'll be debugging that next week,
probably.

-Dave

Dan_Gohman3 · June 5, 2009, 10:41pm

Memory operands would be lowered to explicit loads and stores,
which would be pattern-matched into memop instructions by
instruction selection.

Dan

Eli_Friedman1 · June 5, 2009, 10:48pm

Like this one Sorry, I was confusing it with CVTTSD2SIrm.

-Eli

David_A_Greene · June 5, 2009, 11:05pm

Ok, that'd be cool.

-Dave

David_A_Greene · June 5, 2009, 11:06pm

It happens.

Stupid Intel mnemonics...

-Dave

Topic		Replies	Views
Help with gcc SSE intrinsics LLVM Dev List Archives	5	70	October 9, 2009
how can I create an SSE instrinsics sqrt? LLVM Dev List Archives	4	88	April 20, 2015
GCCBuiltin and Intrinsic Mapping LLVM Dev List Archives	6	111	September 13, 2010
Making use of SSE intrinsics LLVM Dev List Archives	2	56	May 21, 2008
SSE levels & x86 code-gen LLVM Dev List Archives	2	74	July 31, 2007

SSE Scalar Convert Intrinsics

Related Topics