Generating movq2dq using IRBuilder

Hi all,

How do I generate the movq2dq SSE2 instruction using the IRBuilder? There is no zext from 64-bit to 128-bit (corresponding to MMX to XMM register transfer) as far as I can tell. So I’ve tried inserting an i64 into a v2i64, which generates valid code but rather a number of stores and loads on the stack instead of a single movq2dq.

Looking though the code, I found a pattern for the instruction in X86GenDAGISEL.inc, but it describes a i64 to v2i64 bitcast (which isn’t allowed by IRBuilder). Also, it is described as MMX_MOVQ2DQrr and only checks for MMX support, while it’s really an SSE2 instruction.

Actually zext from 32 to 64 and 32 to 128 bit would also be useful, using movd and movq instructions. I couldn’t find ways to generate these instructions. I believe they should also be supported as intrinsics, so if anyone could check whether or not that works, and if so, how I could do it using the IRBuilder, that would be very much appreciated.

Cheers,

Nicolas Capens

In the same breath I’d also like to kindly ask if someone could have a look at the reverse operations, namely trunk from 128 to 64 bit using movdq2q, and 128 to 32 and 64 to 32 using movd. This also seems related to Bug 2585. Thanks again.

The operations you're describing can be represented as insertelement
and extractelement in LLVM IR.

I don't know of anyone actively working on MMX tuning for LLVM, so
if you'd like to see it improve, consider yourself encouraged to
get involved directly :-).

Dan

I noticed that, when doing operations on 64-bit vectors, MMX instructions are often emitted even when SSE3 is available. Is this really the intent or is it just that SSE versions of certain patterns have not been added, and therefore it falls back to MMX versions? It's not really encouraged to use MMX (or x87 for that matter) on modern microarchitectures if you can get away with SSE.

Just off the top of my head, I'd say that the pattern probably hasn't
been added. You're right that we should use SSE whenever available.
Could you send an example of a program that's using MMX when the
equivalent SSE instruction is available?

-bw

Hi Dan,

Yes, they could be represented with insertelement and extractelement, but I
don't think they actually generate optimal code using movq2dq and such. Else
both bugs 2584 and 2585 would be fixed.

Anyway, I'm actually already encouraged to get involved myself. I'm quite
experienced with MMX and SSE but I'm still trying to learn more about how
LLVM does instruction selection and such.

By the way, I noticed that movq2dq and such are missing from the intrinsics
as well. Maybe I could make myself useful by starting to add them? Do you
know whether http://llvm.org/docs/ExtendingLLVM.html#intrinsic is still a
good description on how to get started?

Thank you,

Nicolas

Hi Stefanus,

I'm not if using MMX instructions when doing operations on 64-bit vectors is
so terrible? With x86-64 you have double the registers, but it comes at the
cost of longer instruction encodings. So there's probably no benefit using
SSE. Or am I missing something?

Cheers,

Nicolas

The main reason is that, if you're using SSE also to do 128 bit operations (which is not unlikely), and you're moving data back and forth between 64 bit and 128 bit work, you need to insert moves between XMM and MMX registers.

Furthermore, it's another set of registers that may need to be restored by the function. This depends on your ABI and what exactly you're doing.

Stefanus

Hi Dan,

Yes, they could be represented with insertelement and extractelement, but I
don't think they actually generate optimal code using movq2dq and such. Else
both bugs 2584 and 2585 would be fixed.

Anyway, I'm actually already encouraged to get involved myself. I'm quite
experienced with MMX and SSE but I'm still trying to learn more about how
LLVM does instruction selection and such.

Nice. Perhaps Dan's talk at the Developer Meeting was helpful. :slight_smile:

By the way, I noticed that movq2dq and such are missing from the intrinsics
as well. Maybe I could make myself useful by starting to add them? Do you
know whether http://llvm.org/docs/ExtendingLLVM.html#intrinsic is still a
good description on how to get started?

No. We don't need intrinsics for most of the vector instructions. They can and should be lowered into the right combination of vector_shuffle, extractelement, etc. The right way to go about this is figuring out why x86 isn't doing what's expected. That probably involved fixing 2584 / 2585 and perhaps more.

We usually handle SSE selections very well. Unfortunately MMX has not received much love. Perhaps you can take that lead? :slight_smile:

Evan