Should more vector [zs]extloads be legal for X86 SSE4.1?

Hi Chandler, all,

Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared
legal? Is it a simple oversight, or did I miss a deeper reason?

While cleaning up PMOV*X patterns, I stumbled upon this braindead testcase:

        %0 = load <8 x i8>* %src, align 1
        %1 = zext <8 x i8> %0 to <8 x i16>

turning into:

        pmovzxbw (%rsi), %xmm0
        pand <0xff,0xff,...>, %xmm0, %xmm0

v8i8 isn't legal, so the load became an anyext load from v8i8 to
v8i16, with the pand masking out the unwanted/zero bits.

In that example, if you declare zextloads from v8i8 legal, and add the
simple corresponding pattern, the pand isn't generated anymore, as
expected.

So, unless I'm missing something, shouldn't we declare them legal?
Insights much appreciated, thanks!

- Ahmed

Hi Chandler, all,

Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared
legal? Is it a simple oversight, or did I miss a deeper reason?

While hacking on this, I tried to make them legal, and failed. I don't
recall everything that went wrong though, and perhaps you'll have better
luck than I did.

While cleaning up PMOV*X patterns, I stumbled upon this braindead testcase:

        %0 = load <8 x i8>* %src, align 1
        %1 = zext <8 x i8> %0 to <8 x i16>

turning into:

        pmovzxbw (%rsi), %xmm0
        pand <0xff,0xff,...>, %xmm0, %xmm0

v8i8 isn't legal, so the load became an anyext load from v8i8 to
v8i16, with the pand masking out the unwanted/zero bits.

I've seen this too. It's horrible.

In that example, if you declare zextloads from v8i8 legal, and add the
simple corresponding pattern, the pand isn't generated anymore, as
expected.

Won't type legalization insist on legalizing the <8 x i8> type even though
we can do the extload? My memory of this is very dim. If this "just works"
as expected, then by all means, lets do it.

Speaking of which, I should actually go nuke the old shuffle lowering. Some
of my problems may have only been problems with it.

Hi Chandler, all,

Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared
legal? Is it a simple oversight, or did I miss a deeper reason?

While hacking on this, I tried to make them legal, and failed. I don't
recall everything that went wrong though, and perhaps you'll have better
luck than I did.

So I gave it a shot, but to avoid an ugly hack in X86, I tried to do
the right thing and changed the extload legalization logic to have a
2D array (memory->result types, not just memory), which just seems
like the saner thing to do:
http://reviews.llvm.org/D6532

The X86 stuff is much simpler:
http://reviews.llvm.org/D6533

While cleaning up PMOV*X patterns, I stumbled upon this braindead
testcase:

        %0 = load <8 x i8>* %src, align 1
        %1 = zext <8 x i8> %0 to <8 x i16>

turning into:

        pmovzxbw (%rsi), %xmm0
        pand <0xff,0xff,...>, %xmm0, %xmm0

v8i8 isn't legal, so the load became an anyext load from v8i8 to
v8i16, with the pand masking out the unwanted/zero bits.

I've seen this too. It's horrible.

In that example, if you declare zextloads from v8i8 legal, and add the
simple corresponding pattern, the pand isn't generated anymore, as
expected.

Won't type legalization insist on legalizing the <8 x i8> type even though
we can do the extload? My memory of this is very dim. If this "just works"
as expected, then by all means, lets do it.

From what I've seen, not if the illegal type is only an extload's (or

truncstore's) memory type: then, anything is OK.

- Ahmed

So I gave it a shot, but to avoid an ugly hack in X86, I tried to do
the right thing and changed the extload legalization logic to have a
2D array (memory->result types, not just memory), which just seems
like the saner thing to do:
http://reviews.llvm.org/D6532

I think the lack of this is what caused me to give up, so, awesome!

The X86 stuff is much simpler:
http://reviews.llvm.org/D6533

I'll try to take a look at these soon, but the first one should get eyes
other than mine on it (Tim? Jim? Others?).