Haswell New Instructions

Nicolas_Capens · June 13, 2011, 11:41am

Hi all,

Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it’s looking quite revolutionary: http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lr

It includes powerful ‘gather’ instructions, which allow reading multiple vector elements from non-contiguous memory locations. It also extends all integer vector instructions to 256-bit, and can shift vector elements by independent counts. This offers tremendous opportunity for auto-vectorizing loops, since practically every scalar operation will have a direct vector equivalent. It also facilitates implementing throughput computing languages like OpenCL.

So I was wondering whether in LLVM a gather operation is best represented with a ‘load’ instruction taking vector operands, or whether it’s better to define it as a separate ‘gather’ instruction. What would be the pros and cons of each approach, and what do you think should be the long-term goals for the LLVM instruction set?

Cheers,

Nicolas

Jose_Fonseca · June 13, 2011, 12:37pm

Hi all,

Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it’s looking quite revolutionary: http://software.intel.com/en-us/forums/showthread.php?t=83399&o=a&s=lr

It includes powerful ‘gather’ instructions, which allow reading multiple vector elements from non-contiguous memory locations. It also extends all integer vector instructions to 256-bit, and can shift vector elements by independent counts. This offers tremendous opportunity for auto-vectorizing loops, since practically every scalar operation will have a direct vector equivalent. It also facilitates implementing throughput computing languages like OpenCL.

So I was wondering whether in LLVM a gather operation is best represented with a ‘load’ instruction taking vector operands, or whether it’s better to define it as a separate ‘gather’ instruction. What would be the pros and cons of each approach, and what do you think should be the long-term goals for the LLVM instruction set?

Cheers,

Nicolas

David_A_Greene1 · June 15, 2011, 7:31pm

Jose Fonseca <jfonseca@vmware.com> writes:

The important thing IMO, is to not represent the gather operation as
an instruction which takes a vector of pointers, because that's too
restrictive for architectures with 64bits pointers.

How is it restrictive?

What one most frequently wants to do in those architectures is to specify a
64bit scalar base pointer with a vector of 32bit offsets.

Or 64-bit offsets. We should not restrict offsets to 32 bits.

-Dave

David_A_Greene1 · June 15, 2011, 7:32pm

"Nicolas Capens" <nicolas.capens@gmail.com> writes:

Hi all,

Intel has just revealed its AVX2 instruction set, to be supported by the 2013 Haswell architecture, and it's looking quite
revolutionary: Developer Software Forums - Intel Community

Hooray!

But boo! No 64-bit integer multiply yet.

In any case, once I get the major AVX changes in, it should be almost
trivial to add HNI support. I've got some more patches ready to go in
ASAP. We're very close to sending "the big one" up.

-Dave

David_A_Greene1 · June 15, 2011, 9:40pm

greened@obbligato.org (David A. Greene) writes:

Jose Fonseca <jfonseca@vmware.com> writes:

The important thing IMO, is to not represent the gather operation as
an instruction which takes a vector of pointers, because that's too
restrictive for architectures with 64bits pointers.

How is it restrictive?

Ah, I think you mean you don't want it ONLY to allow a vector of
pointers. I absolutely agree with this view.

What one most frequently wants to do in those architectures is to specify a
64bit scalar base pointer with a vector of 32bit offsets.

Or 64-bit offsets. We should not restrict offsets to 32 bits.

To reiterate, a base address + vector of indices gets my vote. If the
base happens to be zero and the indices happen to be pre-scaled pointer
values, so be it.

The raises the question of whether indices get scaled by the vector
element type size. This would compilcate the semantics of load, I
think, because getelementptr is really the instruction that does the
scaling. If we have a gather operation (wether a load with vector of
indices or a special instruction) it seems that we will need some kind
of vector getelementptr as well.

-Dave

Jose_Fonseca · June 17, 2011, 5:26pm

greened@obbligato.org (David A. Greene) writes:

> Jose Fonseca <jfonseca@vmware.com> writes:
>
>> The important thing IMO, is to not represent the gather operation
>> as
>> an instruction which takes a vector of pointers, because that's
>> too
>> restrictive for architectures with 64bits pointers.
>
> How is it restrictive?

Ah, I think you mean you don't want it ONLY to allow a vector of
pointers. I absolutely agree with this view.

>> What one most frequently wants to do in those architectures is to
>> specify a
>> 64bit scalar base pointer with a vector of 32bit offsets.
>
> Or 64-bit offsets. We should not restrict offsets to 32 bits.

To reiterate, a base address + vector of indices gets my vote. If
the
base happens to be zero and the indices happen to be pre-scaled
pointer
values, so be it.

Precisely.

The raises the question of whether indices get scaled by the vector
element type size. This would compilcate the semantics of load, I
think, because getelementptr is really the instruction that does the
scaling. If we have a gather operation (wether a load with vector of
indices or a special instruction) it seems that we will need some
kind
of vector getelementptr as well.

-Dave

Good question.

At any rate, I agree with everybody here on that we should start with intrinsics.

Jose

Topic		Replies	Views
Haswell New Instructions LLVM Dev List Archives	2	66	June 15, 2011
Gather load in LLVM IR LLVM Dev List Archives	4	84	January 22, 2014
Indexed Load and Store Intrinsics - proposal LLVM Dev List Archives	20	243	March 15, 2015
LoopVectorize module - some possible enhancements LLVM Dev List Archives	2	97	August 24, 2016
New AVX512{VL,BW,DQ} features enabled in LLVM LLVM Dev List Archives	0	144	July 21, 2014

Haswell New Instructions

Related topics