"Rotem, Nadav" <nadav.rotem@intel.com> writes:

I agree that a single vector index is sufficient for many cases. Matt Pharr (from the ISPC compiler), showed me an interesting case where there is a single pointer into an array. In this case we need to have two indices, where the first index is zero. Once the basic patch is in, we can start looking at adding support for arrays and multiple indices.

Ah, I see. I forgot about the GEP zero offset case. This is indeed a

common case. In fact I would like to see a couple of additions:

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 w, i32 x, i32 y, i32 z>

That is, make the base address a scalar and all vector indices get

scaled and added to it to produce the vector of pointers. This is the

case Matt gave you and is very common. It's used for striding through

an array or G/S where all of the elements are in the same array.

This avoids the need to broadcast the base address to a vector. We

could match the broadcast in codegen and revert it to an instruction

that takes a scalar base, but it's a bit trickier. We'd probably have

to make special target ISD nodes _a_la_ the X86 shuffle stuff. The

above would make this more target-independent. It is easy enough for a

target to lower the base address to a broadcast if necessary.

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %index, i32 %stride

This is for common strided accesses, for example:

x[i], x[i+2], x[i+4], x[i+6]

The stride component would be multiplied against an incremented index:

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %i, i32 %stride

=> <4 x i32 *><0+%i*%stride*bitwidth, 0+(%i+1)*%stride*bitwidth, 0+(%i+2)*%stride*bitwidth, 0+(%i+3)*%stride*bitwidth>

This avoids the need to generate a vector of indices that are a regular

strided pattern. Similar to the above, it simplifies matching on vector

architectures that directly support strided accesses.

I would really like to see these two versions of GEP added at some

point. They would simplify isel quite a bit.

Some other ideas:

An alternative is to introduce a cidx (compressed index) instruction:

%IV = cidx i32 %stride, i32 %vector_length

(if %vector_length is 4) => <4 x i32><%stride*0, %stride*1, %stride*2, %stride*3>

This generates the strided index pattern which could be fed into the GEP

as an index vector.

Or:

%IV = cidx i32 %i, i32 %stride, i32 %vector_length

(if %vector_length is 4) => <4 x i32><%stride*%i, %stride*(%i+1), %stride*(%i+2), %stride*(%i+3)>

You can see examples of these and other instructions from the Cray X1

here:

http://docs.cray.com/books/S-2314-51/S-2314-51-manual.pdf

2.5.4 is the section of interest for index generation.

In particular, the scan() and cmprss() (compress) instructions are also

very useful to generate index sets.

Note that the X1 index generation instructions also take a mask

(predicate) operand. The difference between cidx() and scan() is how

the mask and index are interpreted. scan() in particular is useful for

vector compress/expand operations. This is something else to think

about if you plan to tackle predication in the future.

These are just some additional ideas to think about as you continue to

design this.

-Dave