"Rotem, Nadav" <firstname.lastname@example.org> writes:
I agree that a single vector index is sufficient for many cases. Matt Pharr (from the ISPC compiler), showed me an interesting case where there is a single pointer into an array. In this case we need to have two indices, where the first index is zero. Once the basic patch is in, we can start looking at adding support for arrays and multiple indices.
Ah, I see. I forgot about the GEP zero offset case. This is indeed a
common case. In fact I would like to see a couple of additions:
%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 w, i32 x, i32 y, i32 z>
That is, make the base address a scalar and all vector indices get
scaled and added to it to produce the vector of pointers. This is the
case Matt gave you and is very common. It's used for striding through
an array or G/S where all of the elements are in the same array.
This avoids the need to broadcast the base address to a vector. We
could match the broadcast in codegen and revert it to an instruction
that takes a scalar base, but it's a bit trickier. We'd probably have
to make special target ISD nodes _a_la_ the X86 shuffle stuff. The
above would make this more target-independent. It is easy enough for a
target to lower the base address to a broadcast if necessary.
%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %index, i32 %stride
This is for common strided accesses, for example:
x[i], x[i+2], x[i+4], x[i+6]
The stride component would be multiplied against an incremented index:
%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %i, i32 %stride
=> <4 x i32 *><0+%i*%stride*bitwidth, 0+(%i+1)*%stride*bitwidth, 0+(%i+2)*%stride*bitwidth, 0+(%i+3)*%stride*bitwidth>
This avoids the need to generate a vector of indices that are a regular
strided pattern. Similar to the above, it simplifies matching on vector
architectures that directly support strided accesses.
I would really like to see these two versions of GEP added at some
point. They would simplify isel quite a bit.
Some other ideas:
An alternative is to introduce a cidx (compressed index) instruction:
%IV = cidx i32 %stride, i32 %vector_length
(if %vector_length is 4) => <4 x i32><%stride*0, %stride*1, %stride*2, %stride*3>
This generates the strided index pattern which could be fed into the GEP
as an index vector.
%IV = cidx i32 %i, i32 %stride, i32 %vector_length
(if %vector_length is 4) => <4 x i32><%stride*%i, %stride*(%i+1), %stride*(%i+2), %stride*(%i+3)>
You can see examples of these and other instructions from the Cray X1
2.5.4 is the section of interest for index generation.
In particular, the scan() and cmprss() (compress) instructions are also
very useful to generate index sets.
Note that the X1 index generation instructions also take a mask
(predicate) operand. The difference between cidx() and scan() is how
the mask and index are interpreted. scan() in particular is useful for
vector compress/expand operations. This is something else to think
about if you plan to tackle predication in the future.
These are just some additional ideas to think about as you continue to