Making GEP into vector illegal?

Hi,

I ran into a case where ScalarReplAggregates can not promote an array of vectors into registers, e..g,
     %a = alloca [8 x <2 x float>], align 8
         %arrayidx74597 = getelementptr [8 x <2 x float>]* %a, i32 0, i32 1, i32 0 ; <float*> [#uses=2]
  %tmp76 = load float* %arrayidx74597, align 8
         %arrayidx74598 = getelementptr [8 x <2 x float>]* %a, i32 0, i32 1, i32 1 ; <float*> [#uses=2]
  %tmp77 = load float* %arrayidx74598, align 4

Though we could change the algorithm to look through the vector index, it lead to an interesting question about if we should allow a getelementptr (GEP) into a vector.

A vector is not like an short array of elements. It is more of an entity in itself. When dealing with vectors, it seems to me that it is cleaner to think of them as an entity and GEP to the vector and then use extract element to access the element of the vector instead of using a GEP to get a pointer into a vector and loading it directly. If a client wants to do this, it would seem cleaner to force the client to bitcast it to an array and then do a GEP to the element. This would it more similar to what clients have to do to get to a pointer to a byte in a word today. For ScalarReplAggr, we would want to promote the vector itself and not to treat it as promoting the float elements.

I would like to make it illegal to GEP into a vector as I think it is cleaner and more consistent. Opinions? Comments?

   -- Mon Ping

I completely agree with you. Our current approach is similar to allowing GEP to index into the 3rd byte of an i32 for example.

-Chris

Hi Mon Ping,

I would like to make it illegal to GEP into a vector as I think it is
cleaner and more consistent. Opinions? Comments?

now that arrays are first class types, I think vectors should become
a subclass of ArrayType. This would get rid of a lot of duplicated
code, and also fix a bunch of problems with vectors (eg: that sizes
are wrong; currently we think that a <n x i1> has length n bits, and
that a vector <n x x86_fp80> has length 80*n bits, both of which are
untenable). From this point of view you have to allow GEP into a
vector; the conclusion I suppose is that codegen needs to replace
GEP+load or GEP+store with an extract or insert operation.

Ciao,

Duncan.

I agree that disallowing GEP from indexing into vector elements seems
a little cleaner and more consistent. I have used GEPs to index into
vectors once (in code not in the LLVM tree), but I'm pretty sure that
code could be rewritten to avoid needing that.

It's currently illegal to bitcast directly to or from an array type
though, so the workaround for people who need to index into vector
elements with a GEP is to bitcast the GEP's pointer operand.

Dan

Hi Mon Ping,

I would like to make it illegal to GEP into a vector as I think it is
cleaner and more consistent. Opinions? Comments?

now that arrays are first class types, I think vectors should become
a subclass of ArrayType. This would get rid of a lot of duplicated
code, and also fix a bunch of problems with vectors (eg: that sizes
are wrong; currently we think that a <n x i1> has length n bits, and
that a vector <n x x86_fp80> has length 80*n bits, both of which are
untenable).

I'm happy about factoring the code better, but a vectortype isn't an arraytype (isa<ArrayType>(V) should be false). Maybe a common base class (like sequential type) would be better?

From this point of view you have to allow GEP into a
vector; the conclusion I suppose is that codegen needs to replace
GEP+load or GEP+store with an extract or insert operation.

With that logic, there is no difference at all between an array and vector... I disagree very strongly about this.

-Chris

[...]

From this point of view you have to allow GEP into a
vector; the conclusion I suppose is that codegen needs to replace
GEP+load or GEP+store with an extract or insert operation.

With that logic, there is no difference at all between an array and
vector... I disagree very strongly about this.

Can you elaborate your arguments for that disagreement?

(I like the notion that a "vector type" is just a "fixed-length array type" for which I've hinted that the critical operations will be array-wise rather than element-wise.)

  Daveed

Hi,

Something like a sequential type makes sense especially in light of what Duncan is point out. I agree with Chris that a vector shouldn't be treated as a short array. Vectors have different alignment rules and operations. It make senses to talk about doing operations on vectors like add or speak of having a mask for a vector. I don't feel like that it make sense to talk of arrays in that context. Also, I don't think of looping over a vector in the same sense of an array. Also for me, a pointer to an 2nd vector element feels very similar to getting a pointer to a 2nd word of an 64 bit integer and less than a pointer to the 2nd element in an array. If we go toward treating the array model, theoritically one could use extract or insert for an array or we get rid of those operations, and have clients uses GEP to modify a vector element. Either of them seems wrong to me for vectors.

-- Mon Ping

Hi,

I didn't know that about bitcast. This becomes important if someone wants a pointer to a vector element as we would either need to support the bistcast or we just don't allow that. For most vector code that I'm aware of, no one really wants to take an address of vector element as by doing so, one is kind of scalarizing the vector which would be bad for performance. If someone has a case where this is useful, I would like to see it as it make help us make a better decision in this area.

   -- Mon Ping

In Joe programmer language (i.e. C :wink: ), are we basically talking about disallowing:

float4 a;
float* ptr_z = &a.z;

?

Won't programmers just resort to:

float4 a;
float* ptr_z = (float*)(&a) + 3;

?

In Joe programmer language (i.e. C :wink: ), are we basically talking
about disallowing:

float4 a;
float* ptr_z = &a.z;

?

That's my reading as well; the argument for not allowing it is just to
make optimization easier. We don't allow addressing individual bits
either, and compilers obviously have to work around that for
bitfields.

AFAIK, both clang and gcc currently don't allow constructs like "&a.z".

Won't programmers just resort to:

float4 a;
float* ptr_z = (float*)(&a) + 3;

?

Maybe... although note that with gcc vector intrinsics, this violates
strict aliasing. gcc does allow you to use a slightly more elaborate
workaround with a union, though.

-Eli

Hi Mon Ping,

I think I misunderstood your comment about bitcasts.

It's valid to bitcast from any pointer type to any pointer type
(in LLVM IR). That including pointers to arrays or pointers to
vectors, so I think what you're talking about should be fine.

I was just pointing out that it's not valid to bitcast actual
array values (in registers) to other types, or other values
directly to array type.

Dan

Hum what's your take on this then:

/* The Intel API is flexible enough that we must allow aliasing with other
    vector types, and their scalar components. */
/* APPLE LOCAL 4505813 */
typedef long long __m64 __attribute__ ((__vector_size__ (8), __may_alias__));

:slight_smile:

In Joe programmer language (i.e. C :wink: ), are we basically talking
about disallowing:

float4 a;
float* ptr_z = &a.z;

?

That's my reading as well; the argument for not allowing it is just to
make optimization easier. We don't allow addressing individual bits
either, and compilers obviously have to work around that for
bitfields.

AFAIK, both clang and gcc currently don't allow constructs like "&a.z".

Yes, that is what we are talking about and Eli is right that it is to make optimization easier.

Won't programmers just resort to:

float4 a;
float* ptr_z = (float*)(&a) + 3;

?

Maybe... although note that with gcc vector intrinsics, this violates
strict aliasing. gcc does allow you to use a slightly more elaborate
workaround with a union, though.

Sure, someone can write that. Of course, they are making assumptions on how the float4 is stored in memory. From the point of view of the IR, that is fine. We are not generating a GEP into a vector type but GEP from a float pointer. Someone may get worse performance though if someone writes code like that.

   -- Mon Ping

This discussion is specifically about how to model this stuff in the LLVM IR, not whether a specific language accepts it. As Eli mentions, no current llvm front-end supports that. However, even if there was one that did, we could model this even if the change happens, so nothing is lost.

-Chris

This is actually completely different AFAIK, this allows things like:

((float*)&myvec4)[2]

which is exactly what the proposal wants to continue supporting in the IR.

-Chris

Maybe... although note that with gcc vector intrinsics, this violates
strict aliasing. gcc does allow you to use a slightly more elaborate
workaround with a union, though.

Hum what's your take on this then:

/* The Intel API is flexible enough that we must allow aliasing with
other
  vector types, and their scalar components. */
/* APPLE LOCAL 4505813 */
typedef long long __m64 __attribute__ ((__vector_size__ (8),
__may_alias__));

This is actually completely different AFAIK,

That statement was that:

float4 a;
float* ptr_z = (float*)(&a) + 3;

``violates strict aliasing``

That assertion is wrong. The docs says:

@item may_alias
Accesses to objects with types with this attribute are not subjected to
type-based alias analysis, but are instead assumed to be able to alias
any other type of objects, just like the @code{char} type.

clearly, is _m64 is to be treated as char, then, all stores before it are complete and no loads can be moved before it, unless you can't tell that this happened. As for what it allows, it allows many things, including the original snippet that was said to violate strict aliasing. I can't fathom any other reading, what am I missing?

The may_alias you're pointing to was added relatively recently, a bit
before gcc 4.2 was released; note that it only affects stuff using the
_m64 type as opposed to defining a similar vector type.

-Eli

Aha, gotcha. I misunderstood the connection.

In any case, the important distinction I wanted to raise was that we're talking about how the IR *models a very specific case*, not *eliminating a capability* of the IR.

-Chris

Just for reference, my C example was for my own clarification.

I dived into LLVM having to write a TargetMachine and I've been keeping busy without having to learn much IR (yet). I was really trying to use C as a pseudo-IR.

I get that the idea is allowing IR to directly express the address of part of a vector complicates (prevents?) certain optimizations.

However, due to my own ignorance, I don't understand why.

My first thought was that a GEP to part of a vector shouldn't really pose any more complications (restrictions?) than a GEP of the vector as a whole. That's what I was trying to get at with my example.

Although I could see how this might complicate, say, mem2reg. I would think that if you don't index with the result of a GEP of the whole vector (or pass it to a function, or...), then moving it into a register is a "clean" operation. However, a GEP of part of the vector would immediately put you into a more complicated situation.

Or something like that...

Dan

Hi Chris,

I'm happy about factoring the code better, but a vectortype isn't an
arraytype (isa<ArrayType>(V) should be false). Maybe a common base
class (like sequential type) would be better?

currently anything you can do with an array you can do with a vector.
So from this functional viewpoint it would make sense to have a vector
be an array with more stuff (i.e. a subclass of ArrayType). However I
appreciate that there's a difference: a multi-element vector can be held
in a machine register, while that's not the case for an array. But then
again, why not place first-class arrays of appropriate size in such
machine registers? Also, at the IR level is there any practical advantage
to having vectors not be a kind of array - what does disallowing GEP win
you? Is this entirely a codegen issue?

By the way, deriving from SequentialType doesn't make much sense to me
because SequentialType only exists to unify the types on which you can
do GEP (at least that's my understanding) - but you want to disallow GEP
on vectors!

> From this point of view you have to allow GEP into a
> vector; the conclusion I suppose is that codegen needs to replace
> GEP+load or GEP+store with an extract or insert operation.

With that logic, there is no difference at all between an array and
vector... I disagree very strongly about this.

There is a difference of course: vectors can do more than arrays. That's
why VectorType would derive from ArrayType :slight_smile: I understand that probably
in your head they feel different. I say: change your head! :slight_smile:

Ciao,

Duncan.