[LLVM, llc] TypeLegalization, DAGCombining, vectors loading

Hi all. The question about 'load' instruction.
When we promote
v2i5 = load <addr> ; <MemoryVT = v2i5>
to
v2i64 = load <addr> ;<MemoryVT = v2i5>

should we insert vector shuffling that moves second v2i5 item to the second v2i64 item?

Or it is still depends from target?

Thanks.

-Stepan.

Hi Stepan, it was never really decided how to represent v2i5 in memory
(bitpacked?), and the code generators just don't support it right now.

Hi all. The question about 'load' instruction.
When we promote
v2i5 = load<addr> ;<MemoryVT = v2i5>
to
v2i64 = load<addr> ;<MemoryVT = v2i5>

should we insert vector shuffling that moves second v2i5 item to the
second v2i64 item?

This question doesn't make any sense to me. The operation should result
in the first i5 being in the low 5 bits of the first i64, and the second
i5 being in the low 5 bits of the second i64. That's the definition of
this extending load operation. Talking about shuffling only makes sense
in terms of a particular implementation of this operation, and I'm not
sure what you have in mind.

Ciao, Duncan.

Probably, I misunderstood MemoryVT purpose? Should it be a type that equal to original vector type (e.g. v2i5). Or it is a type of memory area for this vector (e.g. v2i8) ?

-Stepan.

Stepan Dyatkovskiy wrote:

Yes. It doesn't works properly. I also read the your discussion in bug 1784: http://llvm.org/bugs/show_bug.cgi?id=1784
I found that know Type and Vector Lagalization and in DAGCombining implicitly assumed that element size of MemoryVT is multiply of 8 bits. Thats the main reason why v2i5 works improperly with load/store. But I can't determine exactly what MemoryVT means...

-Stepan.

Stepan Dyatkovskiy wrote:

Hi Stepan,

Yes. It doesn't works properly. I also read the your discussion in bug 1784:
http://llvm.org/bugs/show_bug.cgi?id=1784
I found that know Type and Vector Lagalization and in DAGCombining implicitly
assumed that element size of MemoryVT is multiply of 8 bits. Thats the main
reason why v2i5 works improperly with load/store. But I can't determine exactly
what MemoryVT means...

do you understand what it means in the non-vector case?

Ciao, Duncan.

Duncan Sands wrote:
> do you understand what it means in the non-vector case?

I'm beginning to understand it now. It means the type that should be in abstract VM memory. So this type should be original always (as it was defined in .ll) isn't it?

Please ignore my concurrent post :slight_smile: Lets proceed in this branch.

do you understand what it means in the non-vector case?

I'm beginning to understand it now. It means the type that should be in
abstract VM memory. Isn't it? The main question about MemoryVT is: should it be original always (as it was defined in .ll) or not?

About vectors with element size less than 8 bits. This topic is interesting for me. I would like to work with it. What is the best place for discussing? llvmdev or bug #1784 (vectors of i1 and vectors x86 long double don't work) ?

I tried to fix PR1784 multiple times. I have since had
some insights which have changed my mind.

<4 x i32> on a machine with <8 x i32> vectors misses out on
50% of the theoretical performance. <8 x i32> on a machine
with only <4 x i32> takes on unneeded code bloat and register
pressure. No amount of heroism in LegalizeTypes can change
this basic situation.

The further you go, either in the conceptual distance
between code and target machine, or in diversity of target
machines, the worse the problem gets.

Also, all of the proposed solutions for fixing exotic
vector types have substantial downsides.

So in addition to asking "why doesn't <2 x i5> work?", it's
also useful to ask "who is producing <2 x i5> values, and
what am I expecting to get out of letting them do that?"

Dan

The further you go, either in the conceptual distance
between code and target machine, or in diversity of target
machines, the worse the problem gets.

Yes. Very SelectionDAG seems very complex for me. I spend all previous week to learn it, but only now I'm begin to understand how all its parts cooperates together.

Also, all of the proposed solutions for fixing exotic
vector types have substantial downsides.

So in addition to asking "why doesn't <2 x i5> work?", it's
also useful to ask "who is producing <2 x i5> values, and
what am I expecting to get out of letting them do that?"

2 x i5 ... Probably is not used anywhere. May be in 29 bit CPUs... But what about N x i1 or N x i4? Since llvm assumes byte addressing there is a reason to extend this kind of vectors to N x i8. This is a complex problem though. Probably I missed something. But I have some draft patches that fixes load+store problem for arm and for X86 rounding the size of vector element.

-Stepan.

Dan,

I completely agree with you. The vectorizer (or whoever generates this vector code) should be aware of the target instruction set and decide on the vectorization factor accordingly. When our vectorizer[1] decides on the vectorization factor, it takes into account the available instruction set, as well as the operations used in the program.
For example, AVX1 focuses on floating point operations, and vectorizing integer code to VF=8, would generate suboptimal code, because it would require the op legalizer to unpack/pack operations on each 'hole' in the instruction set.

Thanks,
Nadav

[1] Intel's OpenCL SDK Vectorizer

Hi Nadav,

I completely agree with you. The vectorizer (or whoever generates this vector code) should be aware of the target instruction set and decide on the vectorization factor accordingly. When our vectorizer[1] decides on the vectorization factor, it takes into account the available instruction set, as well as the operations used in the program.
For example, AVX1 focuses on floating point operations, and vectorizing integer code to VF=8, would generate suboptimal code, because it would require the op legalizer to unpack/pack operations on each 'hole' in the instruction set.

for what it's worth I agree too. While I think support for <2 x i5> and
friends should be added some day, that's only so we can bask in the quiet
enjoyment of knowing that the code generators are "complete", not because
it is actually useful for anything.

Ciao, Duncan.