Does it make sense to upstream some MVT's?

Hi,

Our backend for Pixel Visual Core uses some MVT’s that aren’t upstream. Does it make sense to upstream them? I figure that as architectures get wider, we’ll eventually have “all” possible combinations of widths and types, but on the other hand having code that isn’t used by current backends in tree isn’t great.

These are the MVT’s that we have added:

16x16 element (2D SIMD) 1-bit predicate registers:
v256i1

16x16 element (2D SIMD) 16-bit registers:
v256i16

20x20 element (2D SIMD) 16-bit registers: (we round up to v512 instead of v400):
v512i16

32-bit versions of the above 16-bit registers (to represent 32-bit accumulators for MAD instructions and also dual-issue “wide” instructions to the dual non-MAD ALU’s in each lane)
v256i32
v512i32

For those interested in more details about Pixel Visual Core, the 6th edition of Hennessy and Patterson’s “Computer Architecture: A Quantitative Approach” http://a.co/et2K1xk has a section about it (Section 7.7 pg 579-592). I’ll bring my copy to the next Bay Area LLVM Social if folks want to take a look.

– Sean Silva

Hi Sean,

I had to add ‘v16f16’ to our out-of-tree target, and this was to primarily to allow me to express lowering for all the OpenCL types (well, except for the ‘v3T’ types).

The trend does seem to be towards larger bit-width SIMD registers, and as you say this will increase in time; but perhaps instead of using a discrete enumeration combined with additional entries in several switch-statements, it might be better to rethink MVTs using templates so that they can be instanced automatically as needed by a target. That might be one way of avoiding the problem of having either a sparse population of MVTs as needed by the sum of all in-tree targets, an on the other-hand the bloat of expressing all possible combinations.

How does LLVM handle 2D vectors/matrices? I haven’t moved on to v6.0.0 yet, but so far as I can tell v5.0.x only abstracts 1D vectors: N-elements of M-bits, and having types like ‘v256i16’ is not quite the same as having support for let’s say ‘v16x16i16’. Having a high-level abstraction for reasoning about NxN-elements of M-bits would be really useful/cool, especially for exotic instructions with special register allocation requirements, and for classic nested loops such as convolutions.

MartinO

Hi all,

Progress in machine learning gives rise to many cores designed for this task. They tend to have wide registers, I know about a core that operates vector types up to 2K bytes. Support of wide vector types off the shelf can facilitate compiler development in such cases, because adding new types is not merely several lines in ValueTypes.td.

Hi Sean,

I had to add ‘v16f16’ to our out-of-tree target, and this was to
primarily to allow me to express lowering for all the OpenCL types (well,
except for the ‘v3T’ types).

The trend does seem to be towards larger bit-width SIMD registers, and as
you say this will increase in time; but perhaps instead of using a discrete
enumeration combined with additional entries in several switch-statements,
it might be better to rethink MVTs using templates so that they can be
instanced automatically as needed by a target. That might be one way of
avoiding the problem of having either a sparse population of MVTs as needed
by the sum of all in-tree targets, an on the other-hand the bloat of
expressing all possible combinations.

How does LLVM handle 2D vectors/matrices?

We just use sufficiently wide 1D types. LLVM doesn't need to know anything
about the 2D nature of the underlying SIMD. It is only visible via a
restricted set of operations that we have intrinsics for.

-- Sean Silva

To expand on what Serge said, the vector units on high end nVidia cards are now processing 4x4x4 matrices per instruction. Handling that now (I haven’t been keeping up on the NVidia backend so I’m not sure what hs been done) may be better than later. Similarly Intel has both a neural net chip coming up which will likely need matrix types, and there’s no obvious reason they couldn’t perform matrix operation on AVX512 registers in a future ISA.

Cheers,