LLVMdev Digest, Vol 80, Issue 13

Andrew,
                your response highlights a naming problem in LLVM, which is that "array" and "vector"
mean the same thing in normal computer language and compiler theory usage, so it is
inconvenient and misleading within LLVM to give one a very specific meaning that is different
from the other....

to the LLVM developers I would suggest using the term "packed" to refer to the type of
data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse (among others) instruction
sets support.

As far as I am aware not a single one of any of the above types of
instruction sets allows the "subscripting" of packed data within a register (the Maspar
computer had hardware that would allow subscripting of sub-elements of data within
a larger/wider register, but it was the exception, not the rule, and it did not support
any of the saturating arithmetic that is part-and-parcel of the packed data types in
the currently existing "multi-media instruction sets").

sincerely,
Peter Lawrence.

Agreed, I too was wondering why we need both arrays and vectors. It goes against the grain, I think, of the structure typing system used by LLVM. For example, a vector of 4 floats and an array of 4 floats are structurally the same type. Would it be feasible in the future to consolidate the two types by allowing "vector" operations (add, multiply, etc.) on arrays where it makes sense, and doing away with the specialized vector types?

Andrew

Andrew,
your response highlights a naming problem in LLVM,
which is that "array" and "vector"
mean the same thing in normal computer language and compiler theory
usage, so it is
inconvenient and misleading within LLVM to give one a very specific
meaning that is different
from the other....

The only place in CS I know of that "vector" is used to mean
arbitrarily long sequence of elements is the C++ STL vector template
class. So far as I know in the field of compilers "vectorization"
always refers to lowering code down to use SIMD operations.

to the LLVM developers I would suggest using the term "packed" to
refer to the type of
data that the SPARC-VIS, the PPC-Altivec, and the Intel-mmx/sse
(among others) instruction
sets support.

"packed" is already used to refer to "packed structs", ie structs
without any padding.

Reid

Reid,
           although K&R's "The C Programming Language" does not use the term vector,
note that Stroustrup's "The C++ Programming Language" does (way before STL was ever envisioned) !

Metcalf's "Effective Fortran-77" does not use the term vector, except that it is frequently used as the "name"
for an array in array usage examples, but then in "Effective Fortran-90", when an array is used as an index
into an array it is referred to as a "vector-valued-subscript", not an "array-valued-subscript" !

Guy Steel's "Common Lisp" quote: "one-dimensional arrays are called _vectors_ in Common Lisp...".

the equivalent and interchangable usage of "array" and "vector" goes back to the beginning of computing....

when multiple items are placed within a single word of computer memory or a single register, it is "packed"
data, regardless of whether that it is of uniform type and size (array/vector-like) or of different type
and size (struct/record-like).

best,
-Peter Lawrence.

Andrew,
This is one area of LLVM that maps very nicely to our GPU architecture. A vector is a native data type on these architectures. For example, on AMD's GPUs, the native type is 4x32bit vector with sub-components. Each of the individual 32bits can be indexed separately, but not dynamically. This is a big difference from an array of 32bit values. So there are cases where the meaning of the two are different on modern hardware.

Micah

I'm just suggesting that from the perspective of the LLVM IR there doesn't seem to be a necessary semantic difference between arrays and vectors. Arrays provide a superset of the functionality available for vectors. I would be happy if the code generator used 4x32bit vectors for basic math on [4xfloat] arrays, and fell back to something less efficient if the user decided to dynamically index it. However, maybe this is more work for the code generator than is currently feasible.

I'm just suggesting that from the perspective of the LLVM IR there
doesn't seem to be a necessary semantic difference between arrays and
vectors. Arrays provide a superset of the functionality available for
vectors. I would be happy if the code generator used 4x32bit vectors
for basic math on [4xfloat] arrays, and fell back to something less
efficient if the user decided to dynamically index it. However, maybe
this is more work for the code generator than is currently feasible.

You could make the same argument for arrays and structs: {i32,i32,i32,i32} is (almost) semantically equivalent to [4 x i32]. Similarly, pointers and integers are semantically very similar. You could even say that an i32 and f32 should be the same type and just have different operations available on them. This would map naturally to SSE for example.

The answer here is that having these differentiated in the type system makes analysis, optimization, and code generation easier. Having explicit vector types is definitely goodness.

-Chris

Peter Lawrence <peterl95124@sbcglobal.net> writes:

Andrew, your response highlights a naming problem in LLVM, which is
that "array" and "vector" mean the same thing in normal computer
language and compiler theory usage, so it is inconvenient and
misleading within LLVM to give one a very specific meaning that is
different from the other....

I think any sort of separation at all is counterproductive. The
existing array/vector split is artificial. It would be better to have
one array-like type and allow a reasonable set of operations on it.
Target lowering should take care of legality issues. For best
performance various transformation patterns will want to know about the
target but that's true independent of vector types. Scalar optimizers
want to know about targets too.

As far as I am aware not a single one of any of the above types of
instruction sets allows the "subscripting" of packed data within a
register

Given what we know of Larrabee and speculating that the "Knights" family
is likely a derivative of it, it's safe to assume that future Intel
architectures will be much more like traditional vector machines. That
means gather/scatter, element indexing, etc. The existing PINSR/PEXTR
and shuffle instructions already allow a degree of element indexing.
Note that the existing LLVM vector types already have insert/extract
operators.

Unifying array and vector and generalizing the result would open a lot
of optimization opportunities.

                              -Dave

Andrew Clinton <andrew@sidefx.com> writes:

Agreed, I too was wondering why we need both arrays and vectors. It
goes against the grain, I think, of the structure typing system used by
LLVM. For example, a vector of 4 floats and an array of 4 floats are
structurally the same type. Would it be feasible in the future to
consolidate the two types by allowing "vector" operations (add,
multiply, etc.) on arrays where it makes sense, and doing away with the
specialized vector types?

Actually, I do not believe an array of 4 floats and a vector of 4 floats
are the same. There are various bit packing and alignment semantics
that differ.

                            -Dave

Reid Kleckner <reid.kleckner@gmail.com> writes:

So far as I know in the field of compilers "vectorization" always
refers to lowering code down to use SIMD operations.

Not true at all. "Vectorization" as mapping exclusicely to fixed-length
SSE-style instructions is relatively new. Traditionally, vectorization
has meant targeting vector registers with variable-length operations and
masks. The SSE-style instructions sets are very limited in comparison.

The Allen/Kennedy book is a good resource.

http://www.amazon.com/Optimizing-Compilers-Modern-Architectures-Dependence-based/dp/1558602860

                           -Dave

Dave,

Unifying array and vector and generalizing the result would open a lot
of optimization opportunities.

you would be piling an incomplete optimization on top of a pile of already
incomplete optimizations... Vectorization in Fortran is already a "hard" problem,
requiring alias analysis (always an incomplete and inaccurate (conservative)
analysis) and loop-carried array subscript dependence analysis (which is
equivalent to the Diophantine Equation problem in Mathematics which is in
general not solvable, so you end up again with incomplete and inaccurate
(conservative) analysis). Doing this in C (without first class array/matrix
types), and with even more alias analysis issues, makes it more problematic.

The final straw with all these multimedia instruction sets is that they require
large alignment on their "packed array" data types (even Intel which started
out not requiring alignment with MMX (though unaligned data invoked hugh
performance penalties), did evolve with SSE to what everyone else requires).

data alignment can (and should!) be analyzed within the same algorithms that do
alias analysis, and the analysis has the same inherent limitations.

The problem is that in real world applications it is typical for array data slices
(ie sections of arrays that are passed to subroutines to be processed) to be unaligned,
even if the array base address is aligned, the bounds of the section being processed
are in general not aligned.

You end up with wanting to clone your algorithm kernel for various incoming
alignments (just like memcpy, memcmp, etc are often cloned internally for
various relative alignments of the incoming arguments), but with a kernel
accessing N different arrays you end up needing 2**N clones, which in general
is an impractical code-explosion.

The reason I object to the use of "vector" and "simd" when describing these
"packed data" multimedia instruction sets is that in practical reality the traditional
vectorization optimization technology just does not apply. You can always
come up with geewiz examples where it does, but you cannot make it work
in the general case.

No matter what fancy data shuffling/permuting/inserting/extracting instructions
get added to MMX/SSE/SSE2, they will still not solve the data alignment problem,
so the instruction sets remain incompatible with "traditional vector machines"
where there was always one-data-item-per-HW-register and there was never
any alignment issue.

best,
Peter Lawrence.