Multi-dimensional array accesses in LLVM-IR | Thoughts

Hi Hal,

you are right. Preston's dependence-analysis and Polly are sharing the same
problem. As far as I know, non of us implemented a solution yet.
Preston, what is your status on multi-dim arrays?

As stated earlier the problem is that access functions to multi-dimensional arrays
are linearized at LLVM-IR level. Unfortunately it is a lot harder to reason about
linearized accesses. Hence, we want to recover the multi-dimensional access function.

In LLVM-IR words: When we currently analyze a multi-dimensional
variable length memory access, we calculate a single SCEV expression
which describes the memory access. However, we would prefer to have for
each array dimension a separate SCEV* expression. Those individual expressions
will be a lot easier to analyze.

I think there are two major directions we can go:

1) Represent multi-dimensional access functions in LLVM-IR

Instead of linearizing multi-dimensional array accesses to single
dimensional array accesses, we extend LLVM-IR such that it can express
multi-dimensional accesses and we ensure that front-ends directly
generate LLVM-IR multi-dimensional accesses.

2) Recover multi-dimensionality from linearized accesses

We take a linearized single-dimensional SCEV* expression, apply some
magic and recover both the number and size of the original array
dimensions as well as a set of SCEV* expressions describing the individual
access functions of the original dimensions.

Both approaches have advantages/disadvantages:

Approach 1)
   Pro:
  - Provides exact information about multi-dimensional arrays,
          if the multi-dimensionality is explicit in the source program

   Contra:
  - We need to explicitly express multi-dimensionality in LLVM-IR
  - Front-ends need to explicitly generate the multi-dimensional IR
  - Existing passes need to maintain and reason about the new IR
  - Only C99 supports variable length arrays. Most existing C/C++
    programs have a manual implementation of variable length
    arrays, which cannot be automatically translated by the front-end.

Approach 2)
   Pro:
  - Does not distinguish between manually simulated multi-dimensional
    arrays and multi-dimensional arrays that have been explicitly
    expressed in the source code.
  - Changes are limited to an LLVM-IR analysis

   Contra:
  - The general problem is difficult
  - We may need to add some run-time checks as statically proving
          the delinearization may not be entirely possible

I personally would first have a look at approach '2'. Access functions of
multi-dimensional follow often a very regular structure, which appears not
only for accesses generated by compiler front-ends, but also for
manual implementations of multi-dimensional arrays. One common pattern
is that there is a single parameter that represents the size of a
dimension [1]. Meaning SCEV* expressions describing access to an array "A[m][p]" will contain the sizes '%p' and '%m * %p' more or less explicit.

Here some examples I committed in r163619:

multidim_only_ivs_2d.ll:
; void foo(long n, long m, double A[n][m]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; A[i][j] = 1.0;
; }
; Access function: {{0,+,%m}<%for.i>,+,1}<nw><%for.j>

multidim_only_ivs_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i][j][k] = 1.0;
; }
; Access function: {{{0,+,(%m * %o)}<%for.i>,+,%o}<%for.j>,+,1}<nw><%for.k>

multidim_ivs_and_integer_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i+3][j-4][k+7] = 1.0;
; }
;
; Access function:
; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,
; (8 * %o)}<%for.j>,+,8}<%for.k>

multidim_only_ivs_3d_cast.ll:
; void foo(int n, int m, int o, double A[n][m][o]) {
; for (int i = 0; i < n; i++)
; for (int j = 0; j < m; j++)
; for (int k = 0; k < o; k++)
; A[i][j][k] = 1.0;
; }
;
; Access function:
; {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,
; (8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>

multidim_ivs_and_parameteric_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) {
;
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i+p][j+q][k+r] = 1.0;
; }
;
; Access function:
; {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,
; (8 * %o)}<%for.j>,+,8}<%for.k>

Those examples are obviously not exhaustive, but they already cover many common
cases. Analyzing them seems to be possible. As you may realize the size of the
array dimensions are all explicitly available in the 'step' of the SCEVAddRec
repressions. Even in the last example which contains parameters both for the
sizes and the offsets, the parameters for the sizes are the only ones that
appear on the right hand sides ('8 * %m * %o' and '8 * %o').

I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses
the dimension sizes. After having written that, it should be possible to write
another SCEVIterator that given the dimension sizes can separate the different
dimensions of a SCEV.

I doubt we will always be able to prove that our guess is correct, but we could
add a run-time check to test the conditions that we can not prove statically. This is
also the point, where I think the front-end (or the user with attributes) could help.
Accesses could be annotated with meta-data providing the size of the array dimensions,
such that our analysis can start from this meta-data. This could allow our analysis
to take some short cuts and to avoid some run-time checks.

Does this sound like a reasonable approach? Any ideas or suggestions on that topic?

Cheers
Tobi

[1] FORTRAN seems to express array size as "max(0, %dim_size)"

Hi,

I had a few thoughts about recovering multi-dimensional accesses in Polly and
maybe it's time to share them.

[...] Even in the last example which contains parameters both for the
sizes and the offsets, the parameters for the sizes are the only ones that
appear on the right hand sides ('8 * %m * %o' and '8 * %o').

I guess this is the key observation. When we have a multi-dimensional access,
the coefficients of the iterators have to form an increasing chain w.r.t.
divisibility. For the example

multidim_ivs_and_integer_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
; for (long i = 0; i < n; i++)
; for (long j = 0; j < m; j++)
; for (long k = 0; k < o; k++)
; A[i+3][j-4][k+7] = 1.0;
; }

with access function

; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,
; (8 * %o)}<%for.j>,+,8}<%for.k>

we have (writing the SCEV in ordinary notation and in a suggestive order)

   8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56

and we see that the coefficients of the iterators are 8, 8*%o, 8*%o*%m and
every coefficient in this chain divides it successor. By factoring these
coefficients out from all the term they divide, we find the subscripts for the
dimensions:

  8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A

So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access. The math
tool we'd need to implement this is (multivariate) polynomial division or
something similar (coefficients could be arbitrary polynomials, e.g., %o+5*%m+7
if somebody declares something like double A[o+5*m+7] or even more complex
expressions).

Note that two iterators can have the same coefficients (e.g., for
A[i+j][.][.]) or one iterator can have two coefficients (e.g., A[k][k][.]). Or
a "coefficient" may actually be a constant (e.g., A[42][.][.]) or missing in
one access (e.g., A[0][.][.]) so determining the coefficients is non-trivial
(but not too difficult I guess).

I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses
the dimension sizes.

I'm not sure if this cane be done on the SCEV directly (instead of polynomials)
for cases that are more complex than products of parameters but I haven't
thought about it thoroughly. OTOH, add recursions should be of the right
structure to check divisibility directly; I'm not sure without more thinking...

After having written that, it should be possible to write
another SCEVIterator that given the dimension sizes can separate the different
dimensions of a SCEV.

Here we have to be careful because when there are several accesses to the same
array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above) have to
match but also the range of the subscripts must be right (globally over all
arrays). For example with two accesses

  A[i+4][j-4][k-7]
  A[...][i-k][3*j]

we'd need to check that

   max{j-4, i-k} - min{j-4, i-k} <= %m - 1
   max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1

so there is no (inconsistent) overrun between the dimensions (%m and %o are the
factors "between" the dimensions). Note that this also ensures that the factors
between the dimensions are positive, i.e., no dimension is collapsed.

This check can, of course, be relaxed to checking

   0 <= j-4 <= %m - 1
   0 <= k-7 <= %o - 1

   0 <= i-k <= %m - 1
   0 <= 3*j <= %o - 1

(allowing for independent checks for each access) in the assumption that
subscripts start from 0 but I'm not sure how often this will hold on the IR
(after loop index and array base pointer shifts during optimization).

The problem here is that the factors between the dimensions (%m and %o here)
could be non-affine.

I doubt we will always be able to prove that our guess is correct, but we could
add a run-time check to test the conditions that we can not prove statically. This is
also the point, where I think the front-end (or the user with attributes) could help.
Accesses could be annotated with meta-data providing the size of the array dimensions,
such that our analysis can start from this meta-data. This could allow our analysis
to take some short cuts and to avoid some run-time checks.

Makes sense. I think that it should not be too difficult to implement the above
approach for affine factors between the dimensions (which should cover the
common cases) and then we should evaluate how often we need run-time checks and
whether we need more information from the front-ends because other cases occur
in practice.

So far my thoughts for now.

Regards,
Armin

Hi,

I had a few thoughts about recovering multi-dimensional accesses in
Polly and maybe it's time to share them.

> [...] Even in the last example which contains parameters both for
> the sizes and the offsets, the parameters for the sizes are the
> only ones that appear on the right hand sides ('8 * %m * %o' and '8
> * %o').

I guess this is the key observation. When we have a multi-dimensional
access, the coefficients of the iterators have to form an increasing
chain w.r.t. divisibility. For the example

> multidim_ivs_and_integer_offsets_3d.ll:
> ; void foo(long n, long m, long o, double A[n][m][o]) {
> ; for (long i = 0; i < n; i++)
> ; for (long j = 0; j < m; j++)
> ; for (long k = 0; k < o; k++)
> ; A[i+3][j-4][k+7] = 1.0;
> ; }

with access function

> ; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m *
> %o)}<%for.i>,+, ; (8 * %o)}<%for.j>,+,8}<%for.k>

we have (writing the SCEV in ordinary notation and in a suggestive
order)

   8*%o*%m * i + 8*%o * j + 8 * k + 24*%m*%o - 32*%o + %A + 56

and we see that the coefficients of the iterators are 8, 8*%o,
8*%o*%m and every coefficient in this chain divides it successor. By
factoring these coefficients out from all the term they divide, we
find the subscripts for the dimensions:

  8*%o*%m * (i+3) + 8*%o * (j-4) + 8 * (k+7) + %A

So A[i+4][j-4][k-7] is our candidate for a multi-dimensional access.
The math tool we'd need to implement this is (multivariate)
polynomial division or something similar

Agreed. Furthermore, I already have code that does this (I've started
working on a transformation pass which re-factors and optimizes integer
polynomial expressions, so I can repurpose code from there). It sounds
like we just need to extract integer polynomials used by GEPs
containing induction variables, and, order-by-order, factor out the
non-induction variables. That will give us the 'size' of each
dimension. As you've noted below, we need to make sure that no index
exceeds its size, and that all accesses to an array within a given loop
are consistent. Does that sound reasonable?

-Hal

I’m travelling right now, so can’t fully participate.
But I’d encourage people to read Maslov’s paper

Delinearization: an Efficient Way to Break Multiloop Dependence Equations

Vadim Maslov
PLDI '92

Wolfe’s book shows the same approach.My impression was that he can do most, but not all,
of the delinearization we want. At one point, I though that
he couldn’t handle coupled subscripts that had been linearized,
but I’ll have to re-create my examples.

To get the best results, I thought we should change the GEP
(or have a fancy version in addition to the current one).

Preston

While I normally argue to leave the IR as it is (since it's a compiler
IR, not a magical one), I can see some trends going on that should not
be ignored.

This is one example, where the front-end bends its knees to generate
IR that LLVM understands, so that you can revert it to what should
have been in the first place (to analyse parallelism) and convert it
back again to "correct" IR. As you mentioned, there will be cases that
the analysis won't work, ex. when receiving an array from a function,
it could look like an opaque pointer in some architectures.

Last month, in the Cambridge LLVM Social, David Chisnall asked me
about a builder that would validate procedure call standards depending
on the target, so that the front-end could use that to build the
horrible messes we end up doing in that scenario. Another idea is to
create a pass that will convert from high-level function declaration
to low-level target-dependent declaration during the validation phase,
so the front-end produces "good-looking" calls (with types as-is) and
the pass makes them target-valid.

This technique could also apply to your case. If the front-end
supports multi-dimensional access and produces code conforming to
that, your pass can analyse and vectorize. If not, no worries, you
just ignore them. Which means not all front-ends must change at the
same time to accommodate your analysis.

Later on, a validation pass would transform all multi- to
single-dimensional array access and produce IR that the back-ends can
consume.

It might not be the easiest route, but would avoid some dead-ends down
the road...

Here we have to be careful because when there are several accesses to the

same

array (base pointer) not only the coefficients (8, 8*%o, 8*%o*%m above)

have to

match but also the range of the subscripts must be right (globally over

all

arrays). For example with two accesses

A[i+4][j-4][k-7]
A[...][i-k][3*j]

we'd need to check that

  max{j-4, i-k} - min{j-4, i-k} <= %m - 1
  max{k-7, 3*j} - min{k-7, 3*j} <= %o - 1

An interjection from the peanut gallery...

One thing that would be nice would be a way to indicate in LLVM instructions
that _by construction_ a given variable will be a valid index for a given
dimension of a given array. This isn't so much use in C-style languages
where arrays don't have size data canonically associated with them, but for
languages with constructs like (in the simplest case)

for i=0:dimension(A,0): acc+=A[i]

or other more complicated "index generator" expressions like generating
indices from stencils. At the moment (as I understand the LLVM IR) a
language can only specify that an entire linearised access expression is
within the linearised bounds. This would provide a way to hopefully boost
the "safe to apply" rate of LLVM optimisers like polly for languages that
are amenable to such constructs. This presumably requires a way to link
array pointers, dimensions, bounds and indexes so it's not obvious how to do
it within the LLVM context of course... (There's also the issue of exposing
transitivity: the source language may know i is a by construction index into
A, and B is the same shape as A hence i is also guaranteed to be
within-bounds index into B.)

Just a vague thought,
Regards,
Dave

> I personally would first have a look at approach '2'.

While I normally argue to leave the IR as it is (since it's a compiler
IR, not a magical one), I can see some trends going on that should not
be ignored.

This is one example, where the front-end bends its knees to generate
IR that LLVM understands, so that you can revert it to what should
have been in the first place (to analyse parallelism) and convert it
back again to "correct" IR. As you mentioned, there will be cases that
the analysis won't work, ex. when receiving an array from a function,
it could look like an opaque pointer in some architectures.

I agree that this seems suboptimal: information known to the frontend
is lost, and then must be guessed by the backend. This might also be a
case where metadata is helpful.

I also think that it is important to keep in mind that this particular
case is one in which guessing is important. This is because there is a
lot of existing C/C++ code which does manual indexing of
multidimensional arrays, and we should try to support iteration-space
transformations involving those arrays as well.

Last month, in the Cambridge LLVM Social, David Chisnall asked me
about a builder that would validate procedure call standards depending
on the target, so that the front-end could use that to build the
horrible messes we end up doing in that scenario. Another idea is to
create a pass that will convert from high-level function declaration
to low-level target-dependent declaration during the validation phase,
so the front-end produces "good-looking" calls (with types as-is) and
the pass makes them target-valid.

This technique could also apply to your case. If the front-end
supports multi-dimensional access and produces code conforming to
that,

Do you mean that there is some kind of canonical form that all of the
frontends will use, and that LLVM provides some kind of utility for
generating this canonical form?

-Hal

Hal Finkel <hfinkel@anl.gov> writes:

> I personally would first have a look at approach '2'.

While I normally argue to leave the IR as it is (since it's a compiler
IR, not a magical one), I can see some trends going on that should not
be ignored.

This is one example, where the front-end bends its knees to generate
IR that LLVM understands, so that you can revert it to what should
have been in the first place (to analyse parallelism) and convert it
back again to "correct" IR. As you mentioned, there will be cases that
the analysis won't work, ex. when receiving an array from a function,
it could look like an opaque pointer in some architectures.

I agree that this seems suboptimal: information known to the frontend
is lost, and then must be guessed by the backend. This might also be a
case where metadata is helpful.

I also think that it is important to keep in mind that this particular
case is one in which guessing is important. This is because there is a
lot of existing C/C++ code which does manual indexing of
multidimensional arrays, and we should try to support iteration-space
transformations involving those arrays as well.

Last month, in the Cambridge LLVM Social, David Chisnall asked me
about a builder that would validate procedure call standards depending
on the target, so that the front-end could use that to build the
horrible messes we end up doing in that scenario. Another idea is to
create a pass that will convert from high-level function declaration
to low-level target-dependent declaration during the validation phase,
so the front-end produces "good-looking" calls (with types as-is) and
the pass makes them target-valid.

This technique could also apply to your case. If the front-end
supports multi-dimensional access and produces code conforming to
that,

Do you mean that there is some kind of canonical form that all of the
frontends will use, and that LLVM provides some kind of utility for
generating this canonical form?

-Hal

This all looks like a common problem of single IR, this will happen in
every compiler that uses single centralised IR. It's usually easier to
describe a problem in terms of higher level representation that is even
tailored for the domain or a specific language. Meta data in IR
workarounds many of these problems, but not all of them.

Some of the compilers do employ this strategy of gradual rewrites, but
not all of them. LLVM IR hits the ultimate sweet spot of being flexible
enough and high level enough for most cases, but this discussion shows
that we either want to grow it up to the mid end or down towards the
machine.

There is an awful lot of different classical IRs: LLVM SSA IR, CPS, C--,
various RTLs etc. The domain specific optimisations can be performed,
and then these IRs can be lowered into something else. In the end the
resulting RTL should be only as abstract to decouple from the machine,
but yet contain enough information for efficient instruction selection.

Sounds wonderful but only in theory!

two cents,