Hi Hal,

you are right. Preston's dependence-analysis and Polly are sharing the same

problem. As far as I know, non of us implemented a solution yet.

Preston, what is your status on multi-dim arrays?

As stated earlier the problem is that access functions to multi-dimensional arrays

are linearized at LLVM-IR level. Unfortunately it is a lot harder to reason about

linearized accesses. Hence, we want to recover the multi-dimensional access function.

In LLVM-IR words: When we currently analyze a multi-dimensional

variable length memory access, we calculate a single SCEV expression

which describes the memory access. However, we would prefer to have for

each array dimension a separate SCEV* expression. Those individual expressions

will be a lot easier to analyze.

I think there are two major directions we can go:

1) Represent multi-dimensional access functions in LLVM-IR

Instead of linearizing multi-dimensional array accesses to single

dimensional array accesses, we extend LLVM-IR such that it can express

multi-dimensional accesses and we ensure that front-ends directly

generate LLVM-IR multi-dimensional accesses.

2) Recover multi-dimensionality from linearized accesses

We take a linearized single-dimensional SCEV* expression, apply some

magic and recover both the number and size of the original array

dimensions as well as a set of SCEV* expressions describing the individual

access functions of the original dimensions.

Both approaches have advantages/disadvantages:

Approach 1)

Pro:

- Provides exact information about multi-dimensional arrays,

if the multi-dimensionality is explicit in the source program

Contra:

- We need to explicitly express multi-dimensionality in LLVM-IR

- Front-ends need to explicitly generate the multi-dimensional IR

- Existing passes need to maintain and reason about the new IR

- Only C99 supports variable length arrays. Most existing C/C++

programs have a manual implementation of variable length

arrays, which cannot be automatically translated by the front-end.

Approach 2)

Pro:

- Does not distinguish between manually simulated multi-dimensional

arrays and multi-dimensional arrays that have been explicitly

expressed in the source code.

- Changes are limited to an LLVM-IR analysis

Contra:

- The general problem is difficult

- We may need to add some run-time checks as statically proving

the delinearization may not be entirely possible

I personally would first have a look at approach '2'. Access functions of

multi-dimensional follow often a very regular structure, which appears not

only for accesses generated by compiler front-ends, but also for

manual implementations of multi-dimensional arrays. One common pattern

is that there is a single parameter that represents the size of a

dimension [1]. Meaning SCEV* expressions describing access to an array "A[m][p]" will contain the sizes '%p' and '%m * %p' more or less explicit.

Here some examples I committed in r163619:

multidim_only_ivs_2d.ll:

; void foo(long n, long m, double A[n][m]) {

; for (long i = 0; i < n; i++)

; for (long j = 0; j < m; j++)

; A[i][j] = 1.0;

; }

; Access function: {{0,+,%m}<%for.i>,+,1}<nw><%for.j>

multidim_only_ivs_3d.ll:

; void foo(long n, long m, long o, double A[n][m][o]) {

; for (long i = 0; i < n; i++)

; for (long j = 0; j < m; j++)

; for (long k = 0; k < o; k++)

; A[i][j][k] = 1.0;

; }

; Access function: {{{0,+,(%m * %o)}<%for.i>,+,%o}<%for.j>,+,1}<nw><%for.k>

multidim_ivs_and_integer_offsets_3d.ll:

; void foo(long n, long m, long o, double A[n][m][o]) {

; for (long i = 0; i < n; i++)

; for (long j = 0; j < m; j++)

; for (long k = 0; k < o; k++)

; A[i+3][j-4][k+7] = 1.0;

; }

;

; Access function:

; {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,

; (8 * %o)}<%for.j>,+,8}<%for.k>

multidim_only_ivs_3d_cast.ll:

; void foo(int n, int m, int o, double A[n][m][o]) {

; for (int i = 0; i < n; i++)

; for (int j = 0; j < m; j++)

; for (int k = 0; k < o; k++)

; A[i][j][k] = 1.0;

; }

;

; Access function:

; {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,

; (8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>

multidim_ivs_and_parameteric_offsets_3d.ll:

; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) {

;

; for (long i = 0; i < n; i++)

; for (long j = 0; j < m; j++)

; for (long k = 0; k < o; k++)

; A[i+p][j+q][k+r] = 1.0;

; }

;

; Access function:

; {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+,

; (8 * %o)}<%for.j>,+,8}<%for.k>

Those examples are obviously not exhaustive, but they already cover many common

cases. Analyzing them seems to be possible. As you may realize the size of the

array dimensions are all explicitly available in the 'step' of the SCEVAddRec

repressions. Even in the last example which contains parameters both for the

sizes and the offsets, the parameters for the sizes are the only ones that

appear on the right hand sides ('8 * %m * %o' and '8 * %o').

I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses

the dimension sizes. After having written that, it should be possible to write

another SCEVIterator that given the dimension sizes can separate the different

dimensions of a SCEV.

I doubt we will always be able to prove that our guess is correct, but we could

add a run-time check to test the conditions that we can not prove statically. This is

also the point, where I think the front-end (or the user with attributes) could help.

Accesses could be annotated with meta-data providing the size of the array dimensions,

such that our analysis can start from this meta-data. This could allow our analysis

to take some short cuts and to avoid some run-time checks.

Does this sound like a reasonable approach? Any ideas or suggestions on that topic?

Cheers

Tobi

[1] FORTRAN seems to express array size as "max(0, %dim_size)"