[RFC] A new multidimensional array indexing intrinsic

Hello,

We would like to begin discussions around a new set of intrinsics, to better express
multi-dimensional array indexing within LLVM. The motivations and a possible design
are sketched out below.

Rendered RFC link here

Raw markdown:

First, I agree that there is a need for better multidimensional support.

I (only) browsed the RFC (it is long) and I have a high-level question:

Why introduce a new intrinsic (family)? It seems that would require us
to support GEPs and GEP + "multi-dim" semantics in various places. What is
the benefit over a GEP extension?

(I also added a comment below regarding you C example.)

Thanks,
  Johannes

Hello,

We would like to begin discussions around a new set of intrinsics, to
better express
multi-dimensional array indexing within LLVM. The motivations and a
possible design
are sketched out below.

Rendered RFC link here
<https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md>

Raw markdown:

# Introducing a new multidimensional array indexing intrinsic

## The request for change: A new intrinsic, `llvm.multidim.array.index.*`.

We propose the addition of a new family of intrinsics,
`llvm.multidim.array.index.*`.
This will allow us to represent array indexing into an array `A[d1][d2][..][dk]`
as `llvm.multidim.array.index.k A d1, d2, d3, ... dk` instead of flattening
the information into `gep A, d1 * n1 + d2 * n2 + ... + dk * nk`. The former
unflattened representation is advantageous for analysis and
optimisation. It also
allows us to represent array indexing semantics of languages such as Fortran
and Chapel, which differs from that of C based array indexing.
  
## Motivating the need for a multi-dimensional array indexing intrinsic

There are primarily one of two views of array indexing involving
multiple dimensions most languages take, which we discuss to motivate
the need for a multi-dimensional array index. This consideration impacts
the kinds of analysis we can perform on the program. In Polly, we care about
dependence analysis, so the examples here will focus on that particular problem.

Let us consider an array indexing operation of the form:

int ex1(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {
	__builtin_assume(x1 != x2);
	__builtin_assume(y1 != y2);
	B[x1][y1] = 1;
	printf("%d", B[x2][y2]);
	exit(0);
}

One would like to infer that the array indices _interpreted as tuples_
`(x1, y1)` and `(x2, y2)` do not have the same value, due to the
guarding asserts
that `x1 != x2` and `y1 != y2`. As a result, the write `B[x1][y1] = 1` can
in no way interfere with the value of `B[x2][y2]`. Consquently,
we can optimise the program into:

int ex1_opt(int n, int m, B[n][m], int x1, int x2, int y1, int y2) {
    // B[x1][y1] = 1 is omitted because the result
    // cannot be used:
    //  It is not used in the print and then the program exits
	printf("%d", B[x2][y2]);
	exit(0);
}

However, alas, this is illegal, for the C language does not provide
semantics that allow the final inference above. It is conceivable that
`x1 != x2, y1 != y2`, but the indices do actually alias, since
according to C semantics, the two indices alias if the _flattened
representation of the indices alias_. Consider the parameter
values:

n = m = 3
x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3

Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
so the transformation proposed in `ex1_opt` is unsound in general.

I'm unsure your example actually showcases the problem:

C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression a[1][7]
given the declaration inta[4][5]) (6.5.6).

Why introduce a new intrinsic (family)? It seems that would require us
to support GEPs and GEP + "multi-dim" semantics in various places. What is
the benefit over a GEP extension?

Adding an intrinsic is easier than adding or extending an existing
instruction, as suggested by
https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning

Extending GEP would require all passes to understand the added
semantics on day 1, while a new intrinsic allows us to gradually add
support.
We can still make the intrinsic into an instruction when existing
passes understand the new functionality.

> However, alas, this is illegal, for the C language does not provide
> semantics that allow the final inference above. It is conceivable that
> `x1 != x2, y1 != y2`, but the indices do actually alias, since
> according to C semantics, the two indices alias if the _flattened
> representation of the indices alias_. Consider the parameter
> values:
>
> ```
> n = m = 3
> x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
> x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3
> ```
>
> Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
> so the transformation proposed in `ex1_opt` is unsound in general.

I'm unsure your example actually showcases the problem:

C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression a[1][7]
given the declaration inta[4][5]) (6.5.6).

GEP requires an array type, which in LLVM has static size. For
dynamically-sized array (In C/C++ language family, this is only
supported by C99 VLAs), clang emits the the linearized index
expression followed by a single-dimensional GEP.

The multi-dimensional GEP instruction has this semantics with the
`inrange` modifier, which clang currently does not generate for
subscript expressions. Given that there might be code around relying
on this behavior, and the C++ standard not containing this paragraph,
I am not sure we can change it to undefined behavior, even if the C
standard would permit it. I think this is a separate discussion since
the main motivation of the RFC are languages that have broader use of
runtime-length arrays.

But for the showcase, you are right in that according to the C
standard, this aliasing might be undefined behavior.

Michael

> Why introduce a new intrinsic (family)? It seems that would require us
> to support GEPs and GEP + "multi-dim" semantics in various places. What is
> the benefit over a GEP extension?

Adding an intrinsic is easier than adding or extending an existing
instruction, as suggested by
https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning

They compare *new intrinsic* vs *new instruction*, not *new intrinsic*
vs *extending an instruction*. Also, this is motivated as a missing core
functionality, the discussion if this should have native support from
day one should take place even if it means more work.

Extending GEP would require all passes to understand the added
semantics on day 1, while a new intrinsic allows us to gradually add
support. We can still make the intrinsic into an instruction when
existing passes understand the new functionality.

Here, extending GEP means *restricting* the semantics. So, we could add
it in a way that would not require any pass to understand the semantics,
e.g., we drop "multi-dim" whenever a GEP is modified, cloned, etc.
Preferably, we would change the locations that modify GEPs (inplace),
which I believe is reasonable.

Arguably, for any real use case, you need to use extended GEPs or modify
quite a few passes to get the performance you want. Everything that
analyzes GEPs nowadays needs to be aware of the intrinsics because the
pointer is otherwise opaque, which makes it basically impossible to
transform.

If modifying the GEP is not viable, for any reason, we should
investigate ways that allows to use GEPs anyway. One method could be
to translate the following code into the IR shown below.

type A[n][m];
x = A[i][j];

%ptr = gep %A, %i, %j
%md_ptr = llvm.multidim.access(%A, %n, %m, %i, %j)
%cmp = icmp eq %ptr, %md_ptr
llvm.assume(%cmp)
%x = load %ptr

or maybe:

%ptr = gep %A, %i, %j
%md_ptr = llvm.multidim.access(%A, %n, %m, %i, %j)
%x = load %ptr, mutli-dim %md_ptr

The restrictions of this new intrinsic seem very similar to those of the inrange attribute on GEPs. It seems that the main advantage of your proposal is that it would allow for non-constant strides (i.e. variable length arrays) in dimensions other than the first one. Do these appear frequently enough in the programs that you’re interested in to be worth optimizing for?

Peter

We could also simply extend the existing inrange mechanism to non-constantexpr GEPs. It would remove an inconsistency in the semantics, be relatively straight forward, and solve the motivating example.

(I didn’t read the proposal in full, so there may be other examples it doesn’t solve.)

Philip

It seems that the main advantage of your proposal is that it would allow for non-constant strides (i.e. variable length arrays) in dimensions other than the first one. Do these appear frequently enough in the programs that you're interested in to be worth optimizing for?

Yes - at least in Chapel (which is one of the motivating languages)
these are very common.
In other words, typical Chapel programs use arrays with a size that is
not fixed at compile time.

Thanks,

-michael

Agreed. I'd very much prefer a GEP extension over an intrinsic solution.

Is it even possible to implement the original proposal using an intrinsic? I'm not sure how to interpret part of the description. It says the syntax for the intrinsic would be this:

<result> = llvm.multidim.array.index.* <ty> <ty>* <ptrval> {<stride>, <idx>}*

It isn't clear to me what that means. The later example is also in a somewhat generalized form:

  %arrayidx = llvm.multidim.array.index.* i64 i64* %A, %str_1, %idx_1, %str_2, %idx_2

Trying to expand this into something concrete it looks to me like the extra value-less type argument ('i64' immediately following the intrinsic name) won't work, and if I'm reading it correctly that's a necessary element. The GEP instruction accepts a raw type as a pseudo-argument. It does this to prepare for the coming transition to opaque pointers, and it can do it because it's a first class instruction. However, an intrinsic can't do this. And yet without it we won't know what to use for the GEP's source element type "argument" after pointer's lose their pointee type.

-Andy

Intrinsics can return `llvm_any_ty` (Intrinsics.td). In that case the
return type is added as a suffix to the intrinsic's name, i.e. the
syntax in the RFC is not 100% the syntax for intrinsics. Same for the
parameters which each must have their types explicitly mentioned.

Michael

I don't think the return type has anything to do with what I was asking about. Perhaps seeing a concrete, fully-legal IR representation of what's being proposed would help.

Here's a best guess, switching the indexed base type to a non-integer for illustration purposes:

%arrayidx = call i64 @llvm.multidim.array.index.i64.p0f64.i64.i64.i64.i64 double* %A, i64 %str_1, i64 %idx_1, i64 %str_2, i64 %idx_2

According to the RFC, that would get lowered to this:

  %mul1 = mul nsw i64 %str_1, %idx_1
  %mul2 = mul1 nsw i64 %str_2, %idx_2
  %total = add nsw i64 %mul2, %mul1
  %arrayidx = getelementptr inbounds double, double* %A, i64 %total, !multidim !1

The problem I'm having is that the source element type in the GEP instruction in the lowering can only be inferred from the pointer type, which is going away at some point. When pointers become typeless, the call will look something like this:

%arrayidx = call i64 @llvm.multidim.array.index.i64.p0.i64.i64.i64.i64 void* %A, i64 %str_1, i64 %idx_1, i64 %str_2, i64 %idx_2

Do you see what I'm saying? Have I misunderstood something?

-Andy

After having spoken to Johannes, I think we had a classic
misunderstanding on what "extending" means.

1.
The most obvious why for me was changing GEP to allow variable-sized
multi-dimensional arrays in the first argument, such as

    %1 = getelementptr double, double* %ptr, inrange i64 %i, inrange i64 %j

(normally GEP would only allow a single index argument for a
pointer-typed base pointer).
Since %A has variable size, there is not enough information to compute
the result, we need to pass at least the stride of the innermost
dimension, such as:

    %1 = getelementptr double, double* %A, inrange i64 %i, inrange i64
%j, i64 %n

It should be clear that this breaks a lot of assumptions then went
into writing code that handle GEPs, not only the number of arguments
the GEP should have. As a result, existing code may trigger
assertions, crash, or miscompile when encountering such a modified
GEP. I think it is unfeasible to change all code to properly handle
the new form at once.

2.
Johannes' interpretation is to add some kind of metadata to GEPs, in
addition to the naive computation, such as:

    %offset1= mul i64. %i, %n
    %offset2 = add i64, %j, %offset1
    %1 = getelementptr double, double* %A, inrange i64 %offset2 [
"multi-dim"(i64 %n) ]

This was not obvious to me. The code above uses operand bundle syntax.
During our discussing for this RFC we briefly discussed metadata,
which unfortunately do not allow referencing local SSA values.

3.
For completeness, here is Johannes other suggestion without modifying
GEP/Load/Store:

    %offset1= mul i64. %i, %n
    %offset2= add i64, %j, %offset1
    %1 = getelementptr double, double* %A, inrange i64 %offset2
    %2 = llvm.multidim.access.pdouble.pdouble.i64.i64.i64.i64(double*
%A, i64 %n, i64 %m, i64 %i, i64 %j)
    %cmp = icmp eq %1, %2
    call void @llvm.assume(i1 %cmp)

The main motivation is to avoid that unprepared analyses such as
BasicAA have to give up when encountering the new intrinsic. 2. and 3.
only add things around as they were before such multidimensional
accesses.

Subjectively, I don't think that having to give up is such a big
problem. In the beginning, there will be no frontend that generates
the new intrinsic. More important passes such as BasicAA and
ScalarEvolution can be adapted before even Chapel would emit such
intrinsics. In the long term I would indeed try to fuse it with GEP
when all there is to do is changing the code determining whether it is
a multi-dimensional intrinsic to determining whether it has
variable-length form.

Michael

Hi,

I think I understand what the problem is. The return type will also be
an untyped pointer, just like GEP. However, GEP needs to know what the
size of one element is. Like it is now with overloadable intrinsics,
we might derive it from the suffix. Another solution is that we pass
the element and array sizes in bytes, instead of multiples of the
element size. As a third options, we might change and auto-upgrade the
intrinsic to a proper instruction when the change comes.

Michael

Why introduce a new intrinsic (family)? It seems that would require us
to support GEPs and GEP + "multi-dim" semantics in various places. What is
the benefit over a GEP extension?

Adding an intrinsic is easier than adding or extending an existing
instruction, as suggested by
https://llvm.org/docs/ExtendingLLVM.html#introduction-and-warning

In this case, that's probably bad advice.

Extending GEP would require all passes to understand the added
semantics on day 1, while a new intrinsic allows us to gradually add
support.
We can still make the intrinsic into an instruction when existing
passes understand the new functionality.

However, alas, this is illegal, for the C language does not provide
semantics that allow the final inference above. It is conceivable that
`x1 != x2, y1 != y2`, but the indices do actually alias, since
according to C semantics, the two indices alias if the _flattened
representation of the indices alias_. Consider the parameter
values:

n = m = 3
x1 = 1, y1 = 0; B[x1][y1] = nx1+y1 = 3*1+0=3
x2 = 0, y2 = 3; B[x2][y2] = nx2+y2 = 3*0+3=3

Hence, the array elements `B[x1][y1]` and `B[x2][y2]` _can alias_, and
so the transformation proposed in `ex1_opt` is unsound in general.

I'm unsure your example actually showcases the problem:

C standard, N1570 draft, Page 560, Appendix J.2 Undefined Behavior:

An array subscript is out of range, even if an object is apparently
accessible with the given subscript (as in the lvalue expression a[1][7]
given the declaration inta[4][5]) (6.5.6).

GEP requires an array type, which in LLVM has static size.

This is false. See the last paragraph in http://llvm.org/docs/LangRef.html#array-type

For
dynamically-sized array (In C/C++ language family, this is only
supported by C99 VLAs), clang emits the the linearized index
expression followed by a single-dimensional GEP.

This could be changed if desired.

Er, why not use our existing first class array types?

getelementptr double, [0 x [0 x double]]* %ptr, inrange i64 %i, inrange i64 %j

Philip

This is lowered to the equivalent of

      ((double*)ptr) + i*0 + j

where we want

      ((double*)ptr) + i*n + j

Michael

OK, I think we're on the same page now.

Do we really have code that uses the suffix of an intrinsic for semantic information? That seems like a bad idea. My understanding was that the suffix was just there to provide a unique name and the function signature took care of everything else.

Implementing an intrinsic that later got promoted to a new instruction might be a possibility. Does anyone have a sense for how far out typeless pointers are?

Can someone put together some concrete examples that fully describe how this would look in terms of both the intrinsic proposal and the GEP extension you worked out with Johannes?

Thanks,
Andy

Do we really have code that uses the suffix of an intrinsic for semantic information? That seems like a bad idea. My understanding was that the suffix was just there to provide a unique name and the function signature took care of everything else.

I agree that this is also not my preferred option, but more an
illustration that the information is not lost. If, for any reason, we
would still use an intrinsics for this purpose, we can pass the byte
of the element type as an argument. In any case, I don't think this is
a critical issue.

Implementing an intrinsic that later got promoted to a new instruction might be a possibility. Does anyone have a sense for how far out typeless pointers are?

I haven't seen any discussion about this for a long time and I would
not count on that this change will be made in the foreseeable future.
Similar to the idea of basic block arguments instead of PHI nodes.

Can someone put together some concrete examples that fully describe how this would look in terms of both the intrinsic proposal and the GEP extension you worked out with Johannes?

The closest we have is the RFC document
(https://github.com/bollu/llvm-multidim-array-indexing-proposal/blob/master/RFC.md).
We are in the RFC phase to explore the problem and design space. Do
you have any concrete questions you would like the be answered?

Michael

Mmmh, looks like Tim Northover is actively working on typeless/opaque
pointers, e.g. https://reviews.llvm.org/D64203

Michael

I just have ideas how we could preserve more information, no fully
described examples. While they are not thought through, they might help
to explore the space here:

Access-centric:
- new "operand/metadata" to encode the multi-dim coordinate:
     %coord_3D = llvm.coord.3d(%N, %M, %K, %i, %j, %p)
     %val = load %p, [coord] %coord_3D
   Benefits: pointer computation is not disturbed, even if it is
             transformed, e.g., part of the pointer computation is
             hoisted and coalesced. Annotated operations are known to
             be UB if the bounds are violated.
   Caveats: New load operand need to be hidden, e.g., as operand bundle
   operands for calls. Alternatively, coord can be passed as metadata.
- new "operand/metadata" to encode a shape:
     %shape_3D = llvm.shape.3d(%N, %M, %K)
     %val = load %p, [shape] %shape_3D
   Benefits: see above. When a shape and `inrange` is provided it
             applies to all dimensions of the shape.
   Caveats: see above. Reconstruction of coordinate needs to happen
            still. Though, statically complex parts, e.g., guessing the
            shape and computing when offsets would be "out-of-range",
            is much simpler.

Type-based:
- Have multi-dimensional variable sized arrays. This was discussed in
   various contexts and I cannot summarize all the pros and cons.

GEP-centric:
- an intrinsic-based solutions applied to GEPs
- add the multi-dim coordinate inputs directly to GEPs (no intrinsic)
- build the coordinate as above and assume it equal to the GEP:
     call @llvm.assume(icmp eq %gep, %coord_3D)

I think sth. type-based would the cleanest but also most complicated
solution. Coordinate-based solutions are probably the simplest. Neither
of the above solutions should be too intrusive wrt. unaware passes.