getelementptr and SCEVs

I'm experimenting a bit with the SCEV code and one of the problems I'm
facing is that there's no way to represent a getelementptr operation as a
SCEV. The obvious way to do this in the existing SCEV framework is to add
a new SCEV subclass. I started doing that, but then I decided that it would
be nice to be able to split multiple-index getelementptrs into separate
SCEV objects that each cover a single index, so that it would be easy to do
things like recognize common prefixes. That turns out to be tricky because
the first index doesn't behave like the others.

Given an instruction like this (abbreviated syntax..):

x = getelementptr Z, a, b, c, d

the rewritten form looks (effectively) like this:

m = getelementptr Z, a
n = getelementptr m, 0, b
o = getelementptr n, 0, c
x = getelementptr o, 0, d

and the first index is clearly special. It's not possible to represent each
of these as uniform two-operand operations without some additional way to
tell which one was a "first" index.

Does anyone have any suggestions on the best way to procede? It may be
easier at this point to go back to a multiple-index getelementptr SCEV
class, and just write the code to work with it.

Dan

I'm experimenting a bit with the SCEV code and one of the problems I'm
facing is that there's no way to represent a getelementptr operation as a
SCEV.

Right. SCEV is really designed to expression integer equations.

The obvious way to do this in the existing SCEV framework is to add
a new SCEV subclass. I started doing that, but then I decided that it would
be nice to be able to split multiple-index getelementptrs into separate
SCEV objects that each cover a single index, so that it would be easy to do
things like recognize common prefixes. That turns out to be tricky because
the first index doesn't behave like the others.

Given an instruction like this (abbreviated syntax..):

x = getelementptr Z, a, b, c, d

the rewritten form looks (effectively) like this:

m = getelementptr Z, a
n = getelementptr m, 0, b
o = getelementptr n, 0, c
x = getelementptr o, 0, d

and the first index is clearly special. It's not possible to represent each
of these as uniform two-operand operations without some additional way to
tell which one was a "first" index.

Does anyone have any suggestions on the best way to procede? It may be
easier at this point to go back to a multiple-index getelementptr SCEV
class, and just write the code to work with it.

It really depends on what you're trying to do. Can you say what optimization of code xform you're trying to accomplish?

As I mentioned before, SCEV is really designed for integer expressions, not pointer expressions. Fortunately, all pointers can be treated as integers, and GEP is really just doing some integer arithmetic. Given something like this:

P = getelementptr i32* %A, i32 %Idx

The result pointer is actually equivalent to (on a 32-bit machine):

   (i32*)((i32)%A + Idx*4)

Likewise, all other GEP expressions can be similiarly decomposed into a simple sum of scaled indexes and offsets. If this sort of representation is sufficient for your needs, I'd suggest:

1. Insert a ptrtoint cast into the code that converts the base pointer to
    intptr_t.
2. Build up your SCEV with this cast as a SCEVUnknown value and the
    indices etc as simple arithmetic ops.

This should allow you to do things like factoring common expressions across complex indices etc.

-Chris

> Does anyone have any suggestions on the best way to procede? It may be
> easier at this point to go back to a multiple-index getelementptr SCEV
> class, and just write the code to work with it.

It really depends on what you're trying to do. Can you say what
optimization of code xform you're trying to accomplish?

Well, primarily I'm just trying to learn a few things :-).
At the moment I'm working on a pass to do simple prefetching, using SCEVs
to look at how the addresses of load instructions evolve in loops. Beyond
that, I'm interested in looking at loop memory footprints in general.

As I mentioned before, SCEV is really designed for integer expressions,
not pointer expressions. Fortunately, all pointers can be treated as
integers, and GEP is really just doing some integer arithmetic. Given
something like this:

P = getelementptr i32* %A, i32 %Idx

The result pointer is actually equivalent to (on a 32-bit machine):

   (i32*)((i32)%A + Idx*4)

I guess that'll work, though it makes working with multidimensional arrays
trickier. However, this kind of thing may be unavoidable in the case of
multidimensional arrays with variable extents anyway, since LLVM's type
system doesn't represent that directly.

On a related note, I'm surprised there's no sign-extend SCEV. Is that
intentional or is it just not done yet?

Dan

It really depends on what you're trying to do. Can you say what
optimization of code xform you're trying to accomplish?

Well, primarily I'm just trying to learn a few things :-).

Ah, very good.

At the moment I'm working on a pass to do simple prefetching, using SCEVs
to look at how the addresses of load instructions evolve in loops. Beyond
that, I'm interested in looking at loop memory footprints in general.

Okay, there are two different ways to do this. If you want to do this in the context of the loop optimizer, retaining the getelementptr instructions is important. If you're doing this in preparation to run the code generator (like the loop strength reduction pass) this doesn't matter. In the later case, treating pointers like integers, and lowering addressing to explicit arithmetic makes sense.

As I mentioned before, SCEV is really designed for integer expressions,
not pointer expressions. Fortunately, all pointers can be treated as
integers, and GEP is really just doing some integer arithmetic. Given
something like this:

P = getelementptr i32* %A, i32 %Idx

The result pointer is actually equivalent to (on a 32-bit machine):

   (i32*)((i32)%A + Idx*4)

I guess that'll work, though it makes working with multidimensional arrays
trickier. However, this kind of thing may be unavoidable in the case of
multidimensional arrays with variable extents anyway, since LLVM's type
system doesn't represent that directly.

Right.

On a related note, I'm surprised there's no sign-extend SCEV. Is that
intentional or is it just not done yet?

There wasn't a need for it, so it wasn't done yet :slight_smile:

-Chris