IR optimizations and MemorySSA for special-purpose load/store instructions

Dear all,

our special-purpose target has load/store instructions which do not handle classical pointers to the accessed data in memory as argument, but instead we use a configuration of structured data volumes, and our load and store instructions are intrinsics which get indices in different data volume dimensions as argument.

For example, we might have the following loads and stores (IR roughly simplified for readability):

%1 = call float @spu.load(volume_0, 1, 2, 3)
// arithmetic...
call float @spu.store(%0, volume_1, 4, 5, 6)
// arithmetic...
%2 = call float @spu.load(volume_0, 0, 0, 0)
// arithmetic...
%3 = call float @spu.load(volume_0, 1, 2, 3)

This loads %1 from volume 0 with (x, y, z) coordinates (1, 2, 3), does some arithmetic, stores %0 to volume 1 at coordinates (4, 5, 6) and so on.

We have marked the intrinsics with IntrReadMem/IntrWriteMem so that in principle it is clear that they access memory.

Now, we want to do IR optimizations, e.g. eliminating redundant loads since the read memory has not changed. But since our intrinsics are no normal pointer based accesses, I think the standard LLVM passes for code simplification cannot prove that the accesse are redundant, since our intrinsics are seen as “black box global memory accesses”.

In the above example, it is clear from the semantics of our target that the store to volume 1 will not change any data in volume 0. Hence, The value loaded in %1 could be reused for %3 since the second time we do call float @spu.load(volume_0, 1, 2, 3) this will load the same value again. However, %2 will have to be loaded since it loads from different coordinates (0, 0, 0) instead of (1, 2, 3) in data volume 0.

Would it be possible to e.g. use MemorySSA in some way? If we could tell the underlying LLVM infrastructure how to figure out whether two memory accesses refer to the same data or not, it could be possible for us to employ existing passes in LLVM, which would be much easier for us than developing this ourselves.

Any hints would be greatly appreciated.

Best regards,

Kai

Hi,

kai_plociennik
February 4 |

  • | - |

Dear all,

our special-purpose target has load/store instructions which do not handle classical pointers to the accessed data in memory as argument, but instead we use a configuration of structured data volumes, and our load and store instructions are intrinsics which get indices in different data volume dimensions as argument.

For example, we might have the following loads and stores (IR roughly simplified for readability):

%1 = call float @spu.load(volume_0, 1, 2, 3)

// arithmetic...

call float @spu.store(%0, volume_1, 4, 5, 6)

// arithmetic...

%2 = call float @spu.load(volume_0, 0, 0, 0)

// arithmetic...

%3 = call float @spu.load(volume_0, 1, 2, 3)

This loads %1 from volume 0 with (x, y, z) coordinates (1, 2, 3), does some arithmetic, stores %0 to volume 1 at coordinates (4, 5, 6) and so on.

We have marked the intrinsics with IntrReadMem/IntrWriteMem so that in principle it is clear that they access memory.

Now, we want to do IR optimizations, e.g. eliminating redundant loads since the read memory has not changed. But since our intrinsics are no normal pointer based accesses, I think the standard LLVM passes for code simplification cannot prove that the accesse are redundant, since our intrinsics are seen as “black box global memory accesses”.

In the above example, it is clear from the semantics of our target that the store to volume 1 will not change any data in volume 0. Hence, The value loaded in %1 could be reused for %3 since the second time we do call float @spu.load(volume_0, 1, 2, 3) this will load the same value again. However, %2 will have to be loaded since it loads from different coordinates (0, 0, 0) instead of (1, 2, 3) in data volume 0.

Would it be possible to e.g. use MemorySSA in some way? If we could tell the underlying LLVM infrastructure how to figure out whether two memory accesses refer to the same data or not, it could be possible for us to employ existing passes in LLVM, which would be much easier for us than developing this ourselves.

It sounds like you need to add support for your custom intrinsics to AliasAnalysis (BasicAliasAnalysis is probably a good start). Most passes dealing with memory operations rely on the AA infrastructure to figure out how memory instructions interact.

The AA interfaces mostly deal with memory locations (Pointer + access size), so a challenge may be how to represent your custom locations. One possible alternative may be to separate address computation from the memory operations, which may make the handling in AA easier (e.g have something like @llvm.addr(volume_0, 1, 2, 3), which then creates a pointer/reference that feeds your custom loads/stores).

Have a look at the argmemonly function attribute.
Then why don’t you represent your pointers as:

%volume0 = global [42 x i8]  ;; some dummy array
%ptr = getelementptr i8 %volume_0, i32 1, i32 2, i32 3

Like this it’s obvious when pointer alias and when they don’t. Then your backend can skip creating those global arrays.
With that representation, you may not even need your spu.load functions. Just use play load/stores. You can recover the information you need from the getelementptr.

Thanks a lot for this hint, we will investigate this!

Thank you very much for your hint. We thought about using getelementptr this way, but from the documentation on it in the LLVM assembly language docs we thought this is only usable for fixed-size (at compile time) data volumes, since the docs say it is intended for aggregate data structures, not e.g. vectors. Since our data is runtime-allocated its size can only be determined at runtime. Would it nevertheless be possible to use it?

Ah sorry I think I maybe misunderstood you. So you say we use a dummy array whose dimensions don’t have anything to do with the real dimensions?

Yep, just use a dummy array with a big size. Ideally, it should be at least as large as the size of any dynamic allocation.
But if you don’t know that, it just has to be larger than any dereference with constant indexes, so LLVM doesn’t decide that access is out of bounds. If the indexes are not constant, then LLVM can’t conclude anything, so you’re safe.

BTW, the dummy array hack is assuming your volumes are allocated in the beginning of the program so that you don’t have to pass around those pointers. If not (e.g., you call some malloc for this), then you should pass the pointers around with the noalias attribute. Then those are guaranteed to not alias.

Read a bit about the based-on aliasing rules. These slides may help, but I’m sure there’s better stuff out there.

If you use an external global (or really any global without definitive initializer), then LLVM will not make assumptions about the maximum size of the object. An external global [0 x i32] in one module can be linked against an arbitrarily-large definition from a different module.

1 Like