Our frontend can guarantee that loads from globals are rematerializable and do not alias with any stores in any function in the given module. We'd like the optimization passes (and ideally the register allocator as well) to be able to use this fact. The globals are not constant "forever" but are constant during the calling of any given function in the module.
There seem to be two major ways to expose this to the optimization passes and code gen:
- build a custom alias analysis pass that indicates that these loads never alias with any stores in a given function
- declare these globals as external constants within the module
The former should give optimizations like LICM the freedom to move these loads around, allow them to be CSE'd, etc.
The latter should technically allow the same freedom to these optimizations, but doesn't currently seem to. Furthermore, the latter should give the RA enough information to rematerialize these loads instead of spilling them if necessary.
Below is a simple example module that illustrates this. It's just a memcpy loop copying between two external arrays. With unmodified TOT, opt -basicaa -licm for example will not move the invariant loads of @b and @a (into %tmp3 and %tmp5) out of the body of the for loop.
If I apply the patch found further down, LICM moves the loads out (as expected), but of course this is a fairly specific fix.
What's the right way to handle this? Should Basic AA handle this case? Will the RA be aware that it can remat these loads or do I need to do something else to allow it to know this? Will the scheduler be aware that it can reorder them?
Obviously I can also move the loads to the entry block of the function, but that does not address the RA/scheduling issues and is difficult to do in general due to some additional semantics in our frontend.
Thanks!
Stefanus
=== Example ===
@b = external constant float*
@a = external constant float*
define void @test(i32 %count) {
entry:
br label %forcond
forcond: ; preds = %forinc, %entry
%i.0 = phi i32 [ 0, %entry ], [ %inc, %forinc ] ; <i32> [#uses=4]
%cmp = icmp ult i32 %i.0, %count ; <i1> [#uses=1]
br i1 %cmp, label %forbody, label %afterfor
forbody: ; preds = %forcond
%tmp3 = load float** @b ; <float*> [#uses=1]
%arrayidx = getelementptr float* %tmp3, i32 %i.0 ; <float*> [#uses=1]
%tmp5 = load float** @a ; <float*> [#uses=1]
%arrayidx6 = getelementptr float* %tmp5, i32 %i.0 ; <float*> [#uses=1]
%tmp7 = load float* %arrayidx6 ; <float> [#uses=1]
store float %tmp7, float* %arrayidx
br label %forinc
forinc: ; preds = %forbody
%inc = add i32 %i.0, 1 ; <i32> [#uses=1]
br label %forcond
afterfor: ; preds = %forcond
ret void
}
=== Patch ===
Index: lib/Transforms/Scalar/LICM.cpp