I’ve tried to search invariant.load and invariant.start/end uses but it seems to be mostly in “ignore”/propagate style helper functions.
It looks like if there is any call that’s not readonly on the path (in the simplest case), there is no sinking for this example, even though it seems to me it should be allowed since the load is marked as invariant.
Ping. I know that there are SinkingPass (Sink.cpp) and GVNSink which don’t appear to be used anywhere (in GVNSink it’s disabled by default, IIRC due to soundness issues), but they don’t optimize the examples above either.
I see a test/Transforms/Sink/invariant-load.ll, which has a test case, although I’m not entirely sure how it’s different from your test cases. A quick find-references pulls up the AMDGPU backend, and the sink pass seems to be used there. Perhaps cc @nikic and @arsenm for more information?
Sinking common instructions in predecessors into the successors. In the default pipeline this is done by SimplifyCFG. Outside the default pipeline GVNSink does this. This is not the case you are interested in.
Sinking instructions to their use in a successor block. In the default pipeline this is done by InstCombine. Outside the default pipeline Sink does this.
The short-term solution to sinking problems is to extend sinking in InstCombine – but there is a sharp limit to the amount of extension we’re going to accept.
Longer term, we should consider enabling the Sink pass or something like it. I’m not familiar with the history here of why we ended up with sinking in InstCombine. If more sophisticated analysis (like AA) is needed to perform sinking, a separate pass is a better bet.
Regarding invariant loads, we have three mechanisms to encode these.
!invariant.load is a hard global requirement, and very easy to support.
invariant.start/invariant.end are architecturally very hard to support. It’s not really possible to efficiently query these. We have little to no optimizations for them – I think the only thing these are used for is an open-ended invariant.start when constructing global constants. I’d consider phasing these out (at least the invariant.end part).
!invariant.group and related intrinsics. I believe this is the preferred mechanism for non-global invariance. But I think the design is more targeted at CSE, I’m not sure it’s really suitable for sinking purposes.
It’s true that it shouldn’t be hard to support !invariant.load, however for my use case I needed a finer grained solution. I need to sink values that are “mostly invariant”, i.e. they can’t be changed by IR directly but certain calls and intrinsics can “clobber” them. Since I didn’t find a way to encode this property in LLVM IR that was also supported by existing passes (as I’ve mentioned in the first post - load sinking appears to be rather basic and either disabled or doesn’t really take advantage of the invariant properties) I ended up going with a custom downstream pass for these specific purposes plus adjustments to AA. Would be nice if it could be encoded upstream. I’d imagine something like C++ coroutines might take advantage of such info for TLS variables that behave similarly.