I’m trying to match instructions that can load / store 2 non-adjacent elements at small, constant offsets from a common base pointer. My first thought would be to try doing this as a DAG combine, but this isn’t quite like any other existing combine.Other combines on loads and stores are looking for adjacent loads on the same chain (findConsecutiveLoad in PPC seems to be the closest thing I can find for what I’m trying to do). Is it safe to try to merge loads that are on totally independent chains? Can I collect all loads in a DAG, figure out which share a chain, and it would be safe to merge any pair of loads from any independent chains? Is there a better way to do this in the DAG?
My second thought to do this would be an IR pass inserting intrinsics for the load2 instructions. I’m not sure about doing this with a MachineInstr pass, since it seems like the lack of AA there might be problematic.
As an example, these 2 loads at constant offsets can be combined into a single load, but they end up on separate chains
@test_global_lds_offsets.foo = internal unnamed_addr addrspace(3) global [64 x i32] zeroinitializer, align 4
%2 = load i32 addrspace(3)* getelementptr inbounds ([64 x i32] addrspace(3)* @test_global_lds_offsets.foo, i32 0, i32 7), align 4, !tbaa !5
%3 = load i32 addrspace(3)* getelementptr inbounds ([64 x i32] addrspace(3)* @test_global_lds_offsets.foo, i32 0, i32 37), align 4, !tbaa !5