Thank you for your reply!
It is not that our target-specific passes simply unlock optimization opportunities. The problem is rather that we can only create some target-specific passes that work correctly on “unoptimized” IR.
An example and a bit more explanation:
our accelerator cannot do arbitrary integer arithmetic. But it has hardware loops, and its load, store instructions are coupled to the hardware loops, and contain additional information about where to find the data. So the address calculation itself is done in hardware.
In this example:
for (int x = 0; x < 3; x += 2)
{
field_out[x] = field_in[x + 1];
}
we would like to create a load instrinsic instruction (within an IR pass), that looks schematically like load (base address from field_in, connected loop generator x, with an offset of 1)
. We can get the information that x
is a loop variable from some loop analysis pass. After that we have to find the offset. A part of the IR of the above source code might look like:
%x = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%add = add nsw i32 %x, 1
From the add
we can conclude that there is an offset of 1. The scheme works because we have a narrow range of allowed source code, and we learned from looking at the unoptimized IR how to map the code to the accelerator. After we gathered this information, created our load
intrinsic, and erased no longer needed instructions, we are fine running some optimization passes (but only such optimization passes from which we know that those do not interfere with the remaining part of the work to be done).
If we would however optimize the above IR with the instcombine
pass before introducing our load
intrinsic, the new IR might look like:
%x = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%add = or i32 %x, 1
This means that after this optimization we cannot just search for an add
(or a sub
) to get the information about the offset, but we have also to search for an or
.
So if we would run an optimization pipeline we would need to be able to recognize an unclear set of potential patterns. With the knowledge we have gathered since begin of the project we might now be able to change this approach (and there will be attempts to generate code from optimized IR), but changing the historically taken approach is a high barrier we have to take. In the meanwhile we have to keep things running.
Solution attempts:
So maybe we could take your approach and inject our passes (that are currently part of the codegen pipeline) into an otherwise empty optimization pipeline? Or maybe at the beginning of an “customized” optimization pipeline?
Another approach that came into our mind is, that we just try to run our passes with the new pass manager. From here llvm-project/llvm/docs/NewPassManager.rst at main · llvm/llvm-project · GitHub I get that the new PM cannot handle the whole codegen pipeline. Might be a construct possible in which the new PM is used to run the IR passes at the beginning of the codegen pipeline, and the legacy pass manager is used to run the machine passes?
Thank you.