Using optimization passes in the codegen pipeline

Hello,

we are working on a compiler for an accelerator that has a very limited instruction set, and can handle only a very specific form of code. We currently use c++ source code, and compile it without optimizations (-O0). In our target machine we use the addIRPasses() hook to run IR passes that prepare the initial (unoptimized) IR for the code generation. During this stage we do some analysis, and run some optimization passes, but in a very “controlled” manner. We for example have to ensure that certain analysis is done before certain optimizations can be done (that is the reason why we use -O0).

So far we used for example addPass (createDeadStoreEliminationPass()); during the addIRPasses function. But we realized that the createDeadStoreEliminationPass handle was removed ([Passes] Remove some legacy passes · llvm/llvm-project@7c3c981 · GitHub) with the comment “These are part of the optimization pipeline, of which the legacy version is deprecated and being removed.”. As far as I understand “addIRPasses” is part of the codegen pipeline, which is still handled by the legacy pass manager.

We are know wondering: what is the correct way of using the DeadStoreEliminationPass during the codegen pipeline (i.e. inside the addIRPasses function)? Or is this just somehow a wrong construction of ours?

Feel free to correct me if I misunderstood something. Thanks.

Ideally all DSE would already have happened in the optimization pipeline. Is there a reason you want to run it in the codegen pipeline? If you’re running target-specific passes that unlock more optimization opportunities, you can inject them into the optimization pipeline with TargetMachine::registerPassBuilderCallbacks().

Thank you for your reply!

It is not that our target-specific passes simply unlock optimization opportunities. The problem is rather that we can only create some target-specific passes that work correctly on “unoptimized” IR.

An example and a bit more explanation:

our accelerator cannot do arbitrary integer arithmetic. But it has hardware loops, and its load, store instructions are coupled to the hardware loops, and contain additional information about where to find the data. So the address calculation itself is done in hardware.

In this example:

for (int x = 0; x < 3; x += 2)
{
    field_out[x] = field_in[x + 1];
}

we would like to create a load instrinsic instruction (within an IR pass), that looks schematically like load (base address from field_in, connected loop generator x, with an offset of 1). We can get the information that x is a loop variable from some loop analysis pass. After that we have to find the offset. A part of the IR of the above source code might look like:

%x = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%add = add nsw i32 %x, 1

From the add we can conclude that there is an offset of 1. The scheme works because we have a narrow range of allowed source code, and we learned from looking at the unoptimized IR how to map the code to the accelerator. After we gathered this information, created our load intrinsic, and erased no longer needed instructions, we are fine running some optimization passes (but only such optimization passes from which we know that those do not interfere with the remaining part of the work to be done).
If we would however optimize the above IR with the instcombine pass before introducing our load intrinsic, the new IR might look like:

%x = phi i32 [ 0, %entry ], [ %add3, %for.body ]
%add = or i32 %x, 1

This means that after this optimization we cannot just search for an add (or a sub) to get the information about the offset, but we have also to search for an or.

So if we would run an optimization pipeline we would need to be able to recognize an unclear set of potential patterns. With the knowledge we have gathered since begin of the project we might now be able to change this approach (and there will be attempts to generate code from optimized IR), but changing the historically taken approach is a high barrier we have to take. In the meanwhile we have to keep things running.

Solution attempts:

So maybe we could take your approach and inject our passes (that are currently part of the codegen pipeline) into an otherwise empty optimization pipeline? Or maybe at the beginning of an “customized” optimization pipeline?

Another approach that came into our mind is, that we just try to run our passes with the new pass manager. From here llvm-project/llvm/docs/NewPassManager.rst at main · llvm/llvm-project · GitHub I get that the new PM cannot handle the whole codegen pipeline. Might be a construct possible in which the new PM is used to run the IR passes at the beginning of the codegen pipeline, and the legacy pass manager is used to run the machine passes?

Thank you.