[RFC] Automated Pipeline Reduction Tool for CodeGen (Backend)

Michael-Chen-NJU · January 30, 2026, 2:01pm

The LLVM middle-end has a highly effective tool llvm/utils/reduce_pipeline.py, which automates the process of finding the minimal set of passes that trigger a bug. However, no equivalent exists for the CodeGen pipeline. Currently, backend developers must rely on things such as opt-bisect to debug crashes and miscompilations.

The limitation of opt-bisect becomes evident when dealing with bugs such as “silent corruption”—where an early pass violates an IR invariant (e.g., SSA dominance), but the crash only occurs much later in the MachineVerifier or Register Allocator. In such cases, opt-bisect can only identify the pass that finally caught the error, not the pass that introduced it. Identifying the true culprit often requires manually bisecting through a sequence of hundreds of machine passes.

I am proposing a tool to automate the reduction of the backend pass pipeline. This tool would take a reproducer script (an “interestingness test”) and iteratively remove non-essential passes until the minimal failing pipeline is identified.

I believe this would significantly reduce the manual effort required to debug complex backend issues, such as the recent WebAssembly #55249 case, where a CFG transformation pass silently broke register liveness. Automating the isolation of this pass would have saved significant debugging time.

aeubanks · January 30, 2026, 7:14pm

Automated pipeline reduction is great, no objections to that. There’s still ongoing work to migrate to codegen pipeline to the new pass manager, and once that’s done we might be able to share some of the pipeline reduction code between the optimization and codegen pipelines. And writing another tool to work with the legacy pass manager in the meantime is fine.

I do wonder if in your case, would it be easier to add a flag that runs the verifier after every pass, then you can identify which pass run breaks the IR, and then you can dump the IR/MIR before the pass and just run the culprit pass. There are probably ways to make this process easier (e.g. new pass manager has -print-on-crash which saves the IR before every pass and dumps it if LLVM crashes), but maybe this works for you?

mshockwave · January 31, 2026, 12:49am

I thought that’s -verify-machineinstrs. @Michael-Chen-NJU have you tried that?

Michael-Chen-NJU · February 1, 2026, 9:44am

@mshockwave You are absolutely right that -verify-machineinstrs is the first line of defense and essential for catching invalid MIR.

However, the motivation behind this proposal is to address scenarios where the verifier passes successfully, yet the code is semantically broken (e.g., miscompilations where valid but incorrect instructions are generated). My thought was that while the verifier ensures structural validity (register classes, liveness, etc.), the proposed tool could complement it by allowing us to define an “interestingness test” to automatically isolate the pass responsible for the logic error.

Michael-Chen-NJU · February 1, 2026, 9:54am

aeubanks:

Automated pipeline reduction is great, no objections to that. There’s still ongoing work to migrate to codegen pipeline to the new pass manager, and once that’s done we might be able to share some of the pipeline reduction code between the optimization and codegen pipelines. And writing another tool to work with the legacy pass manager in the meantime is fine.

I do wonder if in your case, would it be easier to add a flag that runs the verifier after every pass, then you can identify which pass run breaks the IR, and then you can dump the IR/MIR before the pass and just run the culprit pass. There are probably ways to make this process easier (e.g. new pass manager has -print-on-crash which saves the IR before every pass and dumps it if LLVM crashes), but maybe this works for you?

Thanks @aeubanks.

I agree regarding the New Pass Manager migration—sharing the reduction infrastructure once the CodeGen pipeline is fully migrated is definitely the right long-term direction.

Regarding the verifier suggestion: While the existing verifier is excellent for catching invalid MIR, the thought behind this proposal is to handle cases where the MIR structure remains valid, but the logic is broken.

For instance, a pass might violate an implicit dependency that the generic verifier doesn’t check, causing a crash much later in the pipeline (e.g., the verifier passes at step N, but the crash happens at N+10).

The proposed tool is intended to be complementary by allowing developers to plug in custom tests. This flexibility helps isolate semantic regressions and other complex scenarios—perhaps even edge cases we haven’t fully anticipated yet.

nikic · February 1, 2026, 11:24am

In case you’re not aware, opt-bisect supports ranges since recently, so you can bisect at a more fine-grained level than “run all passes until this point”.

Though I think we currently lack support for driving that automatically (we have llvm/utils/bisect and llvm/utils/bisect-skip-count, but neither of them supports arbitrary subrange bisect).

The key difference between just opt-bisect and pipeline reduction is that the latter produces something more actionable when it comes to producing a test case or performing further manual reduction. (Though when it comes to the backend, I expect that this will be more fragile than opt-bisect, because of issues with MIR serialization.)

Michael-Chen-NJU · February 2, 2026, 4:50pm

Thanks @nikic for such a detailed and encouraging response! To better understand the scope of the problem, I tried to do some analysis by analyzing over 1,600 historical backend optimization issues. I found that while many fixes are quite small (the median is just ~17 lines), the effort required to isolate them from a complex pipeline is disproportionately high. Furthermore, nearly 60% of optimization-related bugs involve interactions between multiple components (such as TargetLowering and specific optimization passes), which makes simple point-of-failure identification less effective than pipeline reduction.

I’m very grateful for your warning about MIR serialization fragility; it’s something I hadn’t fully factored in yet. Given my current stage, I’m a bit confused on the best path forward: Should I focus on building an automated driver that leverages the new range-based bisection you mentioned? Or would it be more helpful for the community if I kept it as a standalone utility, similar to reduce_pipeline.py, focusing on generating a minimal llc command?

I feel that combining the “interestingness test” with the automated bisection might be a good way to mitigate the fragility while getting that actionable output. I’d appreciate any guidance you could spare!

arsenm · February 2, 2026, 5:21pm

I think it’s important to start addressing these issues. A few concrete examples:

Register use lists. We have an explicit representation of them in the IR, and need the equivalent handled in MIR.
Some fields are still not serialized from MachineFunction and MachineBasicBlock
Target MachineFunctionInfos are more incomplete than not
Some critical information only exists in analyses without serialization. In particular exact SlotIndex numbers, and the LiveRegMatrix

mshockwave · February 2, 2026, 6:48pm

derail the discussion a bit: recently I’ve seen not just one but two issues (both correctness and performance ones) that are strongly related to, if not caused by, slot index not currently being serialized. As regalloc prioritize long live intervals (and the length is calculated from slot index, IIUC), having inconsistent slot index between analyses vs. deserialized MIR might lead to completely different (regalloc) results. So personally I think this item is quite severe especially in high register pressure scenarios.

boomanaiden154-1 · February 2, 2026, 7:01pm

How are you running into issues with slot index serialization in a production compiler? Are you serializing to/from MIR in the compiler that you ship?

arsenm · February 2, 2026, 10:16pm

There is no such thing as slot index serialization, which is part of the problem. They only exist in LiveIntervals, and you’ll get different numbers if you freshly compute them vs. retain them from previous passes

Michael-Chen-NJU · February 6, 2026, 4:15pm

Thanks @arsenm and @mshockwave for the deep dive into MIR serialization. The fact that SlotIndex isn’t serialized and relies on being “retained” in memory makes it clear why a simple port of the middle-end logic won’t work.

To move forward despite these gaps, I’m focusing on a more conservative approach for my prototype by skipping the bisection step that relies on intermediate MIR dumps (Step #1 in reduce_pipeline.py). Instead, I’ll focus on Steps #2 (backward clipping) and #3 (exhaustive sweep) by running the compiler from the original input for every iteration.

While this “from-scratch” strategy is slower, it avoids the “dump-and-load” cycle entirely and keeps analysis states like SlotIndex coherent within each llc execution.

I’ll start building a demo with this strategy to see how it performs on some known issues. I’m still in the early stages of this, so if you have any suggestions on how to better handle the bisection logic without serialization, or if there are specific backend edge cases I should keep an eye on, I’d be very grateful for your advice!

Topic		Replies	Views
[RFC] Removing bugpoint (as part of new pass manager migration) LLVM Project	18	1347	December 7, 2022
Are there any tools to reduce the IR file that caused a problem? LLVM Dev List Archives	3	174	November 12, 2018
RFC: New support for triaging optimization-related failures in front ends LLVM Dev List Archives	26	327	April 5, 2016
Tutorial on LLVM IR LLVM Dev List Archives	4	131	February 15, 2021
Llvm-reduce: short/medium-term directions Subprojects llvm-reduce	52	3068	April 5, 2024

[RFC] Automated Pipeline Reduction Tool for CodeGen (Backend)

Related topics