Unreachable blocks in llvm-reduce

I’m having difficulty deciding what to do with some invalid reductions llvm-reduce produces when unreachable blocks are involved. I’m currently looking into a reproducer which took over a day to reduce, and it spent most of its time producing invalid reducers in block reduction.

I have a patch to start deleting blocks that become unreachable as a result of deleting other blocks which is easy enough. This avoids a lot of def does not dominate use errors. There are additional issues when there are unreachable blocks present in the incoming program.

In the past I have had some difficulty reducing failures that occurred in unreachable code, so I think there’s possibly some value to trying to keep around these blocks. I believe all the issues I’m running into disappear if the block reduction just implicitly deletes unreachable blocks. However, that runs into a different problem. If the interestingness check depends on the unreachable code, we’ll hit the “number of chunks changes when reducing” assertion.

I see 3 options:

  1. Run unreachable block elim before trying any reductions. This breaks the ability to reduce problems that occur as a result of unreachable code.
  2. Have the block reduction auto-delete incoming unreachable blocks. This breaks the assertion about changing the number of chunks, which presumably needs to be removed since it’s no longer a preserved property
  3. Write convoluted code to try to maintain unreachable blocks the oracle doesn’t say to delete, and do something to update any users of values in any successor blocks of the unreachable code that are also reachable through a different path

Given that blocks can become unreachable throughout, I’d assume a combination of 1 and 2 would be reasonable. Together with a flag to disable elimination.

imo (1) is not acceptable, it’s not OK to do any transformation outside of the reduction loop. rather, just have a one-trick reduction pass that tries to eliminate all unreachable blocks. the cost of this is low – one run of the interestingness test.

(2) seems fine

not sure about (3)

Currently leaning towards having block reduction just give up if there are unreachable blocks, and clean up any it introduces itself. Then have a separate reduction for unreachable blocks. If you did have an unreachable code problem, you could turn off that reduction and the other CFG simplifications should still be able to help trudge through it.

Attempting to keep the number of chunks consistent within a run makes sense, or else the chunks we feed to the oracle won’t match the actual IR. I actually want to revisit the chunk handling because right now we calculate the number of chunks once at the beginning and never recompute it which doesn’t make sense. We should recompute after a successful reduction, although if we start over then we’ll probably waste time attempting to reduce everything which likely won’t be interesting (because we’ve already tried to reduce everything), so we’d need to measure the impact on reduction times.

Having block reduction clean up unreachable blocks after the initial reduction sounds good, and running a separate unreachable block cleanup before that also sounds good.

Attempting to keep the number of chunks consistent within a run makes sense, or else the chunks we feed to the oracle won’t match the actual IR. I actually want to revisit the chunk handling because right now we calculate the number of chunks once at the beginning and never recompute it which doesn’t make sense.

I’m battling this when trying to make the block reduction better. In my current patch to fix unreachable handling, you end up with a lot more noise coming out of block reduction. You effectively split the CFG simplification into 3 separate reductions, unreachable blocks, basic blocks, and then simplify-cfg. basic-blocks would likely have fewer steps to perform if some CFG simplifications were performed as it goes along, instead of as a separate reduction after completion.

If I try to do this, such as by calling MergeBlockIntoPredecessor on any neighbors of deleted blocks, it ends up not trying the combinations I expect. I was also wondering if picking the blocks in some kind of graph traversal order instead of source order would make more sense, but I expect even more chunk counting difficulties with that.

⚙ D136461 llvm-reduce: Fix block reduction with unreachable blocks is what I came up with. Somehow bugpoint’s implementation is substantially simpler. llvm-reduce’s block reduction is structured as removing deleted blocks from the terminators of other blocks. Bugpoint just replaces blocks to delete’s terminator with unreachable and let’s BasicBlockUtils functions take care of the rest. I haven’t figured out how it manages to keep interesting blocks attached to the remaining CFG