[RFC] Consolidate "remove unreachable blocks" functions?

Hello all!

After exposing a new “eliminate unreachable basic blocks” function
last week in ⚙ D59069 [Utils] Extract EliminateUnreachableBlocks (NFC), I was asked in
post-commit review whether I could consolidate several existing
unreachable basic block elimination functions in LLVM.

With that goal in mind, I tried replacing the '-unreachableblockelim'
pass's use of 'llvm::ElminateUnreachableBlocks' (from
include/llvm/Transforms/Utils/BasicBlockUtils.h, added in
⚙ D59069 [Utils] Extract EliminateUnreachableBlocks (NFC) from last week) with the older
function 'llvm::removeUnreachableBlocks' (from
'include/llvm/Transforms/Utils/Local.h', added in
rG4fbc0d08bfe9 from 2012), but the change resulted
in many test failures. The older function 'removeUnreachableBlockshas'
uses much more aggressive elimination logic, and it results in changes
to the output IR that many tests rely on. I tested on x86 alone, and
found 81 tests implicitly run the '-unreachableblockelim' pass, and
making that pass more aggressive results in brittle FileCheck
assertions firing. To name a few of those tests:

1. LLVM :: CodeGen/X86/2007-10-12-SpillerUnfold1.ll
2. LLVM :: CodeGen/X86/2007-11-30-LoadFolding-Bug.ll
3. LLVM :: CodeGen/X86/2008-03-31-SpillerFoldingBug.ll
4. LLVM :: CodeGen/X86/2008-04-16-ReMatBug.ll
...
80. LLVM :: DebugInfo/X86/concrete_out_of_line.ll
81. LLVM :: MC/MachO/cstexpr-gotpcrel-64.ll

You may be wondering, "what about 'removeUnreachableBlocks' (the older
one) is more aggressive than EliminateUnreachableBlocks (the newer
one)?"

'EliminateUnreachableBlocks' does a simple depth-first traversal of
the basic blocks in a function, beginning with the entry basic block.
Therefore the only requirement for a basic block being "reachable," I
believe, is that there is some branch instruction that reaches the
block (this is based on my naive understanding of the
'depth_first_ext' iterator). There is no further effort expended in
order to determine whether those branch instructions themselves are
reachable, or whether conditional branches use a value known at
compile time (and therefore are statically known to take only one
branch).

On the other had, 'removeUnreachableBlocks' includes intricate logic,
implemented in the 'markAliveBlocks' static function. For example,
'markAliveBlocks' checks each instruction in each basic block, and
inserts an unreachable instruction after any call to a 'noreturn'
function.

My question is this: would it be worthwhile for me to consolidate
these two functions and fix all of these brittle tests? Or is there
room in LLVM for both a more- and a less-aggressive variant of
unreachable block elimination? If so, I will not attempt to
consolidate these two functions.

The reason I ask is that I think it will take me a while to adjust 81
tests (just on x86) to the more aggressive reduction, and I want to
make sure that this change is welcome before I spend time doing so. If
anyone has any thoughts, please let me know!

- Brian Gesiak

I might be wrong here as I haven't looked very closely in some time,
but, if I recall correctly, you can walk the basic blocks of a
function in O(BB), so the simple variant is linear in the number of
blocks. The function which implements the "intricate logic" scans
(potentially) all the instructions in the function, so it's linear on
the number of instructions. The latter might be much more expensive
than the former. I personally see room for both implementations.