Reason about TailDuplicator correctness heuristics

Hi,
I’m learning the heuristics in Tail Duplicator. The comments are quite educational but I become very interested to know how did the initial implementation catch the erroneous scenarios and add early returns.

It’s not intuitive to me these early returns are needed (or otherwise cause errors). My question is, how are these knowledge typically gained (from a methodology perspective)? Any suggestions to teach myself to gain these knowledge? Are some correctness heuristics driven by failed assertions in the relevant llvm libraries (if yes, are assertion-driven heuristics common)?

p.s. For performance heuristics, like Do not duplicate 'return' instructions if this is a pre-regalloc run or Avoid duplicating calls before register allocation. (https://github.com/llvm/llvm-project/blob/4875ff1dc90bba089a5a14023d5eec69490b0422/llvm/lib/CodeGen/TailDuplicator.cpp#L618-L628) are easier to understand.

Examples of correctness heuristics
1)

// Non-duplicable things shouldn't be tail-duplicated.
// CFI instructions are marked as non-duplicable, because Darwin compact
// unwind info emission can't handle multiple prologue setups. In case of
// DWARF, allow them be duplicated, so that their existence doesn't prevent
// tail duplication of some basic blocks, that would be duplicated otherwise.

This is from https://github.com/llvm/llvm-project/blob/4875ff1dc90bba089a5a14023d5eec69490b0422/llvm/lib/CodeGen/TailDuplicator.cpp#L603-L607

// Convergent instructions can be duplicated only if doing so doesn't add
// new control dependencies, which is what we're going to do here.

This is from https://github.com/llvm/llvm-project/blob/4875ff1dc90bba089a5a14023d5eec69490b0422/llvm/lib/CodeGen/TailDuplicator.cpp#L613

The initial implementation probably didn’t. In particular noduplicate was added specifically to stop this transformation

1 Like