[RFC] CFIFixup handling of prologues which span more than one basic block

The CFIFixup pass assumes a function prologue is contained in a single basic
block. This assumption is broken with upcoming support for stack probing
(-fstack-clash-protection) in AArch64 [1] - the emitted probing sequence in a
prologue may contain loops, i.e. more than one basic block. The generated CFG is
not arbitrary though, in the end the prologue forms a single
single-entry/single-exit region.

I’ve considered a few approaches to handle this in CFIFixup:

  • Find the block that (a) contains instructions with the flag FrameSetup and
    (b) post-dominates every other block that contains instructions with the
    flag FrameSetup.

    • pros: that is fairly universal, with changes confined to the CFIFixup pass alone
    • cons: a bit on the heavy side, requires construction of the Machine PostDominator tree.
  • Add a new MachineInstr flag (say MachineInstr::PrologueEnd) and mark the
    first instruction after the prologue

    • pros: super cheap, just a one extra bit
    • cons: transformations (e.g. reversing condition on branch instructions) may not
      preserve MI flags
  • Add a new operation code (say OpPrologueEnd) to MCCFIInstruction and emit

    • pros: nothing in particular
    • cons: does not seem entirely inline with the design and intent of MCCFIInstruction, since there won’t be a corresponding .cfi_... directive and may break users of MCCFIInstruction who expect to handle every opcode (e.g. in a switch statement) and assert on unrecognised opcode.
  • Add a new target independent pseudo-, meta- instruction, say TargetOpcode::PROLOGUE_END.

    • pros: This is similar to what Windows for AArch64 does (AArch64::SEH_PrologEnd), except that for CodeGen/CFIFixup.cpp we need a target independent one.
    • cons: nothing in partilcular
      This is my preferred approach right now.

Are there any other ideas?

[1] [AArch64] Stack probing for function prologues by momchil-velikov · Pull Request #66524 · llvm/llvm-project · GitHub

A fifth option, add a target hook to find and return the prologue block (or more specifically the block and the location when the prologue is done and we can issue .cfi_remember_state as needed). This should be the last resort, IMHO, for the case we absolutely can’t do the right thing in a target independent manner.

After an initial implementation of option 4 (TargetOpcode::PFROLOGUE_END) now I tend to prefer option 3, adding a new opcode for CFI_INSTRUCTION. The reason is that whatever marker is there for end of prologue, most of the time it needs to behave like other CFI instructions and there is a number of places where we treat CFI instructions in a specific manner (cf. mentions of isCFIInstruction).
Thus either:

  • we need to change all places where isCFIInstruction() is called to
    something like isCFIInstruction() || Opcode == TargetOpcode::PrologueEnd (and that needs to be future proof against future appearances of isCFIInstruction())
  • make the prologue end marked be a CFI instruction, i.e. option 3