Providing correct unwind info in function epilogue

Hi all,

We have been working on a solution for the problem of incorrect unwind info in function epilogue, reported in:
The cause of the problem are missing CFI instructions that update the rule for calculating CFA in the epilogue. We added the instructions for updating the CFA calculation rule in the X86FrameLowering::emitEpilogue() method:
  - for the case without FP, we added a cfi_def_cfa_offset instruction each time the stack pointer is changed
  - and for the case when FP is used, we added a cfi_def_cfa instruction when frame pointer is 'pop'-ed (to use stack pointer register and initial offset for CFA calculation from then on).
These changes enabled the generation of correct unwind info for the function epilogue, for the case when epilogue is the last basic block in a function.

However, there are a few cases that cause problems with this solution, and all of them have epilogue somewhere in the middle of the function, and not as the last basic block:
   1. Problem when there are EH instructions below the epilogue.
     Example: a few tests in test-suite that have EH that physically comes after epilogue block. In the following example:

    0: 53 push %rbx
    1: 0f b6 05 00 00 00 00 movzbl 0x0(%rip),%eax # 8 <_Z6throwsv+0x8>
    8: 83 e0 01 and $0x1,%eax
    b: 83 f8 01 cmp $0x1,%eax
    e: 74 07 je 17 <_Z6throwsv+0x17>
   10: b8 7b 00 00 00 mov $0x7b,%eax
   15: 5b pop %rbx
   16: c3 retq
   17: bf 04 00 00 00 mov $0x4,%edi
   1c: e8 00 00 00 00 callq 21 <_Z6throwsv+0x21>
   21: c7 00 07 00 00 00 movl $0x7,(%rax)
   27: be 00 00 00 00 mov $0x0,%esi

     code after 'retq' instruction (location 16) has the CFA calculation rule set after the 'pop' instruction at (location 15). However, it should have the CFA calculation rule set by the prologue ('push' instruction at location 0), and not by the epilogue.
   2. Problems that shrink wrapping pass can potentially cause:
      2.1. Place prologue and epilogue somewhere in the middle of the function (not as the first/last blocks)
      2.2. Separate the 'return' instruction from the rest of the epilogue. Its physical position in the function can be above or below the epilogue.
      2.3. After inserting prologue and epilogue code, some of the later passes can reorder them and merge them with other basic blocks.
   3. Problem when multiple epilogues exist in a function.

In order to solve the described problems, and provide correct rules for calculating CFA at each instruction in the function, we started implementing a new pass. This pass should run after all basic block merging and reordering are done (once basic blocks are in their final order).
   The pass goes through all basic blocks in the function and sets the correct cfi_def_cfa_offset at the beginning of each basic block. Because this introduces excess cfi instructions, the ones that are not necessary are eliminated (e.g. consecutive cfi_def_cfa_offset instructions with same offset value).
   This is a work in progress, as we still need to add support for setting correct register at the beginning of each basic block (that is needed because we use cfi_def_cfa instruction for the case when FP is used).
   We plan to upload a patch for review once we have a working solution, but for now, we wanted to see if anyone has any ideas or comments. Do you agree that this pass implementation is a right way to solve this issue?

A late pass that adds CFI after block layout sounds like the right approach.