Advice on MachineMoves and SEH

Hi,

If you've been following llvm-commits, you might know that I've been
working on implementing support for SEH--specifically, the Win64 variant
of it--in LLVM.

I know a lot of you couldn't care less about this, but I'd really
appreciate some advice about this. I'm almost to the point where it's
possible to use GCC-style exceptions under Win64, but I've hit a small
roadblock.

The problem is that I need information about what happens to the stack
in the prologue. I know that information is stored in MachineMove
objects in the MachineModuleInfo, but this information seems to be
specific to DWARF CFI.

Windows' scheme for storing information about the call frame differs
somewhat from DWARF CFI. For one thing, the distinction between an x86
PUSH and MOV onto the stack is very important. Windows (and compatibles)
will actually execute the prologue in reverse, instead of just restoring
the registers' state. As a result, Windows needs to know that a register
was PUSHed onto the stack and not simply MOVed.

Another difference is in how offsets on the stack are recorded. In DWARF
CFI, they're offsets from the CFA. But in Win64, they are offsets from
the stack pointer (%rsp on x86). Even worse, if there's a frame pointer,
the offset is from %rsp *when the frame pointer was established*.

Because of all this, it's hard to reconstruct the SEH information from
the MachineMove array. I have thought about adding a new array specific
to SEH information, but I'm not sure how you guys would feel about that.
Any ideas on how to solve this problem?

Chip

Hi Chip,

Because of all this, it's hard to reconstruct the SEH information from
the MachineMove array. I have thought about adding a new array specific
to SEH information, but I'm not sure how you guys would feel about that.
Any ideas on how to solve this problem?

Same problem with ARM-specific EH. I ended with own information
scheme, where instructions are marked as "frame related" during all
prologue & epilogue emission and later are "recognized" during the MI
=> MCInst lowering. You might want to look into ARM backend, maybe we
can somehow "generalize" this approach.

Yes, the current model of producing a on the side vector of machine moves doesn't look like the best thing to do now that we produce cfi directives.

I was thinking about just creating pseudo instructions that map 1:1 to the cfi directives (and ARM, and SEH). Codegen would create the appropriate one depending on the target. This would also avoid the silly labels that we still create when producing cfi.

Cheers,
Rafael

+1

Pseudo instructions are problematic. That means every function pass that iterates over the instructions has to explicitly check for them to make sure they don't affect optimization. We already have to do that for debug_value instructions, and it's really unpleasant and error prone.

-Jim

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Jim Grosbach
Sent: Thursday, June 02, 2011 9:11 AM
To: Rafael Ávila de Espíndola
Cc: llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] Advice on MachineMoves and SEH

>> Hi Chip,
>>
>>> Because of all this, it's hard to reconstruct the SEH information
from
>>> the MachineMove array. I have thought about adding a new array
specific
>>> to SEH information, but I'm not sure how you guys would feel about
that.
>>> Any ideas on how to solve this problem?
>> Same problem with ARM-specific EH. I ended with own information
>> scheme, where instructions are marked as "frame related" during all
>> prologue& epilogue emission and later are "recognized" during the
MI
>> => MCInst lowering. You might want to look into ARM backend, maybe
we
>> can somehow "generalize" this approach.
>>
>
> Yes, the current model of producing a on the side vector of machine
> moves doesn't look like the best thing to do now that we produce cfi
> directives.
>
> I was thinking about just creating pseudo instructions that map 1:1
to
> the cfi directives (and ARM, and SEH). Codegen would create the
> appropriate one depending on the target. This would also avoid the
silly
> labels that we still create when producing cfi.
>

Pseudo instructions are problematic. That means every function pass
that iterates over the instructions has to explicitly check for them to
make sure they don't affect optimization. We already have to do that
for debug_value instructions, and it's really unpleasant and error
prone.

[Villmow, Micah] What about adding a field to the Instruction class that specifies if it is a pseudo instruction or not. That way it is the same check no matter how many pseudo-instructions are added. While this might not be the best solution, if it is something that has to be done for debug_value instructions already, and no better solution is proposed, it might be helpful.

How many passes are there this late in chain ? In any case, they all know how to deal with PROLOG_LABEL anyway. Personally, I'd like to make dwarf writer free of MachineModuleInfo.

Hi Devang,

How many passes are there this late in chain ? In any case, they all know how to deal with PROLOG_LABEL anyway. Personally, I'd like to make dwarf writer free of MachineModuleInfo.

Quite a lot. Especially on ARM where we have all sorts of expansion
and transformation passes at MI level.

Well, I've mulled it over for a while, and I've decided to take the ARM
EH approach of marking frame instructions and recognizing them later
during MachineInstr -> MCInst lowering. Given the trouble of teaching
the various late passes about pseudo-instructions, it just seemed like
the best choice to me.

Thanks for all your help.

Chip

Chip,

Well, I've mulled it over for a while, and I've decided to take the ARM
EH approach of marking frame instructions and recognizing them later
during MachineInstr -> MCInst lowering. Given the trouble of teaching
the various late passes about pseudo-instructions, it just seemed like
the best choice to me.

Note that you will still have to preserve the "frame-related" marker
during various expansion phases. E.g. in ARM case this was true for
formation of multiple load/store instructions, etc.

Just a comment on the main idea of SEH being in LLVM. It should be noted that Borland owns a patent on SEH:

  http://gcc.gnu.org/wiki/WindowsGCCImprovements

I think we should proceed with caution in this case.

-bw

They have a patent on frame-based exception handling. Windows 64-bit SEH
is table-based like DWARF so it is not an issue, as far as I understand.