RFC: How to represent SEH (__try / __except) in LLVM IR

Moving this month old RFC to llvmdev. Not sure why I sent this to cfe-dev in the first place…

Hm, this idea won’t work. If we point to labels from landingpadinst then passes like SimplifyCFG will consider the blocks to be unreachable. I realized this by looking at llvm-dis output after hacking in asmparser support for this syntax. :slight_smile:

I’ll have to think longer.

Hi Reid,

I’ve been following your proposal, and I’d be interested in helping out if I can. My main interest right now is in enabling C++ exception handling in clang for native (i.e. not mingw/cygwin) Windows targets (both 32-bit and 64-bit), but if I understand things correctly that will be closely related to your SEH work under the hood.

I’m still trying to get up to speed on what is and is not implemented, but I think I’m starting to get a clear picture. My understanding is that LLVM has the necessary support to emit exception handling records that Windows will be able to work with (for Win64 EH) but some work may be required to get the IR properly wired up, and that there’s basically nothing in place to support Win32 EH and nothing in clang to generate the IR for either case. Is that more or less accurate?

I’ve been looking at the work Kai Nacke did in ldc to implement exception handling there, but it isn’t clear to me yet how relevant that is to clang.

Can you tell me more about what your plans are? Specifically, do you intend to support both 32 and 64 bit targets? And were you also planning to work toward C++ exception handling support in clang once you had the general SEH support in place?

Finally, and most importantly, what can I do to help?

Thanks,

Andy

Cool! Apologies for the following stream of consciousness brain dump...

Hi Reid,

I’ve been following your proposal, and I’d be interested in helping out if
I can. My main interest right now is in enabling C++ exception handling in
clang for native (i.e. not mingw/cygwin) Windows targets (both 32-bit and
64-bit), but if I understand things correctly that will be closely related
to your SEH work under the hood.

Great! I agree, any changes to LLVM IR made to support SEH will also be
needed to support C++ exceptions on Windows, in particular the outlining.

In the current LLVM model, all the exception handling code lives in the
landing pad. The Windows unwinder doesn't actually return control to the
landingpad until very late. Instead, it creates new stack frames to invoke
the cleanup, catch handler (C++ EH only), or filter function (SEH only).
This is why we need to have outlining somewhere. The question is, where
should we do it? Personally, I want to do this on LLVM IR during
CodeGenPrepare.

The major challenge that outlining anywhere presents is that now the
outlined code has to "know" something about the frame layout of the
function it was outlined from in order to access local variables. I think
we can add `i8* @llvm.eh.get_capture_block(i8* %function, i8* %parent_rbp)`
and `void @llvm.eh.set_capture_block(i8* %captures)` intrinsics to make
this work. Any SSA values or allocas captured by the outlined landing pad
code will be demoted to memory and stored in the capture block, and the
layout will be encoded in a struct used by the outlined handlers and the
parent function. However, once you do this, you cannot inline the IR
without some heroics. It probably isn't that important to be able to inline
functions with try/catch, but a good acid test for any new LLVM IR
construct is "will it inline?", and this construct fails. I think we can
live with this construct as long as we only introduce it after
CodeGenPrepare.

The remaining wrinkle in the capture block scheme is stack realignment
prologues. In this case, we have three pointers to the stack: the SP, the
base pointer (esi/rbx), and the frame pointer (ebp/rbp). Is the capture
block stored at a known constant offset from ebp/rbp or esi/rbx? Or do we
load and store a dynamic offset saved somewhere near ebp/rbp? This needs
study.

I’m still trying to get up to speed on what is and is not implemented, but
I think I’m starting to get a clear picture. My understanding is that LLVM
has the necessary support to emit exception handling records that Windows
will be able to work with (for Win64 EH) but some work may be required to
get the IR properly wired up, and that there’s basically nothing in place
to support Win32 EH and nothing in clang to generate the IR for either
case. Is that more or less accurate?

We can emit valid pdata and xdata sections on Win64, and this supports
basic stack unwinding. On top of that, we currently follow mingw64 and use
Itanium-style LSDA tables and the __gxx_personality_seh0 personality
function to run EH handlers. This means the standard exception handling IR
emitted by clang and other frontends "just works" on Windows, and I want to
keep it that way. I think most of the changes should be on the LLVM side to
lower the standard EH IR down to something that is more compatible with
MSVC EH.

I’ve been looking at the work Kai Nacke did in ldc to implement exception
handling there, but it isn’t clear to me yet how relevant that is to clang.

Can you tell me more about what your plans are? Specifically, do you
intend to support both 32 and 64 bit targets? And were you also planning
to work toward C++ exception handling support in clang once you had the
general SEH support in place?

I want to do Win64 first because it is easier and better documented, and
then look at 32-bit next. 32-bit SEH does things like "take the address of
a BB label from the middle of the parent function and 'call' it with a
special ebp value passed in", but that is basically equivalent to the Win64
way of doing things with a very special calling convention.

I know some people are also interested in ARM (WoA), which should be
similar to Win64, as it also uses pdata/xdata style unwind info.

Finally, and most importantly, what can I do to help?

I think there are some separable tasks here.

The EH capture block intrinsics can probably be built in isolation from the
outlining. We can probably make `get_capture_block` work with the result of
`@llvm.frameaddress(i32 0)`. The inliner also has to be taught to avoid
inlining functions that set up a capture block.

Doing outlining will be similar what `llvm::CloneAndPruneFunctionInto`
does, except it will start at the landing pad instead of the entry block.
Instead of mapping from parameters to arguments, the outliner would map the
selector to a constant and propagate that value forwards, pruning
conditional branches as it goes. The `resume` instruction would end
outlining and become a `ret`. Any cloned `ret` instructions are the result
of cloning something that is statically reachable but dynamically
unreachable. We can transform them to `unreachable` and run standard
cleanup passes to propagate that backwards.

32-bit x86 EH will require installing an alloca onto the fs:00 chain of EH
handlers. I suppose this could be emitted during CodeGenPrepare as regular
LLVM IR instructions, since we have a way of writing `load/store fs:00`
with address space 257. This alloca should probably be the same as the
capture block, since it has to be at some known offset from ebp.

Thanks for the additional information.

Right now I’m experimenting with a mix of code compiled with MSVC and code compiled with clang, trying to get a C++ exception thrown and caught by the MSVC-compiled code across a function in the clang-compiled code. My goal here is to isolate a small part of what needs to be done in a way that lends itself to tinkering. I think this might lead me to the outlining of EH blocks that you describe below.

If the clang code doesn’t have and exception handler (and it can’t since clang won’t compile that right now) and doesn’t need to do any clean-up, this works fine. If the clang code does need to do cleanup, clang currently emits the same landingpad stuff that it would emit for mingw and since I’m trying to link with the MSVC environment I end up with unresolved externals. So I’m playing around with the clang-generated IR to see if I can turn it into something that will handle the cleanup and let the exception pass. I’ve got it calling my custom SEH-style personality function and it’s trivial to get that to let the exception pass without doing the cleanup. Now I just need to figure out how to get it to execute the cleanup code.

I haven’t spent a lot of time on this yet, so if this overlaps with what you’ve been doing I can step back and approach it from a different direction. Otherwise, I’ll proceed and see if I can make use of your suggestions below with regard to outlining, probably starting with manual changes to the IR that simulate the process.

-Andy

Focusing on cleanups is probably a good way to start. The trouble is that your personality function can’t just reset rsp and jump to the landing pad, or it will trash the state of the unwinder that’s still on the stack. Everything in the landing pad basically has to be outlined. If the outlining happens at the IR level, we need some way to represent that, and I don’t really have it nailed down.

Here’s an idea, just to brainstorm:

define void @parent() {
invoke … unwind to %lpad

lpad:

%eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__C_specific_handler to i8*)
cleanup
catch i8* @typeid1
catch i8* @typeid2

%label = call i8* (…)* @llvm.eh.outlined_handlers(
void (i8*, i8*)* @my_cleanup,
i8* @typeid1, i8* (i8*, i8*)* @my_catch1,
i8* @typeid2, i8* (i8*, i8*)* @my_catch2)

indirectbr i8* %label

endcatch:

}

define void @my_cleanup(i8*, i8*) {

ret void ; unwinder will keep going for cleanups
}

define i8* @my_catch1(i8*, i8*) {
ret i8* blockaddress(@parent, %endcatch) ; merge back into normal flow at endcatch
}

define i8* @my_catch2(i8*, i8*) {
ret i8* blockaddress(@parent, %endcatch) ; merge back into normal flow at endcatch
}

I guess @llvm.eh.outlined_handlers wouldn’t be valid outside a landing pad, and would only be introduced during CodeGenPrepare to allow the best optimization of the handlers in the context of the parent function.

I don’t really have a good enough feeling for the landingpad syntax yet to comment on the most natural way to extend it yet, but creating a synthetic cleanup function to call from the personality function is what I was thinking.

With the current (trunk +/- a couple of weeks) clang, compiling for an “x86_64-pc-windows-msvc” target, I’m seeing a landingpad that looks like this:

lpad: ; preds = %if.end, %if.then

%2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__gxx_personality_v0 to i8*)

cleanup

%3 = extractvalue { i8*, i32 } %2, 0

store i8* %3, i8** %exn.slot

%4 = extractvalue { i8*, i32 } %2, 1

store i32 %4, i32* %ehselector.slot

call void @"\01??1Bob@@QEAA@XZ"(%class.Bob* %bob) #3 ; Calling the destructor for a class named “Bob”

br label %eh.resume

Replacing __gxx_personality_v0 with the name of my custom personality function (which has the SEH signature) and scrubbing out the terminate and resume calls for the time being, I see my personality function being called twice – first for the C++ exception (Exception code == 0xe06d7363) and once for the unwind. So now I just need to figure out how to get a pointer to a cleanup function into the DispatcherContext->HandlerData, which must be where the extra stuff in the landingpad comes in, right?

Anyway, I think I’m making progress. :slight_smile:

-Andy

I don’t really have a good enough feeling for the landingpad syntax yet
to comment on the most natural way to extend it yet, but creating a
synthetic cleanup function to call from the personality function is what I
was thinking.

Pretty much.

With the current (trunk +/- a couple of weeks) clang, compiling for an
“x86_64-pc-windows-msvc” target, I’m seeing a landingpad that looks like
this:

lpad: ; preds = %if.end,
%if.then

  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)*
@__gxx_personality_v0 to i8*)

          cleanup

  %3 = extractvalue { i8*, i32 } %2, 0

  store i8* %3, i8** %exn.slot

  %4 = extractvalue { i8*, i32 } %2, 1

  store i32 %4, i32* %ehselector.slot

  call void @"\01??1Bob@@QEAA@XZ"(%class.Bob* %bob) #3 ; Calling the
destructor for a class named “Bob”

  br label %eh.resume

Replacing __gxx_personality_v0 with the name of my custom personality
function (which has the SEH signature) and scrubbing out the terminate and
resume calls for the time being, I see my personality function being called
twice -- first for the C++ exception (Exception code == 0xe06d7363) and
once for the unwind. So now I just need to figure out how to get a pointer
to a cleanup function into the DispatcherContext->HandlerData, which must
be where the extra stuff in the landingpad comes in, right?

It's got some docs here:
http://llvm.org/docs/ExceptionHandling.html#overview

The two values are the exception pointer and the selector value. The
selector value is an artifact of the way we model the Itanium EH scheme,
and you can basically set it to zero if you only want to deal with cleanups
for the time being. The exception pointer is presumably pulled from the
arguments to the personality routine. Again, cleanups don't need it, so you
can probably zero it too.

Anyway, I think I’m making progress. :slight_smile:

Nice!

-Andy

I don’t know much about SEH and haven’t had time to really dig into this, but the idea of outlining functions that need to know about the frame layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as portions of the same function, instead of outlining them to separate functions. Can you arrange to set up the base pointer on entry to one of those segments of code to have the same value as when running the normal part of the function? If so, from the code-gen point of view, doesn’t it just behave as if there is a large dynamic alloca on the stack at that point (because the stack pointer is not where it was when the function was previously running)? Are there other constraints that prevent that from working?

The "big dynamic alloca" approach does work, at least conceptually. It's
more or less what MSVC does. They emit the normal code, then the epilogue,
then a special prologue that resets ebp/rbp, and then continue with normal
emission. Any local variables declared in the __except block are allocated
in the parent frame and are accessed via ebp. Any calls create new stack
adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly
complicate a part of LLVM that is already poorly understood and hard to
hack on. I think adding a pair of intrinsics that can't be inlined will be
far less disruptive for the rest of LLVM. This is actually already the
status quo for SjLj exceptions, which introduce a number of uninlinable
intrinsic calls (although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout information
you want to teach CodeGen to save. If we add the set_capture_block /
get_capture_block intrinsics, then we only need to save the frame offset of
*one* alloca. This is easy, we can throw it into a side table on
MachineModuleInfo. If we don't go this way, we need to save just the right
amount of CodeGen state to get stack offsets in some other function.

Having a single combined MachineFunction also means that MI passes will
have to learn more about SEH. For example, we need to preserve the ordering
of basic blocks so that we don't end up with discontiguous regions of code.

This is the only part that concerns me. Who keeps track of the layout of the data inside that capture block? How do you know what local variables need to be in the capture block? If the front-end needs to decide that, is that something that fits easily into how clang works?

For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

Yes, you would probably need to do that. It doesn’t seem like that would be fundamentally difficult, but I haven’t thought through the details and I can imagine that it would take a fair bit of work.

I don’t know much about SEH and haven’t had time to really dig into this,
but the idea of outlining functions that need to know about the frame
layout sounds a bit scary. Is it really necessary?

I’m wondering if you can treat the cleanups and filter functions as
portions of the same function, instead of outlining them to separate
functions. Can you arrange to set up the base pointer on entry to one of
those segments of code to have the same value as when running the normal
part of the function? If so, from the code-gen point of view, doesn’t it
just behave as if there is a large dynamic alloca on the stack at that
point (because the stack pointer is not where it was when the function was
previously running)? Are there other constraints that prevent that from
working?

The "big dynamic alloca" approach does work, at least conceptually. It's
more or less what MSVC does. They emit the normal code, then the epilogue,
then a special prologue that resets ebp/rbp, and then continue with normal
emission. Any local variables declared in the __except block are allocated
in the parent frame and are accessed via ebp. Any calls create new stack
adjustments to new allocate argument memory.

This approach sounds far scarier to me, personally, and will significantly
complicate a part of LLVM that is already poorly understood and hard to
hack on. I think adding a pair of intrinsics that can't be inlined will be
far less disruptive for the rest of LLVM. This is actually already the
status quo for SjLj exceptions, which introduce a number of uninlinable
intrinsic calls (although maybe SjLj is a bad precedent :).

The way I see it, it's just a question of how much frame layout
information you want to teach CodeGen to save. If we add the
set_capture_block / get_capture_block intrinsics, then we only need to save
the frame offset of *one* alloca. This is easy, we can throw it into a side
table on MachineModuleInfo. If we don't go this way, we need to save just
the right amount of CodeGen state to get stack offsets in some other
function.

This is the only part that concerns me. Who keeps track of the layout of
the data inside that capture block? How do you know what local variables
need to be in the capture block? If the front-end needs to decide that, is
that something that fits easily into how clang works?

The capture block would be a boring old LLVM struct with a type created
during CodeGenPrepare.

I'm imagining a pass similar to SjLjEHPrepare that:
- Identifies all bbs reachable from landing pads
- Identifies all SSA values live in those bbs
- Demote all non-alloca SSA values to allocas (DemoteRegToMem, like sjlj)
- Combine all allocas used in landing pad bbs into a single LLVM alloca
with a new combined struct type
- Outline code from landing pads into cleanup handlers, filters, catch
handlers, etc
- In the parent function entry block, call @llvm.eh.seh.set_capture_block
on the combined alloca
- In the outlined entry blocks, call
@llvm.eh.seh.get_capture_block(@parent_fn, i8* %rbp) to recover a pointer
to the capture block. Cast it to a pointer to the right type.
- Finally, RAUW all alloca references with GEPs into the capture block

The downside is that this approach probably hurts register allocation and
stack coloring, but I think it's a reasonable tradeoff.

Thanks for prompting me on this, it helps to write things down like this. :slight_smile:

For DWARF EH and SjLj, the backend is responsible for handling most of the
EH work. It seems like it would be a more consistent design for SEH to do
the same.

Yep. I guess the question is, is CodeGenPrep the backend or not?

No problem. Now that I see the details of what you have in mind, I can’t think of any reason why that wouldn’t work, and I like the way it isolates most of the impact of SEH into one new pass. Also, if the performance impact turns out to be worse than expected, I don’t see anything here that would prevent moving to the “big dynamic alloca” approach later.

Yes, CGP is definitely backend. I thought you were going to say that the front-end needed to decide what goes in the capture block.

For DWARF EH and SjLj, the backend is responsible for handling most of the EH work. It seems like it would be a more consistent design for SEH to do the same.

Looking beyond SEH to C++ exception handling for a moment, it seems to me that clang may be handling more than it should there. For instance, calls like “__cxa_allocate_exception” and “__cxa_throw_exception” are baked into the clang IR output, which seems to assume that the backend is going to be using libc++abi for its implementation. Yet it has enough awareness that this won’t always be true that it coughs up an ErrorUnsupported failure for “isWindowsMSVCEnvironment” targets when asked to emit code for “try” or “throw”.

Should this be generalized with intrinsics?

Also, I’m starting to dig into the outlining implementation and there are some things there that worry me. I haven’t compared any existing code that might be doing similar things, so maybe these issues will become clear as I get further into it, but it seemed worth bringing it up now to smooth the progress. I’m trying to put together a general algorithm that starts at the landing pad instruction and groups the subsequent instructions as cleanup code or parts of catch handlers. This is easy enough to do as a human reading the code, but the way that I’m doing so seems to rely fairly heavily on the names of symbols and labels.

For instance, following the landingpad instruction I expect to find an extract and store of “exn.slot” and “ehselector.slot” then everything between that and wherever the catch dispatch begins must be (I think) cleanup code. The catch handlers I’m identifying as a sequence that starts with a load of “exn.slot” and a call to __cxa_begin_catch and continues until it reaches a call to __cxa_end_catch.

The calls to begin/end catch are pretty convenient bookends, but identifying the catch dispatch code and pairing catch handlers with the clauses they represent seems to depend on recognizing the pattern of loading the ehselector, getting a typeid then comparing and branching. I suppose that will work, but it feels a bit brittle. Then there’s the cleanup code, which I’m not yet convinced has a consistent location relative to the catch dispatching and I fear may be moved around by various optimizations before the outlining and will potentially be partially shared with cleanup for other landing pads.

Then there’s the matter of what all of this will look like with SEH, but I haven’t given that much thought yet.

For now I’ll just happily push ahead in the hopes that this will all either resolve itself or turn out not to be much of a problem, but it seemed worth talking about now at least.

-Andy

> For DWARF EH and SjLj, the backend is responsible for handling most of
the EH work. It seems like it would be a more consistent design for SEH to
do the same.

Looking beyond SEH to C++ exception handling for a moment, it seems to me
that clang may be handling more than it should there. For instance, calls
like “__cxa_allocate_exception” and “__cxa_throw_exception” are baked into
the clang IR output, which seems to assume that the backend is going to be
using libc++abi for its implementation. Yet it has enough awareness that
this won’t always be true that it coughs up an ErrorUnsupported failure for
“isWindowsMSVCEnvironment” targets when asked to emit code for “try” or
“throw”.

Should this be generalized with intrinsics?

We should just teach Clang to emit calls to the appropriate runtime
functions. This isn't needed for SEH because you don't "throw", you just
crash.

Also, I’m starting to dig into the outlining implementation and there are
some things there that worry me. I haven’t compared any existing code that
might be doing similar things, so maybe these issues will become clear as I
get further into it, but it seemed worth bringing it up now to smooth the
progress. I’m trying to put together a general algorithm that starts at
the landing pad instruction and groups the subsequent instructions as
cleanup code or parts of catch handlers. This is easy enough to do as a
human reading the code, but the way that I’m doing so seems to rely fairly
heavily on the names of symbols and labels.

Look at lib/Transforms/Utils/CloneFunction.cpp. Most of that code should be
factored appropriately and reused. It uses a ValueMapping that we should be
able to apply to the landing pad instruction to map the ehselector.slot to
a constant, and propagating that through.

For instance, following the landingpad instruction I expect to find an
extract and store of “exn.slot” and “ehselector.slot” then everything
between that and wherever the catch dispatch begins must be (I think)
cleanup code. The catch handlers I’m identifying as a sequence that starts
with a load of “exn.slot” and a call to __cxa_begin_catch and continues
until it reaches a call to __cxa_end_catch.

I think we'll have to intrinsic-ify __cxa_end_catch when targeting
*-windows-msvc to get this right. If we don't, exception rethrows will
probably not work. We don't really need an equivalent of __cxa_begin_catch
because there's no thread-local EH state to update, it's already managed by
the caller of the catch handler.

The calls to begin/end catch are pretty convenient bookends, but
identifying the catch dispatch code and pairing catch handlers with the
clauses they represent seems to depend on recognizing the pattern of
loading the ehselector, getting a typeid then comparing and branching. I
suppose that will work, but it feels a bit brittle. Then there’s the
cleanup code, which I’m not yet convinced has a consistent location
relative to the catch dispatching and I fear may be moved around by various
optimizations before the outlining and will potentially be partially shared
with cleanup for other landing pads.

We either have to pattern match the selector == typeid pattern in the EH
preparation pass, or come up with a new representation. I'm hesitant to add
a new EH representation that only MSVC compatible EH uses, because it will
probably trip up existing optimizations. I was hoping that something like
the pruning logic in "llvm::CloneAndPruneFunctionInto" would allow us to
prune the selector comparison branches reliably.

Hi Reid,

I've been working on the outlining code and have a prototype that produces what I want for a simple case.

Now I'm thinking about the heuristics for recognizing the various logical pieces for C++ exception handling code and removing them once they’ve been cloned. I've been working from various comments you've made earlier in this thread, and I'd like to run something by you to make sure we're on the same page.

Starting from a C++ function that looks like this:

void do_some_thing(int &i)
{
  Outer outer;
  try {
    Middle middle;
    if (i == 1) {
        do_thing_one();
    }
    else {
        Inner inner;
        do_thing_two();
    }
  }
  catch (int en) {
    i = -1;
  }
}

I'll have IR that looks more or less like this:

; Function Attrs: uwtable
define void @_Z13do_some_thingRi(i32* dereferenceable(4) %i) #0 {
entry:
  %i.addr = alloca i32*, align 8
  %outer = alloca %class.Outer, align 1
  %middle = alloca %class.Middle, align 1
  %exn.slot = alloca i8*
  %ehselector.slot = alloca i32
  %inner = alloca %class.Inner, align 1
  %en = alloca i32, align 4
  store i32* %i, i32** %i.addr, align 8
  call void @_ZN5OuterC1Ev(%class.Outer* %outer)
  invoke void @_ZN6MiddleC1Ev(%class.Middle* %middle)
          to label %invoke.cont unwind label %lpad

invoke.cont: ; preds = %entry
  %0 = load i32** %i.addr, align 8
  %1 = load i32* %0, align 4
  %cmp = icmp eq i32 %1, 1
  br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %invoke.cont
  invoke void @_Z12do_thing_onev()
          to label %invoke.cont2 unwind label %lpad1

invoke.cont2: ; preds = %if.then
  br label %if.end

; From 'entry' invoke of Middle constructor
; outer needs post-catch cleanup
lpad: ; preds = %if.end, %entry
  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %3 = extractvalue { i8*, i32 } %2, 0
  store i8* %3, i8** %exn.slot
  %4 = extractvalue { i8*, i32 } %2, 1
  store i32 %4, i32* %ehselector.slot
  ; No pre-catch cleanup for this landingpad
  br label %catch.dispatch

; From 'if.then' invoke of do_thing_one()
; Or from 'if.else' invoke of Inner constructor
; Or from 'invoke.cont5 invoke of Inner destructor
; middle needs pre-catch cleanup
; outer needs post-catch cleanup
lpad1: ; preds = %invoke.cont5, %if.else, %if.then
  %5 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %6 = extractvalue { i8*, i32 } %5, 0
  store i8* %6, i8** %exn.slot
  %7 = extractvalue { i8*, i32 } %5, 1
  store i32 %7, i32* %ehselector.slot
  ; Branch to shared label to do pre-catch cleanup
  br label %ehcleanup

if.else: ; preds = %invoke.cont
  invoke void @_ZN5InnerC1Ev(%class.Inner* %inner)
          to label %invoke.cont3 unwind label %lpad1

invoke.cont3: ; preds = %if.else
  invoke void @_Z12do_thing_twov()
          to label %invoke.cont5 unwind label %lpad4

invoke.cont5: ; preds = %invoke.cont3
  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)
          to label %invoke.cont6 unwind label %lpad1

invoke.cont6: ; preds = %invoke.cont5
  br label %if.end

; From 'invoke.cont3' invoke of do_something_two()
; middle and inner need pre-catch cleanup
; outer needs post-catch cleanup
lpad4: ; preds = %invoke.cont3
  %8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %9 = extractvalue { i8*, i32 } %8, 0
  store i8* %9, i8** %exn.slot
  %10 = extractvalue { i8*, i32 } %8, 1
  store i32 %10, i32* %ehselector.slot
  ; Pre-catch cleanup begins here, but will continue at ehcleanup
  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)
          to label %invoke.cont7 unwind label %terminate.lpad

invoke.cont7: ; preds = %lpad4
  br label %ehcleanup

if.end: ; preds = %invoke.cont6, %invoke.cont2
  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)
          to label %invoke.cont8 unwind label %lpad

invoke.cont8: ; preds = %if.end
  br label %try.cont

; Pre-catch cleanup for lpad1
; Continuation of pre-catch cleanup for lpad4
ehcleanup: ; preds = %invoke.cont7, %lpad1
  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)
          to label %invoke.cont9 unwind label %terminate.lpad

invoke.cont9: ; preds = %ehcleanup
  br label %catch.dispatch

; Catch dispatch for lpad, lpad1 and lpad4
catch.dispatch: ; preds = %invoke.cont9, %lpad
  %sel = load i32* %ehselector.slot
  %11 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*)) #4
  %matches = icmp eq i32 %sel, %11
  br i1 %matches, label %catch, label %ehcleanup10

catch: ; preds = %catch.dispatch
  %exn = load i8** %exn.slot
  %12 = call i8* @__cxa_begin_catch(i8* %exn) #4
  %13 = bitcast i8* %12 to i32*
  %14 = load i32* %13, align 4
  store i32 %14, i32* %en, align 4
  %15 = load i32** %i.addr, align 8
  store i32 -1, i32* %15, align 4
  call void @__cxa_end_catch() #4
  br label %try.cont

try.cont: ; preds = %catch, %invoke.cont8
  call void @_ZN5OuterD1Ev(%class.Outer* %outer)
  ret void

; Post catch cleanup for lpad, lpad1
ehcleanup10: ; preds = %catch.dispatch
  invoke void @_ZN5OuterD1Ev(%class.Outer* %outer)
          to label %invoke.cont11 unwind label %terminate.lpad

invoke.cont11: ; preds = %ehcleanup10
  br label %eh.resume

eh.resume: ; preds = %invoke.cont11
  %exn12 = load i8** %exn.slot
  %sel13 = load i32* %ehselector.slot
  %lpad.val = insertvalue { i8*, i32 } undef, i8* %exn12, 0
  %lpad.val14 = insertvalue { i8*, i32 } %lpad.val, i32 %sel13, 1
  resume { i8*, i32 } %lpad.val14

terminate.lpad: ; preds = %ehcleanup10, %ehcleanup, %lpad4
  %16 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* null
  %17 = extractvalue { i8*, i32 } %16, 0
  call void @__clang_call_terminate(i8* %17) #5
  unreachable
}

If I've understood your intentions correctly, we'll have an outlining pass that transforms the above IR to this:

%struct.do_some_thing.captureblock = type { %class.Outer, %class.Middle, %class.Inner, %i32* }

; Uncaught exception cleanup for lpad, lpad1 and lpad4
define void @do_some_thing_cleanup0(i8* %eh_ptrs, i8* %rbp) #0 {
entry:
  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)
  %outer = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 0
  invoke void @_ZN5OuterD1Ev(%class.Outer* %outer)
          to label %invoke.cont unwind label %terminate.lpad

invoke.cont:
  ret void

terminate.lpad: ; preds = %ehcleanup10, %ehcleanup, %lpad4
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* null
  %1 = extractvalue { i8*, i32 } %0, 0
  call void @__clang_call_terminate(i8* %1) #5
  unreachable
}

; Catch handler for _ZTIi
define i8* @do_some_thing_catch0(i8* %eh_ptrs, i8* %rbp) #0 {
entry:
  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)
  %i.addr = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 4
  %1 = load i32** %i.addr, align 8
  store i32 -1, i32* %1, align 4
  ret i8* blockaddress(@_Z13do_some_thingRi, %try.cont)
}

; Outlined pre-catch cleanup handler for lpad1
define void @do_some_thing_cleanup1(i8* %eh_ptrs, i8* %rbp) #0 {
entry:
  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi, %rbp)
  ; Outlined from 'ehcleanup'
  %middle = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 1
  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)
          to label %invoke.cont unwind label %terminate.lpad

invoke.cont:
  ret void

terminate.lpad: ; preds = %ehcleanup10, %ehcleanup, %lpad4
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* null
  %1 = extractvalue { i8*, i32 } %0, 0
  call void @__clang_call_terminate(i8* %1) #5
  unreachable
}

; Outlined pre-catch cleanup handler for 'lpad4'
define void @do_some_thing_cleanup2(i8* %eh_ptrs, i8* %rbp) #0 {
entry:
  %capture.block = call @llvm.eh.get_capture_block(@_Z13do_some_thingRi , %rbp)
  ; Outlined from 'lpad4'
  %inner = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 2
  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)
          to label %invoke.cont unwind label %terminate.lpad

invoke.cont: ; preds = %entry
  ; Outlined from 'ehcleanup'
  %middle = getelementptr inbounds %struct.do_some_this.captureblock* %capture.block, i32 0, i32 1
  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)
          to label %invoke.cont1 unwind label %terminate.lpad

invoke.cont1:
  ret void

terminate.lpad: ; preds = %ehcleanup10, %ehcleanup, %lpad4
  %0 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          catch i8* null
  %1 = extractvalue { i8*, i32 } %0, 0
  call void @__clang_call_terminate(i8* %1) #5
  unreachable
}

; Function Attrs: uwtable
define void @_Z13do_some_thingRi(i32* dereferenceable(4) %i) #0 {
entry:
  %capture.block = alloca %struct.do_some_thing.capture.block, align 1
  %i_addr = getelementptr inbounds %struct.do_some_thing_capture_block* %capture_block, i32 0, i32 3
  store i32* %i, i32** %i_addr, align 8
  llvm.eh.set_capture_block
  %eh.cont.label = alloca i8*
  %en = alloca i32, align 4
  store i32* %i, i32** %i.addr, align 8
  %outer = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 0
  call void @_ZN5OuterC1Ev(%class.Outer* %outer)
  %middle = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 1
  invoke void @_ZN6MiddleC1Ev(%class.Middle* %middle)
          to label %invoke.cont unwind label %lpad

invoke.cont: ; preds = %entry
  %0 = load i32** %i.addr, align 8
  %1 = load i32* %0, align 4
  %cmp = icmp eq i32 %1, 1
  br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %invoke.cont
  invoke void @_Z12do_thing_onev()
          to label %invoke.cont2 unwind label %lpad1

invoke.cont2: ; preds = %if.then
  br label %if.end

; From 'entry' invoke of Middle constructor
; outer needs post-catch cleanup
lpad: ; preds = %if.end, %entry
  %2 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(
      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,
      void (i8*, i8*)* @do_some_thing_cleanup0)
  indirectbr i8* %eh.cont.label

; From 'if.then' invoke of do_thing_one()
; Or from 'if.else' invoke of Inner constructor
; Or from 'invoke.cont5 invoke of Inner destructor
; middle needs pre-catch cleanup
; outer needs post-catch cleanup
lpad1: ; preds = %invoke.cont5, %if.else, %if.then
  %5 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(
      void (i8*, i8*)* @do_some_thing_cleanup1,
      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,
      void (i8*, i8*)* @do_some_thing_cleanup0)
  indirectbr i8* %eh.cont.label

if.else: ; preds = %invoke.cont
  %inner = getelementptr inbounds %struct.do_some_thing.capture.block* %capture.block, i32 0, i32 2
  invoke void @_ZN5InnerC1Ev(%class.Inner* %inner)
          to label %invoke.cont3 unwind label %lpad1

invoke.cont3: ; preds = %if.else
  invoke void @_Z12do_thing_twov()
          to label %invoke.cont5 unwind label %lpad4

invoke.cont5: ; preds = %invoke.cont3
  invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)
          to label %invoke.cont6 unwind label %lpad1

invoke.cont6: ; preds = %invoke.cont5
  br label %if.end

; From 'invoke.cont3' invoke of do_something_two()
; middle and inner need pre-catch cleanup
; outer needs post-catch cleanup
lpad4: ; preds = %invoke.cont3
  %8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
          cleanup
          catch i8* bitcast (i8** @_ZTIi to i8*)
  %eh.cont.label = call i8* (...)* @llvm.eh.outlined_handlers(
      void (i8*, i8*)* @do_some_thing_cleanup2,
      i8* @_ZTIi, i8* (i8*, i8*)* @do_some_thing_catch0,
      void (i8*, i8*)* @do_some_thing_cleanup0)
  indirectbr i8* %eh.cont.label

if.end: ; preds = %invoke.cont6, %invoke.cont2
  invoke void @_ZN6MiddleD1Ev(%class.Middle* %middle)
          to label %invoke.cont8 unwind label %lpad

invoke.cont8: ; preds = %if.end
  br label %try.cont

try.cont: ; preds = %catch, %invoke.cont8
  call void @_ZN5OuterD1Ev(%class.Outer* %outer)
  ret void
}

Does that look about like what you’d expect?

I just have a few questions.

I'm pretty much just guessing at how you intended the llvm.eh.set_capture_block intrinsic to work. It wasn't clear to me if I just needed to set it where the structure was created or if it would need to be set anywhere an exception might be thrown. The answer is probably related to my next question.

In the above example I created a single capture block for the entire function. That works reasonably well for a simple case like this and corresponds to the co-location of the allocas in the original IR, but for functions with more complex structures and multiple try blocks it could get ugly. Do you have ideas for how to handle that?

For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions. I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?

I didn't do that in the mock-up above, but it seems like it would simplify things. Basically, I'm imagining a final pattern that looks like this:

lpad:
  %eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__CxxFrameHandler3 to i8*)
      cleanup
      catch i8* @typeid1
      catch i8* @typeid2
      ...
  %label = call i8* (...)* @llvm.eh.outlined_handlers(
      void (i8*, i8*)* @<pre-catch cleanup function>,
      i8* @typeid1, i8* (i8*, i8*)* @<typeid1 catch function>,
      i8* @typeid2, i8* (i8*, i8*)* @<typeid2 catch function>,
      ...
      void (i8*, i8*)* @<uncaught exception cleanup function>)
  indirectbr i8* %label

Finally, how do you see this meshing with SEH? As I understand it, both the exception handlers and the cleanup code in that case execute in the original function context and only the filter handlers need to be outlined. I suppose the outlining pass can look at the personality function and change its behavior accordingly. Is that what you were thinking?

-Andy

   Hi Reid,

I've been working on the outlining code and have a prototype that produces
what I want for a simple case.

Now I'm thinking about the heuristics for recognizing the various logical
pieces for C++ exception handling code and removing them once they’ve been
cloned. I've been working from various comments you've made earlier in
this thread, and I'd like to run something by you to make sure we're on the
same page.

Starting from a C++ function that looks like this:

...

I'll have IR that looks more or less like this:

...

If I've understood your intentions correctly, we'll have an outlining pass
that transforms the above IR to this:

...

Does that look about like what you’d expect?

Yep! That's basically what I had in mind, but I still have concerns with
this model listed below.

We should also think about how to call std::terminate when cleanup dtors
throw. The current representation for Itanium is inefficient. As a
strawman, I propose making @__clang_call_terminate an intrinsic:

  ...
  invoke void @dtor(i8* %this) to label %cont unwind label %terminate.lpad
cont:
  ret void
terminate.lpad:
  landingpad ... catch i8* null
  call void @llvm.eh.terminate()
  unreachable

This would be good for Itanium EH, as we can actually completely elide
table entries for landing pads that just catch-all and terminate.

I just have a few questions.

I'm pretty much just guessing at how you intended the
llvm.eh.set_capture_block intrinsic to work. It wasn't clear to me if I
just needed to set it where the structure was created or if it would need
to be set anywhere an exception might be thrown. The answer is probably
related to my next question.

I was imagining it would be called once in the entry block.

Chandler expressed strong concerns about this design, however, as
@llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you
add this intrinsic, we *have* to do frame layout of @_Z13do_some_thingRi
*before* we can emit code for all the callers of
@llvm.eh.get_capture_block. Today, this is easy, because module order
defines emission order, but in the great glorious future, codegen will
hopefully be parallelized, and then we've inflicted this horrible
constraint on the innocent.

His suggestion to break the ordering dependence was to lock down the frame
offset of the capture block to always be some fixed offset known by the
target (ie ebp - 4 on x86, if we like that).

In the above example I created a single capture block for the entire

function. That works reasonably well for a simple case like this and
corresponds to the co-location of the allocas in the original IR, but for
functions with more complex structures and multiple try blocks it could get
ugly. Do you have ideas for how to handle that?

Not really, it would just get ugly. All allocas used from landing pad code
would get mushed into one allocation. =/

For C++ exception handling, we need cleanup code that executes before the
catch handlers and cleanup code that excutes in the case on uncaught
exceptions. I think both of these need to be outlined for the MSVC
environment. Do you think we need a stub handler to be inserted in cases
where no actual cleanup is performed?

I think it's actually harder than that, once you consider nested trys:
void f() {
  try {
    Outer outer;
    try {
      Inner inner;
      g();
    } catch (int) {
      // ~Inner gets run first
    }
  } catch (float) {
    // ~Inner gets run first
    // ~Outer gets run next
  }
  // uncaught exception? Run ~Inner then ~Outer.
}

It's easy to hit this case after inlining as well.

We'd have to generalize @llvm.eh.outlined_handlers more to handle this
case. However, if we generalize further it starts to perfectly replicate
the landing pad structure, with cleanup, catch, and then we'd want to think
about how to represent filter. Termination on exception spec violation
seems to be unimplemented in MSVC, so we'd need our own personality
function to implement filters, but it'd be good to support them in the IR.

We also have to decide how much code duplication of cleanups we're willing
to tolerate, and whether we want to try to annotate the beginning and end
of cleanups like ~Inner and ~Outer.

I didn't do that in the mock-up above, but it seems like it would simplify
things. Basically, I'm imagining a final pattern that looks like this:

lpad:

  %eh_vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)*
@__CxxFrameHandler3 to i8*)

      cleanup

      catch i8* @typeid1

      catch i8* @typeid2

      ...

  %label = call i8* (...)* @llvm.eh.outlined_handlers(

      void (i8*, i8*)* @<pre-catch cleanup function>,

      i8* @typeid1, i8* (i8*, i8*)* @<typeid1 catch function>,

      i8* @typeid2, i8* (i8*, i8*)* @<typeid2 catch function>,

      ...

      void (i8*, i8*)* @<uncaught exception cleanup function>)

  indirectbr i8* %label

Finally, how do you see this meshing with SEH? As I understand it, both
the exception handlers and the cleanup code in that case execute in the
original function context and only the filter handlers need to be
outlined. I suppose the outlining pass can look at the personality
function and change its behavior accordingly. Is that what you were
thinking?

Pretty much. The outlining pass would behave differently based on the
personality function. SEH cleanups (__finally blocks) actually do need to
get outlined as well as filters, but catches (__except blocks) do not need
to be outlined. That's the main difference. I think it reflects the fact
that you can rethrow a C++ exception, but you can't faithfully "rethrow" a
trap caught by SEH.

We should also think about how to call std::terminate when cleanup dtors throw. The current representation for Itanium is inefficient. As a strawman, I propose making @__clang_call_terminate an intrinsic:

That sounds like a good starting point.

Chandler expressed strong concerns about this design, however, as @llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you add this intrinsic, we have to do frame layout of @_Z13do_some_thingRi before we can emit code for all the callers of @llvm.eh.get_capture_block. Today, this is easy, because module order defines emission order, but in the great glorious future, codegen will hopefully be parallelized, and then we’ve inflicted this horrible constraint on the innocent.

His suggestion to break the ordering dependence was to lock down the frame offset of the capture block to always be some fixed offset known by the target (ie ebp - 4 on x86, if we like that).

Chandler probably has a better feel for this sort of thing than I do. I can’t think of a reason offhand why that wouldn’t work, but it makes me a little nervous.

What would that look like in the IR? Would we use the same intrinsics and just lower them to use the known location?

I’ll think about this, but for now I’m happy to just proceed with the belief that it’s a solvable problem either way.

For C++ exception handling, we need cleanup code that executes before the catch handlers and cleanup code that excutes in the case on uncaught exceptions. I think both of these need to be outlined for the MSVC environment. Do you think we need a stub handler to be inserted in cases where no actual cleanup is performed?

I think it’s actually harder than that, once you consider nested trys:

void f() {

try {

Outer outer;

try {

Inner inner;

g();

} catch (int) {

// ~Inner gets run first
}

} catch (float) {

// ~Inner gets run first

// ~Outer gets run next
}

// uncaught exception? Run ~Inner then ~Outer.
}

I took a look at the IR that’s generated for this example. I see what you mean. So there is potentially cleanup code before and after every catch handler, right?

Do you happen to know offhand what that looks like in the .xdata for the _CxxFrameHandler3 function?

-Andy

> We should also think about how to call std::terminate when cleanup
dtors throw. The current representation for Itanium is inefficient. As a
strawman, I propose making @__clang_call_terminate an intrinsic:

That sounds like a good starting point.

> Chandler expressed strong concerns about this design, however, as
@llvm.eh.get_capture_block adds an ordering constraint on CodeGen. Once you
add this intrinsic, we *have* to do frame layout of @_Z13do_some_thingRi
*before* we can emit code for all the callers of
@llvm.eh.get_capture_block. Today, this is easy, because module order
defines emission order, but in the great glorious future, codegen will
hopefully be parallelized, and then we've inflicted this horrible
constraint on the innocent.

> His suggestion to break the ordering dependence was to lock down the
frame offset of the capture block to always be some fixed offset known by
the target (ie ebp - 4 on x86, if we like that).

Chandler probably has a better feel for this sort of thing than I do. I
can’t think of a reason offhand why that wouldn’t work, but it makes me a
little nervous.

What would that look like in the IR? Would we use the same intrinsics and

just lower them to use the known location?

Chandler seems to be OK with get/set capture block, as long as the codegen
ordering dependence can be removed. I think we can remove it by delaying
the resolution of the frame offset to assembly time using an MCSymbolRef.
It would look a lot like this kind of assembly:

my_handler:
  push %rbp
  mov %rsp, %rbp
  lea Lframe_offset0(%rdx), %rax ; This is now the parent capture block
  ...
  retq

parent_fn:
  push %rbp
  mov %rsp, %rbp
  push %rbx
  push %rdi
  subq $NN, %rsp
Lframe_offset0 = X + 2 * 8 ; Two CSRs plus some offset into the main stack
allocation

I guess I'll try to make that work.

I’ll think about this, but for now I’m happy to just proceed with the

belief that it’s a solvable problem either way.

>> For C++ exception handling, we need cleanup code that executes before
the catch handlers and cleanup code that excutes in the case on uncaught
exceptions. I think both of these need to be outlined for the MSVC
environment. Do you think we need a stub handler to be inserted in cases
where no actual cleanup is performed?

> I think it's actually harder than that, once you consider nested trys:

> void f() {

> try {

> Outer outer;

> try {

> Inner inner;

> g();

> } catch (int) {

> // ~Inner gets run first
> }

> } catch (float) {

> // ~Inner gets run first

> // ~Outer gets run next
> }

> // uncaught exception? Run ~Inner then ~Outer.
> }

I took a look at the IR that’s generated for this example. I see what you
mean. So there is potentially cleanup code before and after every catch
handler, right?

Do you happen to know offhand what that looks like in the .xdata for the
_CxxFrameHandler3 function?

I can't tell how the state tables arrange for the destructors to run in the
right order, but they can accomplish this without duplicating the cleanup
code into the outlined catch handler functions, which is nice.

I think we may be able to address this by emitting calls to start/stop
intrinsics around EH cleanups, but that may inhibit optimizations.

Hi Reid,

Is this design supposed to be able to cope with asynchronous exceptions? I am having trouble imagining how this would work without adding the ability to associate landing pads with scopes in LLVM IR.

Vadim