RFC: Introduce new llvm.dbg.coroframe_entry intrinsic

The Problem:

Instead of using a stack frame, Coroutines (we will use Swift async frames as the running example) store local variables in a heap data structure called the coroutine frame. To describe variables in a coroutine frame we would want the semantics of #dbg_declares instead of #dbg_values, because of the guarantee that there should only be one #dbg_declare for a variable in a function. However, because it is meant to refer to stack frames, the location of a #dbg_declare must be a pointer. That is typically true after the CoroSplitter pass runs, but not before. This causes a dilemma for the Swift frontend when it wants to describe local variables that are expected to get allocated in the coroutine frame in pre-corosplit LLVM IR.

Example:

// test.swift
public func foo(yyyy: Double) async
{
print(yyyy)
}

if I run the command:

swiftc -g -emit-irgen test.swift -O -o -

Which will produce the LLVM IR that the swift compiler generates just before handing it over to LLVM, we can see:

; Function Attrs: noinline
define swifttailcc void @"$s4test3foo4yyyyySd_tYaF"(ptr swiftasync %0, double %1) #1 !dbg !43
{
entry:

#dbg_declare(double %1, !50, !DIExpression(), !52)
...
}

Here, the #dbg_declare contains a location with the type double, however, if we look at the output of the CoroSplitterPass by using the -print-after-all output, we can see

; *** IR Dump After CoroSplitPass on ($s4test3foo4yyyyySd_tYaF) ***
; Function Attrs: noinline
define swifttailcc void @"$s4test3foo4yyyyySd_tYaF"(ptr swiftasync %0, double %1) #1 !dbg !35
{
#dbg_declare(ptr %0, !42, !DIExpression(DW_OP_plus_uconst, 16), !44)
%3 = getelementptr inbounds i8, ptr %0, i32 16
...
7: ; preds = %6
%8 = alloca double, align 8, !dbg !45
#dbg_declare(ptr %8, !42, !DIExpression(DW_OP_deref), !45)
}

We can see that the CoroSplitter pass fixes up the #dbg_declare and replaces the double with an alloca and a ptr

However, with commit 20507a9e95a08069863e9910a688a38370d58952 there is a new check in the verifier that ensures that #dbg_declare always have a location that is a pointer. This causes an immediate crash in the compiler when swift hands off the generated IR to LLVM.

The Solution:

To fix this, we propose a new intrinsic called llvm.dbg.coroframe_entry or #dbg_coroframe_entry. This intrinsic will behave exactly like a #dbg_declare, however, it will not have the restriction of a ptr only location as a #dbg_declare.

The CoroSplit pass will transform any #dbg_coroframe_entry intrinsics just like it did with #dbg_declares, however, crucially, after fixing up the location type, it will also re-emit them as #dbg_declares, so that we don’t have to teach subsequent passes, such as InstCombine, about what a #dbg_coroframe_entry is

Example:

// test.swift
public func foo(yyyy: Double) async
{
print(yyyy)
}

if I run the command:

swiftc -g -emit-irgen test.swift -O -o -

Which will produce the LLVM IR that the swift compiler generates just before handing it over to LLVM, we can see:

; Function Attrs: noinline
define swifttailcc void @"$s4test3foo4yyyyySd_tYaF"(ptr swiftasync %0, double %1) #1 !dbg !43
{
entry:

#dbg_coroframe_entry(double %1, !50, !DIExpression(), !52)
...
}

Here, the #dbg_coroframe_entry contains a location with the type double, but the Verifier does not care, and therefore we have no issues handing off this IR to LLVM

When we look at the -print-after-all output, we can see:

; *** IR Dump After CoroSplitPass on ($s4test3foo4yyyyySd_tYaF) ***
; Function Attrs: noinline
define swifttailcc void @"$s4test3foo4yyyyySd_tYaF"(ptr swiftasync %0, double %1) #1 !dbg !35
{
#dbg_declare(ptr %0, !42, !DIExpression(DW_OP_plus_uconst, 16), !44)
%3 = getelementptr inbounds i8, ptr %0, i32 16
...
7: ; preds = %6
%8 = alloca double, align 8, !dbg !45
#dbg_declare(ptr %8, !42, !DIExpression(DW_OP_deref), !45)
}

The CoroSplitter pass has transformed the #dbg_coroframe_entry into #dbg_declares and also fixed up the double location to a ptr.

TL;DR:

introduce a new dbg_coroframe_entry intrinsic to communicate local variable storage between a frontend and the CoroSplitter pass. The CoroSplitter spills local variables onto a heap data structure and lowers each dbg_coroframe_entry into a dbg_declare.

@adrian.prantl @jmorse @dblaikie

Being the one who encouraged this proposal in the first place I’m supportive of this, but I’m curious if there are any concerns about adding a new intrinsic to unambiguously describe Coroutine frame entries specifically!

Sounds plausible - but I’ll certainly leave it to @jmorse and the Sony folks who’ve been doing all the heavy lifting for variable locations lately to weigh in on this.

I’ll admit my coroutine knowledge is limited, but this certainly sounds like an area that’s poorly catered for by the existing facilities. To check my understanding, I’m right in thinking:

  • Swift will describe variable values with #dbg_declare,
  • The coroutine splitter will then spill the variable value on a new alloca, and attach the #dbg_declare to the alloca.

Which is certainly different to anything else in LLVM. A few questions would be:

  • Does the coroutine splitter create storage according to the structure of the debug-info – i.e., if there are multiple assignments to a source variable, will they always end up in the same alloca?
  • You mention the need for #dbg_declare semantics, i.e. that there’s one location per source variable: is this a Swift restriction, or an artifact of how LLVM manages debug-info? It seems to be the main reason why you can’t express this all in dbg.values.
  • Are there any variables that /shouldn’t/ be spilt to the stack by corosplitter?

My reflex (driven by parsimony) is to think that this could be described with @OCHyams ā€˜s assignment-tracking work, but I don’t know if it’s a good fit. Each dbg.assign is a dbg.value that has an extra, semi-optional alloca argument specifying the variables location, and as optimisations remove storage and stores we get to track whether variables should be on the stack or an SSA Value. The coroutine splitter could be consider that in reverse: attaching storage to dbg.assigns that don’t have any. (Implementing it could be a mess though!).

Assuming that’s not feasible, would a dbg.coroframe_entry need any special handling when it comes to salvaging, and are there any further limits on what it could refer to? In an ideal world we could treat it as ā€œa dbg.value that we expect to get shifted to the stackā€ and not have any further special handling.

I’m not much into debug info, nor into Swift coroutines, so I can’t comment on the overall approach:

Are there any variables that /shouldn’t/ be spilt to the stack by corosplitter?

I don’t know if this is any different for Swift, but for non-swift coroutines,
the CoroSplitter only spills variables that cross a suspend point. I.e. after splitting end up being written in one function call and read in another. Variables that are only defined and read afterwards in a single part (after splitting) are not spilled and stay SSA values.

Which brings me to ask: How can Swift know beforehand if a value will be spilled or not?
There can be any optimizations happening in between, hoisting/sinking/… that change the live-ranges of variables and change if they need to be spilled or not.
(I guess in the example it may always spill because it’s an argument? Introducing something just for Swift coroutine arguments sounds very special-cased.)

You are correct that CoroSplitter only spills variables that cross suspend points. In the scheme @rastogishubham described the frontend is not expected to know about which variables these are, it would just describe all variables in a coroutine with a dbg.coroframe_entry. This allows the CoroSplitter to lower all dbg_coroframe_entrys into either a dbg.declare(in case of a spill) or, e.g., a dbg.value (for variables with scope limited to one coroutine fragment).

2 Likes

Yes.

Assuming I understand the question, I think the answer is yes.

If we used dbg.values then the frontend would need to inject them into all basic blocks dominated by definition, which is a lot more work than just inserting a single dbg.declare/coroframe_entry per variable. But more importantly, the semantics are different: A dbg.value starts a new range and thus can’t necessarily be lowered into a dbg.declare which covers the entire function. The Swift frontend, may perform some optimizations at the SIL (Swift intermediate language) layer, so the code coming out of the frontend may already contain other dbg.value intrinsics, where this distinction matters.

Because the dbg.coroframe_entry intrinsics only exists between frontend and CoroSplitter (which is run very early in the pipeline, they don’t need any special handling.

Why is this a problem specifically for Swift, and not C++? Is the difference just that clang creates allocas for all arguments? Could Swift create allocas for arguments, and just use dbg_declare?

1 Like

I think clang doesn’t create dbg_declares with non-pointers, however Swift does, that is why it is a problem unique to Swift

It sounds like the root difference here is the need for dbg.declare semantics, while dealing with a non-pointer variable value. With that in mind, using a new record / intrinsic type makes sense to describe a gap that exists.

It’d be even better if we could further formalise what this record means beyond ā€œto be consumed by coro-splitā€. If we do ever come up with a better way of representing it in the future, I doubt it’ll be especially hard to convert to an even-newer record / intrinsic.

I guess the key difference between #dbg_declare and #dbg_value for Swift is that the declare variant says ā€œthe value of this variable is SSA value %x, always and forever, in all subsequent basic blocks, and perhaps the whole functionā€. Adrian put it this way:

Rather than inventing a specific dbg record kind for coroutines, I think there’s probably room for a #dbg_declare_value record or a value boolean stapled onto a #dbg_declare record. I think being explicit is valuable: it’s very helpful for the DWARF location emitter to know if it is dealing with an SSA variable value (like this double argument) or a memory location that can be updated (like an alloca, which is what we always use in Clang).

We could, for example, use this to improve Clang -O0 codegen, by skipping allocas for non-address-taken scalar types, and using the pre-allocated stack memory for register parameters.

(likely relevant to Carbon too, which at the moment uses direct SSA values for let (essentially local constant) declarations, skipping the allocas)

Thanks for the feedback @dblaikie @rnk and @jmorse

I think making it more generic as @rnk suggested sounds like a good idea! I can put up a patch that keeps the new intrinsic the same, but has a different, more generic name