[RFC] Debug info for coroutine suspension locations

Hi,

I am trying to improve the debugging experience for C++20 coroutines. In particular, I want to get the exact location where a coroutine is suspended by inspecting a std::coroutine_handle in the debugger. Currently, I can already get the id of the current suspension point id by printing the coroutine frame (see the example in our docs). However, the suspension point id __coro_index is only displayed as an integer.

There is currently no good way to map this compiler-internal id back to a source location. One previously proposed approach is to keep track of the suspension point explicitly by using std::source_location. However, this approach requires changes to the coroutine types and will be cumbersome for library-defined coroutine types like std::generator.

My proposal

In my mental model, the suspension point id is not a plain integer but rather an enum: Only a subset of the integer values are valid, and each of those integer values has an associated meaning, representing a certain point at which the coroutine was suspended.

If I follow this line of thought and represent the suspension point id as the enum

enum suspension_point {
      line_32_col_4,
      line_45_col_8,
      line_50_col_4,
      final_suspend
};

lldb and gdb print the coroutine frame as

$1 = {
  __resume_fn = 0x555555555940 <test(int&)>,
  __destroy_fn = 0x555555555f10 <test(int&)>,
  __promise = {<No data fields>},
  __suspension_point = __suspension_point::line_45_col_8,
  ...

instead of

$1 = {
  __resume_fn = 0x555555555940 <test(int&)>,
  __destroy_fn = 0x555555555f10 <test(int&)>,
  __promise = {<No data fields>},
  __suspension_point = 1 '001',
  ...

Note how the value of __suspension_point directly tells me where my coroutine was suspended.

A draft implementation can be found at ⚙ D132240 [Coroutine][Debug] Add line and column number to suspension point id

One downside of this approach (already brought up by @ChuanqiXu in the review): Encoding the source code location as enum values will store them as strings in the debug info, and thereby lead to a larger debug size compared to an normal, integer encoding of line/column numbers.

What do you think about this approach? Is there a better way to associate the suspension point ids with source code locations? Are we fine with the size increase which the additional strings will cause?

I think the best way – though I have no idea how hard this would be to implement – would be to compute and store a mapping of suspension-point-id to instruction pointer (pointing to the instruction that stores the id into the __suspension_point field, perhaps?) into the debug info somewhere.

Once you get an instruction pointer, you can get a file/line using the normal line-mapping table.

compute and store a mapping of suspension-point-id to instruction pointer

Is there already a mechanism in DWARF and/or lldb/gdb which could decode such integer-to-instruction mappings? Or are you proposing that we come up with a new “DWARF extension” here? If so, how can I add custom data to debug info? (Sorry if that’s obvious - I didn’t work with DWARF/debug data before and couldn’t find anything useful in DIBuilder)

The .debug_line section is one large map of PC address → source location. If the suspension point has a valid PC address and it has source location, you could store the PC and then use a data formatter in the debugger to translate that back to a source location.

“store the PC” here, I guess would be in the DWARF, though, yeah? I guess maybe you could add an extra attribute (DW_AT_low_pc, say) on to an artificial suspension_point enum debug info type?

But this would require a bunch of extensions to LLVM’s debug info metadata and debug info emission.

Not sure if there’s a tidy way to do this without some extension.

I’m not a DWARF expert, but it seems only right that coroutines will need some DWARF and debugger changes to enable inspecting the state of a suspended coroutine.

I don’t think the right place to start is actually “let’s make p __coro_frame prettier”. If we’re going to support introspection of coroutine frames in a suspended state, it should be able to integrate properly into the debugger – along the same lines as introspecting frames up your stack.

seems fair

“store the PC” here, I guess would be in the DWARF, though, yeah?

Do you mean using a DWARF expression to map the suspension point id to the corresponding suspension point id? Probably through some clever combination of DW_OP_eq, DW_OP_bra and DW_OP_addrx?

If we’re going to support introspection of coroutine frames in a suspended state, it should be able to integrate properly into the debugger – along the same lines as introspecting frames up your stack.

I agree, in general. But most other debugging headaches for coroutines can be solved using debugger scripts (such as the script shard by @ChuanqiXu. Mapping the suspension point id back to a source location however cannot currently be done from a script.

Given that coroutines are still rather new and afaik the best practices around debugging coroutines are still unclear, I think for the time being, it makes most sense to provide the bare minimum inside the compiler/debugger itself. With this minimal information in place, people can experiment using debugging scripts. And as soon as best practices on coroutine debugging emerge, we can integrate those more deeply, directly into the debugger.

I wouldn’t think there would be novel DWARF expressions involved, though I guess it could be - rendering the __suspension_point as an actual address instead of an integer/enum - yeah, you could give that a go with some non-trivial expression generated in the frontend without the need for any changes to the IR metadata format, etc (/maybe/ - I’d guess we don’t have any/good support for freeform addresses in dwarf expressions, though & I guess there’s ). But what I was picturing was an enum with an extra attribute - numeric values for the different suspension points and a DW_AT_low_pc for each address of the suspension points.

Thanks for raising this topic again.

I understand how this path leads to something with less fallout (in terms of requirements towards e.g. gdb/lldb). I still maintain though that in the end debugging coroutines, making input arguments, locals and references to parent coroutines (e.g. PC at coroutine call time) visible should be supported in some way by the debugger too. I just can’t see how compiler can adapt coroutine code generation to such a degree that debuggers wont need any changes.

In the simplest case it could be an agreement on what’s the name of the synthesized local which is the link to the parent coroutine / coroutine frame / etc. On the debugger side, potentially only something like a visualizer (similar to visualizing STL types) is needed to support showing this information. Eventually it could become a standard visualizer.

For my mental models, we could do this in two steps:

(1) Generate the debug information (a map from coro_index to PC addresses) in the LLVM part.

After we made this, we’re able to get what we want in debugging scripts. From the perspective of a user, I feel like a debugging scripts is really useful and powerful to solve the actual problems. So that we could be more patient to move this ability into debugger (gdb/lldb).

(2) Support the analysis in the debugger.

So that people could get the information without the debugging scripts.


And for the how-to-do part, since coro_index is zero-indexed and compile-time-known constants, we could generate the map from coro_index to PC addresses as an array of PC addresses for each coroutine. For example, we could generate information like:

!0 = {.... !1} ; !0 is the debug information for a coroutine function.
!1 = {!2, !3, !4, !5, !6} ; !2 is the initial suspend, !6 is the final suspend. And !3, !4, !5 is the middle suspend points;
!2 = ...
!3 = ...
!4 = ...
!5 = ...
!6 = ...

Then if we can get the line number from gdb commands, we could wrap it into the debugging scripts.

I’m not sure if there is any blocking issue. But I think it worth a try.


@dmitryduka it looks like you’re talking about a bigger topic - for the generalized debugging support for coroutines. From my using experiences, we could get all the information we want by adjusting the library side. And for the suggestion of visualizer, from my understanding, I think we can’t do such things. Since the C++20 Coroutines are a pretty low level abstract and what the user uses in the end is the user defined coroutine types/classes. And such coroutine types/classes are in the same level with other STL types. Or in another word, it makes sense to add a visualizer for std::generator but it doesn’t make sense to add a such a visualizer for any coroutines.

Thinking about the coro_index a bit more, I realized that it should actually serve two purposes:

  1. indicate where the coroutine was suspended. The debugger should be able to map the coro_index to a source location
  2. indicate how to interpret the coroutine frame. In optimized builds, the way to interpret the coroutine frame differs between the suspension points. The debugger should be able to still inspect the coroutine state correctly, though.

Interestingly, solving the 2nd problem also provides a nice solution for the 1st bullet point:

Under the hood, a coroutine frame is represented as

struct internal_coro_data {
    int suspension_point_id;
    union U1 {
        struct S1 {
            int a;
        } suspension_point1;
        struct S2 {
            double b;
        } suspension_point2;
    } data;
};

Currently we don’t encode this detailed per-suspension-point coroutine frame layout information. As soon as we encode this detailed layout, we should be able to inspect the coroutine frame more nicely. Probably, we would use a DWARF variant instead of a DWARF union here - not sure, yet. I will have to experiment with this a bit more.

With the detailed layout, we have a natural point where we can attach our suspension point locations: E.g., the struct S1 currently has the debug info (see Compiler Explorer):

!21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: “S1”, scope: !18, file: !10, line: 4, size: 32, flags: DIFlagTypePassByValue, elements: !22, identifier: “_ZTSN18internal_coro_data2U12S1E”)

Note how line points to the line where this struct was defined. For coroutines, we can simply point this line information to the co_await corresponding to this suspension point.


And for the suggestion of visualizer, from my understanding, I think we can’t do such things. Since the C++20 Coroutines are a pretty low level abstract and what the user uses in the end is the user defined coroutine types/classes. And such coroutine types/classes are in the same level with other STL types. Or in another word, it makes sense to add a visualizer for std::generator but it doesn’t make sense to add a such a visualizer for any coroutines.

I think we can at least add a visualizer for std::coroutine_handle. This visualizer could, e.g., expose the __resume and __destroy functions as well as the contained promise. Given that most user-defined coroutine types store a std::coroutine_handle as a member, this would already allow me to drill into the coroutine type more deeply…

Thinking about the coro_index a bit more, I realized that it should actually serve two purposes:

  1. indicate where the coroutine was suspended. The debugger should be able to map the coro_index to a source location
  2. indicate how to interpret the coroutine frame. In optimized builds, the way to interpret the coroutine frame differs between the suspension points. The debugger should be able to still inspect the coroutine state correctly, though.

hmmm, I don’t get this point. Do you describe the status quo? Or do you state the proposal? Currently I don’t think we can judge if coroutine is suspended by coro_index and I don’t understand the meaning of tp interpret the coroutine frame.


I think we can at least add a visualizer for std::coroutine_handle . This visualizer could, e.g., expose the __resume and __destroy functions as well as the contained promise .

It should be a good idea.

I don’t get this point. Do you describe the status quo? Or do you state the proposal?

The structure layout I am describing is the status quo. However this layout is currently not represented in the debug info. The proposal is to actually add a description of the coroutine frame layout to the debug info, and then piggy-back the information about the suspension point locations onto that struct info.

This struct layout is currently determined in FrameTypeBuilder::addFieldForAllocas (see https://github.com/llvm/llvm-project/blob/9f6cb3e9fdb4f5255f78d77c0a537dc3cb50dc9d/llvm/lib/Transforms/Coroutines/CoroFrame.cpp#L595, assuming OptimizeFrame is set to true). The same slot inside the coroutine frame can be reused to store different variables, depending on which suspension point we are at.

The proposal now is to:

  1. encode the actual information on how the coroutine frame was packed/which slots were reused for what. Afaict, the natural choice to expose such reuse of the same memory to hold different data is a [DWARF union type] (LLVM: llvm::DIBuilder Class Reference).
  2. after we have the coroutine frame layout completely exposed to the debugger, we can reuse the line numbers which DWARF associates with the struct/union type definitions. We can point those line numbers to the location of our suspension points.

I think we can at least add a visualizer for std::coroutine_handle . This visualizer could, e.g., expose the __resume and __destroy functions as well as the contained promise .

It should be a good idea.

See ⚙ D132415 [LLDB] Add data formatter for std::coroutine_handle

I see. So we can’t observe still if the coroutine is suspended or not, can we?

And I don’t understand still how could we solve the line number problems by making the coroutine frame layout DWARF better.

Although the reusing slot optimization is a problem for debugging, I can’t see the relationship between it and the line number problems. Maybe a draft could be helpful for understanding.

how could we solve the line number problems by making the coroutine frame layout DWARF better.

Note that in the struct

struct internal_coro_data {
    int suspension_point_id;
    union {
        struct {
            int a;
        } suspension_point1;
        struct {
            double b;
        } suspension_point2;
    } data;
};

each of the members has some associated line information (see Compiler Explorer):

!20 = !DIDerivedType(tag: DW_TAG_member, name: “suspension_point1”, scope: !18, file: !10, line: 6, baseType: !21, size: 32)
!24 = !DIDerivedType(tag: DW_TAG_member, name: “suspension_point2”, scope: !18, file: !10, line: 9, baseType: !25, size: 64)

For coroutines, it would be quite natural to set those line numbers to point to the corresponding co_await which introduced the suspension point

I see. So we can’t observe still if the coroutine is suspended or not, can we?

Right, we can’t observe that. This would require additional information (e.g., resetting the coro_index to ~0 everytime we resume a coroutine. I think this might be worthwhile for debug builds…

Yeah, the type declaration has line information. But it is different from the awaited line information. I still don’t understand the new proposal. What do you mean about set those line numbers to point to the corresponding? Are you talking about creating new union types and make the location of each sub fields of the union type refer to the different suspend points? If yes, I feel like it is a little bit far from what we want. Since it looks like helpful for debugging merged variables in coroutines and we can solve the problem by a simpler way. Or do I misunderstand your idea?


Right, we can’t observe that. This would require additional information (e.g., resetting the coro_index to ~0 everytime we resume a coroutine. I think this might be worthwhile for debug builds…

From my using experience for coroutines, it looks not so helpful to see if a coroutine function suspends or not. Every time we suspect a coroutine hangs, we would find the root cause is the executor or the IO. Also, if we want to know a coroutine is running, we could get it by observing all the threads and see the running function is the specific coroutine or not.

And it is bad that the behavior of debug builds are different from release builds (if we want, we run sanitizers). So I may not want to pay additional cost to observe if a coroutine is suspended or not although it is really attracting. (I thought this several times before but I just found that it is not so useful)

After some experience with LLDB, I am now also convinced that we need a way to map to an actual program address and not only a line number. I need the address, e.g., for the “step-through” function which should step from a coroutine_handle.resume() call directly into the coroutine, skipping all the internally generated dispatching logic.

I came over the undocumented llvm.dbg.label instrinsic which emits DW_TAG_label debug info. One new alternative approach could be: for every coroutine resumption point, emit a label with a well-known name. E.g., __coro_resume1, __coro_resume_2, and so on. What would you think about this approach? Afaict, this would not require any DWARF extensions, it would only require a naming convention, similar to the existing naming convention, that the promise is exposed under the name __promise.

For the time being, I will focus more on general coroutine debugging (improving the pretty printer for std::coroutine_handle in LLDB, exposing more of the internal state, …). If you have any other additional ideas around decoding the resumption point id in the meantime, please let me know :slight_smile: