RFC: Add DWARF support for yaml2obj

Hi folks,

I am going to implement DWARF support for yaml2obj. I really
appreciate it that many of you have given me a lot of useful comments
and suggestions in the previous thread [1]. I've had some offline
discussions with James and done some updates to the previous proposal.

This proposal addresses the issue of properly describing the DWARF
sections in YAML both at a high level and at a low level. We add some
extra value types (Tag/Value/Address/String) in the "debug_info"
entry, which helps describe the DWARF sections at a high level by
providing the logical structure of the DIEs. "yaml2obj" will traverse
these DIEs and emit other debug sections properly. We are also able to
hand-craft debug sections at a low level by hardcoding the contents,
offsets, values, etc.

Here's the proposal [2] which gives a brief introduction to the DWARF
YAML. Any thoughts on that? Thanks in advance!

I think the example looks like it would be really useful for many categories of testcases!
Will it still be possible to manually specify the .debug_abbrev section when this is desired after you are done?

-- adrian

Yes, I think it works. There are two ways to edit the .debug_abbrev section.

i) Edit the "Attr:" and "Form:" entries of a DIE in the ".debug_info"
section. This controls the generation of the ".debug_abbrev" section
at a high level since "yaml2obj" generates the ".debug_abbrev" section
according to the contents from debug information entries.

ii) Edit the ".debug_abbrev" directly. This controls the generation of
the ".debug_abbrev" section at a low level. We will have to hardcode
the tag and attributes for each DIE. Editing the section in this way
doesn't need a "debug_info" entry in the "DWARF".

Additionally, if we have the "debug_info" entry and the "debug_abbrev"
entry at the same time, the latter one will overwrite the tags and
attributes generated by the former one.

Does it make sense?

Hello Xing,

I think the proposal looks very useful. I think it will be fairly tricky
to get all of the details right though. There is a lot of "inferring"
going on there, and getting that to work reliably and with predictable
results will need careful consideration.

Some ideas/questions I had while looking this over:
- It seems like it would be useful to be able to (symbolically) refer to
other parts of the debug information, much like one can refer to elf
symbols and sections from e.g. relocation descriptions. For example, if
each "abbrev" contribution had some kind of a name/identifier, then one
could easily express that two compile units share the same abbreviation
table. At the same time, one could control their relative order by using
those identifiers in the debug_abbrev section (while leaving some of
them to be auto-generated, and spelling out others, for instance).
- It's not clear to me whether having a "SourceLocation" entry as a
first class entity is really worth it. This
"""
  - SourceLocation:
    - File: foo
    - Line: 1
    - Column: 2
"""
is not that much shorter than
"""
  - Attr: DW_AT_decl_file
    Value: foo
  - Attr: DW_AT_decl_line
    Value: 1
  - Attr: DW_AT_decl_column
    Value: 2
"""
OTOH, it creates a lot of opportunities for ambiguity: What if a DIE has
both a SourceLocation entry and an explicit DW_AT_decl_file attribute?
Do the "SourceLocation" attributes come first or last in the final
attribute list? Can I change their forms? etc.

regards,
pavel

Hi Pavel,

Thanks for your comments!

Hello Xing,

I think the proposal looks very useful. I think it will be fairly tricky
to get all of the details right though. There is a lot of "inferring"
going on there, and getting that to work reliably and with predictable
results will need careful consideration.

Yes, I agree. The proposal gives an ideal example of DWARF YAML. We
will try to make the final implementation close to that example.

Some ideas/questions I had while looking this over:
- It seems like it would be useful to be able to (symbolically) refer to
other parts of the debug information, much like one can refer to elf
symbols and sections from e.g. relocation descriptions. For example, if
each "abbrev" contribution had some kind of a name/identifier, then one
could easily express that two compile units share the same abbreviation
table. At the same time, one could control their relative order by using
those identifiers in the debug_abbrev section (while leaving some of
them to be auto-generated, and spelling out others, for instance).

Nice catch! It's a useful feature, but I cannot give an answer right
now. I will be back to this thread, once I have some ideas on this.

- It's not clear to me whether having a "SourceLocation" entry as a
first class entity is really worth it. This
"""
  - SourceLocation:
    - File: foo
    - Line: 1
    - Column: 2
"""
is not that much shorter than
"""
  - Attr: DW_AT_decl_file
    Value: foo
  - Attr: DW_AT_decl_line
    Value: 1
  - Attr: DW_AT_decl_column
    Value: 2
"""
OTOH, it creates a lot of opportunities for ambiguity: What if a DIE has
both a SourceLocation entry and an explicit DW_AT_decl_file attribute?
Do the "SourceLocation" attributes come first or last in the final
attribute list?

Thanks, it makes sense. I will fix the proposal later. BTW, I think
the value of "DW_AT_decl_file" shall be "Str"? Can we map the values
of different types into same field?

"""
- Attr: DW_AT_decl_file
  Str: foo
"""

Yes, that is definitely possible. You just need to make the map calls
conditional on the values of other attributes. Maybe something like this:
IO.mapRequired("Attr", Attr);
IO.mapOptional("Form", Form, getDefaultForm(Attr, Ctx.isSplitDwarf()
/*or whatever*/));
switch (getFormClass(Form)) {
/* The cases could correspond to DWARF5 form classes, but maybe not
completely.*/
case String: IO.mapRequired("Value", Value.String);
case Constant: IO.mapRequired("Value", Value.Int);
case Block: IO.mapRequired("Value", Value.Bytes);
...
}

I think doing something like that would be reasonable, because i expect
a relatively high number of "classes", and I think it might be tricky to
remember which key to use for which value type.

cheers,
Pavel

I think we have to be careful here. We might want flexibility to say “I want to use a specific class” without having to specify the exact DW_FORM. Sometimes, we might even end up in an ambiguous situation and not get the result we want. For example, in DWARFv4, the DW_AT_high_pc attribute has either a Constant or an Address class, which use completely different forms, but if we have just “Value: 0x1234”, which is it? In DWARFv3, it is always an Address, if I remember correctly, so in that case, we might want our default to be “Address”. However, for DWARFv4 the compiler typically emits DW_AT_high_pc using a Constant form, and most people might expect that to be used instead.

I think having a different name for the tag might be a good thing to do, with the name matching the different classes (of which there are 15 in DWARFv5, possibly with some folded together like string/stroffsetsptr), but it could be a little confusing as Pavel mentions. Alternatively, maybe we could have Value and Form, where the Form can be a generic class name instead of a specific DW_FORM value. This has the potential for ambiguity still, but should be flexible enough at least.

James

Hello James,

The DW_AT_high_pc example is a good one. For this attribute, I guess I
would say that we shouldn't have any "default" form class.

Encoding the form class into the yaml key achieves this implicitly (for
all attributes). I sort of like that -- it results in a pretty concise
representation for the cases where one does not want to specify a form.
On the other hand, it feels a bit redundant in the cases where one does
specify a form.

The idea of making "generic" form values also sounds interesting. I'm
not sure how much will they be used though -- most of the files are made
by copy-pasting, and I'm guessing the obj2yaml path will not be
producing these. And people who know about form classes, probably don't
have a problem with specifying an explicit form either. However, I don't
think it hurts having them anyway.

I don't really have a strong opinion on any of these options. I'm
guessing we'll need to play around with some actual examples to see how
they would look for real.

regards,
pavel

Hi there,

Thank you for giving suggestions and leaving comments on this topic.
I've almost finished porting existing DWARF sections to yaml2elf. I'm
going to implement the .debug_rnglists/.debug_loclists and do some
refactor work according to the proposal. After porting the existing
DWARF sections to yaml2elf, I learned a lot of new things and found
some places that need to improve. A report[1] is drafted to record
these things. Thank you all for participating in the discussion.

[1] My Journey with LLVM (GSoC’20 Phase 1) - Google Docs