yaml2obj support for COFF debug directories

Spoiler: the following only applies to Windows binary format handling.

Potential for extending yaml2obj to support COFF debug directories recently came up during a code review. Currently, its COFF syntax allows for specifying section data, but not debug directories, that’s why llvm-readobj tests which depend on debug directory contents use pre-built executable images instead of yaml2obj.

It is possible to extend the tool, but first I would be interested in gathering feedback on usability of this, especially on potential uses of this change. It looks like porting llvm-readobj tests for codeview would depend on this and also D70606 is introducing another possible use. But I am not sure how trivial would the codeview effort, would it be worth it or is it easier to leave things as they are for now?

In case this is interesting, base Yaml syntax for COFF debug directory may look like this (enum values representing COFF Debug Types):

DebugDirectory:

  • Type: [ {type: str, enum: […]}, {type: int} ]
  • DebugDirectoryData: {type: str}

This may have to be further specialized for sub-categories, specifically codeview.

Best,

Petr

Hi Penzin,

From the practical standpoint, I think this is a matter of investment and reward. If we are going to use the feature only for writing a test for lld, I guess it might not be worth it, and we can live with binary test file though it’s not ideal.

My feeling is that we eventually have to implement the feature, as Microsoft seem to add a new bit to DLLCharacteristics every few years and thus we’ll see more bits defined for ExtendedDLLCharacteristics in the future, but for now, I don’t see an immediate need to implement it as there’s only one bit defined for ExtendedDLLCharacteristics.

I’m not sure I know enough about COFF and debug directories to know how useful this feature will be, but I do have some thoughts on the syntax, based on my experience working with the ELF part of yaml2obj. From reading the spec you linked, I would think it might look something like the following:

DebugDirectory:

  • Characteristics: 1234 # Optional, defaults to 0. Contains value to write in Characteristics field.
    TimeDateStamp: 4321 # Optional, defaults to 0(?).
    MajorVersion: 1 # Optional, defaults to 0.
    MinorVersion: 2 # Optional, defaults to 0.
    Data: # Required

  • Type: 12 # Required, contains the value of the Type field, can be written as raw number or enum value (see how ELF works for various fields).
    Size: 1111 # Optional, derives size from data field, if not specified.
    Address: 2222 # Optional, defaults to 0(?)
    Pointer: 3333 # Optional, defaults to wherever yaml2obj chooses to place the data.
    RawData: ‘12345678abcdef0’ # Optional byte string (see ‘Content’ fields for ELF sections). Defaults to empty if not specified.

The following fields are all defined based on the Type value (for unrecognised values, by default only RawData is allowed). Cannot be mixed with RawData field. Only those actually required need to be implemented up front.

ExtendedDLLCharacteristics: # Used for IMAGE_DEBUG_TYPE_EX_DLLCHARACTERISTICS

  • … # Fields related to DLL Characteristics
    FPOInfo: # Used for IMAGE_DEBUG_TYPE_FP

  • … # FPO Information array

Does this make sense? It’s somewhat similar to how Sections are defined in ELF yaml2obj.

James

I also don't know much about COFF, but I am always interested in using
yaml2obj to generate "interesting" test cases for lldb. So, if you're
looking for a use case, this sounds like it could be very useful there.

cheers,
pavel

I think it seems like an oversight, and improvements in this area would be welcome.

I think most of the effort in COFF <-> YAML translation has been for representing object files, and debug directories are a feature of fully linked PE images. With that in mind, it’s not too surprising that the feature is missing.

In general, it should be possible to roundtrip linked PE images via yaml just fine - their contents would just be part of the opaque section contents blob. Hard to inspect and tweak by hand, but so are lots of other things that are referended via data directories (like base relocation tables) and stored in the plain section contents.

But debug directories have got one property which would break this - they have a PointerToRawData field, that should contain the raw byte offset within the linked PE image, to their content data. As roundtrip via yaml does rewrite the file structure (and the output layout of yaml2obj isn't supposed to be fixed), the exact value of this field would have to be updated. As far as I know, yaml2obj doesn't do this at the moment.

llvm-objcopy's COFF backend does try to do it (COFFWriter::patchDebugDirectory in llvm/tools/llvm-objcopy/COFF/Writer.cpp), but when I now reread the code there, I'm pretty sure I made some mistakes there. (I incorrectly assumed that the raw data is interleaved after each debug directory header.) With your lld patch for the CET compat flag, it should be easy to generate a testcase for that, with more than one debug directory.

One general design question regarding this in obj2yaml, is that when the debug directories are synthesized, should they be appended onto one of the existing sections (with normal hex dumped contents) or created as an entirely new section? Synthesizing them separately works fine for cases where a file is generated entirely from scratch with yaml, but is tricky for obj2yaml, where the original debug directories pretty much need to be left in place. In that case, each time a PE image is roundtripped via yaml, it would generate yet another set of debug directories, orphaning the old ones.

Finally, when reading the spec, it also seems like the payload of a debug directory doesn't even need to be in the mappable parts of sections, but could be in unmapped areas of the PE image file (by having AddressOfRawData set to zero, so it can only be found via PointerToRawData). This doesn't seem like something that e.g. llvm-readobj's --coff-debug-directoriy currently supports though (and llvm-objcopy expects the paylaod to be moved along as part of sections' contents).

I'll make a note to try to fix llvm-objcopy's assumptions about the location of the payload this sometime in the future.

// Martin