[RFC] Adding support for dynamic entries in yaml2obj

The goal of this proposal is to introduce a new type of YAML section for yaml2obj that allows the population of ELF .dynamic entries via a list of tag and value pairs. These entries are interpreted (and potentially validated) before being written to the .dynamic section. The simplest way to satisfy this requirement is for all dynamic entry values to be numeric values. Unfortunately, this inherently prevents entries like DT_SONAME, DT_NEEDED, DT_RPATH, and DT_RUNPATH from being specified alongside dynamic symbols due to the design of yaml2obj.

This proposal introduces three ways to input a value for a dynamic entry. For a given dynamic tag, one or more of these methods of setting a value may be permitted. All of these cases are illustrated later with an example.

  1. For dynamic entry strings that belong in .dynstr, the string itself can be used as the value for an entry. (ex. DT_SONAME, DT_NEEDED, DT_RPATH, and DT_RUNPATH)

  2. A section name can be used in place of an address. In this case, the value of the dynamic entry is the sh_addr of the specified section. (ex. DT_STRTAB, DT_SYMTAB, DT_HASH, DT_RELA, and others)

  3. A value can be specified using hexadecimal or decimal (or other bases supported by StringRef::to_integer()). (ex. DT_STRSZ, DT_SYMENT, DT_RELAENT, and others)

Here’s an example to illustrate this design:

!ELF

FileHeader:

Class: ELFCLASS64

Type: ET_DYN

Machine: EM_X86_64

Sections:

  • Name: .dynsym

Type: SHT_DYNSYM

Address: 0x1000

  • Name: .data

Type: SHT_PROGBITS

Flags: [ SHF_ALLOC, SHF_WRITE ]

  • Name: .dynamic

Type: SHT_DYNAMIC

Entries:

  • Tag: DT_SONAME

Value: libsomething.so

  • Tag: DT_SYMTAB

Value: .dynsym

  • Tag: DT_SYMENT

Value: 0x18

DynamicSymbols:

Global:

  • Name: foo

Type: STT_FUNC

Section: .data

  • Name: bar

Type: STT_OBJECT

Section: .data

The final section is of type SHT_DYNAMIC, and the “Entries” key illustrates the proposed addition. Walking through the three dynamic entries,

  1. DT_SONAME: The value of this entry is a string that will be inserted into the dynamic string table (.dynstr) alongside the symbol names specified in DynamicSymbols. This is possible due to the nature of .dynstr being represented as a StringTableBuilder, and that .dynamic is linked to .dynstr by default. If the .dynamic section had been linked to a section other than .dynstr, the value of this entry would have to be a number (the offset of the string in the linked string table) rather than a string.

  2. DT_SYMTAB: This tag may either be a numeric address or a valid section name, and this example illustrates the option of using the name of a section rather than the address. This resolves to 0x1000 since .dynsym is declared with an address of 0x1000. It would have been equally valid to make this entry have a value of 0x1000, but doing so would mean that changes to .dynsym’s address would need to be manually updated in the dynamic entry. It’s also worth noting that in the case of DT_SYMTAB it wouldn’t be too difficult to infer this.

  3. DT_SYMENT: This tag is restricted to only having numeric values. This entry could easily be inferred as well.

Note that it doesn’t make sense for DT_SYMENT to be any sort of string, so it is restricted to only being populated with a numeric value. Similarly, it doesn’t make sense for the value of DT_SONAME to ever be interpreted as the name of a section. Though at least one input method is required for a given dynamic tag, it’s typically the case that not all three are valid. It should also be possible to specialize upon certain tags for convenience. For example, DT_PLTREL could be specialized to allow “REL” and “RELA” to be used as values rather than requiring the values be entered in hexadecimal. Evaluating the needs for every dynamic tag isn’t within the scope of this proposal, so any tag without a specialization defaults to permitting numeric values or the name of a valid section (that is later converted to an address).

Some dynamic tags have strict enough constraints that they can be inferred. This limited set of dynamic tags could treat “Value” an optional field since the value can be inferred from other parts of an ELF file. This isn’t a requirement for me, though it’s something I’d certainly like to have.

I began working on a patch here, and it will later be updated to reflect the RFC:

https://reviews.llvm.org/D56569

Best,

Armando

Thanks for bringing this up. Since you posted on the review, I’ve been thinking more about the different options and overall design of yaml2obj’s dynamic sections (including dynstr and dynsym) and how they work from a user’s perspective, not least motivated by https://reviews.llvm.org/D56791, where I had to hand-craft a large .dynamic section, and ended up fighting with .dynsym and .dynstr too (see also https://bugs.llvm.org/show_bug.cgi?id=40339, for example).

As for the proposal, it sounds reasonable to be able to specify numeric and string arguments as you propose. However, I do have some questions/thoughts/points:

  1. What should happen if an explicit .dynstr content is specified (or more specifically, explicit content is specified for the linked section, see point 2)? My suggestion would be that it should be an error to specify string values in .dynamic and explicit content in .dynstr at the same time.
  2. I’d like to avoid the same issue that is present in https://bugs.llvm.org/show_bug.cgi?id=40337, namely that the .dynamic section auto-populates strings in .dynstr, even though it is linked against some other section. Note that this other section could also be a string table, so using strings in that instance would still be valid.
  3. It would be nice to have some mechanism to optionally auto-populate the .dynamic section with DT_STRTAB, DT_STRSZ, DT_SYMTAB, etc.
  4. It should be possible to omit the “Link:” field of the header, to get a value of 0 there, or to specify a section index, in place of a section name.

Related to point 3), I foresee several different use-cases: a) users want a regular .dynamic section, with DT_STRTAB etc in, but want the values auto-populated for everything. They don’t want to specify any tags at all. yaml2obj does the hard work of fetching the addresses and sizes automatically; b) similar to a) but a user wants to be able to extend a .dynamic section with other tags (e.g. DT_SONAME); c) again similar to a), but a user wants to be able to override some of the auto-generated tags (but not necessarily all of them); d) a user wants to completely control the content with normal-looking tags, i.e. no auto-generation at all (e.g. they want to create a dynamic section without DT_STRTAB); e) a user wants to be able to hand-craft the content completely (e.g. to create truncated tags etc) - this could probably be best done via having an explicit content. I think all except e) could be accomplished by having an extra attribute for dynamic sections, namely something like “DeriveTags: true/false” or similar. A value of true would mean that all mandatory tags are automatically generated, unless an explicit tag of the same value is specified. In order to have multiple default-generated tags with the same DT_* value (or none of them), this would need to be false.

I like your idea of a missing Value being inferred. This would work well with the ability to turn off auto-generated tags, and would minimise the pain of having to write each normally-defaulted tag by hand in this case.

Note: I have no idea how well my above proposal works in relation to the current design of yaml2obj. My personal ideal is that a user can use yaml2obj to create basically any ELF object they want, primarily for testing, so being able to test corner cases (e.g. missing tags, malformed sections etc) is a significant requirement.

James

  1. Producing an error in this situation is reasonable. Dynamic symbols don’t produce an error when .dynstr has explicit content specified, and that has been a source of confusion for me before.
  2. For the sake of simplicity, strings won’t be added to the linked section unless the linked section is the default .dynstr. In all other cases, a numeric offset value must be used to specify the location of a string. I’ve provided more details on this at the end of this reply.
  3. I was initially considering this as well, and the original patch already does this for a few tags. If that’s generally desirable, I’m game to keep this in and improve control over automatic generation of tags. The current behavior is that user defined tags have precedence over the auto-generated tags.
  4. This is already implemented in yaml2obj for all section types (though the symbol tables forcibly override the linked section if symbols are specified). As my patch stands right now, .dynamic currently links to .dynstr by default, but only if dynamic entries have been specified. I personally feel that is appropriate behavior. Manually specifying “Link: 0” should properly override the default in the case that dynamic entries are specified. Alternatively, if no dynamic entries are specified, the linked section defaults to 0.

Pretty much every use case you’ve brought up is either already implemented in my patch, or wouldn’t be too difficult to add. The only major exception would be allowing strings (for tags like DT_SONAME) to be added to string tables other than .dynstr. The existing design of yaml2obj doesn’t make that very approachable. It’s more reasonable to attempt to search for the specified string in the linked section, but that would require some significant structural changes as well.

Okay, that all sounds reasonable to me. If auto-adding strings to the non-default it hard, then that’s fine, I think, at least for now. We can improve it at a later date, if necessary.