Need to create symbols only once

Hi Nick,

We have few symbols like __bss_start, __bss_end, which are Undefined symbols in the code.

I want a way in the Reader to create specific atoms before the linker bootstraps.

I didnt find a way to do that with the existing interfaces.

The way it needs to work is as below :-

1) ReaderELF creates Absolute symbols (for __bss_start, __bss_end etc)
2) ReaderELF reads each file and adds Atoms to the list
3) If the atoms the linker defined were Global, the atoms that the Reader created should get overridden with the linker created ones.

This may also be needed to pul in specific symbols from archive libraries, too.

I was thinking to add an interface to ReaderELF which would be called by the driver, but the problem is the DefinedAtom/AbsoluteAtom have which file owns the atom.

I was discussing with Michael on this, and he was proposing to add a Pre-Read file.

Do you have any other opinions too ?

Thanks

Shankar Easwaran

We have a similar requirement in darwin's ld64 linker, but even more general. Any binary can do the following to introspect itself:

struct stuff { int a; int b; };

extern struct stuff* stuff_start __asm("section$start$__DATA$__my");
extern struct stuff* stuff_end __asm("section$end$__DATA$__my");

void examineSection() {
  const struct stuff* p;
  for (p = stuff_start; p < stuff_end; ++p) {
    // do stuff with p
  }
}

That is, there are magic symbol names which reference the beginning or ending of any particular section. To support this, the linker lazily creates atoms when references to these magic symbols are discovered during resolving.

I have some hooks for this already in place in lld:

1) There is Writer::addFiles(). This method gives any writer a change to add files/atoms to the set of atoms the Resolver works on. The Writer::addFiles() method is called after all input files are added. If you want to add something lazily (like darwin linker does for section$start$ symbols), the writer returns a File object akin to a static library. That it, it provides no initial atoms, but can provide atoms as a last resort (so an .o files would override it). The WriterMachO already uses the addFiles() method to add CRuntime symbols.

2) DefinedAtom::ContentType already has typeFirstInSection and typeLastInSection. These are intended to be used for the content type of the atoms which represent the magic symbols for the start and end of a section. The key here is that the Pass (not written yet) which sorts atoms, knows to sort these atoms to the start or end of their respective sections.

If you don't want this full general, lazy approach, you could have your WriteELF::addFiles() return a regular object file that has atoms named __bss_start and __bss_end, but they are marked mergeAsWeak so that any user defined atoms will override them.

-Nick

Thanks for the reply Nick.

I will use the Writer::addFiles functionality. Do you want to move the SimpleFile class to lld/Core ?

It might be useful for other types of object files too(like for ELF here).

How does typeFirstInSection/typeLastinSection know that the addresses that need to be used for those symbols are the symbol values for the section start / section end ?

I didnt see references to typeFirstInSection/typeLastInSection in the MachO part of lld too, any pointers to how you are doing that will be helpful.

If not, I need to duplicate that piece of code, which doesnot make sense.

Thanks

Shankar Easwaran

Thanks for the reply Nick.

I will use the Writer::addFiles functionality. Do you want to move the SimpleFile class to lld/Core ?

If others have use for it, we can move it out of lib/ReaderWriter/MachO. I'm not sure if it makes sense in include/lld/Core because those are interfaces external clients would use. The SimpleFile stuff is really just something that other ReaderWriters might use. So, perhaps it could go in lib/ReaderWriter (Thoughts Michael?)

It might be useful for other types of object files too(like for ELF here).

How does typeFirstInSection/typeLastinSection know that the addresses that need to be used for those symbols are the symbol values for the section start / section end ?

The typeFirstInSection comes from the darwin linker. The model is that the Order Pass (which does not exist yet in lld) sorts atoms and it will sort typeFirstInSection atoms to the start of their section. The problem is that in lld, "sections" don't exist in core linking. Real named sections are only assigned when you get to the Writer. So, a Pass could not do this sorting. The general idea still holds, we just need to adjust it for ld64. Here is my proposal:

We add a new attribute to DefinedAtoms:

   enum SectionPosition {
     sectionPositionLowest
     sectionPositionLow,
     sectionPositionAny,
     sectionPositionHigh,
     sectionPositionHighest
   };

   virtual SectionPosition sectionPosition();

For most atoms, the value return for sectionPosition() will be sectionPositionAny. For atoms that must be pinned to the start of a section, they return sectionPositionLowest. We should also assert that any sectionPositionLowest atom must also be zero length. That is to allow multiple names for the start of the section. The sectionPositionLow and sectionPositionHigh are a way to mark atoms that prefer to be towards the start or end of their section.

With this in place, we can add an OrderPass which sorts atoms by contentType and within a group of atoms of the same type, they are sorted by sectionPosition, and of course, within a sectionPosition they are sorted by .o file order and order with .o file.

-Nick

Hi Nick,

We have a similar requirement in darwin's ld64 linker, but even more general. Any binary can do the following to introspect itself:

struct stuff { int a; int b; };

extern struct stuff* stuff_start __asm("section$start$__DATA$__my");
extern struct stuff* stuff_end __asm("section$end$__DATA$__my");

void examineSection() {
  const struct stuff* p;
  for (p = stuff_start; p < stuff_end; ++p) {
    // do stuff with p
  }
}

That is, there are magic symbol names which reference the beginning or ending of any particular section. To support this, the linker lazily creates atoms when references to these magic symbols are discovered during resolving.

I have some hooks for this already in place in lld:

1) There is Writer::addFiles(). This method gives any writer a change to add files/atoms to the set of atoms the Resolver works on. The Writer::addFiles() method is called after all input files are added. If you want to add something lazily (like darwin linker does for section$start$ symbols), the writer returns a File object akin to a static library. That it, it provides no initial atoms, but can provide atoms as a last resort (so an .o files would override it). The WriterMachO already uses the addFiles() method to add CRuntime symbols.

2) DefinedAtom::ContentType already has typeFirstInSection and typeLastInSection. These are intended to be used for the content type of the atoms which represent the magic symbols for the start and end of a section. The key here is that the Pass (not written yet) which sorts atoms, knows to sort these atoms to the start or end of their respective sections.

If you don't want this full general, lazy approach, you could have your WriteELF::addFiles() return a regular object file that has atoms named __bss_start and __bss_end, but they are marked mergeAsWeak so that any user defined atoms will override them.

The case I have is a bit different now. I added symbols __bss_start/__bss_end/_end using WriterELF::addFiles(). The symbols get overridden appropriately but the value of the symbols are known only after the sections have been merged and the virtual addresses assigned to those symbols.

So when I am trying to write these atoms to the output file, I want to set the value of these symbols to the values computed by the ELF Writer.

These atoms are NativeAtoms and i dont see a function to set the value of the atom, How do I go about accomplishing this functionality.

Thanks

Shankar Easwaran

The same way you any atom gets an address. When the Writer gets the set of atoms to write out, the Writer is the one that assigns them addresses. And by "assign" I mean the Writer maintains some extra information for each atom, such as its assigned section, segment, and address. So, your writer just needs to assign the value of the section start to the __bss_start atom.

Note: this is why it does not make sense for a Reader and Writer to share common Atom classes. When the Writer finally gets the atoms, they may not be of that class. The Writer can only depend on the standard attributes of an atom - not something special it can do when the atom's class is known.

-Nick