[lld] Handling non SHF_ALLOC sections.

Debug info linking is currently broken due to how we handle reading and laying out non SHF_ALLOC sections. I posted a patch that partially fixes this, but it’s both the wrong approach and doesn’t handle multiple input files with debug info (wrong relocation values).

The first issue is representing non SHF_ALLOC atoms in the Atom model. We currently don’t have a type for this, and DefinedAtom.cpp makes assumptions about the permissions of an Atom based on their type, so it’s hard to use an existing type.

The next problem is in the ELF writer. It currently cannot handle AtomSections that are not in a segment as file offsets and addresses are never set. This means that assignOffsets is not called, and that the atoms within are never added to the _atomToAddressMap. However, we can’t just add them to that map with their virtual address, as they don’t have a virtual address. We need to use the symbol value, which is the offset into the section. My current hack to fix this is to call assignVirtualAddresses(-fileoffset) and then explicitly added them to the _atomToAddressMap.

Any ideas for the proper fix here?

  • Michael Spencer

Hi Michael,

Debug info linking is currently broken due to how we handle reading and
laying out non SHF_ALLOC sections. I posted a patch that partially fixes
this, but it's both the wrong approach and doesn't handle multiple input
files with debug info (wrong relocation values).

The first issue is representing non SHF_ALLOC atoms in the Atom model. We
currently don't have a type for this, and DefinedAtom.cpp makes assumptions
about the permissions of an Atom based on their type, so it's hard to use
an existing type.

Can we parse the Debug sections into atoms too ? This way we could associate Debug information associated with DefinedAtoms (seperate reference types, probably).

The advantage of this approach would be that Garbage collection would remove all the unneeded references automatically when the definedatom is removed.

I think Nick also mentioned about a similiar way a while back.

The next problem is in the ELF writer. It currently cannot handle
AtomSections that are not in a segment as file offsets and addresses are
never set. This means that assignOffsets is not called, and that the atoms
within are never added to the _atomToAddressMap. However, we can't just add
them to that map with their virtual address, as they don't have a virtual
address. We need to use the symbol value, which is the offset into the
section. My current hack to fix this is to call
assignVirtualAddresses(-fileoffset) and then explicitly added them to the
_atomToAddressMap.

Any ideas for the proper fix here?

There is a way that we can handle this without lot of tweaks.

a) Assign the debug sections part of a linker internal segment(the segment would not appear in the output file), hasOutputSegment will return true for a debug section.

b) Around lines 623, in DefaultLayout, we find out if the section is associated with the special debug section, we add this sepecial segment to the list of segments

c) Around lines 731 in DefaultLayout.h, we compare the segment type against the linker internal segment types, and assign offsets for the debug section. Lets not set the virtual addresses for these sections. If there is a need for assigning virtual addresses, you could change the second loop that assigns virtual addresses to deal with that too.

Thanks

Shankar Easwaran

Hi Michael,

Debug info linking is currently broken due to how we handle reading and
laying out non SHF_ALLOC sections. I posted a patch that partially fixes
this, but it's both the wrong approach and doesn't handle multiple input
files with debug info (wrong relocation values).

The first issue is representing non SHF_ALLOC atoms in the Atom model. We
currently don't have a type for this, and DefinedAtom.cpp makes assumptions
about the permissions of an Atom based on their type, so it's hard to use
an existing type.

There is a couple of ways to model this:
1) SHF_ALLOC=0 sections do not occupy space during execution, so they do not need addresses, so they are not atoms. Instead that section information is modeled as some kind of “attribute” (like a name or content type) of DefinedAtoms or the whole File.
2) If the dwarf can be parsed into chunks that can be associated with DefinedAtoms, then
2a) those chunks could be new attributes of an atom, or
2b) those chunks could be atoms themselves (with some defined way to set the name, permissions, etc of those new atoms)

I have looked at breaking up the source line table information in dwarf. Conceptually, it is a table of pc ranges to source file ranges. The problem is that it is a compressed table. Which means you have to decompress it to figure out which rows belong to which atoms.

So the big question, is should lld parse dwarf into some internal representation (like it does for sections into atoms), or should it basically just pass-thru and concatenate the dwarf?

For what is is worth, Apple has purposefully side stepped this issue with our work flow. The darwin linker always ignores all dwarf debug info in .o files. Instead, it records the code ranges along with the path to .o files into “debug notes” it puts in the linker output file. Our debugger, when it needs debug info for a range, looks at the notes, finds the original .o file and uses the dwarf from it.

-Nick

Hi Michael,

Debug info linking is currently broken due to how we handle reading and
laying out non SHF_ALLOC sections. I posted a patch that partially fixes
this, but it's both the wrong approach and doesn't handle multiple input
files with debug info (wrong relocation values).

The first issue is representing non SHF_ALLOC atoms in the Atom model. We
currently don't have a type for this, and DefinedAtom.cpp makes assumptions
about the permissions of an Atom based on their type, so it's hard to use
an existing type.

There is a couple of ways to model this:
1) SHF_ALLOC=0 sections do not occupy space during execution, so they do not need addresses, so they are not atoms. Instead that section information is modeled as some kind of “attribute” (like a name or content type) of DefinedAtoms or the whole File.
2) If the dwarf can be parsed into chunks that can be associated with DefinedAtoms, then
2a) those chunks could be new attributes of an atom, or
2b) those chunks could be atoms themselves (with some defined way to set the name, permissions, etc of those new atoms)

I have looked at breaking up the source line table information in dwarf. Conceptually, it is a table of pc ranges to source file ranges. The problem is that it is a compressed table. Which means you have to decompress it to figure out which rows belong to which atoms.

I think LLVM DebugInfo has API's to deal with debug information, isnt it ?

So the big question, is should lld parse dwarf into some internal representation (like it does for sections into atoms), or should it basically just pass-thru and concatenate the dwarf?

The penalty is only when garbage collection is done. The linker might store extra debug information. This could be a corner case though.

For what is is worth, Apple has purposefully side stepped this issue with our work flow. The darwin linker always ignores all dwarf debug info in .o files. Instead, it records the code ranges along with the path to .o files into “debug notes” it puts in the linker output file. Our debugger, when it needs debug info for a range, looks at the notes, finds the original .o file and uses the dwarf from it.

For ELF, until DWARF5 is adopted, I assume we cant ignore Debug Info. Also certain customer apps might want to still maintain debug information in the executables(we have couple of such assumptions).

Thanks

Shankar Easwaran

Hi Michael,

Debug info linking is currently broken due to how we handle reading and
laying out non SHF_ALLOC sections. I posted a patch that partially fixes
this, but it's both the wrong approach and doesn't handle multiple input
files with debug info (wrong relocation values).

The first issue is representing non SHF_ALLOC atoms in the Atom model. We
currently don't have a type for this, and DefinedAtom.cpp makes
assumptions
about the permissions of an Atom based on their type, so it's hard to use
an existing type.

Can we parse the Debug sections into atoms too ? This way we could
associate Debug information associated with DefinedAtoms (seperate
reference types, probably).

The advantage of this approach would be that Garbage collection would
remove all the unneeded references automatically when the definedatom is
removed.

I think Nick also mentioned about a similiar way a while back.

You can't just parse it by byte ranges and pull some out. You would
actually need to parse the DWARF and rewrite it.

The next problem is in the ELF writer. It currently cannot handle

AtomSections that are not in a segment as file offsets and addresses are
never set. This means that assignOffsets is not called, and that the atoms
within are never added to the _atomToAddressMap. However, we can't just
add
them to that map with their virtual address, as they don't have a virtual
address. We need to use the symbol value, which is the offset into the
section. My current hack to fix this is to call
assignVirtualAddresses(-**fileoffset) and then explicitly added them to
the
_atomToAddressMap.

Any ideas for the proper fix here?

There is a way that we can handle this without lot of tweaks.

a) Assign the debug sections part of a linker internal segment(the segment
would not appear in the output file), hasOutputSegment will return true for
a debug section.

b) Around lines 623, in DefaultLayout, we find out if the section is
associated with the special debug section, we add this sepecial segment to
the list of segments

The problem with this is that it's not just debug. We need to properly
implement ELF semantics (ok, really gnu-ld semantics and the ELF spec
doesn't say what should happen here (or anywhere really)). The semantics
seem to be that the value of symbols in non SHF_ALLOC sections is their
offset within the section.

c) Around lines 731 in DefaultLayout.h, we compare the segment type
against the linker internal segment types, and assign offsets for the debug
section. Lets not set the virtual addresses for these sections. If there is
a need for assigning virtual addresses, you could change the second loop
that assigns virtual addresses to deal with that too.

Thanks

Shankar Easwaran

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation

- Michael Spencer

> Hi Michael,
>> Debug info linking is currently broken due to how we handle reading and
>> laying out non SHF_ALLOC sections. I posted a patch that partially fixes
>> this, but it's both the wrong approach and doesn't handle multiple input
>> files with debug info (wrong relocation values).
>>
>> The first issue is representing non SHF_ALLOC atoms in the Atom model.
We
>> currently don't have a type for this, and DefinedAtom.cpp makes
assumptions
>> about the permissions of an Atom based on their type, so it's hard to
use
>> an existing type.
There is a couple of ways to model this:
1) SHF_ALLOC=0 sections do not occupy space during execution, so they do
not need addresses, so they are not atoms. Instead that section information
is modeled as some kind of “attribute” (like a name or content type) of
DefinedAtoms or the whole File.

It's weird to not treat them as atoms, as they still have relocations and a
lot of other things atoms have. They still need to be laid out within the
file, and need to be able to refer to other atoms.

2) If the dwarf can be parsed into chunks that can be associated with
DefinedAtoms, then
2a) those chunks could be new attributes of an atom, or

Again, weird for the above reasons.

2b) those chunks could be atoms themselves (with some defined way to set
the name, permissions, etc of those new atoms)

This makes sense for the cases where we want to parse DWARF, but it doesn't
handle other non SHF_ALLOC sections.

I have looked at breaking up the source line table information in dwarf.
Conceptually, it is a table of pc ranges to source file ranges. The
problem is that it is a compressed table. Which means you have to
decompress it to figure out which rows belong to which atoms.

So the big question, is should lld parse dwarf into some internal
representation (like it does for sections into atoms), or should it
basically just pass-thru and concatenate the dwarf?

There seem to be major size savings to be had to compressing DWARF. I think
we should look at it on a case by case basis to see how much size reduction
we get vs how much it slows down the link. But we definitely need a
passthrough mode.

- Michael Spencer

Makes it much simpler, put all sections which do not have the ALLOC flag in a internal segment, which doesnot appear in the output file.

Thanks

Shankar Easwaran

This seems like a problem that will be solved along the way of implementing
`ld -r`, so the "proper" solution will probably be something in the general
direction of `ld -r`.

-- Sean Silva

You are right Sean. That was the general idea behind the comment that I mentioned in the previous mail, for ld -r, no section would be part of a loadable segment, which will make the current code work.

Thanks

Shankar Easwaran