Handling ELF groups.

So I was looking into handling ELF groups today in the Atom model. It
appears that we will need to add the concept of a group to the atom
model directly, as modeling it with references fails to capture some
semantics.

http://www.sco.com/developers/gabi/latest/ch4.sheader.html

Groups in ELF are collections of sections that must be either included
or excluded as a unit. They also are used to implement COMDAT. Each
group has an "identifying symbol entry" or "group signature". This is
only used in the case of COMDAT groups (which are marked with a flag).
When two COMDAT groups have the same group signature the linker must
select one (not specified how to pick) and discard _all_ members of
the other group.

Correctly implementing this requires knowing the group name for each
group and having the resolver remove the correct set of atoms on
collision. We also need to be able to explicitly track the identifying
symbol entry for the relocatable case.

An idea for implementing this would be to add a list of Groups to each
File. I don't believe a Group should be an atom as it has different
semantics and would have to be treated specially everywhere.

A group would have a name, merge attribute, and a list of atoms it contains.

YAML mockup:

So I was looking into handling ELF groups today in the Atom model. It
appears that we will need to add the concept of a group to the atom
model directly, as modeling it with references fails to capture some
semantics.

http://www.sco.com/developers/gabi/latest/ch4.sheader.html

Groups in ELF are collections of sections that must be either included
or excluded as a unit.

I thought groups were a collection of symbol - not sections. Is this a case
where there is one symbol per section?

They also are used to implement COMDAT. Each
group has an "identifying symbol entry" or "group signature". This is
only used in the case of COMDAT groups (which are marked with a flag).
When two COMDAT groups have the same group signature the linker must
select one (not specified how to pick) and discard _all_ members of
the other group.

Correctly implementing this requires knowing the group name for each
group and having the resolver remove the correct set of atoms on
collision. We also need to be able to explicitly track the identifying
symbol entry for the relocatable case.

In the darwin linker this is solved using references. The "signature" atom in
a group has a "group-subordinate" reference to each atom in the group.
When an atom is coalesced away, its references are scanned and the
target of any group-subordinate reference is also coalesced.

Conceptually, a group is just a circle around some set of atoms. That same
information can be represented as a connected graph. That is, by introducing
a zero size "master " atom with reference to each atom in the group. In the special
case of group comdat, the signature atom can be used as the master.

In other words, I'm not convinced of the need to introduce a new top level class
(Group) to go along with Atom and Reference. I believe we can encode
the same information using references.

-Nick

So I was looking into handling ELF groups today in the Atom model. It
appears that we will need to add the concept of a group to the atom
model directly, as modeling it with references fails to capture some
semantics.

http://www.sco.com/developers/gabi/latest/ch4.sheader.html

Groups in ELF are collections of sections that must be either included
or excluded as a unit.

I thought groups were a collection of symbol - not sections. Is this a case
where there is one symbol per section?

It's sections. There is no restriction on symbols in a group section.

They also are used to implement COMDAT. Each
group has an "identifying symbol entry" or "group signature". This is
only used in the case of COMDAT groups (which are marked with a flag).
When two COMDAT groups have the same group signature the linker must
select one (not specified how to pick) and discard _all_ members of
the other group.

Correctly implementing this requires knowing the group name for each
group and having the resolver remove the correct set of atoms on
collision. We also need to be able to explicitly track the identifying
symbol entry for the relocatable case.

In the darwin linker this is solved using references. The "signature" atom in
a group has a "group-subordinate" reference to each atom in the group.
When an atom is coalesced away, its references are scanned and the
target of any group-subordinate reference is also coalesced.

Conceptually, a group is just a circle around some set of atoms. That same
information can be represented as a connected graph. That is, by introducing
a zero size "master " atom with reference to each atom in the group. In the special
case of group comdat, the signature atom can be used as the master.

In other words, I'm not convinced of the need to introduce a new top level class
(Group) to go along with Atom and Reference. I believe we can encode
the same information using references.

-Nick

Ok, I kinda see how this can work. The only thing I'm still confused
about is conforming to this part of the ELF spec:

"This is a COMDAT group. It may duplicate another COMDAT group in
another object file, where duplication is defined as having the same
group signature. In such cases, only one of the duplicate groups may
be retained by the linker, and the members of the remaining groups
must be discarded."

How do we know that a group master is a COMDAT group master as opposed
to a normal group master?

- Michael Spencer

So I was looking into handling ELF groups today in the Atom model. It
appears that we will need to add the concept of a group to the atom
model directly, as modeling it with references fails to capture some
semantics.

http://www.sco.com/developers/gabi/latest/ch4.sheader.html

Groups in ELF are collections of sections that must be either included
or excluded as a unit.

I thought groups were a collection of symbol - not sections. Is this a case
where there is one symbol per section?

It's sections. There is no restriction on symbols in a group section.

They also are used to implement COMDAT. Each
group has an "identifying symbol entry" or "group signature". This is
only used in the case of COMDAT groups (which are marked with a flag).
When two COMDAT groups have the same group signature the linker must
select one (not specified how to pick) and discard _all_ members of
the other group.

Correctly implementing this requires knowing the group name for each
group and having the resolver remove the correct set of atoms on
collision. We also need to be able to explicitly track the identifying
symbol entry for the relocatable case.

In the darwin linker this is solved using references. The "signature" atom in
a group has a "group-subordinate" reference to each atom in the group.
When an atom is coalesced away, its references are scanned and the
target of any group-subordinate reference is also coalesced.

Conceptually, a group is just a circle around some set of atoms. That same
information can be represented as a connected graph. That is, by introducing
a zero size "master " atom with reference to each atom in the group. In the special
case of group comdat, the signature atom can be used as the master.

In other words, I'm not convinced of the need to introduce a new top level class
(Group) to go along with Atom and Reference. I believe we can encode
the same information using references.

-Nick

Ok, I kinda see how this can work. The only thing I'm still confused
about is conforming to this part of the ELF spec:

"This is a COMDAT group. It may duplicate another COMDAT group in
another object file, where duplication is defined as having the same
group signature. In such cases, only one of the duplicate groups may
be retained by the linker, and the members of the remaining groups
must be discarded."

How do we know that a group master is a COMDAT group master as opposed
to a normal group master?

A COMDAT group master has a real, named atom as its master. The other
groups will have a zero size master atom with some special content type
(e.g. typeGroupMaster).

For COMDAT groups, the "group signature" is the name of the signature
(master) atom. If two .o files each have a COMDAT group with the
same signature, that means they each have a master atom with the same
name.

-Nick

I support Nick’s option too. I think handling groups is another example of using follow on references.

One question is how does an atom outside the group refer to the main atom here ? Will not garbage collection cleanup the main atom/signature atom because there are no references ?

Thanks

I support Nick’s option too. I think handling groups is another example of using follow on references.

One question is how does an atom outside the group refer to the main atom here ? Will not garbage collection cleanup the main atom/signature atom because there are no references ?

Well, if there are no references, it should be dead stripped, right?

A typical use of group COMDAT is that you have a function with an inline definition and that function has a static local variable. You have two atoms: the function atom and an atom for the data variable. They are bound together in a group. Meaning, either they both are used, or neither is used. The “signature” of the group is the (mangled) name of the function. If nothing is using that function, and the resolver it told to dead strip, it will remove the function and variable. If another object file also defines the function (and variable), one copy of function will be coalesced away (mergeAsWeak) which also (because of the group reference) also coalesces away its variable atom copy.

-Nick