[lld][ELF] How to transfer st_other field value from input to output file

Hi,

On MIPS st_other field in the ELF symbols table might contain some
additional MIPS-specific flags besides visibility ones. These flags
should be copied to the output linked file. If YAML => Native
conversion is switched off, there is no problem. But in case of the
conversion we lose st_other field values.

So I need an advice how to keep this information. Is it a good idea to
extend YAML and Native format to store these data? Is there any
alternative solutions?

Thanks.

One way to do that is to add new visibility / contentTypes (whatever is relevant) added for each of the values st_other picks ?

What are the other values st_other can take on MIPS ?

Shankar Easwaran

STO_MIPS16 and STO_MICROMIPS flags denote that the symbol use a
different "compressed" instructions encoding. Both these flags can be
combined with usual "visibility" flags.

It looks like adding new flag into the contentTypes set might solve
the problem. Thanks for the idea. I try to implement it.

I was too optimistic. It is possible to use the contentTypes field for
handling STO_MICROMIPS and I have a working solution but the solution
is really ugly. This approach has at least two the following
shortcomings:

1. A MIPS ELF symbol can hold multiple STO_xxx flags stored in the
st_other field (STO_MIPS_PIC, STO_MIPS_MICROMIPS, STO_MIPS_MIPS16
...). Sometimes these flags can be even combined. If we use the
contentTypes field, we have to define a separate ContentType flag for
each such combination. So we get a combinatorics explosion.

2. If we handle MIPS specific ContentType flags together with other
flags, it is pollute the common ELF code. If we factor out the
processing of MIPS specific flags, we have to duplicate code because a
symbol with say STO_MICROMIPS flag should be processed (setup size,
permissions etc) the same way as a regular DefinedAtom::typeCode
symbol.

I considered to create a map symbol name => symbol flags, fill this
map while read object files, and use the map while write a linked
file. But I need to handle both local and global symbols and it is
possible to get symbols with the same name.

It looks like the only solution (if I do not miss anything else) is to
add one more filed to the DefinedAtom class to hold
target/architecture specific set of flags and modify Native and YAML
formats correspondingly. Interpretation of this field is completely
target/architecture dependent.

Any opinions?

I had a similar issue with arm vs thumb in mach-o. Each function’s thumbness is marked in its symbol table entry.

But it is even worse, a function could change encoding in the middle (only hand coded assembly could do this).

My solution was to add a new Reference Kind for mach-o which is the current instruction encoding. The offsetInAtom() is the offset where the encoding kind changes. Usually there is just one at offset zero that sets the encoding for the whole function. So determining the thumbness requires scanning the References. But it turns out in practice the scan is rarely done because the result can be cached by whatever algorithm needs that info.

-Nick

lld needs to have some way to encode flavor specific attributes/target specific attributes. This is becoming more important IMHO.

Shankar Easwaran

I did the same trick to mark some sort of MIPS GOT entries. And sure I
can do the same thing in the current case. But I think using Reference
as a flag is not a good solution.

Anyway if I do not find a better solution, I will have to use this
workaround once again.

I did the same trick to mark some sort of MIPS GOT entries. And sure I
can do the same thing in the current case. But I think using Reference
as a flag is not a good solution.

Anyway if I do not find a better solution, I will have to use this
workaround once again.

If you really have multiple different code types in one atom, then you must have a list of transition points. The Reference list is a great match for that, whereas an atom level attribute would not work.

-Nick

I agree. I think we cannot fit all our supported architectures and
targets into the common format without any arch/tgt specific
information.

This falls into the usual topic that whether or not we should have a generic map attached to an atom. You used a reference as an alternative for the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to me. It’s awkward at least. Why don’t you add an accessor to the attribute you want to DefinedAtom? We’ll have a few or maybe ten more member functions in DefinedAtom, but it’s not bad – architectures that don’t need them are able to just not use them. And the number of attributes we want is limited because the number of architectures we want to support in LLD is not that many.

This falls into the usual topic that whether or not we should have a generic map attached to an atom. You used a reference as an alternative for the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to me. It’s awkward at least. Why don’t you add an accessor to the attribute you want to DefinedAtom? We’ll have a few or maybe ten more member functions in DefinedAtom, but it’s not bad – architectures that don’t need them are able to just not use them. And the number of attributes we want is limited because the number of architectures we want to support in LLD is not that many.

If there are architecture/platform specific atom attributes, I’ve fine with adding more accessors to DefinedAtom. We just need to review them to see if there is similar needs on multiple flavors and design names and values that are clear.

Regarding References, the ELF flavor puts the raw ELF relocation type as the Reference Kind. Mach-o does not do that. The mach-o relocation type is only 4 bits. You need to process lots of other information (including other bits in the reloc record, the instruction content, and perhaps a “paired” relocation to determine the “kind”). So, Mach-O Reference Kind values are abstract and internal to the mach-o ArchHandler. Given that, using a Reference Kind to track thumbness (which only ArchHander_arm cares about), works well.

That said, the ability to handle thumb and arm within a function is probably over engineering. I’d be fine with adding to DefinedAtom something like:

enum CodeModel {
// Note: all these values need word smithing
codeNA,
codeMIPS_PIC,

codeMIPS_micro,

codeMIPS_16,

codeARM_16,

codeARM_32,

};

virtual CodeModel codeModel() { return codeNA; }

-Nick

This falls into the usual topic that whether or not we should have a
generic map attached to an atom. You used a reference as an alternative for
the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to
me. It's awkward at least. Why don't you add an accessor to the attribute
you want to DefinedAtom? We'll have a few or maybe ten more member
functions in DefinedAtom, but it's not bad -- architectures that don't need
them are able to just not use them. And the number of attributes we want is
limited because the number of architectures we want to support in LLD is
not that many.

If there are architecture/platform specific atom attributes, I’ve fine
with adding more accessors to DefinedAtom. We just need to review them to
see if there is similar needs on multiple flavors and design names and
values that are clear.

Regarding References, the ELF flavor puts the raw ELF relocation type as
the Reference Kind. Mach-o does not do that. The mach-o relocation type
is only 4 bits. You need to process lots of other information (including
other bits in the reloc record, the instruction content, and perhaps a
“paired” relocation to determine the “kind”). So, Mach-O Reference Kind
values are abstract and internal to the mach-o ArchHandler. Given that,
using a Reference Kind to track thumbness (which only ArchHander_arm cares
about), works well.

That said, the ability to handle thumb and arm within a function is
probably over engineering. I’d be fine with adding to DefinedAtom
something like:

  enum CodeModel {
     // Note: all these values need word smithing
    codeNA,
    codeMIPS_PIC,
    codeMIPS_micro,
    codeMIPS_16,
    codeARM_16,
    codeARM_32,
};

virtual CodeModel codeModel() { return codeNA; }

Yup, that looks good. That would reduce the amount of code and the
complexity, I guess. We may want to add some prefix (like "machOCodeModel")
for that kind of stuff to make it easy to identify it's used for MachO.

You made a good point that two or more architectures may have a similar or
the same need and want to share the accessors. They have to be designed
carefully and named accordingly. We can coordinate that by sending a patch
to review if it touches DefinedAtom.

Looks good. Let's go this way. I will try to implement the ELF/MIPS
side keeping in mind Mach-O requirements and send the patch to review.

This falls into the usual topic that whether or not we should have a generic map attached to an atom. You used a reference as an alternative for the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to me. It’s awkward at least. Why don’t you add an accessor to the attribute you want to DefinedAtom? We’ll have a few or maybe ten more member functions in DefinedAtom, but it’s not bad – architectures that don’t need them are able to just not use them. And the number of attributes we want is limited because the number of architectures we want to support in LLD is not that many.

If there are architecture/platform specific atom attributes, I’ve fine with adding more accessors to DefinedAtom. We just need to review them to see if there is similar needs on multiple flavors and design names and values that are clear.

Regarding References, the ELF flavor puts the raw ELF relocation type as the Reference Kind. Mach-o does not do that. The mach-o relocation type is only 4 bits. You need to process lots of other information (including other bits in the reloc record, the instruction content, and perhaps a “paired” relocation to determine the “kind”). So, Mach-O Reference Kind values are abstract and internal to the mach-o ArchHandler. Given that, using a Reference Kind to track thumbness (which only ArchHander_arm cares about), works well.

That said, the ability to handle thumb and arm within a function is probably over engineering. I’d be fine with adding to DefinedAtom something like:

enum CodeModel {
// Note: all these values need word smithing
codeNA,
codeMIPS_PIC,

codeMIPS_micro,

codeMIPS_16,

codeARM_16,

codeARM_32,

};

virtual CodeModel codeModel() { return codeNA; }

How would this handle getting the code model right for things like x86 boot code that starts in “real” mode then switches to protected mode, typically within the same Atom? Very analogous to your example of a function that has some portions in thumb mode and some in ARM mode.

This falls into the usual topic that whether or not we should have a generic map attached to an atom. You used a reference as an alternative for the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to me. It’s awkward at least. Why don’t you add an accessor to the attribute you want to DefinedAtom? We’ll have a few or maybe ten more member functions in DefinedAtom, but it’s not bad – architectures that don’t need them are able to just not use them. And the number of attributes we want is limited because the number of architectures we want to support in LLD is not that many.

If there are architecture/platform specific atom attributes, I’ve fine with adding more accessors to DefinedAtom. We just need to review them to see if there is similar needs on multiple flavors and design names and values that are clear.

Regarding References, the ELF flavor puts the raw ELF relocation type as the Reference Kind. Mach-o does not do that. The mach-o relocation type is only 4 bits. You need to process lots of other information (including other bits in the reloc record, the instruction content, and perhaps a “paired” relocation to determine the “kind”). So, Mach-O Reference Kind values are abstract and internal to the mach-o ArchHandler. Given that, using a Reference Kind to track thumbness (which only ArchHander_arm cares about), works well.

That said, the ability to handle thumb and arm within a function is probably over engineering. I’d be fine with adding to DefinedAtom something like:

enum CodeModel {
// Note: all these values need word smithing
codeNA,
codeMIPS_PIC,

codeMIPS_micro,

codeMIPS_16,

codeARM_16,

codeARM_32,

};

virtual CodeModel codeModel() { return codeNA; }

How would this handle getting the code model right for things like x86 boot code that starts in “real” mode then switches to protected mode, typically within the same Atom? Very analogous to your example of a function that has some portions in thumb mode and some in ARM mode.

Turns out the linker does not need to know about real mode vs protected mode. The relocations are all the same. Arm/thumb is special in that the linker needs to know how to switch BL<->BLX depending on the mode of the target. None of that is needed for intel modes (that I’ve seen). The one thing I remember having to add to ld64 for supporting the bit-in-16-bit-mode code for Intel is support for CALL.W which uses a 16-bit pc-rel fix up.

-Nick

This falls into the usual topic that whether or not we should have a generic map attached to an atom. You used a reference as an alternative for the map in this case but the basic idea is the same.

Although using a reference would be practical, it still feels a hack to me. It’s awkward at least. Why don’t you add an accessor to the attribute you want to DefinedAtom? We’ll have a few or maybe ten more member functions in DefinedAtom, but it’s not bad – architectures that don’t need them are able to just not use them. And the number of attributes we want is limited because the number of architectures we want to support in LLD is not that many.

If there are architecture/platform specific atom attributes, I’ve fine with adding more accessors to DefinedAtom. We just need to review them to see if there is similar needs on multiple flavors and design names and values that are clear.

Regarding References, the ELF flavor puts the raw ELF relocation type as the Reference Kind. Mach-o does not do that. The mach-o relocation type is only 4 bits. You need to process lots of other information (including other bits in the reloc record, the instruction content, and perhaps a “paired” relocation to determine the “kind”). So, Mach-O Reference Kind values are abstract and internal to the mach-o ArchHandler. Given that, using a Reference Kind to track thumbness (which only ArchHander_arm cares about), works well.

That said, the ability to handle thumb and arm within a function is probably over engineering. I’d be fine with adding to DefinedAtom something like:

enum CodeModel {
// Note: all these values need word smithing
codeNA,
codeMIPS_PIC,

codeMIPS_micro,

codeMIPS_16,

codeARM_16,

codeARM_32,

};

virtual CodeModel codeModel() { return codeNA; }

How would this handle getting the code model right for things like x86 boot code that starts in “real” mode then switches to protected mode, typically within the same Atom? Very analogous to your example of a function that has some portions in thumb mode and some in ARM mode.

Turns out the linker does not need to know about real mode vs protected mode. The relocations are all the same. Arm/thumb is special in that the linker needs to know how to switch BL<->BLX depending on the mode of the target. None of that is needed for intel modes (that I’ve seen). The one thing I remember having to add to ld64 for supporting the bit-in-16-bit-mode code for Intel is support for CALL.W which uses a 16-bit pc-rel fix up.

Ah, right. Makes sense. Thanks!

> That said, the ability to handle thumb and arm within a function is
probably
> over engineering. I’d be fine with adding to DefinedAtom something like:
>
> enum CodeModel {
> // Note: all these values need word smithing
> codeNA,
> codeMIPS_PIC,
> codeMIPS_micro,
> codeMIPS_16,
> codeARM_16,
> codeARM_32,
> };
>
> virtual CodeModel codeModel() { return codeNA; }

Looks good. Let's go this way. I will try to implement the ELF/MIPS
side keeping in mind Mach-O requirements and send the patch to review.

Sounds good.

Please review the patch: http://reviews.llvm.org/D6236