[lld] ARM/Thumb atom forming

Hi guys,

I'm working on ARM architecture support for lld.
I faced the problem with ARM/Thumb symbols described below.

ARM ELF Reference specifies that symbols addressing Thumb instructions
have zero bit of st_value field set (see 4.5.3).
General ELF Reference says that st_value holds virtual address offset
from the beginning of the section
for executable files and shared objects (see Chapter 4 - Symbol Values).

When atoms are created in ELFFile::createAtoms, their content size and
content data, and their addresses are formed using st_value.
Since st_value has zero bit set for symbols addressing Thumb
instructions, corresponding atoms' addresses are always
one byte ahead of real values.
Content size and, therefore, content data may also be wrong for both ARM
and Thumb symbols depending on their order (see ELFFile::symbolContentSize):
when content size is calculated, it takes the difference between offsets
of two adjacent symbols, and if one of them is Thumb, and the other is not,
the resulting value will be one byte smaller or one byte larger than
expected.
Therefore, atom's content data is also malformed since it uses given
miscalculated content size value.

Such a wrong behavior results in:
- situations when the very first instruction of an atom has the first
byte set to zero
(if there's a gap between previous atom and the current, the initial
instruction's first byte is skipped)
- situations when the very first instruction is split between two atoms
(the right atom which should hold the instruction, and the
previous one, which "stole" the very first byte of the initial instruction)

Is there a way to override this behavior so that both ARM and Thumb atoms
formed correctly, and that I can distinguish between them in the later
stages
for proper relocation calculations?

Regards!

Thanks, Shankar.

I needed to override all the places where st_value had been used, and it worked.

But there another problem appeared: after correcting all atoms, I cannot distinguish between ARM and Thumb symbols in the further stages when fixing up relocations.

I used to check targetVAddress (in terms of the relocation handler) since it contained 1 in the least bit when addressing Thumb symbols. Now targetVAddress always contains 0 in the least bit, because atoms are properly aligned and have proper contents.

I tried applying a workaround and use dyn_cast to retrieve information from overridden ARMELFDefinedAtoms, but DefinedAtoms’ children do not support dyn_casts.

In general, I can describe the issue as inability to pass extra information between linking stages (passes).
Is there a way to do that?

The solution I see is to add a sort of custom context with abstract interface passed along different stages, and directly cast it to specific implementation where needed. That’s a lot of changes though, so I’d like to hear more thoughts.

Regards,
Denis.

You could use the codemodel to say that the code is Thumb for thumb code.

Shankar, thank you again.
Now I see how this trick is done in MIPS, so I’ll use it as a reference.

  • Denis.