[lld] R_MIPS_HI16 / R_MIPS_LO16 calculation

Hi,

I am working on support R_MIPS_HI16 / R_MIPS_LO16 in the new LLD and
have a couple of questions.

== Q1
In case of MIPS O32 ABI we have to find a matching R_MIPS_LO16
relocation to calculate R_MIPS_HI16 one because R_MIPS_HI16 uses
combined addend (AHI << 16) + (short)ALO where AHI is original
R_MIPS_HI16 addend and ALO is addend of the matching R_MIPS_LO16
relocation [1]. There are two methods to do matching and R_MIPS_HI16
calculation.

Method A:
1. Postpone R_MIPS_HI16 relocation calculation and record its arguments.
2. When R_MIPS_LO16 is found, iterate over recorded R_MIPS_HI16
relocations, calculate combined addend and apply relocations.
3. At the end check orphaned (without R_MIPS_LO16 pair) R_MIPS_HI16
relocations, show warnings and apply them with zero addend.

Method B:
1. Each time we have found R_MIPS_HI16 relocation, iterate remaining
relocations list to find matching R_MIPS_LO16.
2. Calculate combined adddend and apply relocation or show warning if
the R_MIPS_LO16 is not found.

Method A requires some sort of container to keep postponed HI16
relocations. If we add the container to the `MipsTargetInfo` class we
will be able to hide all this unusual scheme inside MIPS specific code
and will not need to perform LO16 lookups. But the `MipsTargetInfo`
becomes stateful.

Method B keeps the `MipsTargetInfo` stateless but requires forward
LO16 lookup for each HI16 relocation and requires to provide an
interface for such lookup to the `MipsTargetInfo`.

Sure we can implement each of these methods somewhere in the
`InputSectionBase` class under `if (MIPS)` statements.

Any opinions about the best method / approach?

== Q2

R_MIPS_HI16 and R_MIPS_LO16 relocations perform a special calculation
if a target symbol's name is `_gp_disp` [2]. AFAIK now in the target
`relocateOne` method there is no chance to get the traget symbol name.
Is it okay to pass the target symbol index and provide
`MipsTargetInfo` access to the symbol table of the processing input
file?

[1] ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf page 4-18
[2] ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf page 4-19

Hi,

I am working on support R_MIPS_HI16 / R_MIPS_LO16 in the new LLD and
have a couple of questions.

== Q1
In case of MIPS O32 ABI we have to find a matching R_MIPS_LO16
relocation to calculate R_MIPS_HI16 one because R_MIPS_HI16 uses
combined addend (AHI << 16) + (short)ALO where AHI is original
R_MIPS_HI16 addend and ALO is addend of the matching R_MIPS_LO16
relocation [1]. There are two methods to do matching and R_MIPS_HI16
calculation.

Method A:
1. Postpone R_MIPS_HI16 relocation calculation and record its arguments.
2. When R_MIPS_LO16 is found, iterate over recorded R_MIPS_HI16
relocations, calculate combined addend and apply relocations.
3. At the end check orphaned (without R_MIPS_LO16 pair) R_MIPS_HI16
relocations, show warnings and apply them with zero addend.

Method B:
1. Each time we have found R_MIPS_HI16 relocation, iterate remaining
relocations list to find matching R_MIPS_LO16.
2. Calculate combined adddend and apply relocation or show warning if
the R_MIPS_LO16 is not found.

Method A requires some sort of container to keep postponed HI16
relocations. If we add the container to the `MipsTargetInfo` class we
will be able to hide all this unusual scheme inside MIPS specific code
and will not need to perform LO16 lookups. But the `MipsTargetInfo`
becomes stateful.

Method B keeps the `MipsTargetInfo` stateless but requires forward
LO16 lookup for each HI16 relocation and requires to provide an
interface for such lookup to the `MipsTargetInfo`.

Sure we can implement each of these methods somewhere in the
`InputSectionBase` class under `if (MIPS)` statements.

Any opinions about the best method / approach?

If I understand that spec correctly, an R_MIPS_HI16 should immediately be
followed by an R_MIPS_LO16. Can't you use that property? It doesn't seem to
me that you really have to search and pair up HI16 and LO16 relocations.

== Q2

R_MIPS_HI16 and R_MIPS_LO16 relocations perform a special calculation
if a target symbol's name is `_gp_disp` [2]. AFAIK now in the target
`relocateOne` method there is no chance to get the traget symbol name.
Is it okay to pass the target symbol index and provide
`MipsTargetInfo` access to the symbol table of the processing input
file?

One way is to add a SymbolBody* field to Out<ELFT> struct, and let it have
a pointer to _gp_disp symbol. And do pointer comparison to check if a
relocation target is _gp_disp or not.

[1] ftp://www.linux-mips.org/pub/linux/mips/doc/ABI/mipsabi.pdf page 4-18

It is a question what the ABI authors did mean by the "R_MIPS_HI16
must have an associated R_MIPS_LO16 entry immediately following it"
phrase. In fact you can get from a compiler this code:

lui $t0,%hi(sym1+4) # R_MIPS_HI16
lui $t0,%hi(sym1+8) # R_MIPS_HI16
lui $t0,%hi(sym1+12) # R_MIPS_HI16
addi $t0,$t0,%lo(sym1+16) # R_MIPS_LO16

and even such code:

lui $t0,%hi(sym1) # R_MIPS_HI16 on sym1
lui $t0,%hi(sym2) # R_MIPS_HI16 on sym2
addi $t0,$t0,%lo(sym1) # R_MIPS_LO16 on sym1
addi $t0,$t0,%lo(sym2) # R_MIPS_LO16 on sym2

fortunately I have never seen such code:

lui $t0,%hi(sym1+12) # R_MIPS_HI16
... other type (except HI16 / LO16) of relocation here
addi $t0,$t0,%lo(sym1+16) # R_MIPS_LO16

>>
>> In case of MIPS O32 ABI we have to find a matching R_MIPS_LO16
>> relocation to calculate R_MIPS_HI16 one because R_MIPS_HI16 uses
>> combined addend (AHI << 16) + (short)ALO where AHI is original
>> R_MIPS_HI16 addend and ALO is addend of the matching R_MIPS_LO16
>> relocation [1]. There are two methods to do matching and R_MIPS_HI16
>> calculation.
>>
>> Method A:
>> 1. Postpone R_MIPS_HI16 relocation calculation and record its arguments.
>> 2. When R_MIPS_LO16 is found, iterate over recorded R_MIPS_HI16
>> relocations, calculate combined addend and apply relocations.
>> 3. At the end check orphaned (without R_MIPS_LO16 pair) R_MIPS_HI16
>> relocations, show warnings and apply them with zero addend.
>>
>> Method B:
>> 1. Each time we have found R_MIPS_HI16 relocation, iterate remaining
>> relocations list to find matching R_MIPS_LO16.
>> 2. Calculate combined adddend and apply relocation or show warning if
>> the R_MIPS_LO16 is not found.
>>
>> Method A requires some sort of container to keep postponed HI16
>> relocations. If we add the container to the `MipsTargetInfo` class we
>> will be able to hide all this unusual scheme inside MIPS specific code
>> and will not need to perform LO16 lookups. But the `MipsTargetInfo`
>> becomes stateful.
>>
>> Method B keeps the `MipsTargetInfo` stateless but requires forward
>> LO16 lookup for each HI16 relocation and requires to provide an
>> interface for such lookup to the `MipsTargetInfo`.
>>
>> Sure we can implement each of these methods somewhere in the
>> `InputSectionBase` class under `if (MIPS)` statements.
>>
>> Any opinions about the best method / approach?
>
>
> If I understand that spec correctly, an R_MIPS_HI16 should immediately be
> followed by an R_MIPS_LO16. Can't you use that property? It doesn't seem
to
> me that you really have to search and pair up HI16 and LO16 relocations.

It is a question what the ABI authors did mean by the "R_MIPS_HI16
must have an associated R_MIPS_LO16 entry immediately following it"
phrase. In fact you can get from a compiler this code:

lui $t0,%hi(sym1+4) # R_MIPS_HI16
lui $t0,%hi(sym1+8) # R_MIPS_HI16
lui $t0,%hi(sym1+12) # R_MIPS_HI16
addi $t0,$t0,%lo(sym1+16) # R_MIPS_LO16

The first two relocations don't conform to the standard because there are
no corresponding LO16 relocations, no?

and even such code:

lui $t0,%hi(sym1) # R_MIPS_HI16 on sym1
lui $t0,%hi(sym2) # R_MIPS_HI16 on sym2
addi $t0,$t0,%lo(sym1) # R_MIPS_LO16 on sym1
addi $t0,$t0,%lo(sym2) # R_MIPS_LO16 on sym2

Hmm, isn't this a violation of the ABI? My interpretation of "[e]ach
relocation type of R_MIPS_HI16 must have an associated R_MIPS_LO16 entry
immediately following it in the list of relocations" is not ambiguous to
allow them. Is there any chance to fix the compiler? (I guess there isn't,
though.)

fortunately I have never seen such code:

Strictly speaking yes, it is a violation. But it is not a bug of the
single compiler. You can find such code everywhere from various
versions of libc compiled by different versions of gcc and to the code
produced by Clang.

Moreover, I scan through the libc code and found some places where
R_MIPS_HI16 / R_MIPS_LO16 pairs are interleaved with other types of
relocations.

I’m not sure if I understand the semantics of HI16 and LO16 relocations. If my understanding is correct, a pair of HI16 and LO16 represents an addend AHL. AHL is computed by (AHI<<16) | (ALO&0xFFFF). Can’t we apply HI16 and LO16 relocations separately and produce the same relocation result? Do we have to pair them up before applying relocations?

The correct formula for the combined addend is (AHI << 16) +
(short)ALO. So the combined addend depends on both AHI and ALO
addends, therefore ALO affects result of R_MIPS_HI16 relocation.

Current version of bfd GNU linker looks up R_MIPS_LO16 relocation each
time it needs to calculate R_MIPS_HI16 relocation. It uses
`mips_elf_add_lo16_rel_addend` function for that
(sourceware.org Git - binutils-gdb.git/summary).

> I'm not sure if I understand the semantics of HI16 and LO16 relocations.
If
> my understanding is correct, a pair of HI16 and LO16 represents an addend
> AHL. AHL is computed by (AHI<<16) | (ALO&0xFFFF). Can't we apply HI16 and
> LO16 relocations separately and produce the same relocation result? Do we
> have to pair them up before applying relocations?

The correct formula for the combined addend is (AHI << 16) +
(short)ALO. So the combined addend depends on both AHI and ALO
addends, therefore ALO affects result of R_MIPS_HI16 relocation.

Does that mean it is impossible to process HI16 and LO16 separately?

If you apply only HI16 relocation, the relocation target will have a value
AHI<<16. Next time when you apply LO16 relocation to the same address,
you'll be able to see the previous result of HI16 relocation. So you don't
have to combine them before applying relocations, no?

HI16 and LO16 applied to the different place in the code. Take a look
at the typical example below. So you have to apply the relocations
separately but calculate them together.

00000000 <main>:
   0: 3c1c0000 lui gp,0x0
                        0: R_MIPS_HI16 _gp_disp
   4: 279c0000 addiu gp,gp,0
                        4: R_MIPS_LO16 _gp_disp

HI16 / LO16 relocations use a combined addend AHL. The R_MIPS_LO16
uses low 16 bits of the (S + AHL) expression so HI16 addend does not
affect its result. But LO16 addend might affect the result of the HI16
relocation.

One way to handle that would be to use a stack to save previous locations
of HI16 relocations. When you see a LO16 relocation, pop an address from
the stack, and fix the address at that location.

That should work, but I can see a problem: that makes Target stateful.
That's probably we should avoid since we want to apply relocations in
parallel in future.

So maybe you want to search for a corresponding LO16 relocation when you
see a HI16 relocation at least for now? I have no good idea about how to
handle them.