Do I need to modify the AddrLoc of LLD for ARC target?

Hello Leslie,

I think we are going to need to know a bit more about the ELF ABI for
what looks like the ArcCompact before we can help you.

LLD's calculation of P (the place to be relocated) is as it is in the
generic ELF specification. The Rel.Offset corresponds to the ELF
r_offset field. This is covered by: "For a relocatable file, the value
is the byte offset from the beginning of the section to the storage
unit affected by the relocation."

For LLD we are calculating the virtual address (VA) of P, as I
understand it this is equivalent to the vma used in BFD. Assuming that
the relocation is relocating a regular InputSection from the
basic-arc.o object then the LLD calculation of P =
getOutputSection()->Addr + getOffset(Rel.Offset); translates to: (VA
of OutputSection) + (Offset of InputSection within OutputSection) +
(Offset within InputSection given by r_offset)

The BFD linker seems to be doing the equivalent calculation with an
extra modification of the (Offset within InputSection given by
r_offset) and is rounding down the result to the nearest 4-byte
boundary. This looks unfamiliar to me, and could well be specific to
ArcCompact. I think that you will need to refer to the ELF ABI
documentation as this should tell you if there are any processor
specific modifications to generic ELF that you have to follow.

The other thing that you should do is try and work out why the VA
(vma) is 6 in LD and 8 in LLD and whether this is actually a problem.
The VA of the OutputSection is not guaranteed to be the same between
different linkers so it may have just been that differences in order
of InputSections or alignment has caused a different VA. I would check
the output of the linker map file to see where it placed the Output
and Input Sections to see what the answer should be.

In summary:
It looks like there are some Arc specific things that might need to be
done. Unfortunately I don't have any experience with Arc, and I'm not
sure the other people that work on LLD do either. I suggest looking at
the public ABI documentation and making any arguments for changes
based on that documentation, it is worth assuming that we know nothing
about Arc, don't have the documentation to hand and don't know where
to find it!

Hope that is of some help, with a bit more context I might be able to
help a bit more, unfortunately I can't spend a lot of time learning
about Arc.

Peter

Hi Peter,

Thanks for your kind response!

Hello Leslie,

I think we are going to need to know a bit more about the ELF ABI for
what looks like the ArcCompact before we can help you.

https://github.com/foss-for-synopsys-dwc-arc-processors/arc-ABI-manual

But I prefer to read bfd linker's source code about ARC instead:
1. Specific e_flags https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2017.09/include/elf/arc.h
2. Relocation define https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2017.09/include/elf/arc-reloc.def
3. Relocation replace function https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2017.09/include/opcode/arc-func.h
4. Calculation of S, A, P, PDATA, GOT, etc. https://github.com/foss-for-synopsys-dwc-arc-processors/binutils-gdb/blob/arc-2017.09/bfd/elf32-arc.c#L1156

LLD's calculation of P (the place to be relocated) is as it is in the
generic ELF specification. The Rel.Offset corresponds to the ELF
r_offset field. This is covered by: "For a relocatable file, the value
is the byte offset from the beginning of the section to the storage
unit affected by the relocation."

For LLD we are calculating the virtual address (VA) of P, as I
understand it this is equivalent to the vma used in BFD. Assuming that
the relocation is relocating a regular InputSection from the
basic-arc.o object then the LLD calculation of P =
getOutputSection()->Addr + getOffset(Rel.Offset); translates to: (VA
of OutputSection) + (Offset of InputSection within OutputSection) +
(Offset within InputSection given by r_offset)

The BFD linker seems to be doing the equivalent calculation with an
extra modification of the (Offset within InputSection given by
r_offset) and is rounding down the result to the nearest 4-byte
boundary. This looks unfamiliar to me, and could well be specific to
ArcCompact. I think that you will need to refer to the ELF ABI
documentation as this should tell you if there are any processor
specific modifications to generic ELF that you have to follow.

I implemented the MOD P for ARC:

static void modifyARCAddrLoc(uint64_t &AddrLoc, const uint16_t EMachine,
RelExpr Expr, uint32_t Type, uint64_t VMA,
uint64_t OutSecOff, uint64_t RelOff) {
if (EMachine != EM_ARC_COMPACT || EMachine != EM_ARC_COMPACT2 ||
Expr != R_PC || Expr != R_GOT_PC) {
return;
}

uint64_t M = 0;
if (Type == R_ARC_32_PCREL || Type == R_ARC_PC32 || Type == R_ARC_GOTPC32 ||
Type == R_ARC_GOTPC) {
M = 4; // bitsize >= 32 ? 4 : 0
}
AddrLoc = (VMA + OutSecOff + RelOff - M) & ~0x3;
}

modifyARCAddrLoc(AddrLoc, Config->EMachine, Expr, Type,
getOutputSection()->Addr, <-- VMA is important!
cast<InputSection>(this)->OutSecOff, Rel.Offset);

The other thing that you should do is try and work out why the VA
(vma) is 6 in LD and 8 in LLD and whether this is actually a problem.
The VA of the OutputSection is not guaranteed to be the same between
different linkers so it may have just been that differences in order
of InputSections or alignment has caused a different VA. I would check
the output of the linker map file to see where it placed the Output
and Input Sections to see what the answer should be.

LLD's getOutputSection()->Addr = https://github.com/llvm-mirror/lld/blob/master/ELF/LinkerScript.cpp#L530

Hello Leslie,

I think we are going to need to know a bit more about the ELF ABI for
what looks like the ArcCompact before we can help you.

LLD's calculation of P (the place to be relocated) is as it is in the
generic ELF specification. The Rel.Offset corresponds to the ELF
r_offset field. This is covered by: "For a relocatable file, the value
is the byte offset from the beginning of the section to the storage
unit affected by the relocation."

For LLD we are calculating the virtual address (VA) of P, as I
understand it this is equivalent to the vma used in BFD. Assuming that
the relocation is relocating a regular InputSection from the
basic-arc.o object then the LLD calculation of P =
getOutputSection()->Addr + getOffset(Rel.Offset); translates to: (VA
of OutputSection) + (Offset of InputSection within OutputSection) +
(Offset within InputSection given by r_offset)

The BFD linker seems to be doing the equivalent calculation with an
extra modification of the (Offset within InputSection given by
r_offset) and is rounding down the result to the nearest 4-byte
boundary. This looks unfamiliar to me, and could well be specific to
ArcCompact. I think that you will need to refer to the ELF ABI
documentation as this should tell you if there are any processor
specific modifications to generic ELF that you have to follow.

The other thing that you should do is try and work out why the VA
(vma) is 6 in LD and 8 in LLD and whether this is actually a problem.
The VA of the OutputSection is not guaranteed to be the same between
different linkers so it may have just been that differences in order
of InputSections or alignment has caused a different VA. I would check
the output of the linker map file to see where it placed the Output
and Input Sections to see what the answer should be.

VMA in LD and LLD is the same for AVR target https://reviews.llvm.org/D37615

$ avr-gcc -mmcu=atmega328p -o basic-avr.o -c basic-avr.s
$ avr-ld -o basic-avr basic-avr.o -Ttext=6 /*with-different-high-address*/

DEBUG: avr-ld: R_AVR_CALL: VMA: 6 Output Offset: 0 Reloc Offset: 0
DEBUG: avr-ld: R_AVR_16_PM: VMA: 6 Output Offset: 0 Reloc Offset: 4
DEBUG: avr-ld: R_AVR_8: VMA: 6 Output Offset: 0 Reloc Offset: 6
DEBUG: avr-ld: R_AVR_8_LO8: VMA: 6 Output Offset: 0 Reloc Offset: 7
DEBUG: avr-ld: R_AVR_8_HI8: VMA: 6 Output Offset: 0 Reloc Offset: 8
DEBUG: avr-ld: R_AVR_8_HLO8: VMA: 6 Output Offset: 0 Reloc Offset: 9
DEBUG: avr-ld: R_AVR_LO8_LDI: VMA: 6 Output Offset: 0 Reloc Offset: 10
DEBUG: avr-ld: R_AVR_HI8_LDI: VMA: 6 Output Offset: 0 Reloc Offset: 12
DEBUG: avr-ld: R_AVR_HH8_LDI: VMA: 6 Output Offset: 0 Reloc Offset: 14
DEBUG: avr-ld: R_AVR_MS8_LDI: VMA: 6 Output Offset: 0 Reloc Offset: 16
DEBUG: avr-ld: R_AVR_LDI: VMA: 6 Output Offset: 0 Reloc Offset: 18
DEBUG: avr-ld: R_AVR_LO8_LDI_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 20
DEBUG: avr-ld: R_AVR_HI8_LDI_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 22
DEBUG: avr-ld: R_AVR_HH8_LDI_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 24
DEBUG: avr-ld: R_AVR_MS8_LDI_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 26
DEBUG: avr-ld: R_AVR_LO8_LDI_PM: VMA: 6 Output Offset: 0 Reloc Offset: 28
DEBUG: avr-ld: R_AVR_HI8_LDI_PM: VMA: 6 Output Offset: 0 Reloc Offset: 30
DEBUG: avr-ld: R_AVR_HH8_LDI_PM: VMA: 6 Output Offset: 0 Reloc Offset: 32
DEBUG: avr-ld: R_AVR_LO8_LDI_PM_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 34
DEBUG: avr-ld: R_AVR_HI8_LDI_PM_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 36
DEBUG: avr-ld: R_AVR_HH8_LDI_PM_NEG: VMA: 6 Output Offset: 0 Reloc Offset: 38
DEBUG: avr-ld: R_AVR_7_PCREL: VMA: 6 Output Offset: 0 Reloc Offset: 40
DEBUG: avr-ld: R_AVR_13_PCREL: VMA: 6 Output Offset: 0 Reloc Offset: 42
DEBUG: avr-ld: R_AVR_6: VMA: 6 Output Offset: 0 Reloc Offset: 52
DEBUG: avr-ld: R_AVR_16: VMA: 6 Output Offset: 0 Reloc Offset: 56
DEBUG: avr-ld: R_AVR_LO8_LDI_GS: VMA: 6 Output Offset: 0 Reloc Offset: 58
DEBUG: avr-ld: R_AVR_HI8_LDI_GS: VMA: 6 Output Offset: 0 Reloc Offset: 60
DEBUG: avr-ld: R_AVR_PORT6: VMA: 6 Output Offset: 0 Reloc Offset: 62
DEBUG: avr-ld: R_AVR_PORT5: VMA: 6 Output Offset: 0 Reloc Offset: 64
DEBUG: avr-ld: R_AVR_6_ADIW: VMA: 6 Output Offset: 0 Reloc Offset: 66

$ llvm/build/bin/ld.lld -o basic-avr-lld basic-avr.o -Ttext=6 /*with-different-high-address*/

DEBUG: lld: R_AVR_CALL TargetVA: 50 A: 44 P: 6 VMA: 6 Output Offset: 0 Reloc Offset: 0
DEBUG: lld: R_AVR_16_PM TargetVA: 50 A: 44 P: 10 VMA: 6 Output Offset: 0 Reloc Offset: 4
DEBUG: lld: R_AVR_8 TargetVA: 50 A: 44 P: 12 VMA: 6 Output Offset: 0 Reloc Offset: 6
DEBUG: lld: R_AVR_8_LO8 TargetVA: 50 A: 44 P: 13 VMA: 6 Output Offset: 0 Reloc Offset: 7
DEBUG: lld: R_AVR_8_HI8 TargetVA: 50 A: 44 P: 14 VMA: 6 Output Offset: 0 Reloc Offset: 8
DEBUG: lld: R_AVR_8_HLO8 TargetVA: 50 A: 44 P: 15 VMA: 6 Output Offset: 0 Reloc Offset: 9
DEBUG: lld: R_AVR_LO8_LDI TargetVA: 50 A: 44 P: 16 VMA: 6 Output Offset: 0 Reloc Offset: 10
DEBUG: lld: R_AVR_HI8_LDI TargetVA: 50 A: 44 P: 18 VMA: 6 Output Offset: 0 Reloc Offset: 12
DEBUG: lld: R_AVR_HH8_LDI TargetVA: 50 A: 44 P: 20 VMA: 6 Output Offset: 0 Reloc Offset: 14
DEBUG: lld: R_AVR_MS8_LDI TargetVA: 50 A: 44 P: 22 VMA: 6 Output Offset: 0 Reloc Offset: 16
DEBUG: lld: R_AVR_LDI TargetVA: 50 A: 44 P: 24 VMA: 6 Output Offset: 0 Reloc Offset: 18
DEBUG: lld: R_AVR_LO8_LDI_NEG TargetVA: 50 A: 44 P: 26 VMA: 6 Output Offset: 0 Reloc Offset: 20
DEBUG: lld: R_AVR_HI8_LDI_NEG TargetVA: 50 A: 44 P: 28 VMA: 6 Output Offset: 0 Reloc Offset: 22
DEBUG: lld: R_AVR_HH8_LDI_NEG TargetVA: 50 A: 44 P: 30 VMA: 6 Output Offset: 0 Reloc Offset: 24
DEBUG: lld: R_AVR_MS8_LDI_NEG TargetVA: 50 A: 44 P: 32 VMA: 6 Output Offset: 0 Reloc Offset: 26
DEBUG: lld: R_AVR_LO8_LDI_PM TargetVA: 50 A: 44 P: 34 VMA: 6 Output Offset: 0 Reloc Offset: 28
DEBUG: lld: R_AVR_HI8_LDI_PM TargetVA: 50 A: 44 P: 36 VMA: 6 Output Offset: 0 Reloc Offset: 30
DEBUG: lld: R_AVR_HH8_LDI_PM TargetVA: 50 A: 44 P: 38 VMA: 6 Output Offset: 0 Reloc Offset: 32
DEBUG: lld: R_AVR_LO8_LDI_PM_NEG TargetVA: 50 A: 44 P: 40 VMA: 6 Output Offset: 0 Reloc Offset: 34
DEBUG: lld: R_AVR_HI8_LDI_PM_NEG TargetVA: 50 A: 44 P: 42 VMA: 6 Output Offset: 0 Reloc Offset: 36
DEBUG: lld: R_AVR_HH8_LDI_PM_NEG TargetVA: 50 A: 44 P: 44 VMA: 6 Output Offset: 0 Reloc Offset: 38
DEBUG: lld: R_AVR_7_PCREL TargetVA: 4 A: 44 P: 46 VMA: 6 Output Offset: 0 Reloc Offset: 40
DEBUG: lld: R_AVR_13_PCREL TargetVA: 2 A: 44 P: 48 VMA: 6 Output Offset: 0 Reloc Offset: 42
DEBUG: lld: R_AVR_6 TargetVA: 50 A: 44 P: 58 VMA: 6 Output Offset: 0 Reloc Offset: 52
DEBUG: lld: R_AVR_16 TargetVA: 50 A: 44 P: 62 VMA: 6 Output Offset: 0 Reloc Offset: 56
DEBUG: lld: R_AVR_LO8_LDI_GS TargetVA: 50 A: 44 P: 64 VMA: 6 Output Offset: 0 Reloc Offset: 58
DEBUG: lld: R_AVR_HI8_LDI_GS TargetVA: 50 A: 44 P: 66 VMA: 6 Output Offset: 0 Reloc Offset: 60
DEBUG: lld: R_AVR_PORT6 TargetVA: 51 A: 45 P: 68 VMA: 6 Output Offset: 0 Reloc Offset: 62
DEBUG: lld: R_AVR_PORT5 TargetVA: 6 A: 0 P: 70 VMA: 6 Output Offset: 0 Reloc Offset: 64
DEBUG: lld: R_AVR_6_ADIW TargetVA: 50 A: 44 P: 72 VMA: 6 Output Offset: 0 Reloc Offset: 66

interesting :slight_smile:

Hi Peter,

Thanks for your hint! it is the Alignment issue :slight_smile:

avr-ld and LLD both use the *same* Alignment 1 for input_section->output_section->vma, so VMA is same:

DEBUG: avr-ld: R_AVR_CALL: VMA: 5 Output Offset: 1 Reloc Offset: 0

DEBUG: avr-ld: R_AVR_8_LO8: VMA: 6 Output Offset: 0 Reloc Offset: 7

...

DEBUG: lld: R_AVR_CALL TargetVA: 49 A: 44 P: 5 Align: 1 VMA: 5 Output Offset: 0 Reloc Offset: 0

DEBUG: lld: R_AVR_8_LO8 TargetVA: 50 A: 44 P: 13 Align: 1 VMA: 6 Output Offset: 0 Reloc Offset: 7

...

But arc-ld use Alignment 1 and LLD use *different* Alignment 4:

alignTo https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Support/MathExtras.h#L657

alignTo(0x5, 1) = 5

alignTo(0x5, 4) = 8

alignTo(0x9, 1) = 9

alignTo(0x9, 4) = 12

alignTo(0x11, 1) = 17

alignTo(0x11, 4) = 20

...

I can *not* use updateAlignment to modify the Alignment for a smaller value, such as 1, it is monkey patch, not the root cause, so how to correctly modify the Alignment for LLD? please give me some hint, thanks a lot!

PS: arc-ld's sym_section->output_section->vma is equals to LLD's SymbolBody::getVA, so the value of (S + A) is same.

Hello Leslie,

If I understand you correctly, ld is aligning giving an Output or
Input Section alignment 1 and lld is giving an Output or Input Section
alignment 4 and you are trying to reduce the alignment in lld?

What I would do is try and work out why the Output or Input Section
has a higher alignment in lld. It is not a good idea to try and lower
the alignment of an OutputSection as this must have a high enough
alignment that every InputSection is aligned according to its
requirements, this means that at a minimum it must be set the maximum
alignment requested by an InputSection.

First think I would do would be to look at the linker map -M or
--Map=map.txt to put it in a file and look at the Output and
InputSections, if there is at least one InputSection that has
alignment 4 then I would expect the OutputSection to have alignment 4.
If all the InputSections in an OutputSection have alignment 1 I would
expect the OutputSection to have alignment 1 unless there is some
other bit of code enforcing a minimum alignment of 4.

In general overaligning an OutputSection is safe although the size of
the bytes added to align the Sections can get significant in a very
small embedded system.

Hope this is of some use.

Peter

Just a thought I had about the calculation of P. I think that
following the ld approach too closely may be a mistake.

I'm speculating that the reason for this change in the value of P is
similar to the situation in Arm for a Thumb BLX immediate instruction
(Branch Link and Exchange with the immediate an offset from the PC).
When calculating the target address the immediate is added to
Align(PC, 4) where Align rounds down to nearest 4-byte boundary. The
linker needs to account for this when resolving the relocation
R_ARM_THM_CALL.

To handle the alignment difference for this one special case in lld I
accounted for the alignment difference in relocateOne. You may be able
to use a similar method for Arc rather than writing modifyARCAddrLoc.
Again I know nothing about Arc so you'll need to look at the
Architecture reference manual to understand what the instruction the
relocation applies to works.

Peter

Hi Peter,

Map file about LD for ARC target Google Drive: Sign-in

LLD for ARC https://drive.google.com/open?id=0ByE8c-y74l_ueGVuYkR0a3RSWjQ

arm-thumb-undefined-weak.s https://github.com/llvm-mirror/lld/blob/master/test/ELF/arm-thumb-undefined-weak.s

$ llvm/build/bin/llvm-mc -filetype=obj -triple=thumbv7a-none-linux-gnueabi arm-thumb-undefined-weak.s -o arm-thumb-undefined-weak.o
$ llvm/build/bin/ld.lld -o arm-thumb-undefined-weak-lld arm-thumb-undefined-weak.o -Ttext=11006
$ arm-linux-gnu-ld -o arm-thumb-undefined-weak-ld arm-thumb-undefined-weak.o -Ttext=11006

$ arm-linux-gnu-readelf -r arm-thumb-undefined-weak.o

Relocation section '.rel.text' at offset 0x8c contains 6 entries:
Offset Info Type Sym.Value Sym. Name
00000000 00000333 R_ARM_THM_JUMP19 00000000 target
00000004 0000031e R_ARM_THM_JUMP24 00000000 target
00000008 0000030a R_ARM_THM_CALL 00000000 target
0000000c 0000030a R_ARM_THM_CALL 00000000 target
00000010 00000332 R_ARM_THM_MOVT_PR 00000000 target
00000014 00000331 R_ARM_THM_MOVW_PR 00000000 target

DEBUG: lld: R_ARM_THM_JUMP19 TargetVA: 0 A: -4 P: 69640 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 0
DEBUG: lld: R_ARM_THM_JUMP24 TargetVA: 0 A: -4 P: 69644 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 4
DEBUG: lld: R_ARM_THM_CALL TargetVA: 1 A: -4 P: 69648 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 8
DEBUG: lld: R_ARM_THM_CALL TargetVA: 1 A: -4 P: 69652 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 12
DEBUG: lld: R_ARM_THM_MOVT_PREL TargetVA: 0 A: 0 P: 69656 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 16
DEBUG: lld: R_ARM_THM_MOVW_PREL_NC TargetVA: 0 A: 0 P: 69660 Align: 4 VMA: 69640 Output Offset: 0 Reloc Offset: 20

DEBUG: arm-linux-gnu-ld: R_ARM_THM_JUMP19: VMA: 69638 Output Offset: 2 Reloc Offset: 0
DEBUG: arm-linux-gnu-ld: R_ARM_THM_JUMP24: VMA: 69638 Output Offset: 2 Reloc Offset: 4
DEBUG: arm-linux-gnu-ld: R_ARM_THM_CALL: VMA: 69638 Output Offset: 2 Reloc Offset: 8
DEBUG: arm-linux-gnu-ld: R_ARM_THM_CALL: VMA: 69638 Output Offset: 2 Reloc Offset: 12
DEBUG: arm-linux-gnu-ld: R_ARM_THM_MOVT_PREL: VMA: 69638 Output Offset: 2 Reloc Offset: 16
DEBUG: arm-linux-gnu-ld: R_ARM_THM_MOVW_PREL_NC: VMA: 69638 Output Offset: 2 Reloc Offset: 20

$ llvm/build/bin/llvm-objdump -triple=thumbv7a-none-linux-gnueabi -d arm-thumb-undefined-weak-lld

arm-thumb-undefined-weak-lld: file format ELF32-arm-little

Disassembly of section .text:
_start:
11008: 00 f0 00 80 beq.w #0 <_start+0x4>
1100c: 00 f0 00 b8 b.w #0 <_start+0x8>
11010: 00 f0 00 f8 bl #0
11014: 00 f0 00 f8 bl #0
11018: c0 f2 00 00 movt r0, #0
1101c: 40 f2 00 00 movw r0, #0

$ llvm/build/bin/llvm-objdump -triple=thumbv7a-none-linux-gnueabi -d arm-thumb-undefined-weak-ld

arm-thumb-undefined-weak-ld: file format ELF32-arm-little

Disassembly of section .text:
.text:
11006: 00 00 movs r0, r0

_start:
11008: 2e f4 fa af beq.w #-69644
1100c: 00 e0 b #0 <_start+0x8>
1100e: 00 bf nop
11010: 00 e0 b #0 <_start+0xC>
11012: 00 bf nop
11014: 00 e0 b #0 <_start+0x10>
11016: 00 bf nop
11018: cf f6 fe 70 movt r0, #65534
1101c: 4e f6 e4 70 movw r0, #61412

Hello Leslie,

I don't know quite what to say as I don't know precisely what your
question is? If I am not being precise enough please can you put some
explicit questions in? From what I can see in the output, here are
some comments.

From your arc mapfiles it looks like that in the output both linker's

have given the .text output section the correct base address given the
alignment restrictions as the alignment requirement of .text from
lib_a-memset-bs.o is 4, therefor the alignment requirement of the
OutputSection .text should be 4:
LLD:
Address Size Alignment
00000000 00000080 4 .text
00000000 00000004 1 basic-arc.o:(.text)
00000000 00000000 0 main
00000004 0000007c 4 ... (lib_a-memset-bs.o):(.text)

LD
.text 0x0000000000000000 0x80
*(.text .stub .text.* .gnu.linkonce.t.*)
.text 0x0000000000000000 0x4 basic-arc.o
.text 0x0000000000000004 0x7c ... libc.a(lib_a-memset-bs.o)
                0x0000000000000004 memset
                0x0000000000000060 __strncpy_bzero

I'm not entirely sure where the Arm example has come from, but it does
show an interesting difference. It looks like the linker's are
handling the -ttext <address> option slightly differently when the
<address> of the OutputSection is not 0 modulo OutputSection
alignment.

From the map file we can see that lld is aligning the OutputSection to

the nearest 4-byte boundary, GNU-ld is placing the OutputSection on
the requested address, but is adding padding before the .text section
to make sure that in the final executable the InputSection is aligned.

LLD
Address Size Align Out In Symbol
00011008 00000018 4 .text
00011008 00000018 4 arm-thumb-undefined-weak.o:(.text)
00011008 00000000 0 $t.0
00011008 00000000 0 _start

LD
.text 0x0000000000011006 0x1a
...
*fill* 0x0000000000011006 0x2
.text 0x0000000000011008 0x18 arm-thumb-undefined-weak.o

The *fill* is visible as a nop in the disassembly for the LD produced image.

Strictly speaking I think LD is producing a file that doesn't strictly
conform to ELF here as the sh_addr of the .text OutputSection is 0
modulo sh_addralign (4). In practice it probably wouldn't make much
difference. My preference is for LLD's behaviour here.

Peter

Hi Peter,

Thanks for your kind response!

Hello Leslie,

I don't know quite what to say as I don't know precisely what your
question is? If I am not being precise enough please can you put some
explicit questions in? From what I can see in the output, here are
some comments.

From your arc mapfiles it looks like that in the output both linker's
have given the .text output section the correct base address given the
alignment restrictions as the alignment requirement of .text from
lib_a-memset-bs.o is 4, therefor the alignment requirement of the
OutputSection .text should be 4:
LLD:
Address Size Alignment
00000000 00000080 4 .text
00000000 00000004 1 basic-arc.o:(.text)
00000000 00000000 0 main
00000004 0000007c 4 ... (lib_a-memset-bs.o):(.text)

LD
.text 0x0000000000000000 0x80
  *(.text .stub .text.* .gnu.linkonce.t.*)
  .text 0x0000000000000000 0x4 basic-arc.o
  .text 0x0000000000000004 0x7c ... libc.a(lib_a-memset-bs.o)
                 0x0000000000000004 memset
                 0x0000000000000060 __strncpy_bzero

Reloc type=R_ARC_S25W_PCREL, should_relocate = true
offset = 0x0, addend = 0x0
Symbol:
value = 0x00000000
Symbol Section:
section name = .text, output_offset 0x00000006, output_section->vma = 0x00000006
file: lib_a-memset-bs.o
Input_section:
section name = .text, output_offset 0x00000000, output_section->vma = 0x00000006
changed_address = 0x00000006
file: basic-arc.o
RELOC_TYPE = ARC_S25W_PCREL
FORMULA = ( ME ( ( ( ( S + A ) - P ) >> 2 ) ) )
S = 0xc
A = 0
L = c
symbol_section->vma = 0xc
symbol_section->vma = 0x6
PCL = 0x4
P = 0x4
G = 0
SDA_OFFSET = 0x2188
SDA_SET = 1
GOT_OFFSET = 0
relocation = 0x000002
before = 0x000802
data = 00000002 (2) (2)
after = 0x0000080a

then I need to investigate how LD calculate reloc_data.input_section->output_section->vma, it might different with LLD even the same Alignment https://github.com/llvm-mirror/lld/blob/master/ELF/LinkerScript.cpp#L485

I'm not entirely sure where the Arm example has come from, but it does
show an interesting difference. It looks like the linker's are
handling the -ttext <address> option slightly differently when the
<address> of the OutputSection is not 0 modulo OutputSection
alignment.

From the map file we can see that lld is aligning the OutputSection to
the nearest 4-byte boundary, GNU-ld is placing the OutputSection on
the requested address, but is adding padding before the .text section
to make sure that in the final executable the InputSection is aligned.

LLD
Address Size Align Out In Symbol
00011008 00000018 4 .text
00011008 00000018 4 arm-thumb-undefined-weak.o:(.text)
00011008 00000000 0 $t.0
00011008 00000000 0 _start

LD
.text 0x0000000000011006 0x1a
...
  *fill* 0x0000000000011006 0x2
  .text 0x0000000000011008 0x18 arm-thumb-undefined-weak.o

The *fill* is visible as a nop in the disassembly for the LD produced image.

Strictly speaking I think LD is producing a file that doesn't strictly
conform to ELF here as the sh_addr of the .text OutputSection is 0
modulo sh_addralign (4). In practice it probably wouldn't make much
difference. My preference is for LLD's behaviour here.

It might be arm-linux-gnu toolchain's issue:

$ arm-linux-gnu-gcc -o arm-thumb-undefined-weak-ld.o -c arm-thumb-undefined-weak.s

arm-thumb-undefined-weak.s: Assembler messages:
arm-thumb-undefined-weak.s:18: Error: width suffixes are invalid in ARM mode -- `beq.w target'
arm-thumb-undefined-weak.s:20: Error: width suffixes are invalid in ARM mode -- `b.w target'

then arm-linux-gnu-ld might wrongly relocated R_ARM_THM_CALL for arm-thumb-undefined-weak-lld.o generated by llvm-mc.

Peter

Hi Peter,

Map file about LD for ARC target
https://drive.google.com/open?id=0ByE8c-y74l_uRWpQdUh2c0VXZ1k

LLD for ARC https://drive.google.com/open?id=0ByE8c-y74l_ueGVuYkR0a3RSWjQ

arm-thumb-undefined-weak.s
https://github.com/llvm-mirror/lld/blob/master/test/ELF/arm-thumb-undefined-weak.s

$ llvm/build/bin/llvm-mc -filetype=obj -triple=thumbv7a-none-linux-gnueabi
arm-thumb-undefined-weak.s -o arm-thumb-undefined-weak.o
$ llvm/build/bin/ld.lld -o arm-thumb-undefined-weak-lld
arm-thumb-undefined-weak.o -Ttext=11006
$ arm-linux-gnu-ld -o arm-thumb-undefined-weak-ld arm-thumb-undefined-weak.o
-Ttext=11006

$ arm-linux-gnu-readelf -r arm-thumb-undefined-weak.o

Relocation section '.rel.text' at offset 0x8c contains 6 entries:
  Offset Info Type Sym.Value Sym. Name
00000000 00000333 R_ARM_THM_JUMP19 00000000 target
00000004 0000031e R_ARM_THM_JUMP24 00000000 target
00000008 0000030a R_ARM_THM_CALL 00000000 target
0000000c 0000030a R_ARM_THM_CALL 00000000 target
00000010 00000332 R_ARM_THM_MOVT_PR 00000000 target
00000014 00000331 R_ARM_THM_MOVW_PR 00000000 target

DEBUG: lld: R_ARM_THM_JUMP19 TargetVA: 0 A: -4 P: 69640 Align: 4 VMA: 69640
Output Offset: 0 Reloc Offset: 0
DEBUG: lld: R_ARM_THM_JUMP24 TargetVA: 0 A: -4 P: 69644 Align: 4 VMA: 69640
Output Offset: 0 Reloc Offset: 4
DEBUG: lld: R_ARM_THM_CALL TargetVA: 1 A: -4 P: 69648 Align: 4 VMA: 69640
Output Offset: 0 Reloc Offset: 8
DEBUG: lld: R_ARM_THM_CALL TargetVA: 1 A: -4 P: 69652 Align: 4 VMA: 69640
Output Offset: 0 Reloc Offset: 12
DEBUG: lld: R_ARM_THM_MOVT_PREL TargetVA: 0 A: 0 P: 69656 Align: 4 VMA:
69640 Output Offset: 0 Reloc Offset: 16
DEBUG: lld: R_ARM_THM_MOVW_PREL_NC TargetVA: 0 A: 0 P: 69660 Align: 4 VMA:
69640 Output Offset: 0 Reloc Offset: 20

DEBUG: arm-linux-gnu-ld: R_ARM_THM_JUMP19: VMA: 69638 Output Offset: 2 Reloc
Offset: 0
DEBUG: arm-linux-gnu-ld: R_ARM_THM_JUMP24: VMA: 69638 Output Offset: 2 Reloc
Offset: 4
DEBUG: arm-linux-gnu-ld: R_ARM_THM_CALL: VMA: 69638 Output Offset: 2 Reloc
Offset: 8
DEBUG: arm-linux-gnu-ld: R_ARM_THM_CALL: VMA: 69638 Output Offset: 2 Reloc
Offset: 12
DEBUG: arm-linux-gnu-ld: R_ARM_THM_MOVT_PREL: VMA: 69638 Output Offset: 2
Reloc Offset: 16
DEBUG: arm-linux-gnu-ld: R_ARM_THM_MOVW_PREL_NC: VMA: 69638 Output Offset: 2
Reloc Offset: 20

$ llvm/build/bin/llvm-objdump -triple=thumbv7a-none-linux-gnueabi -d
arm-thumb-undefined-weak-lld

arm-thumb-undefined-weak-lld: file format ELF32-arm-little

Disassembly of section .text:
_start:
    11008: 00 f0 00 80 beq.w #0 <_start+0x4>
    1100c: 00 f0 00 b8 b.w #0 <_start+0x8>
    11010: 00 f0 00 f8 bl #0
    11014: 00 f0 00 f8 bl #0
    11018: c0 f2 00 00 movt r0, #0
    1101c: 40 f2 00 00 movw r0, #0

My question: why LD's relocation is different from LLD? and thanks for your explanation :slight_smile:

Hello Leslie,

The errors coming from the gnu assembler are due to the file being
assembled in Arm state, to get rid of the errors you'll either need to
put a .thumb directive in the file, or pass -mthumb to the assembler
via arm-linux-gnu-gcc -Wa,-mthumb (I think).

I'm not able to explain what you are seeing in your print out as it
doesn't quite match the map file. Looking at your source diff I think
I may have found a bug:
     uint64_t AddrLoc = getOutputSection()->Addr + Offset;
     RelExpr Expr = Rel.Expr;
+ if ((Expr == R_PC || Expr == R_GOT_PC) &&
+ (Config->EMachine == EM_ARC_COMPACT ||
+ Config->EMachine == EM_ARC_COMPACT2)) {
+ uint64_t M = 0;
+ if (Type == R_ARC_32_PCREL || Type == R_ARC_PC32 ||
+ Type == R_ARC_GOTPC32 || Type == R_ARC_GOTPC)
+ M = 4; // bitsize >= 32 ? 4 : 0
+ AddrLoc = (getOutputSection()->Addr /* output_section->vma */ +
+ cast<InputSection>(this)->OutSecOff /* output_offset */ +
+ Offset /* reloc_offset */ - M) & ~0x3;
+ }
     uint64_t TargetVA = SignExtend64(
         getRelocTargetVA(Type, Rel.Addend, AddrLoc, *Rel.Sym, Expr), Bits);

Looking at your calculation for AddrLoc, it seems like your
calculation doesn't match the original as Offset is (in trunk lld,
your diff is a against an old version, but I think the line hasn't
changed semantically) uint64_t Offset = getOffset(Rel.r_offset); which
for a regular InputSection will expand to uint64_t Offset =
this->OutSecOff + Rel.r_offset;

Original:
AddrLoc = getOutputSection()->Addr + this->OutSecOff + Rel.r_offset;

Yours:
AddrLoc = (getOutputSection()->Addr /* output_section->vma */ +
               cast<InputSection>(this)->OutSecOff /* output_offset */ +
               Offset /* reloc_offset */ - M) & ~0x3;
uses Offset and not Rel.r_offset so expanding Offset gives me:
AddrLoc = (getOutputSection()->Addr + this->OutSecOff +
               (this->OutSecOff + Rel.r_offset Offset) - M) & ~0x3;

This looks like you are adding this->OutSecOff twice.

No idea whether this is the cause of the problem or whether you have
fixed this up in the meantime. I recommend that you take a closer look
at your changes to the generic parts of lld first to see if you have
inadvertently changed something.

Peter

Hi Peter,

Thanks for your kind response!

Hello Leslie,

The errors coming from the gnu assembler are due to the file being
assembled in Arm state, to get rid of the errors you'll either need to
put a .thumb directive in the file, or pass -mthumb to the assembler
via arm-linux-gnu-gcc -Wa,-mthumb (I think).

I'm not able to explain what you are seeing in your print out as it
doesn't quite match the map file. Looking at your source diff I think
I may have found a bug:

I found it :slight_smile:

$ llvm/build/bin/ld.lld: warning: cannot find entry symbol _start; defaulting to 0x8
when using High Address 0x5

so I just use the *same* High Address aligned by 4, for example: 0x8 = alignTo(0x5, 4), then VMA is the *same*:

Reloc type=R_ARC_S25W_PCREL, should_relocate = true
offset = 0x31, addend = 0x0
Symbol:
value = 0x00000000
Symbol Section:
section name = .text, output_offset 0x00000038, output_section->vma = 0x00000008
file: lib_a-memset-archs.o
Input_section:
section name = .text, output_offset 0x00000000, output_section->vma = 0x00000008
changed_address = 0x00000039
file: basic-arc.o
RELOC_TYPE = ARC_S25W_PCREL
FORMULA = ( ME ( ( ( ( S + A ) - P ) >> 2 ) ) )
S = 0x40
A = 0
L = 40
symbol_section->vma = 0x40
input_section->output_section->vma = 0x8
PCL = 0x38
P = 0x38
G = 0
SDA_OFFSET = 0x2220
SDA_SET = 1
GOT_OFFSET = 0
relocation = 0x000002
before = 0x000802
data = 00000002 (2) (2)
after = 0x0000080a

DEBUG: lld: R_ARC_S25W_PCREL TargetVA: 8 A: 0 P: 56 Align: 4 VMA: 8 Output Offset: 0 Reloc Offset: 49

      uint64_t AddrLoc = getOutputSection()->Addr + Offset;
      RelExpr Expr = Rel.Expr;
+ if ((Expr == R_PC || Expr == R_GOT_PC) &&
+ (Config->EMachine == EM_ARC_COMPACT ||
+ Config->EMachine == EM_ARC_COMPACT2)) {
+ uint64_t M = 0;
+ if (Type == R_ARC_32_PCREL || Type == R_ARC_PC32 ||
+ Type == R_ARC_GOTPC32 || Type == R_ARC_GOTPC)
+ M = 4; // bitsize >= 32 ? 4 : 0
+ AddrLoc = (getOutputSection()->Addr /* output_section->vma */ +
+ cast<InputSection>(this)->OutSecOff /* output_offset */ +
+ Offset /* reloc_offset */ - M) & ~0x3;
+ }
      uint64_t TargetVA = SignExtend64(
          getRelocTargetVA(Type, Rel.Addend, AddrLoc, *Rel.Sym, Expr), Bits);

Looking at your calculation for AddrLoc, it seems like your
calculation doesn't match the original as Offset is (in trunk lld,
your diff is a against an old version, but I think the line hasn't
changed semantically) uint64_t Offset = getOffset(Rel.r_offset); which
for a regular InputSection will expand to uint64_t Offset =
this->OutSecOff + Rel.r_offset;

Original:
AddrLoc = getOutputSection()->Addr + this->OutSecOff + Rel.r_offset;

Yours:
AddrLoc = (getOutputSection()->Addr /* output_section->vma */ +
                cast<InputSection>(this)->OutSecOff /* output_offset */ +
                Offset /* reloc_offset */ - M) & ~0x3;
uses Offset and not Rel.r_offset so expanding Offset gives me:
AddrLoc = (getOutputSection()->Addr + this->OutSecOff +
                (this->OutSecOff + Rel.r_offset Offset) - M) & ~0x3;

This looks like you are adding this->OutSecOff twice.

Thanks for your hint! I fixed it https://github.com/xiangzhai/lld/blob/arc/ELF/InputSection.cpp#L767