Merge relocation sections with linker script in lld

Hello, I’ve met a problem about how to merge relocation sections by using linker script in lld.

The story was I want to link C++ relocatable object files into one giant object file in partial linking (-r ) and use linker script to reorder some sections like .text.hot .text.cold .text.whatever .... The output seemed failed for there were a huge amount of independent .rela.text.xxx sections, so I changed my linker script to merge .rela.text.xxx sections into .rela.text section, but still failed.

Snippets

It seems that lld can not merge .rela.text section? The linker script likes this.

SECTIONS
{
    /* Output sections : { Input sections }  // Simplified like this.  */
    .text : { *(.text.*Z*) }
    .rela.text : { *(.rela.text.*Z*) }
    /DISCARD/ : { *(.group) }
}

Here is demo code.

// test.cpp
#include "edge.h"

Vertex g_v(11);
Edge g_e(2, 1, 10);

int main()
{
    printf("main()~\n");
    Edge e = g_e;
    return 0;
}

// edge.cpp
#include "edge.h"

int Edge::cnt = 0;
Edge::Edge(int v_, int w_, double weight_) : v(v_), w(w_), weight(weight_) {
    cnt++;
    printf("Edge cnt=%d, defualt ctor.\n", cnt);
}
Edge::Edge(const Edge& e) {
    cnt++;
    printf("Edge cnt=%d, copy ctor.\n", cnt);
    v = e.v; w = e.w;
    weight = e.weight;
}

// edge.h
#include"stdio.h"

class Vertex {
public:
    Vertex() = default;
    Vertex(int i) : v(i) {}
    ~Vertex() { printf("Vertex dtor.\n"); }
private:
    int v;
};

class Edge {
public:
    Edge(int v_, int w_, double weight_);
    Edge(const Edge& e);
    ~Edge() {
        printf("Edge cnt=%d, dtor.\n", cnt);
        cnt--;
    }
private:
    Vertex v, w;
    double weight;
    static int cnt;
};

Here is my compile and link commands. And I have tried in clang15.04 and clang17.0 on x86-64 machine.

clang++ test.cpp -c -O2
clang++ edge.cpp -c  -O2

ld.lld edge.o test.o -T myscript.ld  -r -o lld-T.o
ld.lld edge.o test.o -r -o lld.o

Here is readelf info, it worked fot .text section mergence, but not .rela.text section.

# lld.o 
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0 16
  [ 2] .rela.text        RELA            0000000000000000 000320 0001b0 18   I 21   1  8
...
  [10] .text._ZN6VertexD2Ev PROGBITS        0000000000000000 000260 00000c 00 AXG  0   0 16
  [11] .rela.text._ZN6VertexD2Ev RELA            0000000000000000 0005e0 000030 18  IG 21  10  8
  [12] .group            GROUP           0000000000000000 000610 00000c 04     21  31  4
  [13] .text._ZN4EdgeD2Ev PROGBITS        0000000000000000 000270 000032 00 AXG  0   0 16
  [14] .rela.text._ZN4EdgeD2Ev RELA            0000000000000000 000620 000090 18  IG 21  13  8
..

# lld-T.o
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000128 00 AXG  0   0 16
  [ 2] .rela.text        RELA            0000000000000000 000330 0001b0 18   I 17   1  8
...
  [ 9] .rela.text._ZN6VertexD2Ev RELA            0000000000000000 0005e8 000030 18  IG 17   1  8
  [10] .rela.text._ZN4EdgeD2Ev RELA            0000000000000000 000618 000090 18  IG 17   1  8

Myself Investigation

I found the code in ELF/LinkerScript.cpp as below, it seems that here let the merging of relocation section abort. The comment says that should be ignore for --emit-relocs, but I don’t use the option. Is this a bug?

      // For --emit-relocs we have to ignore entries like
      //   .rela.dyn : { *(.rela.data) }
      // which are common because they are in the default bfd script.
      // We do not ignore SHT_REL[A] linker-synthesized sections here because
      // want to support scripts that do custom layout for them.
      if (isa<InputSection>(sec) &&
          cast<InputSection>(sec)->getRelocatedSection())
        continue;

You are correct that ld.lld does not merge relocation sections as specified in the linker script for relocatable object output.

I’ve brought up this exact issue in the past. To my knowledge it remains unresolved.

I still feel this should be supported. Maybe we can come to a solution this time around?

cc @MaskRay @smithp35

1 Like

Apologies I can’t you give much of a detailed answer, a bit short of time right now.

The ld -r is a relocatable link which produces a relocatable object. As I understand it there are 2, to my view contradictory use cases. The first is something like a kernel module which uses its own program loader to resolve the relocations, using the relocatable object as a pseudo shared object. As I understand it there are limitations on what can be put in a kernel module so that the loader program for the kernel module doesn’t have the complexity of a static linker. I note for the linux kernel C++ is not supported which cuts out a lot of the complexity in relocatable objects like comdat groups.

The second use for the output of a relocatable link is as an input for a future static link step. In this case we need to make sure that information required for the future static link step isn’t lost. For example consider COMDAT groups, the static linker needs to preserve these in a relocatable link so that if they are fed back into a subsequent link step and combined with other relocatable objects the COMDAT groups can be matched up and eliminated. A similar example are linker constructed tables like .ehframe and .ARM.exidx (C++ exceptions) that are per program and need to be constructed by the final non-relocatable link.

To get back to relocations. There are ELF restrictions in how these are represented that could easily get broken if this were under linker script control. For example COMDAT groups need to maintain the relocation sections associated/linked with the non-relocation sections within the COMDAT group. I’m not aware of any maintainable way of representing that in a linker script. There’s also the case that the relocations need to have a sh_link field to the non-relocation section they apply to, there isn’t a way to make sure that in an arbitrary linker script this mapping is maintained.

In short, it is quite easy to break the output in subtle and some not so subtle ways if relocation sections can be arbitrarily coalesced. I can see why lld does not support it well.

I’m not sure what the original use case for the relocatable link with C++ was. Reordering sections was mentioned. In that case I guess it is some kind of kernel module like thing using C++? My intuition is that C++ is not really suitable for kernel modules as there is significant extra ELF complexity. My suggestion would be to use shared objects for this purpose.

Yes, your intuition is right!
The module behind the demo case is like kernel module(linked to be like .ko files) and using C++. And we have a self-kernel imitating linux kernel for embeding symtem.
Thanks for your advisement for using LLD.