Merge relocation sections with linker script in lld

Hello, I’ve met a problem about how to merge relocation sections by using linker script in lld.

The story was I want to link C++ relocatable object files into one giant object file in partial linking (-r ) and use linker script to reorder some sections like .text.hot .text.cold .text.whatever .... The output seemed failed for there were a huge amount of independent .rela.text.xxx sections, so I changed my linker script to merge .rela.text.xxx sections into .rela.text section, but still failed.

Snippets

It seems that lld can not merge .rela.text section? The linker script likes this.

SECTIONS
{
    /* Output sections : { Input sections }  // Simplified like this.  */
    .text : { *(.text.*Z*) }
    .rela.text : { *(.rela.text.*Z*) }
    /DISCARD/ : { *(.group) }
}

Here is demo code.

// test.cpp
#include "edge.h"

Vertex g_v(11);
Edge g_e(2, 1, 10);

int main()
{
    printf("main()~\n");
    Edge e = g_e;
    return 0;
}

// edge.cpp
#include "edge.h"

int Edge::cnt = 0;
Edge::Edge(int v_, int w_, double weight_) : v(v_), w(w_), weight(weight_) {
    cnt++;
    printf("Edge cnt=%d, defualt ctor.\n", cnt);
}
Edge::Edge(const Edge& e) {
    cnt++;
    printf("Edge cnt=%d, copy ctor.\n", cnt);
    v = e.v; w = e.w;
    weight = e.weight;
}

// edge.h
#include"stdio.h"

class Vertex {
public:
    Vertex() = default;
    Vertex(int i) : v(i) {}
    ~Vertex() { printf("Vertex dtor.\n"); }
private:
    int v;
};

class Edge {
public:
    Edge(int v_, int w_, double weight_);
    Edge(const Edge& e);
    ~Edge() {
        printf("Edge cnt=%d, dtor.\n", cnt);
        cnt--;
    }
private:
    Vertex v, w;
    double weight;
    static int cnt;
};

Here is my compile and link commands. And I have tried in clang15.04 and clang17.0 on x86-64 machine.

clang++ test.cpp -c -O2
clang++ edge.cpp -c  -O2

ld.lld edge.o test.o -T myscript.ld  -r -o lld-T.o
ld.lld edge.o test.o -r -o lld.o

Here is readelf info, it worked fot .text section mergence, but not .rela.text section.

# lld.o 
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0 16
  [ 2] .rela.text        RELA            0000000000000000 000320 0001b0 18   I 21   1  8
...
  [10] .text._ZN6VertexD2Ev PROGBITS        0000000000000000 000260 00000c 00 AXG  0   0 16
  [11] .rela.text._ZN6VertexD2Ev RELA            0000000000000000 0005e0 000030 18  IG 21  10  8
  [12] .group            GROUP           0000000000000000 000610 00000c 04     21  31  4
  [13] .text._ZN4EdgeD2Ev PROGBITS        0000000000000000 000270 000032 00 AXG  0   0 16
  [14] .rela.text._ZN4EdgeD2Ev RELA            0000000000000000 000620 000090 18  IG 21  13  8
..

# lld-T.o
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000128 00 AXG  0   0 16
  [ 2] .rela.text        RELA            0000000000000000 000330 0001b0 18   I 17   1  8
...
  [ 9] .rela.text._ZN6VertexD2Ev RELA            0000000000000000 0005e8 000030 18  IG 17   1  8
  [10] .rela.text._ZN4EdgeD2Ev RELA            0000000000000000 000618 000090 18  IG 17   1  8

Myself Investigation

I found the code in ELF/LinkerScript.cpp as below, it seems that here let the merging of relocation section abort. The comment says that should be ignore for --emit-relocs, but I don’t use the option. Is this a bug?

      // For --emit-relocs we have to ignore entries like
      //   .rela.dyn : { *(.rela.data) }
      // which are common because they are in the default bfd script.
      // We do not ignore SHT_REL[A] linker-synthesized sections here because
      // want to support scripts that do custom layout for them.
      if (isa<InputSection>(sec) &&
          cast<InputSection>(sec)->getRelocatedSection())
        continue;

You are correct that ld.lld does not merge relocation sections as specified in the linker script for relocatable object output.

I’ve brought up this exact issue in the past. To my knowledge it remains unresolved.

I still feel this should be supported. Maybe we can come to a solution this time around?

cc @MaskRay @smithp35

1 Like

Apologies I can’t you give much of a detailed answer, a bit short of time right now.

The ld -r is a relocatable link which produces a relocatable object. As I understand it there are 2, to my view contradictory use cases. The first is something like a kernel module which uses its own program loader to resolve the relocations, using the relocatable object as a pseudo shared object. As I understand it there are limitations on what can be put in a kernel module so that the loader program for the kernel module doesn’t have the complexity of a static linker. I note for the linux kernel C++ is not supported which cuts out a lot of the complexity in relocatable objects like comdat groups.

The second use for the output of a relocatable link is as an input for a future static link step. In this case we need to make sure that information required for the future static link step isn’t lost. For example consider COMDAT groups, the static linker needs to preserve these in a relocatable link so that if they are fed back into a subsequent link step and combined with other relocatable objects the COMDAT groups can be matched up and eliminated. A similar example are linker constructed tables like .ehframe and .ARM.exidx (C++ exceptions) that are per program and need to be constructed by the final non-relocatable link.

To get back to relocations. There are ELF restrictions in how these are represented that could easily get broken if this were under linker script control. For example COMDAT groups need to maintain the relocation sections associated/linked with the non-relocation sections within the COMDAT group. I’m not aware of any maintainable way of representing that in a linker script. There’s also the case that the relocations need to have a sh_link field to the non-relocation section they apply to, there isn’t a way to make sure that in an arbitrary linker script this mapping is maintained.

In short, it is quite easy to break the output in subtle and some not so subtle ways if relocation sections can be arbitrarily coalesced. I can see why lld does not support it well.

I’m not sure what the original use case for the relocatable link with C++ was. Reordering sections was mentioned. In that case I guess it is some kind of kernel module like thing using C++? My intuition is that C++ is not really suitable for kernel modules as there is significant extra ELF complexity. My suggestion would be to use shared objects for this purpose.

Yes, your intuition is right!
The module behind the demo case is like kernel module(linked to be like .ko files) and using C++. And we have a self-kernel imitating linux kernel for embeding symtem.
Thanks for your advisement for using LLD.

1 Like

Thanks for your reply, @smithp35. Apologies for the delay in responding on my part.

In some cases, for example FreeBSD specifically on arm64, kernel modules are shared objects. So this is possible, but there are complications to doing the same thing for x86_64. It was tried a few years ago:

Unlike other CPU architectures supported by FreeBSD, amd64 kernel modules are linked as relocatable object files, i.e., .o files. (On other architectures, they are dynamically shared objects (DSOs, or .so files), as one might naively expect.) The use of .o files means that amd64 kernel modules contain more efficient code than they would if linked as DSOs, since DSOs inherently make use of certain types of indirection which allow shared libraries to be loaded at arbitrary addresses, and this indirection is useless in the kernel. As part of this project an attempt was made to switch amd64 to using DSOs as well, since the cost of this indirection can largely be mitigated with modern toolchains, but it was found that the use of DSOs would also force a change to the code model used when compiling amd64 kernel code, resulting in a further performance penalty.

I brought up some of these same concerns in a separate thread.

I’d have to dig up some old testcases so we can have a more fruitful discussion, but IIRC:

  • Linking kernel modules as shared objects from input object files built with -fPIC results in GOT and PLT indirection that negatively impacts performance.
  • Linking kernel modules as shared objects without -fPIC results in linking errors such as relocation R_X86_64_32S cannot be used against local symbol even though such relocations are actually safe in kernel space (to my understanding).
  • The use of --Bsymbolic does not resolve any of these issues, as was suspected.

I’ll try to come up with an actual reproducer we can discuss, but it would be fantastic if we could devise a way to make shared object, x86_64 kmods possible without negative performance impact.

I dropped the ball on this thread. I was reminded of it after seeing @davidchisnall and @MaskRay discuss C++ kernel modules here.

Which reminded me of @davidchisnall’s remarks on C++ kernel modules here:

On x86-64 (possibly elsewhere), FreeBSD kernel modules are not shared libraries, they’re basically .o files. The kernel’s loader is actually an in-memory static linker.

Why did Linux choose Rust but not C++? | Lobsters

As I said above, it would be great to devise a solution for shared object kmods. But that’s going to require heavier work on, I suspect, both ld.lld and FreeBSD kernel loader fronts.

Is it possible to resolve some of the inconveniences of C++ and relocatable kernel modules mentioned in this thread? Can ld.lld provide a mechanism, via linker script or otherwise, to combine relocation sections as their corresponding .text (etc.) sections are combined? (Ditto .group sections.)

As I’ve stated in one of the long-ago threads, there is precedent in ld.bfd for a “final reloctable object”…e.g. an object in relocatable format that cannot be further linked.

Thoughts, @davidchisnall @MaskRay @smithp35?

In the CHERIoT branch of LLD, we have a -compartment flag that we use for linking compartments. This is mostly similar to -r, but merges COMDATs and is used with a linker script that makes everything except compartment entry points private symbols. This change is quite small and pulling it out as a —merge-comdats flag would be easy (happy to do that if it’s useful beyond our use case).

For FreeBSD kernel modules, I don’t actually want to merge COMDATs in each module, I want to have a tool that consumes a load of kernel modules and:

  • For any COMDATs that are used in only one module, turns them into local symbols and merges them into the relevant sections.
  • For COMDATs that are used in multiple modules, pulls the, out into separate modules and adds them as the dependencies for others, unless one of the modules is already an explicit dependency of others (in which case, it just erases the definition in the depending module).

This should let the kernel’s loader ignore them entirely, they’re just normal sections by the time we get to loading. Unfortunately the structure of lld’s code is not very amenable to anything that does not look like a traditional linking operation. It would be nice if LLD could follow the philosophy of rather rest of LLVM.

1 Like

@davidchisnall Would the proposed --merge-comdats flag provide the resolution of .group sections and the combining of .rela* sections from these two examples?

Group sections

The .group sections would be resolved and eliminated in a shared object.

$ cat ldscript.amd64
SECTIONS
{
    .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
    .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
}
$ cat sections.sh
#!/bin/bash
for i in {0..65280}
do
    echo -e "inline int inline$i() { return $i; }" >> a.cc
    if [[ "$i" -gt "0" ]]; then
        echo -e "int use$i() { return inline$((i-1))() + inline$i(); }" >> a.cc
    fi
done
clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
$ ./sections.sh
$ llvm-readelf -h lld.ro
[...]
  Number of section headers: 0 (65293)
  Section header string table index: 65535 (65291)

Relocation sections

Using the same linker script as above. Note the .rela.text.* sections are not combined.

$ cat b.h
#pragma once
struct A {
    A();
    virtual ~A();
    virtual int foo();
};
struct B : public A {
    int foo() override;
};
$ cat b.cc
#include "b.h"
int A::foo() { return 42; }
int B::foo() { return 84; }
$ cat rela.sh
#!/bin/bash
clang -c b.cc
ld.lld -r -T ldscript.amd64 b.o -o rela.ro
$ ./rela.sh
$ llvm-readelf -S rela.ro | egrep "RELA|PROGBITS"
  [ 1] .text1            PROGBITS        0000000000000000 000040 000068 00 AXG  0   0 16
  [ 3] .rela.text._ZN1BD2Ev RELA         0000000000000000 000198 000018 18  IG 14   1  8
  [ 5] .rela.text._ZN1BD0Ev RELA         0000000000000000 0001c0 000030 18  IG 14   1  8
  [ 6] .data.rel.ro      PROGBITS        0000000000000000 0000a8 000040 00  WA  0   0  8
  [ 7] .rela.data.rel.ro RELA            0000000000000000 0001f0 0000a8 18   I 14   6  8
  [ 8] .rodata           PROGBITS        0000000000000000 0000e8 000003 00   A  0   0  1
  [ 9] .comment          PROGBITS        0000000000000000 000298 000016 01  MS  0   0  1
  [11] .rela.eh_frame    RELA            0000000000000000 0002b0 000060 18   I 14  10  8
  [13] .note.GNU-stack   PROGBITS        0000000000000000 000315 000000 00      0   0  1

I just retested these examples with LLVM 17.0.6. The results are unchanged from LLVM 10 (the version in the original thread).

(I’d be happy to help test out a patch resolving either of these issues.)

I believe our existing code does:

If you want to try, the --compartment flag also does some other things, but we can easily pull out the behaviour and upstream it. Look for config->compartment to see what we changed to make that work - it was very small.

I tested by applying this patch to LLVM 17.0.6, and am thrilled to report both issues I mentioned are resolved by it:

  • .group sections are processed and not present in the relocatable object
  • .rela.text.* sections are combined, notably without any specification of .rela.* sections in the linker script. With a linker script combining .text.*, the associated relocation sections are combined (this is how I expected things would work by default back in 2021 :slight_smile:).

@davidchisnall Upstreaming this functionality to ld.lld in some way would be incredibly useful for me, and I suspect for others working with kernel modules too. How can we collaborate to make that upstreaming happen?

cc @MaskRay

@MaskRay, that patch has some special cases for our compartment exports section, and the --compartment flag name is not what you want here, but I can pull the comdat-merging bits into a separate PR and raise it if you would be happy with it, with a --comdats-merge or similar (bikesheds welcome). Happy to pull the feature out and reduce our diff!

@MaskRay Ping. :slight_smile:

1 Like

@MaskRay Do you have any issues with @davidchisnall 's suggested approach of raising the PR for this patch?

I think there is a desire to agree on general direction before it is raised.

@MaskRay Could you please let us know your thoughts on the above? Would you prefer the PR to be sent and the discussion to occur there?

Thanks!

Apologies for the delay. I’ve been busy in the past few weeks for many other stuff. I will take a look.

@MaskRay Apologies for the ping. :slight_smile: I’m really eager to see this land.

Have you had a chance to review the discussion?

@MaskRay Have you had a chance to take a look at this proposal?

@davidchisnall If it seems this discussion would be more fruitful on a PR I’m happy to discuss there, but I did spend some time bikeshedding potential enablement methods:

  • --merge-comdats
  • --kernel-relocatable / -Kr
  • --final-relocatable / -Ur
    • Maybe this is too confusing because the semantics are not identical, but GNU ld’s -Ur does carry the broad concept of “regular relocatable” and “final relocatable”.
  • What if --merge-comdats was implied for -r when the clang driver received -mcmodel=kernel? Too much magic/surprise?

I really am indifferent as to how this ultimately gets spelled; I’m just eager for the functionality (and thus trying to keep the discussion moving :slight_smile: ).

Thanks!

@MaskRay Apologies, it’s me again with a ping here. :slight_smile:

Sorry that I dropped the ball.

tl;dr ld -r + /DISCARD/ : { *(.group) } (or objcopy -R .group) + objcopy --keep-global-symbol=xxx seems to emulate the proposed option quite well.

Trying to recap prior discussions.

I suggested /DISCARD/ : { *(.group) }

cat > ./a.cc <<eof
inline int inline0() { return 0; }
inline int inline1() { return 1; }

int use() { return inline0() + inline1(); }
eof
cat > ./a.sh <<eof
clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
ld.bfd -r -T ldscript.amd64 a.o -o bfd.ro
readelf -g lld.ro
eof
cat > ./ldscript.amd64 <<eof
SECTIONS {
   .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } // this is ignored
   .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
   /DISCARD/ : { *(.group) }
}
eof

The output contains combined .text1 and .rela.text1, and .group sections are discarded.
The output looks expected?

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text1            PROGBITS        0000000000000000 000040 00004b 00 AXG  0   0 16
  [ 2] .rela.text1       RELA            0000000000000000 000108 000030 18   I  8   1  8
  [ 3] .comment          PROGBITS        0000000000000000 000138 000022 01  MS  0   0  1
  [ 4] .eh_frame         X86_64_UNWIND   0000000000000000 000090 000078 00   A  0   0  8
  [ 5] .rela.eh_frame    RELA            0000000000000000 000160 000048 18   I  8   4  8
  [ 6] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 0001a8 000002 00   E  0   0  1
  [ 7] .note.GNU-stack   PROGBITS        0000000000000000 0001aa 000000 00      0   0  1
  [ 8] .symtab           SYMTAB          0000000000000000 0001b0 0000d8 18     10   6  8
  [ 9] .shstrtab         STRTAB          0000000000000000 000288 00006e 00      0   0  1
  [10] .strtab           STRTAB          0000000000000000 0002f6 000026 00      0   0  1

The output .text1 contains SHF_GROUP, which technically violates the ELF spec:

The section must be referenced by a section of type SHT_GROUP.

But all of GNU lld, readelf, lld, and llvm-readelf happily accept this section.


--compartment in https://github.com/CHERIoT-Platform/llvm-project/ does a bunch of things.
The tasks related to relocatable linking are:

  • Remove SHF_GROUP from input sections so that the output will not have SHF_GROUP. As explained, this is cosmetic.
  • Remove .group sections.
  • Mark every symbol not in the section .compartment_exports as local.

These tasks can be emulated with ld -r + /DISCARD/. If you need the localization behavior, you can drop /DISCARD/ and use llvm-objcopy -R .group --keep-global-symbol=xxx.


Subject: [PATCH] compartment from
 https://github.com/CHERIoT-Platform/llvm-project/

---
 lld/ELF/Config.h              |  1 +
 lld/ELF/Driver.cpp            |  3 ++-
 lld/ELF/InputFiles.cpp        |  2 +-
 lld/ELF/InputSection.cpp      |  2 +-
 lld/ELF/Options.td            |  2 ++
 lld/ELF/SyntheticSections.cpp | 12 ++++++++++++
 6 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/lld/ELF/Config.h b/lld/ELF/Config.h
index 883c4a2f8429..04d32fbf403c 100644
--- a/lld/ELF/Config.h
+++ b/lld/ELF/Config.h
@@ -287,4 +287,5 @@ struct Config {
   bool relaxGP;
   bool relocatable;
+  bool compartment = false;
   bool relrGlibc = false;
   bool relrPackDynRelocs = false;
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index ddc574a11314..7fe2509a5f26 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -1384,5 +1384,6 @@ static void readConfigs(opt::InputArgList &args) {
   config->relaxGP = args.hasFlag(OPT_relax_gp, OPT_no_relax_gp, false);
   config->rpath = getRpath(args);
-  config->relocatable = args.hasArg(OPT_relocatable);
+  config->relocatable = args.hasArg(OPT_relocatable) || args.hasArg(OPT_compartment);
+  config->compartment = args.hasArg(OPT_compartment);

   if (args.hasArg(OPT_save_temps)) {
diff --git a/lld/ELF/InputFiles.cpp b/lld/ELF/InputFiles.cpp
index d760dddcf5ec..204ee5fc42ae 100644
--- a/lld/ELF/InputFiles.cpp
+++ b/lld/ELF/InputFiles.cpp
@@ -677,5 +677,5 @@ template <class ELFT> void ObjFile<ELFT>::parse(bool ignoreComdats) {
             .second;
     if (keepGroup) {
-      if (config->relocatable)
+      if (config->relocatable && !config->compartment)
         this->sections[i] = createInputSection(
             i, sec, check(obj.getSectionName(sec, shstrtab)));
diff --git a/lld/ELF/InputSection.cpp b/lld/ELF/InputSection.cpp
index e6c5996c0b39..4696806816e0 100644
--- a/lld/ELF/InputSection.cpp
+++ b/lld/ELF/InputSection.cpp
@@ -82,5 +82,5 @@ InputSectionBase::InputSectionBase(InputFile *file, uint64_t flags,
 static uint64_t getFlags(uint64_t flags) {
   flags &= ~(uint64_t)SHF_INFO_LINK;
-  if (!config->relocatable)
+  if (!config->relocatable || config->compartment)
     flags &= ~(uint64_t)SHF_GROUP;
   return flags;
diff --git a/lld/ELF/Options.td b/lld/ELF/Options.td
index ff61a566f52f..7af7bc9b24d6 100644
--- a/lld/ELF/Options.td
+++ b/lld/ELF/Options.td
@@ -421,4 +421,6 @@ defm rpath: Eq<"rpath", "Add a DT_RUNPATH to the output">;
 def relocatable: F<"relocatable">, HelpText<"Create relocatable object file">;

+def compartment: F<"compartment">, HelpText<"Create object file for a compartment">;
+
 defm retain_symbols_file:
   Eq<"retain-symbols-file", "Retain only the symbols listed in the file">,
diff --git a/lld/ELF/SyntheticSections.cpp b/lld/ELF/SyntheticSections.cpp
index ad280289cebf..1fd6cece1b98 100644
--- a/lld/ELF/SyntheticSections.cpp
+++ b/lld/ELF/SyntheticSections.cpp
@@ -2188,4 +2188,16 @@ void SymbolTableBaseSection::addSymbol(Symbol *b) {
   // Adding a local symbol to a .dynsym is a bug.
   assert(this->type != SHT_DYNSYM || !b->isLocal());
+
+  // If we're linking a compartment, mark every symbol as local except for
+  // ones in the export table.
+  if (config->compartment) {
+    if (auto *def = dyn_cast<Defined>(b)) {
+      auto *section = dyn_cast_or_null<InputSection>(def->section);
+      if (!section || (section->name != ".compartment_exports")) {
+        b->binding = STB_LOCAL;
+      }
+    }
+  }
+
   symbols.push_back({b, strTabSec.addString(b->getName(), false)});
 }

Thanks for the response, @MaskRay. There are two separate examples covering the two issues. I recapped both earlier in this thread.

Example 1: Group Sections are not discarded

The first example produces a ridiculous amount of .group sections. Your suggestion of /DISCARD/ : { *(.group) } does eliminate them, but…

The output .text1 contains SHF_GROUP, which technically violates the ELF spec:

…I disagree that this is only cosmetic. Using DISCARD like this has a broad impact across many tools, for example GDB:

$ bin/gdb lld.ro
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
[...]
Type "apropos word" to search for commands related to "word"...
BFD: /test/lld.ro: no group info for section '.text1'
BFD: /test/lld.ro: no group info for section '.text1'
"/test/lld.ro": not in executable format: file format not recognized
(gdb)

I regularly deal with such issues because I currently am using DISCARD via our years ago discussion.

Rather than depending on every tool that interacts with ELF to handle a spec violation, I think it’s strongly desirable to find a solution that’s ELF compliant. The --compartment flag achieves this (your first two tasks bullets).

Example 2: Relocation sections are not combined

The second reproducer is different than the first, though they share the same linker script.

The linker script fragment .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } does not combine the .rela.text.* sections as expected. Once again, the --compartment flag does. This is another aspect of --compartment related to relocatable linking.

Thus my understanding remains the first two relocatable linking tasks of --compartment that you listed, plus .rela section combining, cannot be emulated today.