lld suspect behavior .group/rela/text input sections

Hi

Can you help me to understand if I’m doing anything wrong or the problem is with lld?

I need to use linker-script (below link) for linking freebsd kernel modules and pass -r to ld.lld (version 10) to make these modules relocatable.

Some of the modules that I build are very big and also have .group sections on rela.text. and .text.

Per ELF specification these sections should be treated as one and either all merged or none. But I think lld is not respecting that spec and merges all .text. sections while leaving the .group and .rela.text. sections in the output, which results in broken elf (when I try to load it into gdb, I get Bad Value error)

Alternatively, when I link with ld.bfd, the .text. remain in the output and I’m able to load the object into gdb without error.

Linker-script:

https://github.com/freebsd/freebsd-src/blob/098dbd7ff7f3da9dda03802cdb2d8755f816eada/sys/conf/ldscript.amd64

Hi

Can you help me to understand if I’m doing anything wrong or the problem is with lld?

I need to use linker-script (below link) for linking freebsd kernel modules and pass -r to ld.lld (version 10) to make these modules relocatable.

Some of the modules that I build are very big and also have .group sections on rela.text.<mangledname> and .text.<mangledname>

Per ELF specification these sections should be treated as one and either all merged or none. But I think lld is not respecting that spec and merges all .text.<mangledname> sections while leaving the .group and .rela.text.<mangledname> sections in the output, which results in broken elf (when I try to load it into gdb, I get Bad Value error)

Alternatively, when I link with ld.bfd, the .text.<manglenames> remain in the output and I’m able to load the object into gdb without error.

Hi, do you have a minimal reproducible example
(How to create a Minimal, Reproducible Example - Help Center - Stack Overflow)?

Freebsd is not build-friendly when the user is not using FreeBSD :wink:

It is not easy to produce an minimal example. But I will try

I was hoping that you may have solved it before.

I will come back with an example

Thanks

A

sending it again to the list

sending it again to the list

I am still waiting for a proper reproduce. Your commands have
file1.cpp but your attachment has just file1.h.

Hi

The file contents and the commands to build and test are below. assuming that you have clang++, ld.lld, ld, sed, curl and gdb installed, the commands are at the end of this email.

Just copy paste the lines below each file name into that file on your computer, then run the commands at the end. You’ll see that the file output linked with ld will load into gdb without any error but the output linked with ld.lld will produce errors when loaded into gdb

A

******** Minimal example :

------------------ file.h

const char * get_char_value (void);

const int * get_int_value (void);

inline const char * first_inline_function (void) {

return get_char_value ();

}

inline const int * second_inline_function (void) {

return get_int_value ();

}

-------------------- file1.cpp

#include “file.h”

static int data = 5;

const int *get_int_value (void) {

return &data;

}

-------------------- file2.cpp

#include “file.h”

static const char *data = “something meaningless”;

const char *get_char_value (void) {

return data;

}

-------------------- file3.cpp

#include “file.h”

const char * get_it (void) {

const char *c = first_inline_function();

const int *i = second_inline_function();

return &c[*i];

}

-------------------- commands to build and load into gdb

curl -o ldscript.amd64 https://raw.githubusercontent.com/freebsd/freebsd-src/master/sys/conf/ldscript.amd64

sed -i ‘s/.kern.//g’ ldscript.amd64

sed -i ‘s/.freebsd.//g’ ldscript.amd64

clang++ -c file1.cpp -o file1.cpp.o

clang++ -c file2.cpp -o file2.cpp.o

clang++ -c file3.cpp -o file3.cpp.o

ld.lld -r -T ldscript.amd64 *.o -o output.ld.lld

ld -r -T ldscript.amd64 *.o -o output.ld.bfd

echo “============== done building”

gdb -batch -ex ‘add-symbol-file output.ld.lld 0x1234’

echo “============== done loading lld ( Notice the errors above)”

gdb -batch -ex ‘add-symbol-file output.ld.bfd 0x1234’

echo “============== done loading lld (Notice that no error is produced here)”

Hi
The file contents and the commands to build and test are below. assuming that you have clang++, ld.lld, ld, sed, curl and gdb installed, the commands are at the end of this email.
Just copy paste the lines below each file name into that file on your computer, then run the commands at the end. You’ll see that the file output linked with ld will load into gdb without any error but the output linked with ld.lld will produce errors when loaded into gdb

[...]

The gdb command does not give me a warning/error. That said, I have simplified your example to the following:

cat > ./a.cc <<eof
inline int inline0() { return 0; }
inline int inline1() { return 1; }

int use() {
   return inline0() + inline1();
}
eof
cat > ./a.sh <<eof
clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
ld.bfd -r -T ldscript.amd64 a.o -o bfd.ro
readelf -g lld.ro
eof
cat > ./ldscript.amd64 <<eof
SECTIONS
{
   .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
   .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
}
eof

zsh a.sh will give a readelf warning.

COMDAT group section [ 3] `.group' [_Z7inline0v] contains 1 sections:
    [Index] Name
    [ 2] .text1

COMDAT group section [ 4] `.group' [_Z7inline1v] contains 1 sections:
    [Index] Name
readelf: Error: section [ 2] in group section [ 4] already in group section [ 3]

This is a case about what to do when the output would have two section groups
containing the same section. ELF specification says "A section cannot be a
member of more than one group." but relocatable output is not covered by the
specification.

It seems that GNU ld does not respect the *(... .text.*) input section
description for SHF_GROUP sections in this case. (It is unclear what exact
conditions it uses to decide whether to obey an input section description) It
simply doesn't place .text.* in .text, breaking the expectation of the input
section description usage. I agree with Peter that we are in the
implemention-defined realm. I currently don't see anything which should be
improved on LLD's side. My advice is to avoid such usage.

Thank you for your feedback.

Although I agree that the specifications are not clear but I think ld is doing the right thing by not merging the .text.* sections that are part of two different groups. LLD however is clearly silently creating an unspecified output by following what the user has told it to merge all .text.* sections. That is why I’m not sure if we can categorize this behavior under “implementation defined” behavior because the output is not usable by many systems and at least LLD should generate a warning.

Thanks

A

Thank you for your feedback.

Although I agree that the specifications are not clear but I think ld is doing the right thing by not merging the .text.* sections that are part of two different groups. LLD however is clearly silently creating an unspecified output by following what the user has told it to merge all .text.* sections. That is why I’m not sure if we can categorize this behavior under “implementation defined” behavior because the output is not usable by many systems and at least LLD should generate a warning.

I think the linker script needs some syntax to exclude these SHF_GROUP
sections from being matched. It is difficult to justify GNU ld's
behavior either.

I wonder if there is a case where the SHF_GROUP sections need to be matched.

If so what are the circumstances?

Otherwise, I think it will be more confusing to add a syntax to linkerscript for something that always need to be set.

A

Hi,

I apologize for reviving an old thread, but this issue remains unresolved. To quickly recap: when linking a relocatable object, if a linker script specifies a section combining .text.*, associated .group and .rela.text.* sections are not discarded (.group) or combined (.rela.text.*) as expected. For large relocatable objects, this dramatically inflates the section count. That causes problems (beyond just confusion) due to ELF-interacting code often not handling section counts above SHN_LORESERVE (0xFF00) according to the spec.

So, we need a resolution. What can we do?

MaskRay suggested:

I think the linker script needs some syntax to exclude these SHF_GROUP sections from being matched.

And earlier in the thread, Peter suggested:

It may be that a relocatable link needs an additional “kernel module” or “relocatable object” choice to guide the linker as to whether certain types of section should be combined.

I am open to any solution. To me Peter's suggestion has some GNU ld precedent:

Use ‘-Ur’ only for the last partial link, and ‘-r’ for the others.

https://sourceware.org/binutils/docs/ld/Options.html#Options

Please ignore -Ur's specifics...I'm only noting there is a flag to indicate the output is a "final" relocatable object. Perhaps something similar could be the guide to discard .group and combine .rela sections appropriately.

What does everyone else think?

Thanks,

Justin

Hi,

I apologize for reviving an old thread, but this issue remains unresolved. To quickly recap: when linking a relocatable object, if a linker script specifies a section combining .text.*, associated .group and .rela.text.* sections are not discarded (.group) or combined (.rela.text.*) as expected. For large relocatable objects, this dramatically inflates the section count. That causes problems (beyond just confusion) due to ELF-interacting code often not handling section counts above SHN_LORESERVE (0xFF00) according to the spec.

So, we need a resolution. What can we do?

A detailed example is probably better than written narrative.
I am still using my old:

cat > ./a.cc <<eof
inline int inline0() { return 0; }
inline int inline1() { return 1; }

int use() { return inline0() + inline1(); }
eof

cat > ./ldscript.amd64 <<eof
SECTIONS
{
    .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
    .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
}
eof

clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
ld.bfd -r -T ldscript.amd64 a.o -o bfd.ro
ld.bfd -T ldscript.amd64 a.o -o bfd

Given bfd's non-relocatable behavior (.text.* are merged), I think ld.lld and ld.lld -r's behaviors are reasonable.
As I said previously, I don't think ld.lld -r should move to GNU ld's -r behavior.

If you need a way not to match SHF_GROUP .text.* sections, you can make a proposal on the Binutils mailing list.

My apologies once again, I was just trying to recap the prior discussion. There are several things going on here.

# Issue 1: .group sections are not discarded

I think your example is appropriate for the .group issue. This scaled-up version is an absurd example...but you asked for detail and it simulates what can happen with a larger codebase. :slight_smile:

$ cat ldscript.amd64
SECTIONS
{
    .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
    .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
}
$ cat sections.sh
#!/bin/bash
for i in {0..65280}
do
    echo -e "inline int inline$i() { return $i; }" >> a.cc
    if [[ "$i" -gt "0" ]]; then
        echo -e "int use$i() { return inline$((i-1))() + inline$i(); }" >> a.cc
    fi done
clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
$ ./sections.sh
$ readelf -h lld.ro
[...]
  Number of section headers: 0 (65293)
  Section header string table index: 65535 (65291)

Because of the incredible number of .group sections, the object becomes a problem for code that does not correctly handle a section count > 0xFF00:

If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value zero. The actual number of section header table entries is contained in the sh_size field of the section header at index 0. Otherwise, the sh_size member of the initial section header entry contains the value zero.
https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter6-43405/index.html

Yes, one can wish all of that code was updated to ELF spec. But some of that is outside our control, and the problem here is artificially introduced: there aren't that many sections generated from either the code or the linker script. It's just the detritus created by the assumption of further linking of the output, which is (I believe) why Peter proposed an indicator that the output is intended to be final (and presumably the .group sections would be processed as they are for shared objects).

# Issue 2: relocation section non-compliance with the linker script

This is easily reproducible and others have provided examples in this thread. Here's a basic one:

$ cat b.h
#pragma once
struct A {
    A();
    virtual ~A();
    virtual int foo();
};
struct B : public A {
    int foo() override;
};
$ cat b.cc
#include "b.h"
int A::foo() { return 42; }
int B::foo() { return 84; }
$ cat rela.sh
#!/bin/bash
clang -c b.cc
ld.lld -r -T ldscript.amd64 b.o -o rela.ro
$ ./rela.sh
$ readelf -SW rela.ro | egrep "RELA|PROGBITS"
  [ 1] .rela.text RELA 0000000000000000 000040 000048 18 I 14 2 8
  [ 2] .text1 PROGBITS 0000000000000000 000090 0000de 00 AXG 0 0 16
  [ 4] .rela.text._ZN1BD2Ev RELA 0000000000000000 000180 000018 18 IG 14 2 8
  [ 6] .rela.text._ZN1BD0Ev RELA 0000000000000000 0001a8 000030 18 IG 14 2 8
  [ 7] .rodata PROGBITS 0000000000000000 0001d8 000088 00 A 0 0 8
  [ 8] .rela.rodata RELA 0000000000000000 000260 000138 18 I 14 7 8
  [ 9] .comment PROGBITS 0000000000000000 000398 000088 01 MS 0 0 1
  [11] .rela.eh_frame RELA 0000000000000000 000518 0000a8 18 I 14 10 8
  [13] .note.GNU-stack PROGBITS 0000000000000000 0005c7 000000 00 0 0 1

The .text1 directive of the linker script is followed, but the .rela.text directive is not. When a linker script is provided, my expectation is the linker will not be selective in enforcing the user's instructions. It seems to me this behavior only occurs with RELA sections, but maybe I have missed other cases.

Beyond that basic premise, this causes the same section count problem mentioned above at scale. This also leads to confusion because the rela.text._Z* sections above are named for a section that doesn't exist in the output.

# Potential Solutions

As I mentioned earlier:

1. Linker script syntax to exclude SHT_GROUP (solves Issue 1)
2. Linker script compliance for relocatable objects (solves Issue 2)
3. "--relocatable-final" flag that instructs the linker to discard and combine sections as it does in other cases, e.g. shared objects (solves Issues 1 and 2)

I hope that clarifies the situation a bit. Thank you for your time, and please let me know your thoughts.

Thanks,

Justin

My apologies once again, I was just trying to recap the prior discussion. There are several things going on here.

# Issue 1: .group sections are not discarded

I think your example is appropriate for the .group issue. This scaled-up version is an absurd example...but you asked for detail and it simulates what can happen with a larger codebase. :slight_smile:

$ cat ldscript.amd64
SECTIONS
{
    .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
    .text1 : { *(.text .stub .text.* .gnu.linkonce.t.*) }
}
$ cat sections.sh
#!/bin/bash
for i in {0..65280}
do
    echo -e "inline int inline$i() { return $i; }" >> a.cc
    if [[ "$i" -gt "0" ]]; then
        echo -e "int use$i() { return inline$((i-1))() + inline$i(); }" >> a.cc
    fi done
clang -c a.cc
ld.lld -r -T ldscript.amd64 a.o -o lld.ro
$ ./sections.sh
$ readelf -h lld.ro
[...]
  Number of section headers: 0 (65293)
  Section header string table index: 65535 (65291)

Because of the incredible number of .group sections, the object becomes a problem for code that does not correctly handle a section count > 0xFF00:

> If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value zero. The actual number of section header table entries is contained in the sh_size field of the section header at index 0. Otherwise, the sh_size member of the initial section header entry contains the value zero.
> ELF Header (Linker and Libraries Guide)

Yes, one can wish all of that code was updated to ELF spec. But some of that is outside our control, and the problem here is artificially introduced: there aren't that many sections generated from either the code or the linker script. It's just the detritus created by the assumption of further linking of the output, which is (I believe) why Peter proposed an indicator that the output is intended to be final (and presumably the .group sections would be processed as they are for shared objects).

In this case, ld.bfd -r creates more sections than ld.lld -r's output.
(See below)

% readelf -h bfd.ro
ELF Header:
...
  Number of section headers: 0 (130574)
  Section header string table index: 65535 (130573)

It's true that some tools don't support SHT_SYMTAB_SHNDX, and they are
broken tools.
For such an uncommon case (-r + linker script operating SHT_GROUP +
numerous sections),
it's true that broken tools exist, but it may not be used as an
argument, especially that ld.bfd -r output will confuse the broken
tools as well.

We can think of whether ld.lld output can improve in this case, i.e.
whether ld.lld should discard empty section groups.
I think it can be argued either way.

# Issue 2: relocation section non-compliance with the linker script

This is easily reproducible and others have provided examples in this thread. Here's a basic one:

$ cat b.h
#pragma once
struct A {
    A();
    virtual ~A();
    virtual int foo();
};
struct B : public A {
    int foo() override;
};
$ cat b.cc
#include "b.h"
A::A() {}
A::~A() {}
int A::foo() { return 42; }
int B::foo() { return 84; }
$ cat rela.sh
#!/bin/bash
clang -c b.cc
ld.lld -r -T ldscript.amd64 b.o -o rela.ro
$ ./rela.sh
$ readelf -SW rela.ro | egrep "RELA|PROGBITS"
  [ 1] .rela.text RELA 0000000000000000 000040 000048 18 I 14 2 8
  [ 2] .text1 PROGBITS 0000000000000000 000090 0000de 00 AXG 0 0 16
  [ 4] .rela.text._ZN1BD2Ev RELA 0000000000000000 000180 000018 18 IG 14 2 8
  [ 6] .rela.text._ZN1BD0Ev RELA 0000000000000000 0001a8 000030 18 IG 14 2 8
  [ 7] .rodata PROGBITS 0000000000000000 0001d8 000088 00 A 0 0 8
  [ 8] .rela.rodata RELA 0000000000000000 000260 000138 18 I 14 7 8
  [ 9] .comment PROGBITS 0000000000000000 000398 000088 01 MS 0 0 1
  [11] .rela.eh_frame RELA 0000000000000000 000518 0000a8 18 I 14 10 8
  [13] .note.GNU-stack PROGBITS 0000000000000000 0005c7 000000 00 0 0 1

The .text1 directive of the linker script is followed, but the .rela.text directive is not. When a linker script is provided, my expectation is the linker will not be selective in enforcing the user's instructions. It seems to me this behavior only occurs with RELA sections, but maybe I have missed other cases.

Beyond that basic premise, this causes the same section count problem mentioned above at scale. This also leads to confusion because the rela.text._Z* sections above are named for a section that doesn't exist in the output.

In such a relocatable link, GNU ld appears to just skip SHF_GROUP
sections when matching input sections.
Arguably it is a dubious behavior as well, but it can avoid a problem
using section groups.
It doesn't need to discard empty section groups (your "Issue 1").

The generic ABI says "A section cannot be a member of more than one group."
When SHF_GROUP .text.* are combined, it's unclear how their .group
sections should behave.

# Potential Solutions

As I mentioned earlier:

1. Linker script syntax to exclude SHT_GROUP (solves Issue 1)

To discard all SHT_GROUP sections, you can use /DISCARD/ : { *(.group) }

There is no way discarding a selective subset of .group sections.

2. Linker script compliance for relocatable objects (solves Issue 2)

To not match SHF_GROUP sections in an input section description, I
think explicit syntax is better than GNU ld's current implicit -r
behavior.
This perhaps needs a Binutils discussion.
This is complicated by the fact that the behavior between -r and
--emit-relocs isn't super clear. In GNU ld, --emit-relocs input
sections are not matched.

3. "--relocatable-final" flag that instructs the linker to discard and combine sections as it does in other cases, e.g. shared objects (solves Issues 1 and 2)

I'd be concerned of adding a linker option.
First, I think there is something which should be discussed with Binutils.
Second, I suspect there may be some soundness issues which cannot be
perfectly handled.
I'd recommend that you migrate away from relying on certain .group or
.rela.* behavior in -r mode.

In such a relocatable link, GNU ld appears to just skip SHF_GROUP
sections when matching input sections.
Arguably it is a dubious behavior as well, but it can avoid a problem
using section groups. [...]

The generic ABI says "A section cannot be a member of more than one group."
When SHF_GROUP .text.* are combined, it's unclear how their .group
sections should behave.

I agree the ABI requirements are unclear. But when the linker script explicitly specifies what to do with those sections (e.g. .rela.text.*), it is surprising to me that the directive is ignored.

To not match SHF_GROUP sections in an input section description, I
think explicit syntax is better than GNU ld's current implicit -r
behavior.

Are you proposing a way to specify "match .rela.text.* even if it is a member of a .group" in the linker script? Is that what the binutils discussion should entail?

Justin