[ELF] String literals don't obey -fdata-sections

Hi there,

When I compile my code with -fdata-sections and -ffunction-sections, I still see some unused string in my shared library (Android). Actually, the strings appear together inside a .rodata.str1.1 section instead of getting their own section. It seems that the C-string literal are considered differently than other constant and the -fdata-sections is not respected in https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799. I came across the following GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192 where they have fixed the issue back in 2015. Any reason not to do so in LLVM?

My code example:

  • static library 1 : expose functions api1() and api3()

#include “lib1.h”

static char *test = “Test”;
static char *unused = “Unused”;

void api1(){
printf(test);
}

void api3(){
printf(unused);
}

  • shared library : use only function api1() from static library 1

#include “lib1.h”

void test(){
api1();
}

Both compiled with “-fdata-sections -ffunction-sections -fvisibility=hidden” and linked with “–gc-sections”.

While the api3() function is correctly gone, the result for the C-string is the following (in Hopper):

Hi there,

When I compile my code with -fdata-sections and -ffunction-sections, I
still see some unused string in my shared library (Android). Actually,
the strings appear together inside a .rodata.str1.1 section instead of
getting their own section. It seems that the C-string literal are
considered differently than other constant and the -fdata-sections is
not respected in
https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799.
[1] I came across the following GCC bug
192 – String literals don't obey -fdata-sections where they have fixed
the issue back in 2015. Any reason not to do so in LLVM?

Usually it is because nobody has noticed the problem or nobody is
motivated enough to fix the problems, not that they intentionally leave
a problem open:) I took some time to look at the problem and conclude
that clang should do nothing on this. Actually, with the clang behavior,
you can discard "Unused" if you use LLD. Read on.

My code example:
- static library 1 : expose functions api1() and api3()

#include "lib1.h"

static char *test = "Test";
static char *unused = "Unused";

void api1(){
printf(test);
}

void api3(){
printf(unused);
}

- shared library : use only function api1() from static library 1

#include "lib1.h"

void test(){
api1();
}

Both compiled with "-fdata-sections -ffunction-sections
-fvisibility=hidden" and linked with "--gc-sections".

While the api3() function is correctly gone, the result for the C-string
is the following (in Hopper):

; Section .rodata.str1.1

; Range: [0x63; 0x6f[ (12 bytes)

; File offset : [151; 163[ (12 bytes)

; Flags: 0x32

; SHT_PROGBITS

; SHF_ALLOC

.L.str:

00000063 db "Test", 0

.L.str.1:

00000068 db "Unused", 0

Links:
------
[1]
https://github.com/llvm/llvm-project/blob/master/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp#L799

In GCC, -O turns on -fmerge-constants. Clang does not implement this
option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++
require each variable, including multiple instances of the same variable
in recursive calls, to have distinct locations, so using this option
results in non-conforming behavior.").

With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16
This is, however, suboptimal because the cost of a section header
(sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large.
I have replied on 192 – String literals don't obey -fdata-sections and
created a GNU ld feature request
(26622 – Support --gc-sections for SHF_MERGE sections)

Usually it is because nobody has noticed the problem or nobody is
motivated enough to fix the problems, not that they intentionally leave
a problem open:) I took some time to look at the problem and conclude
that clang should do nothing on this. Actually, with the clang behavior,
you can discard “Unused” if you use LLD. Read on.

Sorry if I misspoke, I was not suggesting that the bug was known and voluntary not fixed by laziness ;-). I am sure there is a valid reason and wanted to know about it. Just like you explained, it appears that LLVM rely on LLD to do that instead of enforcing it in the middle-end which is a different approach to GCC.

In GCC, -O turns on -fmerge-constants. Clang does not implement this
option, but implement the level 2 -fmerge-all-constants, which is non-conforming (“Languages like C or C++
require each variable, including multiple instances of the same variable
in recursive calls, to have distinct locations, so using this option
results in non-conforming behavior.”).

Non-confirming in the sense of C/C++ standard? How is it related to the -fdata-sections implementation?

With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c16
This is, however, suboptimal because the cost of a section header
(sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large.
I have replied on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=192#c19 and
created a GNU ld feature request
(https://sourceware.org/bugzilla/show_bug.cgi?id=26622)

In my example, LLVM/Clang already put both pointer “test” and “unused” in different data section because of “-fdata-sections” as seen below.

; Segment unnamed segment
; Range: [0x5c; 0x64[ (8 bytes)
; File offset : [144; 152[ (8 bytes)
; Permissions: -

; Section .data.test
; Range: [0x5c; 0x60[ (4 bytes)
; File offset : [144; 148[ (4 bytes)
; Flags: 0x3
; SHT_PROGBITS
; SHF_WRITE
; SHF_ALLOC

test:

0000005c dd 0x00000063

; Section .data.unused
; Range: [0x60; 0x64[ (4 bytes)
; File offset : [148; 153[ (4 bytes)
; Flags: 0x3
; SHT_PROGBITS
; SHF_WRITE
; SHF_ALLOC

unused:

00000060 dw 0x00000070

So I am not sure to understand the point about sub-optimality here since it is already the case for the .data section where each variable imply a suboptimal cost in term of section header. How the c-string like datas are different ? I mean, the concept of -fdata-section/-ffunction-section (“one section for each data/functions”) should be the same for every kind of data, no?

Usually it is because nobody has noticed the problem or nobody is
motivated enough to fix the problems, not that they intentionally leave
a problem open:) I took some time to look at the problem and conclude
that clang should do nothing on this. Actually, with the clang behavior,
you can discard "Unused" if you use LLD. Read on.

Sorry if I misspoke, I was not suggesting that the bug was known and
voluntary not fixed by laziness ;-). I am sure there is a valid reason
and wanted to know about it. Just like you explained, it appears that
LLVM rely on LLD to do that instead of enforcing it in the middle-end
which is a different approach to GCC.

In GCC, -O turns on -fmerge-constants. Clang does not implement this
option, but implement the level 2 -fmerge-all-constants, which is non-conforming ("Languages like C or C++
require each variable, including multiple instances of the same variable
in recursive calls, to have distinct locations, so using this option
results in non-conforming behavior.").

Non-confirming in the sense of C/C++ standard? How is it related to the
-fdata-sections implementation?

With (-fmerge-constants or -fmerge-all-constants) & -fdata-sections, string literals are placed in .rodata.xxx.str1.1
192 – String literals don't obey -fdata-sections
This is, however, suboptimal because the cost of a section header
(sizeof(Elf64_Shdr)=64) + a section name (".rodata.xxx.str1.1") is quite large.
I have replied on 192 – String literals don't obey -fdata-sections and
created a GNU ld feature request
(26622 – Support --gc-sections for SHF_MERGE sections)

In my example, LLVM/Clang already put both pointer "test" and "unused"
in different data section because of "-fdata-sections" as seen below.

Your example uses global mutable variables "test" and "unused" and that
is why they are in the .data.* sections. They are initialized to
addresses of string literals in .rodata.* . .rodata.* are what we care
about, not .data.* (.data.* can always be correctly garbage collected by
GNU ld/gold/LLD).

Your example uses global mutable variables “test” and “unused” and that
is why they are in the .data.* sections. They are initialized to
addresses of string literals in .rodata.* . .rodata.* are what we care
about, not .data.* (.data.* can always be correctly garbage collected by
GNU ld/gold/LLD).

Of course, the issue here is .rodata.. I use the .data. section as a counterexample
but it could be any section. I compare those two sections because they contain both
small datas and the ratio “section header size” vs “data size” is not optimal.

But my point is: Why the implementation of -fdata-sections should differ between .data.*
and .rodata.* sections? Or why .rodata.* should be treated differently?

If the only reason is because it is suboptimal due to the additional section header,
this is definitely not a valid reason. Having everything in its own section is the
purpose of the -f*-sections and allows the linker to easily strip them. I really
don’t get the exception made for .rodata.* here.

I’ve not looked at the GNU behaviour, but the purpose of -fdata-sections in my experience is to provide the ability to do section-level operations like --gc-sections, --icf, symbol ordering etc. SHF_MERGE sections, such as .rodata string and integer literal sections, are internally fragmentable by the linker using rules defined by the ABI, so a linker can already easily strip them without -fdata-sections splitting them up. LLD does this, and I don’t see why other linkers couldn’t either (obviously this might require the functionality to be implemented properly in those linkers).

Input section overhead is a big deal, based on experiments I’ve recently been doing, both because of the additional incurred I/O and the time it takes in the linker to process the additional sections. If an object has many literals, this will drastically bloat the input size, which imposes a cost on people who want to be able to do proper --gc-sections [–icf etc] who use LLD, for no gain.