Creating SHF_MERGE|SHF_STRINGS section

shuffle2 · May 18, 2025, 5:30pm

I’m compiling existing code developed against gcc with clang/lld. The code has construct similar to:
__attribute__((section(".my_const,\"MSa\",@progbits,1 #")))
which is applied to every data symbol which should reside in this section, the goal being that the toolchain performs string deduplication and merging to save space.

On gcc, this works fine. On clang, it’s not possible to “inject” assembler input into section attribute like this.

As alternative, I’ve tried adding __asm__(".section \".my_const\",\"MSa\",@progbits,1\n"), hoping that the data symbols marked with section((".my_const")) will then fall into that pre-defined section. Instead, I see that the output file has both “.my_const” sections: 1 with my given SHF_MERGE|SHF_STRINGS|SHF_ALLOC, sh_entsize=1 settings, and another that is created by the toolchain. All symbols actually wind up getting allocated into the toolchain-defined section, instead of the mergeable one. e.g.

[Nr] Name         Type            Address          Off    Size   ES Flg Lk Inf Al
[ 3] .my_const    PROGBITS        0000000000000000 000040 000000 01 AMS  0   0  1
[ 9] .my_const    PROGBITS        0000000000000000 0004a0 000124 00   A  0   0  1

There is no space savings with this approach. It’s confusing to me as a user that the section name is not unique.

Another thing I’ve tried is to run llvm-objcopy --set-section-flags=.my_const=merge,strings,alloc as a post-build step (in this build, the objects are compiled and then linked in another step - I’ve added the objcopy step before the final link). While this does set the flags correctly on the intermediate object files, there is still no space savings/merging occurring when doing the final link.

Is there a way to get similar behavior on llvm as gcc in this case?

Completely different approach I’m considering is to compress the sections with -Wl,--compress-sections==.my_const=zlib, however being compatible with existing construct would be better.

MaskRay · May 21, 2025, 5:18am

Is there a way to get similar behavior on llvm as gcc in this case?

No As you have noticed, __attribute__((section(".my_const,\"MSa\",@progbits,1 #"))) style injection doesn’t work.
This looks like a hack even in GCC, where separates assembly generation (cc1) and object file generation (gas) are separate processes.

As alternative, I’ve tried adding asm(“.section ".my_const","MSa",@progbits,1\n”), hoping that the data symbols marked with section((“.my_const”)) will then fall into that pre-defined section. Instead, I see that the output file has both “.my_const” sections: 1 with my given SHF_MERGE|SHF_STRINGS|SHF_ALLOC, sh_entsize=1 settings, and another that is created by the toolchain. All symbols actually wind up getting allocated into the toolchain-defined section, instead of the mergeable one. e.g.

% cat y.c
__asm__(".section \".my_const\",\"MSa\",@progbits,1\n");
__attribute__((section(".my_const")))
char g[] = "hello";
% clang -S y.c -o -
        .file   "y.c"
                                        # Start of file scope inline assembly
        .section        .my_const,"aMS",@progbits,1

                                        # End of file scope inline assembly
        .type   g,@object                       # @g
        .section        .my_const,"aw",@progbits,unique,1
        .globl  g
...

There are two different .my_const sections because the second one uses ,unique,1. ⚙ D100944 [MC][ELF] Emit separate unique sections for different flags

Without the ,unique,1 there will be an assembler error (both LLVM and gas)

% /tmp/Debug/bin/clang -c y.s
y.s:7:2: error: changed section flags for .my_const, expected: 0x32
        .section        .my_const,"aw",@progbits
        ^
y.s:7:2: error: changed section entsize for .my_const, expected: 1
        .section        .my_const,"aw",@progbits
        ^

(There is another minor issue. clang -S y.c && clang -c y.s output is different from clang -c y.c because LLVM’s .section parser doesn’t unescape the quoted string.)

Another thing I’ve tried is to run llvm-objcopy --set-section-flags=.my_const=merge,strings,alloc as a post-build step (in this build, the objects are compiled and then linked in another step - I’ve added the objcopy step before the final link). While this does set the flags correctly on the intermediate object files, there is still no space savings/merging occurring when doing the final link.

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000040 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000040 000000 00  WA  0   0  1
  [ 4] .my_const         PROGBITS        0000000000000000 000040 000000 01 WAMS  0   0  1
  [ 5] .my_const         PROGBITS        0000000000000000 000040 000006 00 WAMS  0   0  1

The non-empty .my_const section is not merged because its sh_entsize member (ES) is 0.

shuffle2 · May 21, 2025, 5:37pm

Thanks for the explaination.

The non-empty .my_const section is not merged because its sh_entsize member (ES ) is 0.

As far as I understand, there’s no supported way to control sh_entsize here, is that correct? I could manually change it with some script during build (I guess that’s no more gross than having to use objcopy to set the flags).

It would be nice to somehow have the ability to mark arbitrary sections with MERGE|STRINGS,entsize=1, or otherwise induce the string merging. I guess it’s debatable if this should be specialized to the string merging use case, or implemented in a more generic way which allows the user to freely modify section properties (as gcc does, albeit maybe not via the extended-section-attribute hack).

I wonder if another approach could be (TYPE=SHT_STRTAB) in linker script, to coerce this behavior.

MaskRay · May 22, 2025, 5:20am

You are right. There is no way to control sh_entsize with GNU Assembbler/LLVM integrated assembler/llvm-objcopy. You’ll have to write a custom binary manipulation tool (probably no existing one…). The GCC section attribute lacks a way to customize section type, flags, and sh_entsize. Ideally we should agree with them on the exact syntax.

I wanted to set SHT_INIT_ARRAY properly in the past.

Topic		Replies	Views
funny llvm bug LLVM Dev List Archives	16	104	April 19, 2013
how clang merge strings in .rodata section Using Clang	4	288	July 10, 2018
How to tell whether a GlobalValue is user-defined LLVM Dev List Archives	15	116	August 28, 2014
Using address space attribute in LLVM LLVM Dev List Archives	7	151	October 21, 2009
Problem of ELF section attribute LLVM Dev List Archives	1	84	February 13, 2008

Creating SHF_MERGE|SHF_STRINGS section

Related topics