Creating SHF_MERGE|SHF_STRINGS section

I’m compiling existing code developed against gcc with clang/lld. The code has construct similar to:
__attribute__((section(".my_const,\"MSa\",@progbits,1 #")))
which is applied to every data symbol which should reside in this section, the goal being that the toolchain performs string deduplication and merging to save space.

On gcc, this works fine. On clang, it’s not possible to “inject” assembler input into section attribute like this.

As alternative, I’ve tried adding __asm__(".section \".my_const\",\"MSa\",@progbits,1\n"), hoping that the data symbols marked with section((".my_const")) will then fall into that pre-defined section. Instead, I see that the output file has both “.my_const” sections: 1 with my given SHF_MERGE|SHF_STRINGS|SHF_ALLOC, sh_entsize=1 settings, and another that is created by the toolchain. All symbols actually wind up getting allocated into the toolchain-defined section, instead of the mergeable one. e.g.

[Nr] Name         Type            Address          Off    Size   ES Flg Lk Inf Al
[ 3] .my_const    PROGBITS        0000000000000000 000040 000000 01 AMS  0   0  1
[ 9] .my_const    PROGBITS        0000000000000000 0004a0 000124 00   A  0   0  1

There is no space savings with this approach. It’s confusing to me as a user that the section name is not unique.

Another thing I’ve tried is to run llvm-objcopy --set-section-flags=.my_const=merge,strings,alloc as a post-build step (in this build, the objects are compiled and then linked in another step - I’ve added the objcopy step before the final link). While this does set the flags correctly on the intermediate object files, there is still no space savings/merging occurring when doing the final link.

Is there a way to get similar behavior on llvm as gcc in this case?

Completely different approach I’m considering is to compress the sections with -Wl,--compress-sections==.my_const=zlib, however being compatible with existing construct would be better.

Is there a way to get similar behavior on llvm as gcc in this case?

No :frowning: As you have noticed, __attribute__((section(".my_const,\"MSa\",@progbits,1 #"))) style injection doesn’t work.
This looks like a hack even in GCC, where separates assembly generation (cc1) and object file generation (gas) are separate processes.

As alternative, I’ve tried adding asm(“.section ".my_const","MSa",@progbits,1\n”), hoping that the data symbols marked with section((“.my_const”)) will then fall into that pre-defined section. Instead, I see that the output file has both “.my_const” sections: 1 with my given SHF_MERGE|SHF_STRINGS|SHF_ALLOC, sh_entsize=1 settings, and another that is created by the toolchain. All symbols actually wind up getting allocated into the toolchain-defined section, instead of the mergeable one. e.g.

% cat y.c
__asm__(".section \".my_const\",\"MSa\",@progbits,1\n");
__attribute__((section(".my_const")))
char g[] = "hello";
% clang -S y.c -o -
        .file   "y.c"
                                        # Start of file scope inline assembly
        .section        .my_const,"aMS",@progbits,1

                                        # End of file scope inline assembly
        .type   g,@object                       # @g
        .section        .my_const,"aw",@progbits,unique,1
        .globl  g
...

There are two different .my_const sections because the second one uses ,unique,1. ⚙ D100944 [MC][ELF] Emit separate unique sections for different flags

Without the ,unique,1 there will be an assembler error (both LLVM and gas)

% /tmp/Debug/bin/clang -c y.s
y.s:7:2: error: changed section flags for .my_const, expected: 0x32
        .section        .my_const,"aw",@progbits
        ^
y.s:7:2: error: changed section entsize for .my_const, expected: 1
        .section        .my_const,"aw",@progbits
        ^

(There is another minor issue. clang -S y.c && clang -c y.s output is different from clang -c y.c because LLVM’s .section parser doesn’t unescape the quoted string.)

Another thing I’ve tried is to run llvm-objcopy --set-section-flags=.my_const=merge,strings,alloc as a post-build step (in this build, the objects are compiled and then linked in another step - I’ve added the objcopy step before the final link). While this does set the flags correctly on the intermediate object files, there is still no space savings/merging occurring when doing the final link.

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000040 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000040 000000 00  WA  0   0  1
  [ 4] .my_const         PROGBITS        0000000000000000 000040 000000 01 WAMS  0   0  1
  [ 5] .my_const         PROGBITS        0000000000000000 000040 000006 00 WAMS  0   0  1

The non-empty .my_const section is not merged because its sh_entsize member (ES) is 0.

1 Like

Thanks for the explaination.

The non-empty .my_const section is not merged because its sh_entsize member (ES ) is 0.

As far as I understand, there’s no supported way to control sh_entsize here, is that correct? I could manually change it with some script during build (I guess that’s no more gross than having to use objcopy to set the flags).

It would be nice to somehow have the ability to mark arbitrary sections with MERGE|STRINGS,entsize=1, or otherwise induce the string merging. I guess it’s debatable if this should be specialized to the string merging use case, or implemented in a more generic way which allows the user to freely modify section properties (as gcc does, albeit maybe not via the extended-section-attribute hack).

I wonder if another approach could be (TYPE=SHT_STRTAB) in linker script, to coerce this behavior.

You are right. There is no way to control sh_entsize with GNU Assembbler/LLVM integrated assembler/llvm-objcopy. You’ll have to write a custom binary manipulation tool (probably no existing one…). The GCC section attribute lacks a way to customize section type, flags, and sh_entsize. Ideally we should agree with them on the exact syntax.

I wanted to set SHT_INIT_ARRAY properly in the past.