LLD output section flag assignment behavior

Hi folks, I have some comments/questions/observations about how LLD assigns flags to output sections. Maybe @smithp35 can have some ideas from the embedded perspective.

Consider the following simple scenario, that is valid for all targets (results presented for X86):

Asm file:

.globl _start, "ar"
_start:
nop
nop
nop

Linker script:

MEMORY { int_main (rx): org = 0x10000000, len = 0x00100000 }
 
HEAP_SIZE = 1K;
 
SECTIONS {
        .code : {  *(.text*) } > int_main
}
 
SECTIONS
{
.CPU0.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main
.CPU1.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main
.CPU2.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main

}

Result using LLD:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .code             PROGBITS        0000000010000000 001000 000003 00  AX  0   0  1
  [ 2] .CPU0.heap        NOBITS          0000000010000040 001003 000400 00  AX  0   0 64
  [ 3] .CPU1.heap        NOBITS          0000000010000440 001003 000400 00  AX  0   0 64
  [ 4] .CPU2.heap        NOBITS          0000000010000840 001003 000400 00  AX  0   0 64
  ...
 
text    data     bss     dec     hex filename
3075       0       0    3075     c03 a-lld.elf

Result: Heap allocation (just an example) has AX flags, later on, llvm-size accounts them as text sections.

Result considering Gnu ld:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .code             PROGBITS        0000000010000000 001000 000003 00  AX  0   0  1
  [ 2] .note.gnu.property NOTE           0000000010000008 001008 000030 00   A  0   0  8
  [ 3] .CPU0.heap        NOBITS          0000000010000040 001038 000400 00  WA  0   0 64
  [ 4] .CPU1.heap        NOBITS          0000000010000440 001038 000400 00  WA  0   0 64
  [ 5] .CPU2.heap        NOBITS          0000000010000840 001038 000400 00  WA  0   0 64
 
text    data     bss     dec     hex filename
  51       0    3072    3123     c33 a-ld.elf

If we swap the assignments:

MEMORY { int_main (rx): org = 0x10000000, len = 0x00100000 }

HEAP_SIZE = 1K;
SECTIONS
{
   .CPU0.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . +=  HEAP_SIZE; } > int_main
   .CPU1.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . +=  HEAP_SIZE; } > int_main
   .CPU2.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . +=  HEAP_SIZE; } > int_main
}

SECTIONS {
       .code : { *(.text*) } > int_main
}

We get also different flags, without X, but lld also reports as a text section (not a problem, actually):

Section Headers: Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .CPU0.heap        NOBITS          0000000010000000 001000 000400 00   A  0   0 64
  [ 2] .CPU1.heap        NOBITS          0000000010000400 001000 000400 00   A  0   0 64
  [ 3] .CPU2.heap        NOBITS          0000000010000800 001000 000400 00   A  0   0 64
  [ 4] .code             PROGBITS        0000000010000c00 001c00 000003 00  AX  0   0  1
  ...
 
text    data     bss     dec     hex filename
3075       0       0    3075     c03 a-lld.elf

Result considering binutils ld:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .CPU0.heap        NOBITS          0000000010000000 001000 000400 00  WA  0   0 64
  [ 2] .CPU1.heap        NOBITS          0000000010000400 001000 000400 00  WA  0   0 64
  [ 3] .CPU2.heap        NOBITS          0000000010000800 001000 000400 00  WA  0   0 64
  [ 4] .code             PROGBITS        0000000010000c00 001c00 000003 00  AX  0   0  1
 
text    data     bss     dec     hex filename
  51       0    3072    3123     c33 a-ld.elf

My question is, is this behavior intentional? This can impact embedded development, as llvm-size is a handy tool to evaluate code size but sometimes the result can be a bit strange.

Thank you very much.

Andreu

I can tell you what I know as I’ve seen some similar questions before. I think this area falls into the cracks of unspecified behaviour (best available specification being the GNU ld documentation makes no mention of what the behaviour should be.

If you have an OutputSection that contains no Input Sections such as:

.CPU0.heap (NOLOAD) : ALIGN(64) { . = ALIGN(64); . +=  HEAP_SIZE; }

However the linker does not have any information with which to set the type and flags for the section.

As I understand it GNU ld (without any TYPE=<type or NOLOAD) will use SHT_PROGBITS for the type and “SHF_WRITE, SHF_ALLOC” for the flags

LLD will inherit the type and flags from the previous Output Section which in this case would normally be SHT_PROGBITS, “SHF_ALLOC”, “SHF_EXECINSTR” however the NOLOAD has changed the type to SHT_NOBITS.

I think LLD’s inheritance of type and flags is intentional, however I think the combination of NOLOAD and SHF_EXECINSTR is nonsensical and the linker should probably clear SHF_EXECINSTR when NOLOAD is given. Feel free to raise a github issue for that.

One small modification you could make to your linker script would be to add a zero sized section before .CPU0.heap with the type and flags that you want so that the heap sections have the flags that you want.

SECTIONS
{
.wa (NOLOAD) : { *(.wa) } > int_main
.CPU0.heap (TYPE=SHT_NOBITS) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main
.CPU1.heap (TYPE=SHT_NOBITS) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main
.CPU2.heap (TYPE=SHT_NOBITS) : ALIGN(64) { . = ALIGN(64); . += HEAP_SIZE; } > int_main

}

Where .wa is just:

.section .wa, "wa", %nobits

Hope that is of some use.

Hi @smithp35 , thank you very much for your view. This workaround is quite interesting for a system written from scratch considering the LLD behavior. However, in some cases, we have just built really complex embedded systems originally written for Gcc/ld and we spot this artificial code increase, that for the normal user, cannot be obviously related to flags.

Feel free to raise a github issue for that.

I can create a PR for sure. Just one more question: in case a NOLOAD section, comprised of no input sections, together with clearing SHF_EXECINSTR, do you think it is a good idea to enable SHF_WRITE in this case? By doing this we will report the text size more consistently.

Thank you very much.

Andreu

For NOLOAD with no input sections I agree that SHF_ALLOC, SHF_WRITE (WA) is the best default as this is usually associated with a stack or heap location which will be writeable.

I will consider this scenario in the PR.

Thanks.

An output sections annotated with (NOLOAD) with just data commands used to get SHF_WRITE in GNU ld but GNU ld removed the forced SHF_WRITE in 2021: 26378 – sections initialised only by linker scripts are always read/write

It seems that . += HEAP_SIZE; causes GNU ld to set the SHF_WRITE|SHF_ALLOC flags, which is different from data commands. But I am unsure we want to follow the inconsistent special rule.

I feel that LLD’s current behavior is sensible and consistent, and we probably should not special case SHT_NOBITS and add complexity to the code.

Yes, this seems appropriate. An alternative is to have a zero-size .CPU0.heap input section with the desired flags.

Hi @MaskRay, thank you for your reply together with all this context. I think you are right about this case, it is inconsistent. I can also filter these cases (nobits ( TYPE=SHT_NOBITS) : { BYTE(8) }) in the PR if you think we can have this compatibility.

I have one more question, what do you think about the X flag in the previous example? I think this is the real problem for estimating the code size.

Best regards.

I agree that copying SHF_EXECINSTR for such an output section with no input section is undesired.
After investigating the history of the behaviors, I decide to drop SHF_EXECINSTR but keep SHF_WRITE: [ELF] adjustOutputSections: don't copy SHF_EXECINSTR when an output does not contain input sections by MaskRay · Pull Request #70911 · llvm/llvm-project · GitHub

edit: landed as [ELF] adjustOutputSections: don't copy SHF_EXECINSTR when an output d… · llvm/llvm-project@a40f651 · GitHub

1 Like