RFC: Enhancing function alignment attributes

FYI @efriedma-quic @MaskRay

This RFC describes the proposed changes to ELF and the IR to support a notion of preferred alignment for GlobalObjects which is separate from the minimum alignment.

The initial use case is a recently proposed enhancement for CFI jump tables known as jump table relaxation. CFI jump table relaxation reduces the overhead of a CFI protected indirect call by inlining the function body into the jump table itself, as long as it is small enough. In order to achieve this, we must know the function’s minimum alignment as well as its preferred alignment. The purpose of using a preferred alignment larger than the minimum alignment is generally to enhance performance, but in the case of jump table relaxation we can expect it to be better for performance to inline the jump table entry (8 bytes on x86) than to obey the function’s preferred alignment (16 bytes on x86). Additionally, we must ensure that relaxing the jump table entry will not cause misbehavior at runtime, so we must obey the function’s minimum alignment and refrain from inlining the function body if it requires too much alignment.

IR

There is currently a single align field on GlobalObject. Prior to #149444 we have the following logic:

  1. If the align field is unset, a function’s alignment is the backend-determined minimum alignment if -Os or -Oz, otherwise the backend-determined preferred alignment.
  2. If the align field is set, the function’s alignment is the max of the align attribute and the alignment computed by (1).

The alignment via -falign-functions was ignored if it was less than the preferred alignment. This behavior was observed to be inconsistent with GCC. With #149444, step 2 was changed to “If the align field is set, the function’s alignment is the max of the align attribute and the minimum alignment”, bringing the behavior in line with GCC.

A new minalign field will be added to GlobalObject. This shall have type Align (instead of MaybeAlign) and will default to 1. For brevity (and to avoid churn), minalign will only be printed if it is not equal to 1.

As part of splitting up the attributes, the following is proposed as the logic for deciding a function’s alignment:

  1. A function’s minimum alignment is the max of the minalign attribute and backend-determined minimum alignment.
  2. A function’s preferred alignment is the align attribute if set, otherwise the backend-determined minimum alignment if -Os or -Oz, otherwise the backend-determined preferred alignment. Additionally, the preferred alignment shall be at least the minimum alignment.

The existing accessor names will be updated for clarity: GlobalObject::{getAlign,setAlignment} shall be renamed GlobalObject::{get,set}PreferredAlignment. The new accessors shall be named GlobalObject::{get,set}MinAlignment().

Clang will be updated to set minalign 2 on member functions instead of align 2. As a result, member functions will usually receive the preferred alignment, fixing the regression from #149444.

Object file (ELF)

To represent the preferred and minimum alignments in ELF, it is proposed to introduce a new SHT_LLVM_MIN_ADDRALIGN section, which is used to specify the minimum alignment of a section where that differs from its preferred alignment. Its sh_link field identifies the section whose alignment is
being specified, its sh_addralign field specifies the linked section’s minimum alignment and the sh_addralign field of the linked section’s section header specifies its preferred alignment. This section has the SHF_EXCLUDE flag so that it is stripped from the final executable or shared library, and the SHF_LINK_ORDER flag so that the sh_link field is updated by tools such as ld -r and objcopy. The contents of the section must be empty.

The new asm directive:

.prefalign n

specifies that the preferred alignment of the current section is determined by taking the maximum of n and the section’s minimum alignment, and causes an SHT_LLVM_MIN_ADDRALIGN section to be emitted if necessary.

The preferred alignment section is an opt-in feature. Because the initial anticipated use case (specifically CFI jump tables) requires LTO, it is expected that LTO clients (linkers) with support for the minimum alignment section will opt in via the API. For the same reason, there will not be a user facing (clang driver) flag for opting in for the time being. If preferred alignment is disabled (or unrepresentable, in the case of non-ELF object formats), the preferred alignment shall be stored as the only alignment in the object file, and CodeGen will emit .balign or .p2align instead of .prefalign.

The proposed ELF extension is backwards compatible with linkers that do not recognize the new section type. Linkers that do not support the section type will read the section’s sh_addralign field containing the preferred alignment and treat it as the minimum alignment, which will result in conservatively correct behavior, as the preferred alignment will always be at least as large as the minimum alignment.

The initial change to support the ELF extension is #150151. If this RFC is accepted, further changes will be developed to teach CodeGen to emit the new directive, and reimplement part of #147424 to read the new section.

The effect on -falign-functions

-falign-functions shall set both the preferred alignment and minimum alignment attributes, to maintain consistency with GCC.

To set the preferred function alignment on its own, a new flag is proposed, which shall be named -fpreferred-function-alignment.

For reference, I already did some cleanups in Remove GlobalObject::getAlign/setAlignment by efriedma-quic · Pull Request #143188 · llvm/llvm-project · GitHub , so we can modify the Function alignment APIs without impacting other GlobalObjects.

Changing the existing “Alignment” to be the minimum alignment seems obvious: it matches the ways we naturally query alignment. And allowing the frontend to specify a preferred alignment on a per-function basis, which can be overridden by later optimizations if necessary, also seems like a good idea. For example, you can specify your preferred alignment, and let PGO override that preference for cold functions, or something along those lines.

The need for the ELF extension seems predicated on the assumption that we request 16-byte alignment for functions that have size 8 bytes or less. But that seems like something we could fix: there’s not much point to requesting 16-byte alignment for a function of size 8 bytes or less. The primary reason for aligning a function entry is to ensure the beginning of the function doesn’t cross icache boundaries, so we only really need 8-byte alignment for an 8-byte function. We could extend the assembler so it requests less alignment for such functions. (I don’t think there’s any way to write that right now in GNU-style assembly, but I can’t think of any obstacle to implementing an assembler directive.)

If we have that, I don’t think you need the ELF extension for CFI jump tables? And I’m not sure how helpful the ELF extension is outside of that.

I don’t think that changing the meaning of align to no longer mean “minimum alignment” as done in CodeGen: Respect function align attribute if less than preferred alignment. by pcc · Pull Request #149444 · llvm/llvm-project · GitHub is appropriate. The meaning of align (on allocations/objects) everywhere in LLVM is a minimum alignment, which can be increased if considered profitable.

If you want to introduce a separate preferred alignment property, the way to do it is to leave align alone and add a separate prefalign. Not to change the meaning of align and add minalign.

Thanks for the feedback. I reverted #149444 while we figure out what to do. Keeping the existing attribute as minimum alignment and adding a preferred alignment attribute sounds reasonable to me.

This might work. We currently pass -falign-functions=32 in our internal builds in order to reduce the measurement bias effect of functions changing size. Without the ELF extension, this flag would also affect sh_addralign and would therefore end up preventing jump table relaxation. But we could also consider setting sh_addralign to the lowest power of 2 >= the function size if the function’s size is between the minimum and preferred alignment. Then instead of passing -falign-functions=32 we could start passing the new flag -fpreferred-function-alignment=32. This may also lead to a general performance improvement due to lower TLB/icache pressure. Let me experiment with that and see how well it works.

I agree that the function attribute align to indicate the minimum alignment (the original behavior before the reverted #149444) is useful. I jotted down some notes in the " Aligning code for performance" chapter of this post: https://maskray.me/blog/2025-08-24-understanding-alignment-from-source-to-object-file

Implementing this as an assembler directive with a complex expression (label difference) operand is likely impractical.
Ideally LLVMCodeGen should estimate the function size and emit a suitable alignment directive:

// rejected as intended today
.p2align 4, , b-a
a:
  nop
b:

The first draft of GCC’s -flimit-function-alignment actually found a lowest power of 2 >= the function size, but it was considered not useful.

Aligning small functions can be inefficient and may not be worth the overhead. To address this, GCC introduced -flimit-function-alignment in 2016. The option sets .p2align directive’s max-skip operand to the estimated function size minus one.

% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 | grep p2align
        .p2align 4
% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 -flimit-function-alignment | p2align
        .p2align 4,,3

In LLVM, the x86 backend does not implement TargetInstrInfo::getInstSizeInBytes, making it challenging to implement -flimit-function-alignment.

For anything other than x86, I’d say sure, let the backend estimate it; we have accurate codesize estimates anyway for branch relaxation. But x86 does branch relaxation in the assembler, so from what I recall we don’t have good size estimates, so we might need some cooperation from the assembler to estimate the size.

For our heuristics, especially with functions around 32 bytes (as we consider -falign-functions=32), we can make a simple assumption: all JMP and JCC instructions will fit within 2 bytes (If the 5-byte variant is needed, the function would be larger than 128 bytes). A very rough estimate should suffice, as reaching 80% accuracy is likely achievable and provides enough mitigation for the initial jump table issue.

Extending the .p2align directive to support complex expressions–like label differences–is a tricky path. It could lead to layout non-convergence (.align and .org should avoid utilizing information from a subsequent fragment), and would require using that complex attemptToFoldSymbolOffsetDifference code. I think we should reconsider before we get too deep into it.

Instead of using the CodeGen size estimation, the simplest solution would seem to be to have the assembler increase sh_addralign based on a supplied (via the .prefalign directive) preferred alignment which is tracked separately from the minimum alignment.

The downside vs a size estimation based approach is that this won’t be compatible with -fno-function-sections and external assemblers, but maybe that’s fine; the initial use case for preferred alignment (jump tables) depends on function sections and the integrated assembler anyway.

I ran some experiments internally and found that there was no statistically significant performance difference between a fixed alignment of 32 and an alignment based on the function size.

Sent patches implementing this approach:

The revised approach makes sense to me.

Since I posted the message above some patches were requested to be split up, so for clarity here is the full list of patches in order. This list also includes the CFI jump table relaxation series (last 4 patches).

Just a quick ping on all of the patches, which are still awaiting review.