RFC: Enhancing function alignment attributes

pcc · August 21, 2025, 11:29pm

This RFC describes the proposed changes to ELF and the IR to support a notion of preferred alignment for GlobalObjects which is separate from the minimum alignment.

The initial use case is a recently proposed enhancement for CFI jump tables known as jump table relaxation. CFI jump table relaxation reduces the overhead of a CFI protected indirect call by inlining the function body into the jump table itself, as long as it is small enough. In order to achieve this, we must know the function’s minimum alignment as well as its preferred alignment. The purpose of using a preferred alignment larger than the minimum alignment is generally to enhance performance, but in the case of jump table relaxation we can expect it to be better for performance to inline the jump table entry (8 bytes on x86) than to obey the function’s preferred alignment (16 bytes on x86). Additionally, we must ensure that relaxing the jump table entry will not cause misbehavior at runtime, so we must obey the function’s minimum alignment and refrain from inlining the function body if it requires too much alignment.

IR

There is currently a single align field on GlobalObject. Prior to #149444 we have the following logic:

If the align field is unset, a function’s alignment is the backend-determined minimum alignment if -Os or -Oz, otherwise the backend-determined preferred alignment.
If the align field is set, the function’s alignment is the max of the align attribute and the alignment computed by (1).

The alignment via -falign-functions was ignored if it was less than the preferred alignment. This behavior was observed to be inconsistent with GCC. With #149444, step 2 was changed to “If the align field is set, the function’s alignment is the max of the align attribute and the minimum alignment”, bringing the behavior in line with GCC.

A new minalign field will be added to GlobalObject. This shall have type Align (instead of MaybeAlign) and will default to 1. For brevity (and to avoid churn), minalign will only be printed if it is not equal to 1.

As part of splitting up the attributes, the following is proposed as the logic for deciding a function’s alignment:

A function’s minimum alignment is the max of the minalign attribute and backend-determined minimum alignment.
A function’s preferred alignment is the align attribute if set, otherwise the backend-determined minimum alignment if -Os or -Oz, otherwise the backend-determined preferred alignment. Additionally, the preferred alignment shall be at least the minimum alignment.

The existing accessor names will be updated for clarity: GlobalObject::{getAlign,setAlignment} shall be renamed GlobalObject::{get,set}PreferredAlignment. The new accessors shall be named GlobalObject::{get,set}MinAlignment().

Clang will be updated to set minalign 2 on member functions instead of align 2. As a result, member functions will usually receive the preferred alignment, fixing the regression from #149444.

Object file (ELF)

To represent the preferred and minimum alignments in ELF, it is proposed to introduce a new SHT_LLVM_MIN_ADDRALIGN section, which is used to specify the minimum alignment of a section where that differs from its preferred alignment. Its sh_link field identifies the section whose alignment is
being specified, its sh_addralign field specifies the linked section’s minimum alignment and the sh_addralign field of the linked section’s section header specifies its preferred alignment. This section has the SHF_EXCLUDE flag so that it is stripped from the final executable or shared library, and the SHF_LINK_ORDER flag so that the sh_link field is updated by tools such as ld -r and objcopy. The contents of the section must be empty.

The new asm directive:

.prefalign n

specifies that the preferred alignment of the current section is determined by taking the maximum of n and the section’s minimum alignment, and causes an SHT_LLVM_MIN_ADDRALIGN section to be emitted if necessary.

The preferred alignment section is an opt-in feature. Because the initial anticipated use case (specifically CFI jump tables) requires LTO, it is expected that LTO clients (linkers) with support for the minimum alignment section will opt in via the API. For the same reason, there will not be a user facing (clang driver) flag for opting in for the time being. If preferred alignment is disabled (or unrepresentable, in the case of non-ELF object formats), the preferred alignment shall be stored as the only alignment in the object file, and CodeGen will emit .balign or .p2align instead of .prefalign.

The proposed ELF extension is backwards compatible with linkers that do not recognize the new section type. Linkers that do not support the section type will read the section’s sh_addralign field containing the preferred alignment and treat it as the minimum alignment, which will result in conservatively correct behavior, as the preferred alignment will always be at least as large as the minimum alignment.

The initial change to support the ELF extension is #150151. If this RFC is accepted, further changes will be developed to teach CodeGen to emit the new directive, and reimplement part of #147424 to read the new section.

The effect on -falign-functions

-falign-functions shall set both the preferred alignment and minimum alignment attributes, to maintain consistency with GCC.

To set the preferred function alignment on its own, a new flag is proposed, which shall be named -fpreferred-function-alignment.

efriedma-quic · August 22, 2025, 12:23am

For reference, I already did some cleanups in Remove GlobalObject::getAlign/setAlignment by efriedma-quic · Pull Request #143188 · llvm/llvm-project · GitHub , so we can modify the Function alignment APIs without impacting other GlobalObjects.

Changing the existing “Alignment” to be the minimum alignment seems obvious: it matches the ways we naturally query alignment. And allowing the frontend to specify a preferred alignment on a per-function basis, which can be overridden by later optimizations if necessary, also seems like a good idea. For example, you can specify your preferred alignment, and let PGO override that preference for cold functions, or something along those lines.

The need for the ELF extension seems predicated on the assumption that we request 16-byte alignment for functions that have size 8 bytes or less. But that seems like something we could fix: there’s not much point to requesting 16-byte alignment for a function of size 8 bytes or less. The primary reason for aligning a function entry is to ensure the beginning of the function doesn’t cross icache boundaries, so we only really need 8-byte alignment for an 8-byte function. We could extend the assembler so it requests less alignment for such functions. (I don’t think there’s any way to write that right now in GNU-style assembly, but I can’t think of any obstacle to implementing an assembler directive.)

If we have that, I don’t think you need the ELF extension for CFI jump tables? And I’m not sure how helpful the ELF extension is outside of that.

nikic · August 22, 2025, 8:14am

I don’t think that changing the meaning of align to no longer mean “minimum alignment” as done in CodeGen: Respect function align attribute if less than preferred alignment. by pcc · Pull Request #149444 · llvm/llvm-project · GitHub is appropriate. The meaning of align (on allocations/objects) everywhere in LLVM is a minimum alignment, which can be increased if considered profitable.

If you want to introduce a separate preferred alignment property, the way to do it is to leave align alone and add a separate prefalign. Not to change the meaning of align and add minalign.

pcc · August 22, 2025, 8:20pm

Thanks for the feedback. I reverted #149444 while we figure out what to do. Keeping the existing attribute as minimum alignment and adding a preferred alignment attribute sounds reasonable to me.

This might work. We currently pass -falign-functions=32 in our internal builds in order to reduce the measurement bias effect of functions changing size. Without the ELF extension, this flag would also affect sh_addralign and would therefore end up preventing jump table relaxation. But we could also consider setting sh_addralign to the lowest power of 2 >= the function size if the function’s size is between the minimum and preferred alignment. Then instead of passing -falign-functions=32 we could start passing the new flag -fpreferred-function-alignment=32. This may also lead to a general performance improvement due to lower TLB/icache pressure. Let me experiment with that and see how well it works.

MaskRay · August 25, 2025, 4:25am

I agree that the function attribute align to indicate the minimum alignment (the original behavior before the reverted #149444) is useful. I jotted down some notes in the " Aligning code for performance" chapter of this post: https://maskray.me/blog/2025-08-24-understanding-alignment-from-source-to-object-file

Implementing this as an assembler directive with a complex expression (label difference) operand is likely impractical.
Ideally LLVMCodeGen should estimate the function size and emit a suitable alignment directive:

// rejected as intended today
.p2align 4, , b-a
a:
  nop
b:

The first draft of GCC’s -flimit-function-alignment actually found a lowest power of 2 >= the function size, but it was considered not useful.

Aligning small functions can be inefficient and may not be worth the overhead. To address this, GCC introduced -flimit-function-alignment in 2016. The option sets .p2align directive’s max-skip operand to the estimated function size minus one.

% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 | grep p2align
        .p2align 4
% echo 'int add1(int a){return a+1;}' | gcc -O2 -S -fcf-protection=none -xc - -o - -falign-functions=16 -flimit-function-alignment | p2align
        .p2align 4,,3

In LLVM, the x86 backend does not implement TargetInstrInfo::getInstSizeInBytes, making it challenging to implement -flimit-function-alignment.

efriedma-quic · August 25, 2025, 6:43am

For anything other than x86, I’d say sure, let the backend estimate it; we have accurate codesize estimates anyway for branch relaxation. But x86 does branch relaxation in the assembler, so from what I recall we don’t have good size estimates, so we might need some cooperation from the assembler to estimate the size.

MaskRay · August 26, 2025, 5:21am

For our heuristics, especially with functions around 32 bytes (as we consider -falign-functions=32), we can make a simple assumption: all JMP and JCC instructions will fit within 2 bytes (If the 5-byte variant is needed, the function would be larger than 128 bytes). A very rough estimate should suffice, as reaching 80% accuracy is likely achievable and provides enough mitigation for the initial jump table issue.

Extending the .p2align directive to support complex expressions–like label differences–is a tricky path. It could lead to layout non-convergence (.align and .org should avoid utilizing information from a subsequent fragment), and would require using that complex attemptToFoldSymbolOffsetDifference code. I think we should reconsider before we get too deep into it.

pcc · August 27, 2025, 12:15am

Instead of using the CodeGen size estimation, the simplest solution would seem to be to have the assembler increase sh_addralign based on a supplied (via the .prefalign directive) preferred alignment which is tracked separately from the minimum alignment.

The downside vs a size estimation based approach is that this won’t be compatible with -fno-function-sections and external assemblers, but maybe that’s fine; the initial use case for preferred alignment (jump tables) depends on function sections and the integrated assembler anyway.

I ran some experiments internally and found that there was no statistically significant performance difference between a fixed alignment of 32 and an alignment based on the function size.

Sent patches implementing this approach:

github.com/llvm/llvm-project

IR: Add prefalign attribute for function definitions.

main ← users/pcc/spr/ir-add-prefalign-attribute-for-function-definitions

opened 12:12AM - 27 Aug 25 UTC

pcc

+73 -21

The prefalign attribute determines the function's preferred alignment. By defaul…t, the function's preferred alignment is set in a target-specific way, but it may be overridden with this attribute. The backend logic will be added in followup patches. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

CodeGen, Driver: Introduce -fpreferred-function-alignment option.

users/pcc/spr/main.codegen-driver-introduce-fpreferred-function-alignment-option ← users/pcc/spr/codegen-driver-introduce-fpreferred-function-alignment-option

opened 12:12AM - 27 Aug 25 UTC

pcc

+44 -10

This option may be used to specify a function's preferred alignment. The -falign…-functions option and the aligned attribute now control both the minimum alignment and the preferred alignment for consistency with gcc. In contrast to the previous approach implemented in #149444 the preferred alignment is retained for member functions. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

MC: Add directive for specifying a section's preferred alignment.

users/pcc/spr/main.mc-add-elf-section-and-directive-for-specifying-a-sections-preferred-alignment ← users/pcc/spr/mc-add-elf-section-and-directive-for-specifying-a-sections-preferred-alignment

opened 01:30AM - 23 Jul 25 UTC

pcc

+191 -4

The new asm directive: .prefalign n specifies that the preferred alignment of …the current section is determined by taking the maximum of ``n`` and the section's minimum alignment. Sections whose size is larger than the preferred alignment are aligned to the preferred alignment. If the size of a section with a preferred alignment is between the minimum alignment and the preferred alignment, the section alignment is the smallest power of 2 >= the section size. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

CodeGen: Emit .prefalign directives based on the prefalign attribute.

users/pcc/spr/main.codegen-emit-prefalign-directives-based-on-the-prefalign-attribute ← users/pcc/spr/codegen-emit-prefalign-directives-based-on-the-prefalign-attribute

opened 12:12AM - 27 Aug 25 UTC

pcc

+61 -13

MachineFunction can now be queried for the preferred alignment which comes from …the function attributes (optsize, minsize, prefalign) and TargetLowering. The result of this query is emitted as a .prefalign directive if supported, otherwise it gets combined into the minimum alignment. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

efriedma-quic · September 12, 2025, 8:04pm

The revised approach makes sense to me.

pcc · October 6, 2025, 10:25pm

Since I posted the message above some patches were requested to be split up, so for clarity here is the full list of patches in order. This list also includes the CFI jump table relaxation series (last 4 patches).

github.com/llvm/llvm-project

IR: Add prefalign attribute for function definitions.

main ← users/pcc/spr/ir-add-prefalign-attribute-for-function-definitions

opened 12:12AM - 27 Aug 25 UTC

pcc

+69 -21

The prefalign attribute determines the function's preferred alignment. By defaul…t, the function's preferred alignment is set in a target-specific way, but it may be overridden with this attribute. The backend logic will be added in followup patches. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

CodeGen, Driver: Introduce -fpreferred-function-alignment option.

users/pcc/spr/main.codegen-driver-introduce-fpreferred-function-alignment-option ← users/pcc/spr/codegen-driver-introduce-fpreferred-function-alignment-option

opened 12:12AM - 27 Aug 25 UTC

pcc

+44 -10

This option may be used to specify a function's preferred alignment. The -falign…-functions option and the aligned attribute now control both the minimum alignment and the preferred alignment for consistency with gcc. In contrast to the previous approach implemented in #149444 the preferred alignment is retained for member functions. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

MC: Add directive for specifying a section's preferred alignment.

users/pcc/spr/main.mc-add-elf-section-and-directive-for-specifying-a-sections-preferred-alignment ← users/pcc/spr/mc-add-elf-section-and-directive-for-specifying-a-sections-preferred-alignment

opened 01:30AM - 23 Jul 25 UTC

pcc

+191 -4

The new asm directive: .prefalign n specifies that the preferred alignment of …the current section is determined by taking the maximum of ``n`` and the section's minimum alignment. Sections whose size is larger than the preferred alignment are aligned to the preferred alignment. If the size of a section with a preferred alignment is between the minimum alignment and the preferred alignment, the section alignment is the smallest power of 2 >= the section size. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

CodeGen: Introduce MachineFunction::getPreferredAlignment().

users/pcc/spr/main.codegen-introduce-machinefunctiongetpreferredalignment ← users/pcc/spr/codegen-introduce-machinefunctiongetpreferredalignment

opened 09:34PM - 12 Sep 25 UTC

pcc

+20 -10

MachineFunction can now be queried for the preferred alignment which comes from …the function attributes (optsize, minsize, prefalign) and TargetLowering. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

CodeGen: Emit .prefalign directives based on the prefalign attribute.

users/pcc/spr/main.codegen-emit-prefalign-directives-based-on-the-prefalign-attribute ← users/pcc/spr/codegen-emit-prefalign-directives-based-on-the-prefalign-attribute

opened 12:12AM - 27 Aug 25 UTC

pcc

+44 -5

The result of the MachineFunction preferred alignment query is emitted as a .pre…falign directive if supported, otherwise it gets combined into the minimum alignment. Part of this RFC: https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019

github.com/llvm/llvm-project

ELF: CFI jump table relaxation.

users/pcc/spr/main.elf-cfi-jump-table-relaxation ← users/pcc/spr/elf-cfi-jump-table-relaxation

opened 11:41PM - 07 Jul 25 UTC

pcc

+416 -2

Indirection via the jump table increases the icache and TLB miss rate associated… with indirect calls, and according to internal benchmarking was identified as one of the main runtime costs of CFI, contributing around 30% of the total overhead. #145579 addressed the problem for direct calls to jump table entries, but the indirect call overhead is still present. This patch implements jump table relaxation, which is a technique for opportunistically reducing the indirect call overhead. The basic idea is to eliminate the indirection by moving function bodies into the jump table wherever possible. This is possible in two circumstances: - When the body size is at most the size of a jump table entry. - When the function is the last function in the jump table. In both cases, we may move the function body into the jump table by splitting the jump table in two, with enough space in the middle for the function body, and placing the function there. We leave the last function in the jump table at its original location and place the rest of the jump table behind it. The goal of this is to decrease the TLB miss rate, on the assumption that it is more likely for functions with the same type (and their callees) to be in the same page as each other than for them to be in the same page as the original location of the jump table (typically clustered together near the end of the binary). A complete implementation of jump table relaxation was found to reduce the overhead of CFI in a large realistic internal Google benchmark by between 0.2 and 0.5 percentage points, or 10-25%, depending on the microarchitecture.

github.com/llvm/llvm-project

LowerTypeTests: Mark CFI jump table sections as eligible for relaxation.

users/pcc/spr/main.wip-lowertypetests-start-using-elf_section_properties-metadata-to-mark-cfi-jump-table-sections ← users/pcc/spr/wip-lowertypetests-start-using-elf_section_properties-metadata-to-mark-cfi-jump-table-sections

opened 06:10AM - 17 Jul 25 UTC

pcc

+44 -25

Use !elf_section_properties metadata to set the type and entry size to the corre…ct values, and set the preferred alignment to the entry size to enable last jump table entry placement.

pcc · October 22, 2025, 1:43am

Just a quick ping on all of the patches, which are still awaiting review.

Topic		Replies	Views
Preferred alignment of globals > 16bytes LLVM Dev List Archives	7	142	September 20, 2012
[RFC] Use preferred alignment instead of ABI alignment for complete object when applicable Clang Frontend	23	222	August 28, 2020
PR400 - alignment for LD/ST LLVM Dev List Archives	26	207	April 3, 2007
[NVPTX] llc -march=nvptx64 -mcpu=sm_20 generates invalid zero align for device function params LLVM Dev List Archives	9	131	November 10, 2012
The size of a pointer to function. Clang Frontend	11	216	February 12, 2013

IR

Object file (ELF)

The effect on -falign-functions

Related topics