RFC: Code Prefetch Insertion

WenleiHe · November 5, 2025, 8:27am

Thanks for sharing the data. Any insights on the L1 icache miss increase?

This optimization would also add more instructions, what is the total instruction count increase? what is the overall effect (i.e. application level metric) from increased instruction count + improved IPC?

rlavaee · November 8, 2025, 12:51am

Correct. The prefetchi* instruction targets the L2 cache. It’s not documented, but it’s not hard to verify with microbenchmarks. The increase in L1 icache is not from those instructions per se as it’s not reproducible when they are replaced by the same size nops. We believe it’s due to evictions from the L2 which subsequently cause evictions in the L1 icache (due to cache inclusivity).

rlavaee · November 8, 2025, 7:54am

Yes. The intended pass could work for CSPGO as well. An interface can be defined to match with block IDs. The callsite index would be the same.

The full paths in LBR profiles are used to guide the prefetchit placement. So doing this solely based on the compiler’s edge and block profile is infeasible.

There will be separate efforts to open-source/upstream that part as well.

nikic · December 17, 2025, 4:28pm

I’m trying to understand how this RFC relates to upstream LLVM. Propeller is an independent project. Does this RFC require changes only to Propeller, or also to upstream LLVM?

rlavaee · December 17, 2025, 6:15pm

The upstream changes needed are:

Generating symbols for prefetch targets: https://github.com/llvm/llvm-project/pull/168439
Inserting prefetch instructions at requested positions: X86: Add prefetch insertion based on Propeller profile by rlavaee · Pull Request #166324 · llvm/llvm-project · GitHub

Both of these need the mapping data from SHT_LLVM_BB_ADDR_MAP. We are planning to extend the mapping capability to AFDO as well for next year.

tmsriram · December 17, 2025, 6:50pm

I’m trying to understand how this RFC relates to upstream LLVM. Propeller is an independent project. Does this RFC require changes only to Propeller, or also to upstream LLVM?

Further to what @rlavaee said , we are working on porting the Propeller profile conversion tool in github to LLVM. @jinhuang1102 is working on a proposal for the same.

nikic · December 18, 2025, 11:20am

Thanks for the references, that clarifies things. The original RFC also mentioned that linker changes may be needed, is there a patch for that?

cc @MaskRay as this proposal seems to be doing some unusual things with symbols.

rlavaee · January 5, 2026, 5:34pm

Thanks for the reminder. Here is the linker change: Resolve undefined prefetch targets to zero to effectively prefetch the next instruction. by rlavaee · Pull Request #174448 · llvm/llvm-project · GitHub

MaskRay · January 19, 2026, 5:14am

Thanks for notifying me.

Hardcoding symbol names for symbol resolution and relocation processing is definitely not right.

The PC-relative prefetchit1 uses an R_X86_64_PC32 relocation. When the referenced symbol is undefined, the linker reports an error in -shared and -pie links.

Instead, handle this in the compiler. Before emitting the prefetch, if the target symbol isn’t defined in the current module, emit a weak fallback:

prefetchit1 __llvm_prefetch_target_foo(%rip)
.weak __llvm_prefetch_target_foo
__llvm_prefetch_target_foo:

When __llvm_prefetch_target_foo is defined elsewhere, emit it as STB_GLOBAL — the strong definition will override any weak ones.
This way, stale profiles gracefully degrade to prefetching the next instruction without requiring linker changes.

If you need semantics beyond what weak definitions provide, the path forward would be proposing a new relocation type on the x86-64 ABI list.

rlavaee · January 23, 2026, 6:42pm

Thanks @MaskRay for the great idea. I will implement this soon.

Topic		Replies	Views
[RFC] Adding support #pragma clang loop [no]prefetch() for prefetch Clang Frontend rfc	14	2306	November 22, 2023
[RFC] Propeller: A frame work for Post Link Optimizations LLVM Dev List Archives	55	966	February 11, 2020
RFC: HWPGO, i.e., adding new SPGO feedback types IR & Optimizations pgo , rfc	22	1066	December 24, 2024
[RFC] Adding Matching and Inference Functionality to Propeller IR & Optimizations	21	1416	September 25, 2025
Memory Prefetching Support in LLVM Common CodeGen Infrastructure	12	1564	November 10, 2025

RFC: Code Prefetch Insertion

Related topics