We are considering creating an optimization pass to improve memory throughput using the write-allocate evasion technique.
Background
Write-allocate evasion is a technique that omits the operation of reading a cache line from memory for store instructions when writing the entire cache line. [1] demonstrates that Grace can do this automatically, while Sapphire Rapids and Genoa can do it using non-temporal store instructions.
Therefore, we consider it useful to have a feature that recognizes store instructions where write-allocate evasion can be applied and adds non-temporal metadata.
The Intel compiler provides an equivalent feature with the option -qopt-streaming-stores.
Design
Stores with continuous addresses to arrays that do not alias with others in loops can potentially perform write-allocate evasion.
The performance improvement rate can be calculated as follows. Let N
be the number of streams of the target stores, M
be the number of streams of other stores, and L
be the number of streams of loads. The amount of memory transfer is 2*(N+M)+L
without write-allocate evasion and N+2M+L
with it. Therefore, (2*(N+M)+L)/(N+2M+L)
becomes the ideal performance improvement rate. Non-temporal metadata should be added when this value exceeds a threshold.
Since write-allocate evasion may lead to performance degradation due to factors such as the inability to prefetch, it is considered necessary to enable this feature via command-line options or pragmas.
Alternatives
The following interfaces exist for generating non-temporal stores:
- The OpenMP directive
omp simd nontemporal()
__builtin_nontemporal_store()
These require users to specify the targets directly, which requires detailed knowledge of write-allocate evasion. Therefore, we believe there is value in a feature that automatically detects stores where write-allocate evasion can be applied.
In addition, the nontemporal
directive currently does not actually generate non-temporal stores. (nontemporal instructions not generated for #pragma omp simd nontemporal · Issue #55757 · llvm/llvm-project · GitHub)
References
We would appreciate any comments or feedback. Thank you.