I'm hailing from the Rust community, where there is a discussion about adding facilities for aligning data on an L1 cache line boundary. One example of situation where this is useful is when building thread synchronization primitives, where avoiding false sharing can be a critical concern.
Now, when it comes to implementation, I have this gut feeling that we probably do not need to hardcode every target's cache line size in rustc ourselves, because there is probably a way to get this information directly from the LLVM toolchain that we are using. Is my gut right on this one? And can you provide us with some details on how it should be done?
Thanks in advance,
There’s no way to know, until you run on real hardware. It could be different every time the binary is run. You have to ask the OS or hardware, and that’s system dependent.
The cache line size can even change in the middle of the program running, for example if your program is moved between a “big” and “LITTLE” core on ARM. In this case the OS is supposed to lie to you and tell you the smallest of the cache line sizes (but that can only work if cache line operations are non-estructive! No “zero cache line” or “throw away local changes in cache line” like on PowerPC). It also means that you might not places things far enough apart to be on different cache lines on the bigger core, and so not acheive the optimal result you wanted. It’s a mess!
I guess that in this case, what I would like to know is a reasonable upper bound of the cache line size on the target architecture. Something that I can align my data structures on at compile time so as to minimize the odds of false sharing. Think std::hardware_destructive_interference_size in C++17.
PowerPC G5 (970) and all recent IBM Power have 128 byte cache lines. I believe Itanium is also 128.
Intel has stuck with 64 recently with x86, at least at L1. I believe multiple adjacent lines may be combined into a “block” (with a single tag) at L2 or higher in some of them.
ARM can be 32 or 64.
Thank you! Is this information available programmatically through some LLVM API, so that next time some hardware manufacturer does some crazy experiment, my code can be automatically compatible with it as soon as LLVM is?
Cavium ThunderX has 128 bytes cache lines.
Yes, using TargetTransformInfo, you can call TTI->getCacheLineSize(). Not all targets provide this information, however, and as Bruce pointed out, there are environments where this does not make sense ( caveat emptor). -Hal
Well, thank you all for these informations!