[PTX Backend] Current optimizations for Memory Load Instructions?


The ISA of PTX offers different memory load intructions that can store the data in different levels of the Memory Hierarchy. (ld.ca, ld.cg, ld.cs, etc). For example, it makes sense that if some data is not going to be used anymore not to stored it in cache.

However, it is hard to detect in compile time which is the best instruction for each case. I would like to know if currently LLVM (the backend for PTX) performs any kind of analysis to decide what instruction should be used, or otherwise what criteria is followed for selection between the instructions.

Best regards,