Hey guys!
I have been diving deep into the world of LLVM IR for GPU targets, specifically aiming to port some compute-hungry algorithms and unleash their potential on the GPU. While I’ve made some headway, optimizing memory access and taking full advantage of GPU-specific optimizations are proving to be tricky beasts.
I am hoping the awesome LLVM community can lend a hand:
- Anyone have any secret sauce or best practices for optimizing LLVM IR to squeeze every ounce of performance out of GPUs?
- Specifically, I’m struggling with optimizing data transfers between that big GPU memory and the device registers. Any tips or tricks?
- Are there specific LLVM IR constructs or optimization passes that are golden nuggets for GPU performance?
I also check this thread: https://discourse.llvm.org/t/how-to-get-the-passed-argument-of-every-gpu-kernel-callinst-in-llvm-ir-producedbyhipalteryx But I have not found any solution. Could anyone guide me about this?
I am looking forward to any tips or resources you can share on conquering GPU optimization with LLVM IR.
Thanks in advance