You mentioned at http://reviews.llvm.org/D9360 that the optimization pipeline set up in PassManagerBuilder has not worked well for GPUs in your experience. So I’d like to try out an alternative to PassManagerBuilder for CUDA. Do you have a suggestion for what I might try instead of PassManagerBuilder? If you happen to have a replacement for it that I could try, that would be great.
Without being an expert on the details of architectures targeted by CUDA, here are some high level observations regarding pass pipelines for GPUs:
- The set of CFG optimizations you want is likely to be quite different. JumpThreading in particularly is typically not desirable.
- Most modern GPUs are going to want specialized passes (scalarization, speculative execution) inserted at various points in the pipeline.
- A lot of GPUs are very sensitive to loop unrolling to eliminate dynamic accesses.
From there, your mileage will vary based on whether you’re doing online or offline compilation. I’m guessing the latter for CUDA.