Pragma for Loop Unrolling

We are considering supporting loop unrolling pragma in clang for CUDA, and want to scope what needs to be done. I came across a 3-year old post discussing this ( Is there any follow-up work/discussion on this topic? For instance, does Clang translate per-loop pragma to metadata? How stable is such metadata against LLVM standard optimizations?



In my limited experience from playing with some simple examples, the metadata is pretty stable; some work has recently been done in this area to clean up some of the lose ends (just the other day, a patch was committed to prevent loop rotation from breaking it).

The work to be done is pretty minimal -- you can take advantage of the infrastructure used to generate the metadata to support OpenMP loop pragmas. Only some of this is upstream (for the rest, see -- lib/CodeGen/CGLoopInfo.cpp in that repository is especially relevant).