question about unrolling loops with convergent instructions

I have a loop with convergent instructions with a loop count of 1024. I use pragma to specify unroll count to be 32. However, the loop was unrolled by 512, which results in very long compilation time.

In tryToUnrollLoop, there is

// If the loop contains a convergent operation, the prelude we’d add

// to do the first few instructions before we hit the unrolled loop

// is unsafe – it adds a control-flow dependency to the convergent

// operation. Therefore restrict remainder loop (try unrollig without).

//

// TODO: This is quite conservative. In practice, convergent_op()

// is likely to be called unconditionally in the loop. In this

// case, the program would be ill-formed (on most architectures)

// unless n were the same on all threads in a thread group.

// Assuming n is the same on all threads, any kind of unrolling is

// safe. But currently llvm’s notion of convergence isn’t powerful

// enough to express this.

if (Convergent)

UP.AllowRemainder = false;

Later in computeUnrollCount, there is

// 2nd priority is unroll count set by pragma.

unsigned PragmaCount = UnrollCountPragmaValue(L);

if (PragmaCount > 0) {

UP.Count = PragmaCount;

UP.Runtime = true;

UP.AllowExpensiveTripCount = true;

UP.Force = true;

if (UP.AllowRemainder &&

getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)

return true;

}

Because UP.AllowRemainder is false, the unroll count specified by pragma is ignored. Later on, computeUnrollCount calculates an unroll count of 512.

Is this a bug? Essentially, this disables unroll count specified by pragma for any loops containing convergent operations, even though the unroll count divides the trip count.

Thanks.

Sam