Optimization for divisibility tests, and bitwise rotate

The pattern x % constant == 0 (where the constant is odd) can be optimized
more than it currently is, using some modular arithmetic trick, to:

    x * inv(constant) <=u MaxValue / constant

This patch sort of does that (hopefully, I did test it and it seemed to work,
and please give me feedback on it, it's only my first try at an llvm patch),
however I ran into a problem. This trick can in theory be easily generalized
to even `constant` (let's ignore powers of two though, they can be done better
the obvious way), but to do so a bitwise rotation is needed.
But when I build it from two shifts:

        Value *Mul = Builder->CreateMul(A, Builder->getInt(Multiplier));
        Value *RShift = Builder->CreateLShr(Mul, Builder->getInt32(Ctz));
        Value *LShift = Builder->CreateShl(Mul, Builder->getInt32(W - Ctz));
        Rotated = Builder->CreateOr(RShift, LShift);

Somewhere along the line, the left shift gets merged with the multiplication,
resulting in two multiplications (and some extra stuff), and no rotation.
Having two multiplications mostly defeats this optimization, though it's still
a bit less code than what would be generated currently.
It would be really nice if it could reliably generate something like this on x86:

    imul eax, edi, foo
    ror eax, bar
    cmp eax, baz
    ; use flags

How should this be handled?

(the attached patch is limited to odd divisors to avoid this problem)

Regards,
Harold

divisibilitypatch (1.66 KB)

Hi Harold -

I think it would be better to implement this as a DAG combine rather than an IR combine. That way, you can create an “ISD::ROTR” node directly. You can also bail out on the transform if a target doesn’t support ROTR by checking something like:
bool HasROTR = TLI.isOperationLegalOrCustom(ISD::ROTR, VT);

That’s from DAGCombiner::MatchRotate() -

http://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp;256564$3977