     Could you please tell me why am I not able to recognize (with LLVM built from the SVN code in Apr 25, 2018) the LLVM IR intrinsic ctpop (described at https://llvm.org/docs/LangRef.html#llvm-ctpop-intrinsic) in the following program:
       int PopCnt_Simple(int x) {
           int numBits = 0;
           int i;

           //for (i = 0; i < 32; i++) {
           for (i = 0; x != 0; i++) {
               if (x & 1)
               x >>= 1;

           return numBits;

     I also did check the following code, getting inspired from the discussion at http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121119/156300.html:
       int popcount_Yang_i32(int a) {
             int c = 0;
             while (a) {
                 //... // both a & c would be used multiple times in or out of
                 a &= a - 1;

             return c;

     Is there any good paper discussing this type of loop idiom recognitions? I found only a vaguely related paper: "Automatic Recognition of Performance Idioms in Scientific Applications", IPDPS 2011 (http://www.sdsc.edu/~allans/ipdps11.pdf).

LLVM doesn’t have any support for transforming the first loop into ctpop. We only recognize the second form in your popcount_Yang_i32 function.

I don’t know of any papers myself, I’ve just spent some time looking at the popcount recognition code recently.


     I managed to make LLVM recognize ctpop. To make LLVM apply the loop idiom we have to give with care arguments like -Os to opt, and especially choose an architecture that supports in hardware ctpop (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20121119/156300.html mentions of x86, ARM, PowerPC having an instruction for population count). For example:
       clang -O3 -mllvm -disable-llvm-optzns -S -emit-llvm test.c
       opt -Os -debug -S -force-vector-width=1 -mcpu=corei7 test.ll -o test_opt.ll -loop-idiom # -Os is required (-O3 doesn't work). -mcpu is required, but we could put something else instead of corei7, like some ARM, PPC.

