Does anyone know a workaround for this branching bug I hit?

I believe you all can access the test file and the diff in my bug report

In short, I’m experimenting with optimization and wrote branchless code -O2 generates a jump instruction causing my loop to be twice as slow. I changed the jne+movl to cmove and my loop seems to take half as long.

I’m a little worried that this will interfere with other optimizations I make in the same function (bug report is a small reproduce). Is there a temporary workaround I can use? I’m thinking no and that I would have to suck it up. But I seen stranger things so I thought I’d ask. I guess for now I’ll experiment allowing the slow jumps and I’ll hand change it once I’m done optimizing that function

You can try “-mllvm -x86-cmov-converter=false”. That should keep it as a cmov, but might give you performance issues elsewhere.