How to eliminate the unnecessary subtraction in a loop

Hello All,

Recently I am testing the compilation efficiency of riscv-llvm against riscv-gcc. And there are several cases that llvm generates inferior code already in the IR. Probably these issues can be easily solved by switching on a specific compiler flag.

One of the cases shown below, modified from embench-iot/aha-mont64

uint64_t
modul64 (uint64_t x, uint64_t y, uint64_t z)
{

  int64_t i, t;

  for (i = 1; i <= 64; i++)
    {				// Do 64 times.
      y = y << 1;		// one bit.
      if (y >= z)
	{
	  x = x - z;
	}
    }
  return x;
}

The programmer intends to subtract z from x when the condition y>=z holds, and skip the subtraction when it does not. However, the asm generated with -O2 by llvm is as following:

modul64:                                # @modul64
        addi    a3, zero, 64
        j       .LBB0_2
.LBB0_1:                                #   in Loop: Header=BB0_2 Depth=1
        addi    a3, a3, -1
        sub     a0, a0, a4
        beqz    a3, .LBB0_4
.LBB0_2:                                # =>This Inner Loop Header: Depth=1
        slli    a1, a1, 1
        mv      a4, zero
        bltu    a1, a2, .LBB0_1
        mv      a4, a2
        j       .LBB0_1
.LBB0_4:
        ret

The subtrahend is assigned to zero if the if-condition does not hold and the subtraction always takes place. I believe this is not desired.

What is the purpose of this optimization and how could I eliminate this behavior?

Best Regards
zhaozhaozhao

The purpose in general of this kind of optimization is to remove control in favor of straight-line execution using a conditional-move instruction. In this case, since riscv doesn’t have a conditional move instruction, that didn’t really work out as desired (it emitted a branch for the “conditional move”, in any case.)

You didn’t say what version of Clang you’re using, but, while I do see this output on the Clang-13 release, it looks to have been already fixed at head. Here’s the new output:

modul64:                                # @modul64
# %bb.0:                                # %entry
	li	a3, 64
	j	.LBB0_2
.LBB0_1:                                # %for.body
                                        #   in Loop: Header=BB0_2 Depth=1
	addi	a3, a3, -1
	beqz	a3, .LBB0_4
.LBB0_2:                                # %for.body
                                        # =>This Inner Loop Header: Depth=1
	slli	a1, a1, 1
	bltu	a1, a2, .LBB0_1
# %bb.3:                                # %for.body
                                        #   in Loop: Header=BB0_2 Depth=1
	sub	a0, a0, a2
	j	.LBB0_1
.LBB0_4:                                # %for.end
	ret

Thanks jyknight,

I use the Clang 13.0.1. It is good to know that the issue has been fixed, my next step will be updating my Clang and reading the relative patches.

Do you happen to know in which patch this issue got fixed?

My guess would be https://github.com/llvm/llvm-project/commit/af57a71d1871ec4a108ca1b4478114770b6588bd