Missed optimization due to overflow check

kevina · October 26, 2025, 5:07am

Hi,

The function reciprocal below fails to optimize as expected. In particular is still includes the check if num is 1 or -1 and then a call to the reduce_in_place method, when that should be optimized out. GCC optimization this code correctly. Godbolt link: Compiler Explorer

#include <cstdlib>
#include <limits>

class Frac {
protected:
  int num, den;
public:
  Frac(int n, int d) {
    if (d < 0) {
      if (d == 0) abort();
      if (n == std::numeric_limits<int>::lowest()) abort();
      num = -n;
      if (d == std::numeric_limits<int>::lowest()) abort();
      den = -d;
    } else {
      num = n;
      den = d;
    }
  }
};

class Reduced : public Frac {
  void reduce_in_place();
public:
  Reduced(int n, int d) : Frac(n,d) {
    if (den == 1 || num == 1 || num == -1) return;
    reduce_in_place();
  }
};

Reduced int_value(int x) {
  return Reduced(x,1);
}

Reduced reciprocal(int x) {
  return Reduced(1,x);
}

LLVM IR: (from godbolt)

define dso_local i64 @reciprocal(int)(i32 noundef %x) local_unnamed_addr {
entry:
  %retval = alloca %class.Reduced, align 8
  %cmp.i.i = icmp slt i32 %x, 0
  br i1 %cmp.i.i, label %if.end.i.i, label %if.else.i.i

if.end.i.i:
  store i32 -1, ptr %retval, align 8
  %cmp8.i.i = icmp eq i32 %x, -2147483648
  br i1 %cmp8.i.i, label %if.then9.i.i, label %if.end10.i.i

if.then9.i.i:
  tail call void @abort() #4
  unreachable

if.end10.i.i:
  %sub11.i.i = sub nsw i32 0, %x
  br label %_ZN4FracC2Eii.exit.i

if.else.i.i:
  store i32 1, ptr %retval, align 8
  br label %_ZN4FracC2Eii.exit.i

_ZN4FracC2Eii.exit.i:
  %0 = phi i32 [ 1, %if.else.i.i ], [ -1, %if.end10.i.i ]
  %d.sink.i.i = phi i32 [ %x, %if.else.i.i ], [ %sub11.i.i, %if.end10.i.i ]
  %den13.i.i = getelementptr inbounds nuw i8, ptr %retval, i64 4
  store i32 %d.sink.i.i, ptr %den13.i.i, align 4
  %cmp.i = icmp eq i32 %d.sink.i.i, 1
  br i1 %cmp.i, label %_ZN7ReducedC2Eii.exit, label %lor.lhs.false.i

lor.lhs.false.i:
  switch i32 %0, label %if.end.i [
    i32 1, label %_ZN7ReducedC2Eii.exit
    i32 -1, label %_ZN7ReducedC2Eii.exit
  ]

if.end.i:
  call void @Reduced::reduce_in_place()(ptr noundef nonnull align 4 dereferenceable(8) %retval)
  br label %_ZN7ReducedC2Eii.exit

_ZN7ReducedC2Eii.exit:
  %1 = load i64, ptr %retval, align 8
  ret i64 %1
}

I am only vaguely familiar with clangs IR, but it seams obvious from %0 = phi i32 [ 1, %if.else.i.i ], [ -1, %if.end10.i.i ] that %0 can only be 1 or -1, and yet the check is still included. If I remove the overflow checks, reciprocal optimizes as expected. Also if I instead test if n (the parameter to the constructor) is 1 or -1 instead of num the function optimized as expected.

Is this a known issue?

Thanks,
Kevin

dtcxzyw · October 26, 2025, 7:15am

We remove the dead edges of switches in SCCP/CVP with the range information of the condition.

github.com/llvm/llvm-project

llvm/lib/Transforms/Utils/SCCPSolver.cpp

792c65c39


      
          if (auto *SI = dyn_cast<SwitchInst>(&TI)) {
            if (!SI->getNumCases()) {
              Succs[0] = true;
              return;
            }
            const ValueLatticeElement &SCValue = getValueState(SI->getCondition());
            if (ConstantInt *CI =
                    getConstantInt(SCValue, SI->getCondition()->getType())) {
              Succs[SI->findCaseValue(CI)->getSuccessorIndex()] = true;
              return;
            }
          
            // TODO: Switch on undef is UB. Stop passing false once the rest of LLVM
            // is ready.
            if (SCValue.isConstantRange(/*UndefAllowed=*/false)) {
              const ConstantRange &Range = SCValue.getConstantRange();
              unsigned ReachableCaseCount = 0;
              for (const auto &Case : SI->cases()) {
                const APInt &CaseValue = Case.getCaseValue()->getValue();
                if (Range.contains(CaseValue)) {

This file has been truncated. show original

github.com/llvm/llvm-project

llvm/lib/Transforms/Scalar/CorrelatedValuePropagation.cpp

792c65c39


      
          { // Scope for SwitchInstProfUpdateWrapper. It must not live during
            // ConstantFoldTerminator() as the underlying SwitchInst can be changed.
            SwitchInstProfUpdateWrapper SI(*I);
            ConstantRange CR =
                LVI->getConstantRangeAtUse(I->getOperandUse(0), /*UndefAllowed=*/false);
            unsigned ReachableCaseCount = 0;
          
            for (auto CI = SI->case_begin(), CE = SI->case_end(); CI != CE;) {
              ConstantInt *Case = CI->getCaseValue();
              std::optional<bool> Predicate = std::nullopt;
              if (!CR.contains(Case->getValue()))
                Predicate = false;

However, ConstantRange only represents a contiguous range. It means that %0 is treated as [-1, 1]. So LLVM fails to remove the dead edge.

I think adding a new value lattice kind that represents the union of several non-contiguous values would be helpful. This case should be common as there are many selects/phis with constant operands.

Can you file an issue to track this missed optimization? Thanks!

kevina · October 26, 2025, 11:02pm

I just did Missed optimization: failure to remove dead edge · Issue #165179 · llvm/llvm-project · GitHub.

Feel free to adjust the subject line to something more descriptive.

Topic		Replies	Views
Should llvm optimize 1.0 / x ? LLVM Dev List Archives	4	93	September 1, 2020
A simple case about SDiv LLVM Dev List Archives	5	148	September 6, 2008
`llvm.$op.with.overflow`, InstCombine and ScalarEvolution LLVM Dev List Archives	8	157	March 27, 2015
14 missing floating point folds IR & Optimizations	12	1829	March 6, 2024
RFC: optimizing integer overflow checks LLVM Dev List Archives	1	99	August 25, 2012

Missed optimization due to overflow check

Related topics