Missed (Seemingly) Simple Loop Optimization

The following is a pared down version of real production code to illustrate the problem.

int foo_optimized(int num)
{
  const auto arr = new int[num];
  int        ret = 0;

  for (int i = 0; i < num; ++i) {
    arr[i] = i;
    ret += arr[i];
  }
  delete[] arr;
  return ret;
}

int foo_missed(int num)
{
  const auto arr = new int[num];
  int        ret = 0;

  for (int i = 0; i < num; ++i) arr[i] = i;
  for (int i = 0; i < num; ++i) ret += arr[i];
  delete[] arr;
  return ret;
}

foo_optimized() and foo_missed() perform the same work, yet for foo_missed() clang fails to deduce that the allocation is pointless and remove it.

In fact, it looks like the optimizer completely gives up applying basic optimizations due to the presence of the double loops. If we change the functions to

int foo_optimized(int num)
{
  const auto arr = new int[num];
  int        ret = 0;

  for (int i = 0; i < num; ++i) arr[i] = 0;
  for (int i = 0; i < num; ++i) {
    arr[i] = i;
    ret += arr[i];
  }
  delete[] arr;
  return ret;
}

int foo_missed(int num)
{
  const auto arr = new int[num];
  int        ret = 0;

  for (int i = 0; i < num; ++i) arr[i] = 0;
  for (int i = 0; i < num; ++i) arr[i] = i;
  for (int i = 0; i < num; ++i) ret += arr[i];
  delete[] arr;
  return ret;
}

clang fails to delete the for (int i = 0; i < num; ++i) arr[i] = 0 loop in foo_missed(), even though it ostensibly understands the pattern well enough to replace it with a call to memset()! Note it correctly optimizes out the loop in foo_optimized().

I was surprised that such a seemingly simple transformation was enough to defeat the optimizer.
Is there a compiler flag that users can enable to allow clang to optimize both forms equally? Even at the cost of additional compile time?

FTR this is about Missed Simple Loop Optimization · Issue #62845 · llvm/llvm-project · GitHub, right?

At the moment, I don’t think the optimizer can handle this. To optimize this, we could do one of the following:

FTR this is about Missed Simple Loop Optimization · Issue #62845 · llvm/llvm-project · GitHub, right?

Yes. Apologies for the double posting. I figured it would just get lost under all the GitHub issues (as previous issues of mine have); posting here would get a (faster) response :).

there’s a loop-fusion pass, but it’s not added to the pipeline

Would you mind elaborating on why? Is it a case of it misbehaving? Or was it deemed not cost-effective enough? If it is the latter, is there a way for users to enable it (perhaps some -mllvm option), I would be curious to see its effects.

Loop optimizations are currently a pass pipeline. If I change the order, the result will mostly likely change.

Are there any plans for VPlan to optimize loop nests?

Mostly a number of longstanding open issues and the amount of testing done is unclear. An incomplete list from the issue tracker: