Why LLVM cannot optimize this?

test.c :

if you replace “zero *= a;” by “zero += a;”, you get:

; Function Attrs: norecurse nounwind readnone uwtable
define i32 @foo(i32 %a) #0 {
%0 = mul i32 %a, 10000
ret i32 %0

I think the problem is ScalarEvolution only have “SCEVAddRecExpr” for zero += a, but no corresponding expression to represent and optimize zero *= a;


Yes SCEV is pretty limited on this aspect.
This kind of code can trigger LLVM to explode in time/memory: https://llvm.org/bugs/show_bug.cgi?id=18606

See also this llvm-dev thread: SCEV implementation and limitations, do we need “pow”? : http://lists.llvm.org/pipermail/llvm-dev/2014-February/070062.html

[+CC Andy]

A similar issue also comes up in cases like this:

int clz(unsigned long val) {
  unsigned result = sizeof(unsigned long);
  if (val) {
    do {
      val >>= 1;
    } while (val);
  return result;

where we'd like to be able to compute the tripcount of the loop as
"sizeof(long) - clz(val) - 1" and get rid of the loop entirely via

The reason why the above issue is "similar" to wanting a "SCEVPowExpr"
is that both are examples of SCEV node kinds that will be useful in
some very specific cases; but probably is not seen widely enough that
teaching whole of ScalarEvolution how to simplify these is a good

To get around this, I've been thinking (and only thinking, no code yet
:slight_smile: ) about adding a possible "SCEVIntrinsicExpr", parameterized by an
"OpCode", like SCEVIntrinsicExpr::Pow or
SCEVIntrinsicExpr::CountLeadingZeroes. The idea is that these would
be a low-overhead way of adding new kinds of expressions to the SCEV
hierarchy, with the only constraints being:

- They're a pure, mathematically function of their input values
- SCEVExpander knows how to expand them
- It is always safe to treat them as SCEVUnknowns

-- Sanjoy

While SCEV provides a generic framework to calculate the loop exit values, may be what zet asking is just an instcomb that replace the loop exit value of zero by 0, when zero is initialized to 0 before the loop?

A special rule would help this specific case (and if getting this
specific case is important for some reason, we should consider
it), but ideally we should also be able to, say, fold

unsigned foo(int a) {
  unsigned x = 1;
  for (int i = 0; i < a; i++)
    x *= 2;
  return x;

into a compare and a shift, and things like that.

So my question to zet is: is there a specific case you're looking at
where optimizing
the specific pattern you pointed out is important?

-- Sanjoy