Hello,
I was doing some experiments with SCEV and especially the loop trip count.
Sorry for the dumb question, what is the easiest way to dump SCEV analysis results on a .bc file?
On a side note, I wanted to see if we could optimize this function:
unsigned long kernel(long w, long h, long d) {
unsigned long count = 0;
for(int i = 0; i < w; ++i)
for(int j = i; j < h; ++j)
for(int k = j; k < d; ++k)
++count;
return count;
}
into this (it might not take into account overflows, it was generated using the barvinok library):
unsigned long kernel2(long w, long h, long d) {
return
(-1 + w >= 0 && -1 - w + h >= 0 && -1 - h + d >= 0)
? (((((2 * w - 3 * ww + www) + 3 * w * h + -3 * w * hh) + ((3 * w - 3 * ww) + 6 * w * h) * d))/6)
: (-1 + w >= 0 && -1 - w + d >= 0 && h - d >= 0)
? ((((2 * w - 3 * ww + www) + (6 * w - 3 * ww) * d + 3 * w * dd))/6)
: (-1 + h >= 0 && w - h >= 0 && -1 - h + d >= 0)
? ((((2 * h - 2 * hhh) + (3 * h + 3 * hh) * d))/6)
: (-1 + d >= 0 && h - d >= 0 && w - d >= 0)
? (((2 * d + 3 * dd + ddd))/6)
: 0;
}
I am not sure how advanced are SCEV-based trip counts.
Best regards.
Hi,
Hello,
I was doing some experiments with SCEV and especially the loop trip count.
Sorry for the dumb question, what is the easiest way to dump SCEV analysis results on a .bc file?
opt -analyze -scalar-evolution < file.bc
On a side note, I wanted to see if we could optimize this function:
unsigned long kernel(long w, long h, long d) {
unsigned long count = 0;
for(int i = 0; i < w; ++i)
for(int j = i; j < h; ++j)
for(int k = j; k < d; ++k)
++count;
return count;
}
into this (it might not take into account overflows, it was generated using the barvinok library):
Looks like we can not, but I haven’t examined why. Overflows might indeed get into the way.
Michael
(Hit reply-all this time
)
I'm doing this on a old clang/llvm build (from June 29), but if I
generate IR by
clang -mllvm -unroll-threshold=0 -O3 -S -emit-llvm -fno-vectorize x.c -o x.ll
(loop unrolling and vectorization both tend to obscure trip count
computation)
then we are able to compute the trip counts in whatever loops that
remain:
Loop %for.cond8.preheader: backedge-taken count is {(-1 + %h),+,-1}<nw><%for.cond2.preheader>
Loop %for.cond8.preheader: max backedge-taken count is -1
Loop %for.cond2.preheader: backedge-taken count is (-1 + %w)
Loop %for.cond2.preheader: max backedge-taken count is -1
It isn't clear to me why indvars does not fold way all the loops out
of existence (it is "clearly profitable" in this case). I'll try to
take a closer look tonight.
-- Sanjoy
Thank you for the fast answer.
Sorry for the email crossing…