thank you for all the feedback. I will try to answer your
questions below. But, if you think that might not be a good GSoC
project, do not hesitate to tell me. I can try to write a different
proposal. Nevertheless, I would like to hear from you what you think
is important to have in the range analysis. By reading your e-mails, I
see that there are still a lot of things that we do not do, and would
be useful to the community:
can you speed up program runtime significantly using this analysis?
No, not yet. We have not used the analysis to implement any
optimization yet. So far we can only flag that variables are
dead-code, but I have not tried to remove this code. And for the C
programs that we have analyzed, the number of variables that are
dead-code, after LLVM optimizes the program, is small. We found
something like 200 dead variables in the whole gcc, for instance.
Notice, however, that the other optimizations had not caught those.
I took a brief look at your report. I’m not sure what others in the
LLVM community think (and I’d like their input), but I suspect that
10-15 seconds for analyzing GCC means that users would want your
analysis enabled at optimization levels higher than -O2.
It is a whole (inter-procedural) program analysis, with function
inlining enabled to give us a limited form of context-sensitiveness.
If I run it intra-procedurally, it takes negligible time per function.
I think its runtime is similar to other LLVM passes.
When I asked for memory usage, I meant the amount of RAM used by the
analysis. As far as I can tell, your paper only reports the number of
constraints, so it doesn’t really tell me how much RAM the analysis
I just measured it. For SPEC 2006 gcc it took 455MB.
One other thing I’ve been wondering about is whether your analysis is
able to find value ranges for the results of load instructions.
No. We do not do not use pointer analysis.
Finally, do you think it would be possible in the future to use
parallelism to improve the performance of your analysis?
We did not think about it. I think that the algorithm, in the way it
is now, is pretty sequential. We can parallelize the processing of
strong components that do not depend on each other, but I speculate
that there are not many of these.
I recommend that your proposal mention that you’ll try your analysis on
larger and more diverse codes
Yes, we want to find larger benchmarks too.
one of the big problems with Douglas’s original range analysis was that
it couldn’t handle modulo arithmetic.
We still have this problem. Jorge Navas, who is using our analysis, is
working on an implementation that deals with wrapping arithmetics, but
we did not have access to his code yet.
An analysis that is not sound with respect to the original C/C++ semantics
would be of little use to anyone.
You can use our analysis to remove overflows. If we output that a variable
is bound to the range [c1, c2], where c1 > -inf, and c2 < +inf, then you
are guaranteed to be inside these bounds. The imprecision happens when
we output that something is bound to, say, [c1, +inf], c1 > -inf, because,
in this case, you could have that the upper bound of the variable
wraps around, and then you may have it changing the lower limit c1. In
our experiments, about 75% of the non-trivial limits that we output is
[c1, c2], c1 > -inf, and c2 < +inf, and could be used to eliminate
overflow tests. For instance, in gcc we output 40,043 good limits, 22,369
limits [c, +inf], 785 limits [-inf, c], and 116,637 limits [-inf, +inf].
Also, the tricky bit of range analysis is handling loops. I suggested
to Douglas that he exploit LLVM’s SCEV infrastructure to discover and
exploit how induction variables are evolving rather than trying to
create his own poor man’s version using interval arithmetic. How are
loops handled now?
Yes, Douglas told me about this. He actually wrote a small pass that
catches induction variables using SCEV. We still catch tighter
intervals with our range analysis than using SCEV, but we miss some
induction variables that SCEV handles. Integrating SCEV into our
analysis is in the high list of priorities.