Dear llvm devs,
tl;dr: What prevents llvm from switching to a fancier pointer analysis?
Currently, there exists a variety of general-purpose alias analyses in the LLVM codebase: basic-aa, globalsmodref-aa, tbaa, scev-aa, and cfl-aa. However, only the first three are actually turned on when invoking clang with -O2 or -O3 (please correct me if I’m wrong about this).
If one looks at existing research literatures, there are even more algorithm to consider for doing pointer analysis. Some are field-sensitive, some are field-based, some are flow-sensitive, some are context-sensitive. Even for flow-insensitive ones, they could also be inclusion-style (-andersen-aa) and equality-style (-steens-aa and -ds-aa). Those algorithms are often backed up by rich theoretical framework as well as preliminary evaluations which demonstrate their superior precision and/or performance.
Given such an abundance choices of pointer analyses that seem to be much better in the research land, why does real-world compiler infrastructures like llvm still rely on those three simple (and ad-hoc) ones to perform IR optimization? Based on my understanding (and again please correct me if I am wrong):
(1) The minor reason: those “better” algorithms are very hard to implement in a robust way and nobody seems to be interested in trying to write and maintain them.
(2) The major reason: it’s not clear whether those “better” algorithms are actually better for llvm. More precise pointer analyses tend to slow down compile time a lot while contributing too little to the optimization passes that use them. The benefit one gets from a more precise analysis may not justify the compile-time or the maintenance cost.
So my question here is: what kind(s) of precision really justify the cost and what kinds do not? Has anybody done any study in the past to evaluate what kinds of features in pointer analyses will benefit what kinds of optimization passes? Could there potentially be more improvement on pointer analysis precision without adding too much compile-time/maintenance cost? Has the precision/performance tradeoffs got fully explored before?
Any pointers will be much appreciated. No pun intended
PS1: To be more concrete, what I am looking for is not some black-box information like “we switched from basic-aa to cfl-aa and observed 1% improvement at runtime”. I believe white-box studies such as “the licm pass failed to hoist x instructions because -tbaa is not flow sensitive” are much more interesting for understanding the problem here.
PS2: If no such evaluation exists in the past, I’d happy to do that myself and report back my findings if anyone here is interested.