Comparison of Alias Analysis in LLVM

Hi,

Chapter 4 in http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html
compares the precision of alias analysis in LLVM at that time. Does
the latest LLVM still follow the similar results? I was also wondering
how the globalmodred-aa and scev-aa that were not discussed in the PhD
thesis are compared with others? Thanks and Happy New Year!

Hi Jianzhou,

That study is over 7 years old now. So much has changed that the results would surely have to be rerun. Among other things, SRoA is more aggressive than in those days, so many more of the "easy" alias pairs are already resolved.

-Chris

I see. I asked the question because LLVM provides several alias
analysis, and I was wondering how to decide which one should be used
for compiling most programs.

I think the basicaa is the default one, but by looking into its code,
it is not inter-procedural or context-sensitive (I am not 100% sure),
so does not provide interesting mod/ref information. Then, GVN(PRE)
and LICM may not have a lot of opportunities for optimizing calls. So,
I looked into the thesis for understanding how those inter-procedural
aa and other aa contribute. I guess basicaa has a better speed-up per
compiling-cost ratio for most common programs than others. Is this
correct?

We do also use globalsmodref, and have a pass for inferring and propagating the readnone/readonly markers. We also now have Type Based Alias Analysis for C languages.

No one has really been pushing forward aliasing much lately.

-Chris

I see. I asked the question because LLVM provides several alias
analysis, and I was wondering how to decide which one should be used
for compiling most programs.

I think the basicaa is the default one, but by looking into its code,
it is not inter-procedural or context-sensitive (I am not 100% sure),
so does not provide interesting mod/ref information. Then, GVN(PRE)
and LICM may not have a lot of opportunities for optimizing calls. So,
I looked into the thesis for understanding how those inter-procedural
aa and other aa contribute. I guess basicaa has a better speed-up per
compiling-cost ratio for most common programs than others. Is this
correct?

We do also use globalsmodref, and have a pass for inferring and propagating the readnone/readonly markers. We also now have Type Based Alias Analysis for C languages.

I do not understand how the chaining works here. If I do opt
-globalsmodref-aa ..., does the globalsmodref pass call the analysis
implemented in AliasAnalysis or BasicAliasAnalysis class when
globalsmodref cannot decide Must or No Alias? I think it should be
AliasAnalysis, because globalsmodref subclasses AliasAnalysis.

The documents say that all the aa analysis are chained, and give an
example like opt -basicaa -ds-aa -licm. In this case, does ds-aa
automatically call basicaa for the case when ds-aa can only return
MayAlias? This looks magic to me. Is this handled by AnalysisGroup
magically?

Jianzhou Zhao <jianzhou <at> seas.upenn.edu> writes:

The documents say that all the aa analysis are chained, and give an
example like opt -basicaa -ds-aa -licm. In this case, does ds-aa
automatically call basicaa for the case when ds-aa can only return
MayAlias? This looks magic to me. Is this handled by AnalysisGroup
magically?

As I understand it, the simplest AA pass which can determine reliable
information is the one which is used, which then chains on to more
complex (and presumably slower) passes. In this way something like
basicaa can determine a few things as definite and anything it cannot
determine it chains onto another pass (e.g. ds- aa) to have a go at.
This will continue until it finds a definite answer or runs out of AA
passes.

The chaining is used with masking of the results to ensure that the
result only becomes more accurate, even if a chained-to pass doesn't
have any idea what to do (at least for mod/ref info, so I presume it
is similar for alias() calls).

Jianzhou Zhao <jianzhou <at> seas.upenn.edu> writes:

The documents say that all the aa analysis are chained, and give an
example like opt -basicaa -ds-aa -licm. In this case, does ds-aa
automatically call basicaa for the case when ds-aa can only return
MayAlias? This looks magic to me. Is this handled by AnalysisGroup
magically?

As I understand it, the simplest AA pass which can determine reliable
information is the one which is used, which then chains on to more
complex (and presumably slower) passes. In this way something like
basicaa can determine a few things as definite and anything it cannot
determine it chains onto another pass (e.g. ds- aa) to have a go at.
This will continue until it finds a definite answer or runs out of AA
passes.

Hi David,

At this level, I can understand how it works. I was confused because I
have been looking at the source code for implementing them. All the
globalmodref, scev-aa, steenaa and ds-aa are only subclasses of the
AliasAnalysis class, so I cannot see how ds-aa can automatically call
basicaa.

What I guess is that the order of the flags matters. This means if I
want to chain a simple AA (say, -basicaa) and a fancier one (ds-aa), I
should add the -basicaa before -ds-aa to define the chain, which the
AnalysisGroup can understand. If I add ds-aa before basicaa, does that
mean basicaa will chain with ds-aa backward?

Jianzhou Zhao <jianzhou <at> seas.upenn.edu> writes:

At this level, I can understand how it works. I was confused because I
have been looking at the source code for implementing them. All the
globalmodref, scev-aa, steenaa and ds-aa are only subclasses of the
AliasAnalysis class, so I cannot see how ds-aa can automatically call
basicaa.

There's some magic in the pass registration which adds them with a
`previous' link between AA passes, so the base AliasAnalysis class
ends up calling the previous one via the "AliasAnalysis *AA" member.

What I guess is that the order of the flags matters. This means if I
want to chain a simple AA (say, -basicaa) and a fancier one (ds-aa), I
should add the -basicaa before -ds-aa to define the chain, which the
AnalysisGroup can understand. If I add ds-aa before basicaa, does that
mean basicaa will chain with ds-aa backward?

It will affect the order of the usage of the particular alias analysis
passes, but the last one specified is called first (so -basicaa -ds-aa
will cause ds-aa to be used first, then chain to basicaa as the previous
AA pass). However, I believe that it should not affect the accuracy of
the alias information as the chaining will stop when a definite answer
is reached, whatever the order of the chained passes.

Jianzhou Zhao <jianzhou <at> seas.upenn.edu> writes:

At this level, I can understand how it works. I was confused because I
have been looking at the source code for implementing them. All the
globalmodref, scev-aa, steenaa and ds-aa are only subclasses of the
AliasAnalysis class, so I cannot see how ds-aa can automatically call
basicaa.

There's some magic in the pass registration which adds them with a
`previous' link between AA passes, so the base AliasAnalysis class
ends up calling the previous one via the "AliasAnalysis *AA" member.

What I guess is that the order of the flags matters. This means if I
want to chain a simple AA (say, -basicaa) and a fancier one (ds-aa), I
should add the -basicaa before -ds-aa to define the chain, which the
AnalysisGroup can understand. If I add ds-aa before basicaa, does that
mean basicaa will chain with ds-aa backward?

It will affect the order of the usage of the particular alias analysis
passes, but the last one specified is called first (so -basicaa -ds-aa
will cause ds-aa to be used first, then chain to basicaa as the previous
AA pass). However, I believe that it should not affect the accuracy of
the alias information as the chaining will stop when a definite answer
is reached, whatever the order of the chained passes.

Thanks. This makes a lot of sense.