flow-sensitive alias analysis

Hi,

I am looking for some flow-sensitive (context-insensitive) alias
analysis algorithm implemented in LLVM. (I use LLVM 3.9, hope to switch
to 4.0 soon.)
As far as I know, none of the built-in analysis (basicAA,
globals-modref, andersAA, etc.) is intended to be flow-sensitive. So I
searched and came across these two

1. https://github.com/unsw-corg/SVF by Yulei Sui (for LLVM 3.8)
2. http://www.cs.ucsb.edu/~benh/research/downloads.html by Ben Hardekopf
(for LLVM 2.5)

Are there other implementations which use the AA-Interface?

Giving a little context, I need some functionality in LLVM that answers
the following question.
For a given argument of a call instruction in the cfg: Where does the
value of the argument come from at the call site?

So I guess I need a flow-sensitive alias analysis, right?
Could someone please guide me a little?

Thank you,

Oliver

For a given argument of a call instruction in the cfg: Where does the
value of the argument come from at the call site?

GVN may tell you the values. I’m not sure why you would need flow sensitive alias analysis for this.

-Kevin

Perhaps by “value” you mean points-to set?

Either way, flow-sensitivity can only give you more precise – but still not necessarily exact – answers.

Yours,

Andrey

Perhaps by "value" you mean points-to set?

Thats right! I meant the points-to set. Sorry I didn't mention that.
I want to track back the value of the parameter to its definition -- an
"assignment" which could be indirect through a pointer to the value used
as the function argument.

Either way, flow-sensitivity can only give you more precise -- but still
not necessarily exact -- answers.

I am aware of the inexactness of (flow-sensitive) alias analysis. Is
there an exact way to accomplish this?

Thanks for your replies!
Oliver

No, even the non-flow-sensitive version of the problem is statically
undecidable for may-alias, and not even recursively enumerable for
must-alias.
This is true even when all paths through the program are executable.

If you leave out recursive data structures, it is simply NP complete, and
thus, you could compute an exact answer, very expensively.

The TL;DR is that for most languages, there can be no exact aliasing
computation, only approximations.
See, e.g, http://dl.acm.org/citation.cfm?doid=161494.161501 and
http://dl.acm.org/citation.cfm?id=186025.186041

(btw, it gets worse. If you disallow recursive data structures, and allow
multiple levels of pointers, precise flow-sensitive may-alias is NP-hard
even when restricted to intraprocedural and *no* dynamic memory allocation)

Determining the exact behavior of a program (which would be necessary to know the exact definition of a value) is undecidable (see halting problem etc.). So all you can hope for are approximations.

- Matthias

    > Perhaps by "value" you mean points-to set?

    Thats right! I meant the points-to set. Sorry I didn't mention that.
    I want to track back the value of the parameter to its definition -- an
    "assignment" which could be indirect through a pointer to the value used
    as the function argument.

    > Either way, flow-sensitivity can only give you more precise -- but still
    > not necessarily exact -- answers.

    I am aware of the inexactness of (flow-sensitive) alias analysis. Is
    there an exact way to accomplish this?

No, even the non-flow-sensitive version of the problem is statically
undecidable for may-alias, and not even recursively enumerable for
must-alias.
This is true even when all paths through the program are executable.

If you leave out recursive data structures, it is simply NP complete,
and thus, you could compute an exact answer, very expensively.

The TL;DR is that for most languages, there can be no exact aliasing
computation, only approximations.
See, e.g, http://dl.acm.org/citation.cfm?doid=161494.161501
and http://dl.acm.org/citation.cfm?id=186025.186041

(btw, it gets worse. If you disallow recursive data structures, and
allow multiple levels of pointers, precise flow-sensitive may-alias is
NP-hard even when restricted to intraprocedural and *no* dynamic memory
allocation)

Okay, so thats is what I expected...no exact static alias analysis in
polynomial time.
We can only get (over-)approxmations for the may-alias problem by the
algorithms implemented in LLVM. Thats fine for me. But all those
implementations are flow-insensitive.
So my original question was: Are there implementations of flow-sensitive
algorithms that are usable in LLVM (through AA-Interface,
MemorySSA/MemoryDependenceAnalysis)?
Because my assumption is: (sparse) flow-sensitive algorithms can have
nearly the same time requirements as flow-insensitive ones but are a
little more precise.

-Oliver

>
>
> > Perhaps by "value" you mean points-to set?
>
> Thats right! I meant the points-to set. Sorry I didn't mention that.
> I want to track back the value of the parameter to its definition --
an
> "assignment" which could be indirect through a pointer to the value
used
> as the function argument.
>
> > Either way, flow-sensitivity can only give you more precise -- but
still
> > not necessarily exact -- answers.
>
> I am aware of the inexactness of (flow-sensitive) alias analysis. Is
> there an exact way to accomplish this?
>
>
> No, even the non-flow-sensitive version of the problem is statically
> undecidable for may-alias, and not even recursively enumerable for
> must-alias.
> This is true even when all paths through the program are executable.
>
> If you leave out recursive data structures, it is simply NP complete,
> and thus, you could compute an exact answer, very expensively.
>
> The TL;DR is that for most languages, there can be no exact aliasing
> computation, only approximations.
> See, e.g, http://dl.acm.org/citation.cfm?doid=161494.161501
> and http://dl.acm.org/citation.cfm?id=186025.186041
>
> (btw, it gets worse. If you disallow recursive data structures, and
> allow multiple levels of pointers, precise flow-sensitive may-alias is
> NP-hard even when restricted to intraprocedural and *no* dynamic memory
> allocation)
>

Okay, so thats is what I expected...no exact static alias analysis in
polynomial time.
We can only get (over-)approxmations for the may-alias problem by the
algorithms implemented in LLVM. Thats fine for me. But all those
implementations are flow-insensitive.

Yes, because flow-sensitivity doesn't really buy much for the cost.

So my original question was: Are there implementations of flow-sensitive
algorithms that are usable in LLVM (through AA-Interface,
MemorySSA/MemoryDependenceAnalysis)?

Nope.

Because my assumption is: (sparse) flow-sensitive algorithms can have
nearly the same time requirements as flow-insensitive ones but are a
little more precise.

This assumption is very wrong, AFAIK :slight_smile:
The best known precise flow-insensitive algorithms easily handle pretty
much any scale these days.

GCC's points-to is field-sensitive precise flow-insensitive, and it pretty
much never shows up on profiles.
By contrast, the best sparse known sparse flow-sensitive analysis varies
(quite wildly, in fact), is usually between 10x and 100x slower.
This is true even when, for example, parallelized on GPU's (see, e.g.,
http://dl.acm.org/citation.cfm?id=2555296).

If you only care about certain pointers, you would be better off with a
demand-driven technique.

See, e.g., "Demand-Driven pointer analysis with Strong updates using
value-flow refinement"

This assumption is very wrong, AFAIK :slight_smile:
The best known precise flow-insensitive algorithms easily handle pretty
much any scale these days.

GCC's points-to is field-sensitive precise flow-insensitive, and it
pretty much never shows up on profiles.
By contrast, the best sparse known sparse flow-sensitive analysis varies
(quite wildly, in fact), is usually between 10x and 100x slower.
This is true even when, for example, parallelized on GPU's (see,
e.g., http://dl.acm.org/citation.cfm?id=2555296).

If you only care about certain pointers, you would be better off with a
demand-driven technique.

See, e.g., "Demand-Driven pointer analysis with Strong updates using
value-flow refinement"

Seems like I have to read much more :smiley:
Thanks for your advice! I will probably come back with some questions
regarding demand-driven analysis.

-Oliver