Flow Sensitive Alias Analysis in LLVM

Is there any flow-sensitive Alias Analysis implementation in LLVM?

I actually want to determine at each program point, whether two PointerType Values in LLVM-IR point to some overlapping memory regions. I am aware that the analysis would be conservative and may not provide exact results.

I have used basic-aa before but since it is flow-insensitive, it can’t be used here. If there aren’t any such analyses present in the LLVM core, are there any other options that I can use to solve the aforementioned issue?

Thanks

2 Likes

It’s been a while since I looked into our Alias Analysis implementations, so take the rest of this with a grain of salt, but I don’t recall much in the way of flow-sensitivity in our implementation of BasicAA or the related analysis passes. I have a vague memory that some of the other analysis bits that compute mod-ref info and Memory SSA have some support for those types of queries, but it’s been long enough that I can’t be sure. I’ll defer to people like @jdoerfert who are much more in touch with the current state of affairs an can provide better guidance on what is in-tree today.

I do still remember some history, though. A long time ago, we had the DSA implementation in GitHub - llvm-mirror/poolalloc: Mirror kept for legacy. Moved to https://github.com/llvm/llvm-project, which I believe also could support flow-sensitive analysis, but that hasn’t been supported in a long time. It’s been quite a while since I read the original paper, so I may be incorrect about exactly what it supported.

A bit more recently, we had some (limited) implementations of Anderson and Steensgaard style analysis, but those were also removed (⚙ D139703 [AA] Remove CFL AA passes).

As for external projects, SVF(GitHub - SVF-tools/SVF: Static Value-Flow Analysis Framework for Source Code), Phasar(GitHub - secure-software-engineering/phasar: A LLVM-based static analysis framework.), and SeaDSA(GitHub - seahorn/sea-dsa: A new context, field, and array-sensitive heap analysis for LLVM bitcode based on DSA.) are referenced quite frequently in academic literature.I’ve had mixed results with these, but they’re under active development, and it’s been several years since I last tried to do anything serious with them, so you may not have the same issues I did.

Each framework is pretty different in its approach, but it should be possible to generate some kind of points-to sets using any of them. Phasar is probably the hardest to do so with, since its an IFDS/IDE framework, and I’m not sure they were ever able to port Boomerang (https://www.bodden.de/pubs/sna+16boomerang.pdf) to work with LLVM-IR. If you don’t actually need the points-to set, but can reduce your query to something expressible in IFDS, then it will probably work well. I found that in many cases, I didn’t actually need a points-to set, but could just use dataflow facts to get to the right answer. That completely depends on what you were trying to do w/ the analysis though. Also, IIRC the phasar maintainers were working on overhauling the points-to analysis, so its probably worth taking a look.

If you only need intra-procedural queries, SVF is likely the simplest and most straightforward, though I think it may not model all IR operations, like Insert/ExtractValue sufficiently for all usage. Depending on what you’re doing, that may not matter. Since I was analyzing IR from Rust, which lowers fat-pointer operations to Insert/ExtractValue instructions it was a limitation I couldn’t easily work around. And again, the status there may have changed, since I know the authors are actively improving the library.

The big limitation I found w/ just about every framework for doing inter-procedural points-to analysis was scalability, primarily w.r.t. memory usage. I think this is still an open problem, and one I hope gets focus from the research community.

I know that was a lot of information you didn’t ask for, but hopefully some of it will be useful.

2 Likes

I’m not aware of a (fully) flow sensitive AA in tree. There might be some research works out there. Flow sensitivity is expensive, and depending on what codes you have, AA is not the problem, see Table 4, among other things, in https://dl.acm.org/doi/abs/10.1145/3605573.3605644, or as pdf:

oraql_icpp2023.pdf (562.4 KB)

Will try to circle back, just tagging more ppl:
@dobbelaj @alina @nikic @fhahn

It’s not accessible.

Since relatively recently, the AA API supports flow-sensitive queries (i.e. it accepts a context instruction, and ModRef APIs will use the passed instruction as context), but the only thing they are used for are separate.storage assumes. It’s very unlikely that I will accept extensions to this for compile-time reasons.

What is used to a larger degree are flow-sensitive facts at the definition points of the involved instructions (rather than the point of access). For example, if BatchAA is used with EarliestEscapeInfo instead of SimpleCaptureInfo, it will ignore escapes that cannot reach the involved instructions. Queries on value ranges and other value properties can also make use of assumes.

Following Johannes statement and MLs group work, the compiler becomes more adaptive.

How about offering two AAs: AAA and AAB? AAA is the standard one and AAB is a more aggressive/precise one with a compile time cost. There are optimizations that would benefit from AAB: MemorySSA and dead store elimination. For other optimizations AAA is a great compromise. Offering a larger portfolio of analysisis ought to be a win in optimisation and compile time.

1 Like