List of alias analyses available in LLVM, and how to use them from llc

Hi all,

I have two questions about alias analysis; see below for the background on why I am asking these questions. First, the questions:

  1. Is there an up-to-date list somewhere of all the alias analyses in LLVM, and what their capabilities are? I’ve been told that the AliasAnalysis page in the official documentation () is severely out of date. I see in particular that it makes no mention of e.g. TBAA, which shows up in “opt -help”.
  2. Is it possible to override the default alias analysis implementation when running the “llc” tool (or from clang)? I am working from within a MachineFunctionPass, so (unfortunately) I cannot use “opt”, which would provide easy command-line options to do this. If there is no command-line ability to specify a different alias analysis, how could I do so in code? Presently, I am simply using getAnalysis() to get the default alias analysis, which is BasicAA.
    Now for the background:

I am writing an analysis pass at the machine-IR level (MachineFunctionPass) for security research. (Before anyone asks, yes, we’ve considered doing this in regular IR, and it isn’t sufficient for our needs. :slight_smile: We are studying code-reuse attacks and need to examine the actual machine instructions available to an attacker in the final compiled binary.)

I’m currently exploring the use of IR-level alias analysis information in my MachineFunctionPass. Specifically, I’m using AliasSetTracker on the Value* pointers returned by MachineMemOperand::getValue() (similar to what MachineInstr::mayAlias() does, though we’re not using that interface because we’re interested in alias sets).

So far, I’ve been using BasicAliasAnalysis’s results, since they’re the default that I get when simply using getAnalysis(). However, I’d like to try some more advanced alias analyses. Specifically, I’m looking for something that provides better field-sensitivity within structs on the stack. BasicAA is field-sensitive in general but I’ve observed that this has trouble distinguishing between (for instance) scalar and array fields within the same struct when the array is accessed through a variable index. A simple motivating example of this is:

struct mystruct {
short s;
int i;
int arr[5];

struct mystruct str;
str.s = (short) rand();
str.i = rand();
for (int idx = 0; idx < 5; ++idx)`` {
str.arr[idx] = rand();

printf("%hd %d %d %d %d %d %d", str.s, str.i, str.arr[0], str.arr[1], str.arr[2], str.arr[3], str.arr[4]);

For the above code, BasicAA knows that str.s, str.i, and str.arr[0-4] are all pairwise NoAlias, but it says that they are all PartialAlias with str.arr[idx]. Clearly, this is because idx is a variable with unknown value, and it doesn’t fall into one the “special cases” that BasicAA can recognize as NoAlias. (For instance, if I write str.arr[(unsigned)idx] instead, it can figure out that it is NoAlias with str.s, since the explicit zero-extension lets BasicAA know that the index must be positive, and they are far enough apart that they can’t alias. Oddly, it still thinks str.i is PartialAlias…I haven’t figured this one out but would guess it somehow slips through the cracks of BasicAA’s special cases.)

Having an array alongside scalars in a struct is fairly common, and I’d like to not have to sacrifice field sensitivity when it occurs. My thinking, therefore, is that I need a smarter AA that knows things about types, e.g., that it’s undefined behavior to access a scalar struct field through an out-of-bounds pointer to an array field in the same struct. This sounds like the sort of thing TBAA (“Type-Based Alias Analysis”) might be able to do - hence my questions above.

Thanks in advance for your time and assistance!

Ethan Johnson

Hi Ethan,

I’m not aware of a complete, up-to-date document of the available AA analyses. It might make sense to look at the source, though.

You can find a list of available alias analyses in the PassRegistry[1]. Just grep for ALIAS. If you’re building on top of the legacy pass manager, there’s a similar file for it, but the actual analyses should be the same. I guess how these analyses work is in most cases obvious by their name. Otherewise, another look at their implementation might help. For instance, the implementation of TBAA[2] is comparatively well documented.

Hope that helps a bit!