A question about pointer aliasing rules in LLVM


I have the following IR code

%prev = getelementptr inbounds %struct.myStruct* %node, i32 0, i32 1
%1 = load %struct.myStruct** %prev, align 4, !tbaa !0
%next1 = getelementptr inbounds %struct.myStruct* %1, i32 0, i32 0
store %struct.myStruct* %0, %struct.myStruct** %next1, align 4, !tbaa !0
%2 = load %struct.myStruct** %prev, align 4, !tbaa !

myStruct is defined as

struct myStruct {
   struct myStruct *next;
   struct myStruct *prev;

In the snippet above, %1 can be reused instead of the load for %2. The
problem is that according to BasicAliasAnalysis, %prev "mayAlias"es with
%next1 and so the store to %next1 clobbers %1.

The point is that %next1 is (struct mysStruct * + 0) while %prev is (struct
myStruct * + 4) so they should not alias. The problem is that they do not
have the same base (%node and %1) but the type of the base is the same.
BasicAliasAnalysis is able to distinguish between them if they have the
same base, but not the same type. Is it wrong to get BasicAliasAnalysis to
look at the type of the base pointer and take a more aggressive approach if
the base pointer type is the same ? I got this impression on reading
http://llvm.org/docs/LangRef.html#pointeraliasing as it seems to suggest
that types are not associated with memory.

Can this situation be helped ?

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

First, yes, it is wrong for AliasAnalysis implementations to trust LLVM IR types, for the most part. There’s nothing in LLVM IR which would prevent you from having two myStruct instances which overlap here, sharing 4 bytes. Because of that, next really could be equal to &prev.

In theory, you could help this situation by using TBAA; you could give next and prev fields different TBAA tags to say that a store to a next field never stores to a prev field.


In practice this is impossible to guarantee. The only safe way of disambiguating the two objects is to prove that the pointers are different.


It may be impossible to guarantee if the input language is C (depending on how you interpret it).

However, the point of LLVM’s TBAA system is to let front-ends make statements about aliasing that are difficult or impossible to prove from the IR alone. Front-ends for more restrictive source languages may actually be able to make guarantees about how linked list pointers are used. LLVM’s TBAA tags are a way for them to describe at least some of those guarantees to optimizers.

Unless you’re objecting to the fact that while LLVM says that metadata may be stripped at any time, there are no rules for when it must be stripped, and that arbitrary valid transformations can theoretically cause valid TBAA and other metadata to become invalid. This is true. This is an area where LLVM’s preference for practicality over absolute correctness may be observed.


I was writing with the C/C++ language in mind (since that was the language from the original post).

I didn't think of the last point you're making, but it actually does seem like a concern. If metadata is there to provide extra information allowing the compiler to perform otherwise unsafe optimizations, the possibility that it may become outdates poses a real risk.