There aren't any real papers i'm aware of on TBAA, because it is
entirely dictated by language specific standards, and thus, the rules
differ from language to language.
The only even mildly useful *paper* would be
Honestly, there are no real fundamentals unless you want to talk about
specific languages.
Otherwise, everything could be described as:
1. Each language has some rules about type compatibility. Violation of
these rules usually results in undefined behavior.
2. The compiler takes advantage of this to make assumptions about
aliasing of variables of two different types, on the assumtion that
your program does not exhibit undefined behavior.
LLVM's implementation is really separable into two pieces:
Type based aliasing
Access-path aliasing
Both exist in TypeBasedAliasAnalysis.cpp, but are somewhat unrelated
(in practice, #2 gets almost all of it's power from being able to say
things about various offsets, not about types).
#1 is what is traditionally referred to when people talk about TBAA.
LLVM's implementation theoretically consists of a TBAA DAG that is
written out by the frontend, and has various properties (every child
is a alias subset of the parent, for example), that correspond to
various language rules.
The loads/stores are annotated with the set ids. The TBAA alias
implementation compares the set ids and their positions in the TBAA
DAG to see whether two things alias.
That is the theory.
In practice, you can encode all kinds of interesting analysis results
as TBAA metadata, whether it's type-based or not.
It happens to be a useful datastructure for doing so.
So there is actually no guarantee that any of the metadata you see is
actually related to types in the source language or language rules.