PROPOSAL: struct-access-path aware TBAA (new version)

Hello,

After discussions with Daniel, Dan and others, here is an updated proposal for struct-access-path aware TBAA.

Given an example
struct A {
  int x;
  int y;
};
struct B {
  A a;
  int z;
};
struct C {
  B b1;
  B b2;
  int *p;
};
struct D {
  C c;
};

The purpose of struct-path-aware TBAA is to say
"C::b1.a" will alias with "B::a.x", "C::b1.a" will alias with "B",
"C::b1.a" does not alias with "D::c.b2.a.x".
"A::x" does not alias with "A::y"
We can potentially disambiguate aggregate accesses and scalar accesses.

The current scalar TBAA will say
"C::b1.a.x" will alias with "D::c.b2.a.x" since both are accesses to int.
"A::x" will alias with "A::y"

This proposal is about the format of metadata and how to implement alias queries.
--------- Option A

metadata for the type DAG

   1> the existing scalar nodes
      !0 = metadata !{metadata !"any pointer", metadata !1}
      !1 = metadata !{metadata !"omnipotent char", metadata !2}
      !2 = metadata !{metadata !"Simple C/C++ TBAA"}
      !4 = metadata !{metadata !"int", metadata !1}
   2> the struct nodes
      A struct node has a unique name plus a list of field types
      For example, struct node for "C" should look like
      !5 = metadata !{"C", metadata !3, metadata !3, metadata !0}
      where !3 is the struct node for "B"

   The type DAG for the example:
   int <------------- A <- B <-- C <-- D
   int <-------------------- B
   any pointer <--------------- C

We call metadata attached to scalar accesses scalar tags and metadata

   attached to aggregate accesses aggregate tags.
   Each tag can be a sequence of field accesses and nodes in the type DAG.
   For example, the tag for "C::b2.a.x" is
   !7 := [ "tbaa.path", !C, "C::b2", !B, "B::a", !A, "A::x", !int ] ;C::b2.a.x
   where !C, !B, !A are metadata nodes in the type DAG, "C::b2" "B::a" "A::x" are strings.

alias(x, y) = alias_rule(x, y) or alias_rule(y, x), where alias_rule(x, y) is

   if the first element of 'x' does not enclose the first element of 'y'
     return NoAlias
   if the last element of 'x' encloses the first element of 'y' via type DAG,
     return Alias
   Check the first element of 'y', if it is not in 'x', return NoAlias
   Repeat until we reach the end of either 'x' or 'y'
     Check the next element of 'y' with the next element of 'x'
     if not the same, return false.

-------------- Option B
Option B is to include offset of each field in the type DAG and let tag
be offset based.

metadata for the type DAG

   1> the existing scalar nodes
   2> the struct nodes
      A struct node has a unique name plus a list of pairs (offset, field type)
      For example, struct node for "C" should look like
      !5 = metadata !{"C", 0, metadata !3, 12, metadata !3, 24, metadata !0}
      where !3 is the struct node for "B"
   The type DAG for the example:
   int <- 0 <- A
   int <- 4 <- A <- 0 <- B <- 0 <- C <- 0 <- D
   int <-------------- 8 <- B <- 12 <- C
   any pointer <--------------- 24 <- C

The tag will be (outermost struct type node, offset, size, the last type node)

   For example, the tag for C::b2.a.x will be
   !7 = [ "tbaa.offset", !C, 12, 4, !int] ;C::b2.a.x
   The last field of the tag is to handle alias(D::c, A::x), the tags will be
   [!D, 0, 32, !C] and [!A, 0, 4, !int], we only need to check whether !C encloses
   !A.

Consider access pair "C::b1.a.x" and "D::c.b2.a.x", the tags are [!C, 0, 4, !int]

   and [!D, 12, 4, !int].
   To check if they alias, we first check if !D encloses !C. If no, we return
   NoAlias. Since !D does enclose !C in our example, we start tracing from !D
   and follow the edge with the correct offset by
   adjusting the offset to be relative to the next node. We will reach C in the
   type DAG, now we have a common type node and can compare the (offset, size)
   directly. [!D, 12, 4] will be adjusted to [!C, 12, 4], since it does not
   overlap with [!C, 0, 4], we say they do not alias.

   alias_rule(x, y) is
   if the outermost type of 'x' does not enclose the outermost type of 'y',
     return NoAlias
   if the last type of 'x' encloses the outermost type of 'y'
     return Alias
   Start from the outermost type of 'x'
   Repeat until we reach the outermost type of 'y'
     follow the edge with the correct offset in the type DAG
     adjust the offset to be relative to the next node
   Compare the adjusted offset, size pair for 'x' against 'y'
   if they overlap, return Alias
   Otherwise, return NoAlias

------------------ Comparison
The complexity of alias queries: similar
The space for metadata: option A should require more space because the tag is a
  sequence of nodes and strings.
I will work on Option B.

------------------ Holes
We use BasicAA to perform sanity check. If BasicAA returns MayAlias, TBAA will
kick off with full speed.

Watch out for cases where the incoming arguments have different types but they
do alias. Scalar TBAA has the same issue but we don't see bugs coming up. If
we see bugs coming up with struct-path aware TBAA, we can do less aggressive
TBAA or apply similar tricks as GCC.

Scalar TBAA will say NoAlias for the placement new example under
test/Analysis/TypeBasedAliasAnalysis/placement-tbaa.ll.
If we optimize placement-tbaa.ll with -O1 -gvn, BaiscAA will start saying
MustAlias. This may cause problem.

Nice proposal.

A possibly simplified way of doing TBAA is:

  1. decompose an access into a 3-tuples: <BaseType, AccessType, BitOffset> – there is no need to remember the full access path.
  2. If the access types are incompatible, return noAlias (scalar TBAA rule)
  3. if there is no containment relationship between two BaseTypes, return noAlias
  4. Otherwise, align the the enclosed base type into the enclosing type (for each enclosed instance), and adjust the offset. If there is no overlap, return noAlias, otherwise return mayAlias.

For instance, “C::b1.a.y” AND "D::c.b2.a.x

C::b1.a.y → <C, int, 32>
D::c.b2.a.x → <D, int, 96>

Base Type C is enclosed in D, so the first access is normalized to <D, int, 32> which does not overlap with <D, int, 96>.

For more complicated cases, such as p->a[i].x.b[j].y, TBAA can choose to normalize the access via projection before creating the tuple → p->a[0].x.b[0].y. Make sure MustAlias is not wrongly returned for this case.

Cheers,

David