RFC: Representing unions in TBAA

I certainly agree. We should pick terminology that makes sense independent of C/C++. -Hal

Hello all,

I started implementing Daniel’s idea, and I noticed a couple things that should be addressed.

  1. The root node: Clang currently generates a root node named “Simple C++ TBAA” (or something similar). From the comments it looks like this is used to determine whether or not two TBAA DAGs from different translation units can be merged into a single on during LTO. I plan on keeping that node, but making it the universal “sink” node as before. This means that, in the new interpretation of the graph, it will be a child of everything instead of a parent of everything. However, since it is still the sink, it can still work in LTO as before. This is just a guess, because I have not looked at the code that uses this.

  2. char: Daniel suggested omitting the aliasing for because we know it aliases everything. The problem with this is that llvm does not alias the current “omnipotent char” to everything. As is pointed out in the comments:

// Define the root of the tree for user-accessible memory. C and C++
// give special powers to char and certain similar types. However,
// these special powers only cover user-accessible memory, and doesn’t
// include things like vtables.

Is there another option on how to handle vtables? If not, we may need to add a node for “omnipotent char”, which will have lots of children. This will make precomputing the transitive closure particularly important.

Later,
Steven Perron

Hello all,

I started implementing Daniel's idea, and I noticed a couple things that
should be addressed.

1) The root node: Clang currently generates a root node named "Simple C++
TBAA" (or something similar). From the comments it looks like this is used
to determine whether or not two TBAA DAGs from different translation units
can be merged into a single on during LTO. I plan on keeping that node,
but making it the universal "sink" node as before. This means that, in the
new interpretation of the graph, it will be a child of everything instead
of a parent of everything. However, since it is still the sink, it can
still work in LTO as before. This is just a guess, because I have not
looked at the code that uses this.

2) char: Daniel suggested omitting the aliasing for because we know it
aliases everything.

Not quite. I suggested not explicitly representing it as a node.

The problem with this is that llvm does not alias the current "omnipotent
char" to everything. As is pointed out in the comments:

  // Define the root of the tree for user-accessible memory. C and
C++
  // give special powers to char and certain similar types.
However,
  // these special powers only cover user-accessible memory, and
doesn't
  // include things like vtables.

Sure.

Is there another option on how to handle vtables? If not, we may need to
add a node for "omnipotent char", which will have lots of children. This
will make precomputing the transitive closure particularly important.

You don't need an actual node, just a single bit saying whether it is has
"aliases anything" as a child.
If so, it is equivalent to aliases anything, and you do not need to store
any more children.

See, e.g., what gcc does for "has_zero_child"

Can we turn off TBAA info for union member accesses in clang before this gets fixed?

-Krzysztof

Not familiar with clang enough to know.
Staring at it, it looks a bit annoying to do.
You’d basically just want to stop decorating loads/stores with tbaa info if it’s a union.

I'm asking if people are ok with it. :slight_smile:
We've had customer reports that can be tracked down to this issue, so this is something we'd really like to working (at least in terms of correctness).

I can come up with a patch.

-Krzysztof

Ah.
IMHO, yes we should disable it until it’s correct.

I would like to see it disabled as well until fixed, we have problem
compiling HHVM with clang due to this union+TBAA problem.

-Xin

I posted a patch at https://reviews.llvm.org/D31885.

-Krzysztof

Now that I’ve spent a bunch of time looking at this, I’d like to voice support for Steven’s original proposal. In the context of what we have, it makes sense, seems sound, and should fix the representational mapping problem we currently have. Something completely different (e.g. closer to what GCC uses) can work too, but this seems unnecessary (the proposed extension to the current semantics seem equivalently expressive). What you call “special casing of union-type nodes” does not actually seem all that special. The rule seems quite simple: if you some across duplicate offsets, then search all of them. This should not be difficult to implement or use, and seems no less efficient than any other way of encoding the concept of disjunction in the hierarchy. -Hal

So, https://bugs.llvm.org/show_bug.cgi?id=32056 is an example showing our
current TBAA tree for union generation is definitely irretrievably broken.
I'll be honest here. I'm pretty sure your proposal doesn't go far enough.
But truthfully, I would rather see us come closer to a representation we
know works, which is GCC's.
Let me try to simplify what you are suggesting, and what we have.
Our current representation is basically inverted from GCC, but we don't
allow things that would enable it to work.

Given
union {int a, short b};

GCC's will be:

union
  / \
short int

Everything is implicitly a subset of alias set 0 (which for C/C++ is char)
just to avoid representing it.

Our metadata has no child links, and only a single parent link.

You can't represent the above form because you can't make a single short
node a child of every union/struct it needs to be (lack of multiple
parents means you can't just frob them all to offset zero).

Your suggestion is to invert this as a struct

short int
   > /
union

We don't allow multiple parents, however.
Because of that, you suggest we follow all nodes for unions, special
casing union-type nodes somehow

Now that I've spent a bunch of time looking at this, I'd like to voice
support for Steven's original proposal. In the context of what we have, it
makes sense, seems sound, and should fix the representational mapping
problem we currently have.

Except you can't easily differentiate it from the current one, and if we
are going to have to upgrade/break compatibility, why not just do it once
right, a way we know works, instead of risk screwing it up again, and
playing with a representation we aren't actually sure we can make efficient
for this case?

Something completely different (e.g. closer to what GCC uses) can work
too, but this seems unnecessary (the proposed extension to the current
semantics seem equivalently expressive).

Yes, they are equivalently expressive.

What you call "special casing of union-type nodes" does not actually seem
all that special. The rule seems quite simple: if you some across
duplicate offsets, then search all of them.

Which means you have to know they exist, for starters, which means keeping
it in some sorted order and checking, or some other mechanism.

This should not be difficult to implement or use, and seems no less

efficient than any other way of encoding the concept of disjunction in the
hierarchy.

It is, in actuality, less efficient.

For starters, in the inverted representation, you don't have to explicitly
represent the things that are children of alias set zero (char in C++'s
case), which is quite common.

It's also trivial to generate transitive closures of parts of the graph in
the inverted representation, and be able to use them, if you need to.
It turns out to be much trickier the other way.

Those are just trivial examples.

Now, how often this matters, don't know.

But i'm suggesting what i believe to be the most practical route:
Given a situation where our representation has been broken for years, take
an approach that is battle tested and we know works and is efficient,
instead of trying to patch our representation and hope we've thought it
through well enough :slight_smile:

Does this mean the original proposal won't work?
Nope. It may in fact work. But i'd still do the thing i knew already
worked well. Because like i said, you are going to break compatibility
anyway, so ...

I don’t see why need to break backward compatibility. Do you have something specific in mind? Once we extend the TBAA implementation to treat repeated offsets as disjunction, then we’ll extend Clang to emit metadata for unions in this form. Old IR will work exactly as it does now. We already keep the offsets in order (this is a current requirement), so finding duplicate offsets can be done with no added complexity. So, I think this is an orthogonal concern. It is not clear to me that overhead here is significant, and even if it is, it seems like an even-more efficient choice would be for the frontend just not to emit TBAA at all for universally-aliasing accesses. We can also reduce the overhead by merging the universally-aliasing-type nodes with the root node of the type system. We don’t do that now, however, because we treat “omnipotent char” and “vtable pointer” as peer children to the root of the type hierarchy. That seems desirable (i.e. even char* does not alias with vtable pointers). If we wish to keep this, then I think we’re doing about as well as we can. I don’t see how this differs significantly. We can still cache the types in the transitive closure up to the root for any type and use that as an O(1) filter on queries. As you point out, TBAA hierarchies tend to be shallow, so I’m not sure how much this matters. If it does, it is just a matter of walking from the type to the root (covering all paths if there are disjunctions). I’m obviously all for learning from what others have done. Also, obviously, for everything we do it might turn out we haven’t thought it through well enough. However, the fundamental problem here seems easy to express: we currently lack a concept of disjunction. Disjunction is the underlying concept required to represent unions. As I said above, I don’t think we need to break backward compatibility. If we do, that may change my opinion. :wink: Thanks again, Hal

Except the Old IR is irretrievably broken. and in this method, you can't
tell whether you are dealing with correct TBAA or not.

Earlier, the discussion was basically "detect old IR, disable TBAA since
it's broken, make new IR work".

If we do that, we need a way to distinguish new vs old IR.

Ah, okay. I don’t think that’s desirable in general. There are frontends emitting perfectly self-consistent TBAA metadata, and there’s no reason they should change. Clang’s metadata is broken because the mapping between language rules and TBAA is broken. If we’d like, we can blacklist TBAA metadata coming from root nodes named “Simple C++ TBAA” and “Simple C/C++ TBAA” (or provide an option to do that because I’m not convinced it is worth trying to retroactively “fix” old IR by default given the associated performance penalties). After the fix, Clang can emit root nodes with different names (they’re arbitrary). Changing the root-node names will also give the conservatively-correct behavior when linking old IR with new IR. Thanks again, Hal

I still continue to believe trying to patch the existing mechanism this way
is not a great plan, and likely to end us in a place where we still have
either correctness or performance issues.
But i'm not doing the work, so if that's what you guys want to do, go for
it.

--Dan

I don’t agree, but this is because I fail to see how the two representations (the GCC-like one you’ve outlined and the current one with the proposed extension) aren’t completely isomorphic. Your proposal is: the evolutionary scheme can be described in similar terms: aggregate-type node → { name, array of { offset, member-type } } For a root node, the array is empty (i.e. it is only a name). For a scalar type, the array only has one entry: the base/parent type. For struct types, the array has all of the members. For union types, the array has all of the members with duplicate offsets (likely zero, although there may be cases, such as anonymous unions in structs, where the duplicate offsets are not zero). Paths in the current scheme, as in your proposal, are separate from the TBAA DAG itself, and are just: path node → {access-type node, aggregate-type node, offset} For scalar accesses, access type == aggregate type and offset == 0. This could be more representationally efficient, but it is done this way to provide a uniform way to handle both scalars and aggregates and provide a uniform way to differentiate these node from the old-style scalar-only TBAA (to be fair, however, likely more the latter than the former). In any case, aside from the fact that we explicitly include the access type, this is the same. In both cases the edges of the hierarchy run between structs and their members, so regardless of whether you draw the graph going up or down (i.e. call the edges parents or children) that aspect of the representation seems equivalent. The differences seem to be: 1. Our current scheme represents scalar types like it represents structs with a single member (i.e. a single base type at offset 0). In your proposed scheme we have separate scalar nodes. 2. The proposed evolved scheme we represent unions as aggregates with duplicated offsets (or, more generally, duplicate offsets within some aggregate). In your proposed scheme we represent unions as multiple children of scalar nodes. Transforming between the schemes seems only to require taking all consecutive members with duplicate offsets in the proposed evolved scheme and placing them into a union (scalar) node in your scheme. Given that the union representation seems equivalent by rewriting, the question is whether the representation of scalars is equivalent - that is, is representing a scalar as a single-member struct (or union, as the case may be) equivalent to the representation you’ve proposed? I don’t see why this wouldn’t be equivalent as well (i.e. when traversing the tree in your representation we’d not adjust the offset when visiting scalar nodes, and then we’d visit the child, which is what we currently do as well). Thanks again, Hal

I don't agree, but this is because I fail to see how the two
representations (the GCC-like one you've outlined and the current one with
the proposed extension) aren't completely isomorphic. Your proposal is:

Lots of data structures are completely isomorphic in the same way, and in
plenty of those cases, one is completely unusable, and the other functions
quite well.

Your below is basically "one is drawn in one direction, and the other is
drawn the other".
This is entirely true.

You want to argue that this means they will be exactly as efficient or easy
to understand as each other.
This is where you and I disagree.
I disagree because i rarely see parent-linked graph structures (outside of
union-find ) that are implemented or used well.
People often screw up the algorithms, they tend to be harder to reason
about, etc.
This is why, for example, despite having both parent and child links,
roughly all of our dominator tree graph algorithms walk children instead of
parents, even though you could equivalently do them using only the parent
links (even the depth first algorithms).

Past that I'm not sure what you want me to say here.
If y'all come up with an implementation that works as well as the ones i'm
familiar with, i'll be as happy as anyone else?

Fair point. I suppose I was saying something stronger… I don’t think that we’re on the same page here. AFAIKT, in both schemes, the links go in the same direction: from struct nodes to member-type nodes. We call these parent links in our scheme and you call them child links in your scheme, but the sense in the DAG is the same. Maybe an example will help, you said: Let’s expand this, and also explicitly represent char, does the graph look like in the GCC-like scheme? union1 {int a; short b;}; union2 {int a; float f;}; union1 union2 / \ / \ short int int float \ | | / \ | | / ( char ) Thanks again, Hal

I don't agree, but this is because I fail to see how the two
representations (the GCC-like one you've outlined and the current one with
the proposed extension) aren't completely isomorphic. Your proposal is:

Lots of data structures are completely isomorphic in the same way, and in
plenty of those cases, one is completely unusable, and the other functions
quite well.

Fair point. I suppose I was saying something stronger...

Your below is basically "one is drawn in one direction, and the other is
drawn the other".
This is entirely true.

You want to argue that this means they will be exactly as efficient or
easy to understand as each other.
This is where you and I disagree.
I disagree because i rarely see parent-linked graph structures (outside of
union-find ) that are implemented or used well

I don't think that we're on the same page here. AFAIKT, in both schemes,
the links go in the same direction:

from struct nodes to member-type nodes. We call these parent links in our
scheme and you call them child links in your scheme, but the sense in the
DAG is the same.

Except not because of what you've chosen explicitly as the root :slight_smile:

we have

root (char)
/ \
int short

     /

struct A

gcc has

<forest>
struct A
/ \
int short

(note the lack of char here too)

So you would be right if it wasn't rooted.
But we've explicitly chosen to make it rooted in what GCC would consider a
leaf:)

It is true you can rewrite one into the other, but ...

Hello Steven, Hal and Daniel,

Thanks a lot for your discussion; it really helps with summarizing current TBAA issues and ways to resolve them.

Do you guys know anything of the current status of the proposed change? Steven, will you please let us know if the work is in progress and if there is any ETA you can share?

I'm asking because we are working on an alternative approach that not only supports accesses to union members, bit fields, fields of aggregate and union types, but also allows to represent accesses to aggregates and unions the same way we do it for scalars so that !tbaa.struct is replaced with plain !tbaa, meaning TBAA information can be propagated uniformly regardless of types of accessed objects. As a consequence, it supports identification of user types defined in different translation units, even if some of them are written in C and others are in C++. It also defines a set of language-neutral formal rules that LLVM codegen follows to determine whether a given pair of accesses are allowed to overlap by rules of the input language. As of today, we know this implementation covers all currently supported TBAA functionality reflected in the test suites and to test the new functionality we have SROA improved to preserve TBAA information.

The point is, our approach does not try to describe accesses as (type, offset) pairs and instead represents access sequences explicitly beginning from the base type followed by field descriptors, which is what makes the approach so flexible. TypeBasedAAResult::Aliases() and MDNode::getMostGenericTBAA() are a bit more complex than they used to be (they actually use the same internal function), but rely exclusively on linear scans of access sequences unless we have a situation when have to check if one of the accessed types is the type of a member of the other one, in which case it seems we just have to traverse through fields recursively no matter what.

So, I wonder if this or similar approaches have ever been considered before and what are the cons, if there are any sounded. Do you think it is worth to consider it now?

Thanks again,