Request for comments: TBAA on call

Hi all,

TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren't read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let's use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR:

Method #1:

      %object = call @runtime_allocate(... /* pass size and/or other data */)

Method #2:

  SomeBlock:
      ... ; code preceding the allocation
      %objectFast = ... ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path
      %didFail = icmp eq %objectFast, null
      br %didFail, label %AllocationSlowPath, label &Continue
  AllocationSlowPath:
      %objectSlow = call @runtime_allocate_slow_path(... /* pass size and/or other data */)
      br label %Continue
  Continue:
      %object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath]

In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it's not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don't like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC's data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would be easy to convey using TBAA.

Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types, then LLVM would be able to perform more CSE/GVN/LICM and probably other goodness as well.

Perhaps there are alternate ways of representing allocation or write barriers - but in my experience when LLVM is integrated into a larger system the above IR appears to be optimal for other reasons; and anyways, this issue arises in a bunch of other situations. I'm currently working on an LLVM-based compiler backend for a JavaScript VM and in this project I can see the following cases have already arisen:

- GC allocation slow paths
- GC write barrier slow paths
- Object reshaping slow paths (JS VMs get objects to be fast by sometimes changing their shape which isn't observable to the user and only affects certain well-defined fields)
- Array access slow paths (if JS type inference tells me that at array really is an array but I cannot prove that the access is in-bounds then I still have some work to do for the out-of-bounds case but I can prove that it will only read/write some things rather than all of the things)

There are likely to be more such cases. We out-of-line a lot of gnarly code, and that code is often compiled by a compiler other than LLVM and we won't necessarily be able to feed the IR of those runtime functions to LLVM. Sometimes that code is hand-written assembly so having LLVM analyze it is pretty much a non-starter. Much of that out-of-lined code has a narrow set of effects. We can certainly summarize what those effects are using TBAA, so that seems like the most pragmatic way of doing it.

So here's a strawman for how to represent TBAA on calls.

Ideally a call instruction would list what it reads and what it writes. In most cases this will be overkill, so if we see:

  call @foo(...) !tbaa !42

Then this ought to be interpreted as both reading and writing. We could then also have new kinds of meta-data references:

  !tbaa.read
  !tbaa.write

So that saying:

  call @foo(...) !tbaa.read !42 !tbaa.write !84

Indicates that this call to @foo reads whatever !42 is and writes whatever !84 is.

Calls that are not decorated with !tbaa, !tbaa.read, or !tbaa.write must be interpreted as if they read or wrote anything, subject of course to function attributes (like readonly). If you ignore the TBAA metadata on a call then you should also assume that the function read or wrote anything, again subject to function attributes. If a function call has both a readonly attribute and some TBAA data, then it should be safe to use either the attribute or the metadata for the purpose of dependency analysis. If you use both sources of information then the set of possible memory locations that are written is the intersection of the set allowed by the attributes and the set allowed by the TBAA metadata; for example if @foo (in the above example) were marked readonly then the !tbaa.write !84 can be ignored.

I'm not entirely convinced that this representation really is totally general; for example you can imaging wanting to list the set of things read or written and I gather that this would be hard using my suggested syntax. But it's just a strawman - the bigger question is: can anyone see show-stopper issues that would prevent TBAA on calls from ever being a thing?

Also this would be a sizable change - a bunch of dependency analysis stuff would have to be taught to look at calls in addition to loads and stores. But I think it could be broadly useful, maybe even beyond the high-level language compilers that I'm familiar with.

Thoughts?

-Filip

The general idea seems right to me. I can’t think of a better way to express the side effect of type-checks and special helpers provided by the runtime. I don’t have much opinion on the meta-data syntax other than that it should have room for extensions.

I was expecting a little more discussion on this one. But FWIW I think it’s a good idea.

-Andy

Out of curiosity, why is this tied to TBAA types, rather than variables or
something else?
IE what analysis are you performing that gives you type answers instead of
answers in terms of local/global variables?
The information you are talking about here is traditional mod/ref info that
is completely independent of TBAA.

The more specific the info you can get from the metadata, the better off
you are eliminating loads/stores

Otherwise, yes, it's certainly incredibly helpful to the other
optimizations.

However, unless there is a good reason, i'd suggest you just make it
generic call read/write metadata that talks about values (be they incoming
arguments, alloca'd memory, or whatever else).

It's actually a lot easier to reason about individual heap values than
entire types, unless, again, you only have an analysis that is providing
answers about types (or your plan was simply to use the existing TBAA info
and do a bottom-up closure over the entire callgraph, which would provide
some value, but lose a lot of info due to external function calls)

TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t for reasoning about user-facing types, but for creating what you could alternatively think of as an “abstract heaps”. TBAA is perfect for this. For example I might know that a certain function clobbers a field called ‘foo’. In my frontend I create an “abstract heap” for each field name, and this gets lowered to LLVM TBAA. I already use this for telling LLVM about dependencies between loads and stores. If I emit a store as a result of lowering a JS “o.foo = …” operation then I will ascribe a TBAA node for the “foo” abstract heap. This is just one example of the many different abstract heaps (i.e. “TBAA nodes”) that my compiler uses.

Note that most of the loads and stores that I’m concerned with do not involve alloca’s (i.e. locals) or “global variables” in any kind of way that LLVM could reason about. A mechanism that only works for locals and globals would be of zero use to me since all of my alloca’s are trivially killed by mem2reg and I don’t have any globals.

You lost me. I’m writing a compiler that generates LLVM IR. I have high-level knowledge that allows me to prove the set of abstract heap locations that may be read or written by any call, load, or store.

What do you suggest as a mechanism by which I can communicate that information to LLVM? Are you saying that such a mechanism already exists and that I should use that instead?

As far as I can tell, TBAA is the cleanest such mechanism that I have found thus far and its only shortcoming is that I cannot use it to annotate calls.

You lost me. How does this help a high-level compiler convey to LLVM that a particular call and a particular load or store don’t have any dependency? My proposal achieves this, albeit using a coarse-grained at a coarse-grained level: you have to express your knowledge of dependencies using a hierarchy of abstract sets of heap locations (i.e. exactly what TBAA gives you).

The goal is to allow a front-end that already has knowledge about some high-level types to convey that knowledge to LLVM when generating LLVM IR. In my front-end, I actually don’t really do much analysis to produce the TBAA information - it mostly just falls out naturally from my own IR.

I’m not sure what you mean about “individual heap values”. If you mean global variables, then that’s of no use to me. I don’t have any global variables.

It’s interesting that using TBAA for this completely subsumes any approach based on globals: you could ascribe a distinct TBAA node to each global and get the same effect. But TBAA is more powerful because you can annotate something that you’ve already proven about a memory location, and this information can be orthogonal to the IR used to compute that memory location. This makes TBAA ideal for conveying dependency information from a high-level compiler to LLVM at the time of LLVM IR generation.

-Filip

Hi all,

TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren't read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let's use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR:

Method #1:

            %object = call @runtime_allocate(... /* pass size and/or other data */)

Method #2:

        SomeBlock:
            ... ; code preceding the allocation
            %objectFast = ... ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path
            %didFail = icmp eq %objectFast, null
            br %didFail, label %AllocationSlowPath, label &Continue
        AllocationSlowPath:
            %objectSlow = call @runtime_allocate_slow_path(... /* pass size and/or other data */)
            br label %Continue
        Continue:
            %object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath]

In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it's not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don't like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC's data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would !
be easy to convey using TBAA.

Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types,

Out of curiosity, why is this tied to TBAA types, rather than variables or something else?
IE what analysis are you performing that gives you type answers instead of answers in terms of local/global variables?

TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t for reasoning about user-facing types, but for creating what you could alternatively think of as an “abstract heaps”. TBAA is perfect for this.

No. A nested hierarchy that describes conflicts among possible
abstract heap locations is perfect for this. TBAA is not quite the
same. See below.

For example I might know that a certain function clobbers a field called ‘foo’. In my frontend I create an "abstract heap" for each field name, and this gets lowered to LLVM TBAA.

So let's start simple.

You are representing an abstract set of heap locations. Are they
actually in any way related to the *existing type tree* that is being
created for TBAA? Or are you adding new values to it that are not
really part of a type hierarchy at all, but represent some other
relation among abstract heaps?

From your description (you create new things that represent abstract

heap locations and lower that into a nested tree), it sounds like the
answer is "you are adding new locations that represent a different
tree than the existing TBAA".

If that is not true, what is the actual relation to the existing information?
TBAA in the LLVM compiler, at least as currently produced, while
theoretically compatible with this scheme , is not quite the same.
The langref says "metadata is added to the IR to describe a type
system of a higher level language."

If you are adding a new tree just to represent new abstract heap
location info, you are essentially reusing it simply because it
represents a nested set structure. I don't see a lot of value in
that sort of conflation, hence the question.

If not, can you describe in more detail how you are generating this info?
Because that is what i get from your description.

The information you are talking about here is traditional mod/ref info that is completely independent of TBAA.

You lost me. I’m writing a compiler that generates LLVM IR. I have high-level knowledge that allows me to prove the set of abstract heap locations that may be read or written by any call, load, or store.

What do you suggest as a mechanism by which I can communicate that information to LLVM?

I'm suggesting you create an attribute that deals with abstract heap
locations that you are describing, instead of overloading and tying it
into an existing set of information that has a different purpose
(TBAA, which at least in the current world, represents a nested type
tree that describes access rules of a language).

Are you saying that such a mechanism already exists and that I should use that instead?

As far as I can tell, TBAA is the cleanest such mechanism that I have found thus far and its only shortcoming is that I cannot use it to annotate calls.

The fact that some other mechanism is kinda-sorta-like-what-you-want
does not necessarily mean it should be tied into it :slight_smile:
Hence, I am trying to understand the actual relation between the TBAA
tree as it exists now, and what you are suggesting creating.
For example, i could perform pointer analysis in clang and generate a
TBAA tree from that. It would work. It would be easy. It doesn't
mean it is necessarily a great approach.

The more specific the info you can get from the metadata, the better off you are eliminating loads/stores

Otherwise, yes, it's certainly incredibly helpful to the other optimizations.

However, unless there is a good reason, i'd suggest you just make it generic call read/write metadata that talks about values (be they incoming arguments, alloca'd memory, or whatever else).

You lost me. How does this help a high-level compiler convey to LLVM that a particular call and a particular load or store don’t have any dependency?

You add the modref info to the calls. You add attributes to the the
loads/stores. It becomes another disambiguation method to tell
whether the load/store and the call have a dependency.

My proposal achieves this, albeit using a coarse-grained at a coarse-grained level: you have to express your knowledge of dependencies using a hierarchy of abstract sets of heap locations (i.e. exactly what TBAA gives you).

TBAA does not actually give you that. It gives you a tree that
describes a type hierarchy according to the memory access rules of
some language. This is transformable into sets of possible heap
locations, but it is not actually the same.
I get why you may want to reuse it.
The structure is close to what you will want. The attributes already
exist on load/stores. However, much like the tbaa.struct info, unless
this actually represents the same tree that TBAA does, it should
probably not be part of the same tree.

I think we’re talking cross-purposes at this point. Let me try to step back and concisely restate what I’m trying to do:

  • I am writing a frontend for a high-level language and I emit LLVM IR.
  • I control the TBAA tree. I don’t use it to express the high-level language’s types, since that would make absolutely no sense. JavaScript doesn’t have types. But by the time my compiler is generating LLVM IR, it will have come up with meaningful constraints on aliasing and those constraints are naturally expressed using TBAA.
  • Some calls that I emit in LLVM IR are neither readnone/readonly nor do they clobber the world - instead they clobber a known set of memory locations, and that set of memory locations is already expressible in the TBAA representation that I control.

Which part of the above do you disagree with?

Hi all,

TBAA on loads and stores is super useful for conveying high-level type knowledge to LLVM transformations. But often translating a high-level language into LLVM IR will result in runtime calls that are known at the high level to only affect (either by reading or by writing) some restricted subset of the heap. These aren’t read-only calls; that is often too strong of a guarantee. A great example would be a GC barrier or GC allocation that may arise if LLVM is used as a backend for a compiler for a higher-level language. There are probably other examples that arise in high-level languages, but let’s use allocation as an example for now. I can envision two ways of representing allocation in LLVM IR:

Method #1:

%object = call @runtime_allocate(… /* pass size and/or other data */)

Method #2:

SomeBlock:
… ; code preceding the allocation
%objectFast = … ; IR for the inlined allocation fast path, which produces %objectFast but may yield null if the fast path fails and we need to take slow path
%didFail = icmp eq %objectFast, null
br %didFail, label %AllocationSlowPath, label &Continue
AllocationSlowPath:
%objectSlow = call @runtime_allocate_slow_path(… /* pass size and/or other data */)
br label %Continue
Continue:
%object = phi [%objectFast, SomeBlock], [%objectSlow, AllocationSlowPath]

In both of these cases, the call instruction will appear to clobber the world leading to more pessimistic optimization. Moreover, it’s not always correct to mark the allocation as being readonly; it may read or write observable state. But if you don’t like the allocation example then you can imagine that a GC store barrier would have an almost identical pattern to #2 and it would definitely not be readonly. Anyway, such a pattern would lead to fewer loads being eliminated or hoisted, fewer stores being eliminated or sunk, etc - all due to those pesky calls. But the high-level code generator that produces this IR will know for sure that @runtime_allocate or @runtime_allocate_slow_path can only affect a very narrow subset of the heap: namely, it will only read and write the GC’s data structures. So most of the loads and stores surrounding the allocation, that arose from source-level loads and stores, would definitely not have any interference with the call and this would !
be easy to convey using TBAA.

Hence if we could decorate the calls to these runtime functions as only reading or writing certain TBAA types,

Out of curiosity, why is this tied to TBAA types, rather than variables or something else?
IE what analysis are you performing that gives you type answers instead of answers in terms of local/global variables?

TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t for reasoning about user-facing types, but for creating what you could alternatively think of as an “abstract heaps”. TBAA is perfect for this.

No. A nested hierarchy that describes conflicts among possible
abstract heap locations is perfect for this. TBAA is not quite the
same. See below.

For example I might know that a certain function clobbers a field called ‘foo’. In my frontend I create an “abstract heap” for each field name, and this gets lowered to LLVM TBAA.

So let’s start simple.

You are representing an abstract set of heap locations. Are they
actually in any way related to the existing type tree that is being
created for TBAA?

Yes. Because I am also creating the TBAA type tree. I control it entirely.

Or are you adding new values to it that are not
really part of a type hierarchy at all, but represent some other
relation among abstract heaps?
From your description (you create new things that represent abstract
heap locations and lower that into a nested tree), it sounds like the
answer is "you are adding new locations that represent a different
tree than the existing TBAA”.

What do you mean by “the existing TBAA”? To my understanding, LLVM itself never creates TBAA nodes; they arise entirely out of the frontend that emits LLVM IR. I control the frontend in this case.

If that is not true, what is the actual relation to the existing information?
TBAA in the LLVM compiler, at least as currently produced, while
theoretically compatible with this scheme , is not quite the same.
The langref says "metadata is added to the IR to describe a type
system of a higher level language.”

My understanding is that LLVM doesn’t produce TBAA. It consumes TBAA.

Hence it’s more meaningful to reason about TBAA in terms of its semantics rather than hypothesizing about how and why someone would produce it. We agree that TBAA is a nested set structure that describes heap locations. I’m observing that it would be useful to express bounds on what heap locations that calls read or write. Therefore, TBAA can be used to convey those sets.

If you are adding a new tree just to represent new abstract heap
location info, you are essentially reusing it simply because it
represents a nested set structure. I don’t see a lot of value in
that sort of conflation, hence the question.

If we agree that it would be good to have a system for bounding what a call reads or writes using nested sets, then we have two choices:

  • Reuse TBAA, which can already be used to express nested sets of memory locations.

  • Invent a whole new type system that is structurally identical to TBAA but that goes by a different name.

Are you seriously proposing the latter?

If not, can you describe in more detail how you are generating this info?
Because that is what i get from your description.

The information you are talking about here is traditional mod/ref info that is completely independent of TBAA.

You lost me. I’m writing a compiler that generates LLVM IR. I have high-level knowledge that allows me to prove the set of abstract heap locations that may be read or written by any call, load, or store.

What do you suggest as a mechanism by which I can communicate that information to LLVM?

I’m suggesting you create an attribute that deals with abstract heap
locations that you are describing, instead of overloading and tying it
into an existing set of information that has a different purpose
(TBAA, which at least in the current world, represents a nested type
tree that describes access rules of a language).

So, you’re suggesting inventing a new representation for type hierarchies, that would have the following properties:

  • The same rules as TBAA for load/store.

  • The same rules as what I propose for call.

  • Identical to TBAA in all other respects, except that it would have a different name.

Am I understanding your proposal correctly?

Are you saying that such a mechanism already exists and that I should use that instead?

As far as I can tell, TBAA is the cleanest such mechanism that I have found thus far and its only shortcoming is that I cannot use it to annotate calls.

The fact that some other mechanism is kinda-sorta-like-what-you-want
does not necessarily mean it should be tied into it :slight_smile:

Hence, I am trying to understand the actual relation between the TBAA
tree as it exists now

What TBAA tree? Are you referring to the TBAA format as described in http://llvm.org/docs/LangRef.html#tbaa-metadata, or some specific TBAA tree that exists for some specific frontend for some specific language?

, and what you are suggesting creating.
For example, i could perform pointer analysis in clang and generate a
TBAA tree from that. It would work. It would be easy. It doesn’t
mean it is necessarily a great approach.

Would your pointer analysis generate either disjoint sets, or nested sets? For example, if you did a Steensgaard-style pointer analysis then TBAA would be the most ideal way of representing the results of such an analysis.

One way to view my front end’s aliasing knowledge is that I’m doing something like a unification-based analysis. I actually end up with a three-level TBAA hierarchy (the world, with subtypes for my disjoint sets of heap locations; some of those sets then have further subtypes).

The more specific the info you can get from the metadata, the better off you are eliminating loads/stores

Otherwise, yes, it’s certainly incredibly helpful to the other optimizations.

However, unless there is a good reason, i’d suggest you just make it generic call read/write metadata that talks about values (be they incoming arguments, alloca’d memory, or whatever else).

You lost me. How does this help a high-level compiler convey to LLVM that a particular call and a particular load or store don’t have any dependency?

You add the modref info to the calls. You add attributes to the the
loads/stores. It becomes another disambiguation method to tell
whether the load/store and the call have a dependency.

Are you suggesting that I override AliasAnalysis?

I’m trying to make it possible for a frontend to convey this information using LLVM IR, rather than having to override internal LLVM classes in order to provide this information. Hence there is a question of what format to use. I’m arguing that TBAA is that format.

My proposal achieves this, albeit using a coarse-grained at a coarse-grained level: you have to express your knowledge of dependencies using a hierarchy of abstract sets of heap locations (i.e. exactly what TBAA gives you).

TBAA does not actually give you that. It gives you a tree that
describes a type hierarchy according to the memory access rules of
some language.

I disagree.

It’s true that this is one use-case for TBAA, but that’s not how I use it. I don’t believe that I’m under any obligation to use it that way. TBAA is just a format that LLVM consumes, that can be used by a producer of LLVM IR to convey aliasing information.

I think we’re talking cross-purposes at this point. Let me try to step
back and concisely restate what I’m trying to do:

- I am writing a frontend for a high-level language and I emit LLVM IR.
- I control the TBAA tree. I don’t use it to express the high-level
language’s types, since that would make absolutely no sense. JavaScript
doesn’t have types. But by the time my compiler is generating LLVM IR, it
will have come up with meaningful constraints on aliasing and those
constraints are naturally expressed using TBAA.
- Some calls that I emit in LLVM IR are neither readnone/readonly nor do
they clobber the world - instead they clobber a known set of memory
locations, and that set of memory locations is already expressible in the
TBAA representation that I control.

Which part of the above do you disagree with?

Nothing.
I disagree that just because you control the TBAA tree generation, that you
should shoehorn something that is neither what the langref claims it should
be, or what other producers produce, just because the datastructure may
kinda-sorta work for your purpose.

Hi all,

TBAA on loads and stores is super useful for conveying high-level type
knowledge to LLVM transformations. But often translating a high-level
language into LLVM IR will result in runtime calls that are known at the
high level to only affect (either by reading or by writing) some restricted
subset of the heap. These aren't read-only calls; that is often too strong
of a guarantee. A great example would be a GC barrier or GC allocation
that may arise if LLVM is used as a backend for a compiler for a
higher-level language. There are probably other examples that arise in
high-level languages, but let's use allocation as an example for now. I
can envision two ways of representing allocation in LLVM IR:

Method #1:

           %object = call @runtime_allocate(... /* pass size and/or other
data */)

Method #2:

       SomeBlock:
           ... ; code preceding the allocation
           %objectFast = ... ; IR for the inlined allocation fast path,
which produces %objectFast but may yield null if the fast path fails and we
need to take slow path
           %didFail = icmp eq %objectFast, null
           br %didFail, label %AllocationSlowPath, label &Continue
       AllocationSlowPath:
           %objectSlow = call @runtime_allocate_slow_path(... /* pass size
and/or other data */)
           br label %Continue
       Continue:
           %object = phi [%objectFast, SomeBlock], [%objectSlow,
AllocationSlowPath]

In both of these cases, the call instruction will appear to clobber the
world leading to more pessimistic optimization. Moreover, it's not always
correct to mark the allocation as being readonly; it may read or write
observable state. But if you don't like the allocation example then you
can imagine that a GC store barrier would have an almost identical pattern
to #2 and it would definitely not be readonly. Anyway, such a pattern
would lead to fewer loads being eliminated or hoisted, fewer stores being
eliminated or sunk, etc - all due to those pesky calls. But the high-level
code generator that produces this IR will know for sure that
@runtime_allocate or @runtime_allocate_slow_path can only affect a very
narrow subset of the heap: namely, it will only read and write the GC's
data structures. So most of the loads and stores surrounding the
allocation, that arose from source-level loads and stores, would definitely
not have any interference with the call and this would !
be easy to convey using TBAA.

Hence if we could decorate the calls to these runtime functions as only
reading or writing certain TBAA types,

Out of curiosity, why is this tied to TBAA types, rather than variables or
something else?
IE what analysis are you performing that gives you type answers instead of
answers in terms of local/global variables?

TBAA’s utility for non-C languages (like JavaScript, in my case) isn’t for
reasoning about user-facing types, but for creating what you could
alternatively think of as an “abstract heaps”. TBAA is perfect for this.

No. A nested hierarchy that describes conflicts among possible
abstract heap locations is perfect for this. TBAA is not quite the
same. See below.

For example I might know that a certain function clobbers a field called
‘foo’. In my frontend I create an "abstract heap" for each field name, and
this gets lowered to LLVM TBAA.

So let's start simple.

You are representing an abstract set of heap locations. Are they
actually in any way related to the *existing type tree* that is being
created for TBAA?

Yes. Because I am also creating the TBAA type tree. I control it
entirely.

So then the answer is really "no", in the sense that what you want to
produce is unlike any of the current producers or what is actually
described :slight_smile:

Or are you adding new values to it that are not
really part of a type hierarchy at all, but represent some other
relation among abstract heaps?
From your description (you create new things that represent abstract
heap locations and lower that into a nested tree), it sounds like the
answer is "you are adding new locations that represent a different
tree than the existing TBAA”.

What do you mean by “the existing TBAA”? To my understanding, LLVM itself
never creates TBAA nodes; they arise entirely out of the frontend that
emits LLVM IR. I control the frontend in this case.

If that is not true, what is the actual relation to the existing
information?
TBAA in the LLVM compiler, at least as currently produced, while
theoretically compatible with this scheme , is not quite the same.
The langref says "metadata is added to the IR to describe a type
system of a higher level language.”

My understanding is that LLVM doesn’t produce TBAA. It consumes TBAA.

True.

Hence it’s more meaningful to reason about TBAA in terms of its semantics
rather than hypothesizing about how and why someone would produce it.

That would be great, but it's not what the langref says, nor does it match
up with the name of the thing you are creating, nor does it necessarily
have optimal semantics, nor would it be great for future producers or the
ability to do other analyses in those producers.

We agree that TBAA is a nested set structure that describes heap
locations.

Though note that it is bidirectional which is actually probably non-optimal
for describing heap locations that are unrelated to types.

I’m observing that it would be useful to express bounds on what heap
locations that calls read or write. Therefore, TBAA can be used to convey
those sets.

I do not disagree that it can, i disagree that it *should*.

If you are adding a new tree just to represent new abstract heap
location info, you are essentially reusing it simply because it
represents a nested set structure. I don't see a lot of value in
that sort of conflation, hence the question.

If we agree that it would be good to have a system for bounding what a
call reads or writes using nested sets, then we have two choices:

- Reuse TBAA, which can already be used to express nested sets of memory
locations.

Depending upon the properties of those sets, yes, I agree.

- Invent a whole new type system that is structurally identical to TBAA
but that goes by a different name.

Why would it necessarily be structurally identical?
TBAA is a good approximation for your particular analysis, but it's nested
set representation will not necessarily be optimal in the end for what else
people produce for call based info
By shoving one into the other, you force this representation on everyone,
essentially until someone else is going to come along and untangle them
again.
You also force all producers that want to represent the output of some kind
of analyses around calls, *and* actual type based alias info, to combine
both into a single tree.

Are you seriously proposing the latter?

Yes, I am seriously proposing that you do not shoehorn something

completely different than what the langref describes, and the code expects,
into an existing structure.

If you want to change the langref, all the tests, the name, etc, first, I
would object, but that would certainly be less ugly to do that in addition
to your proposal.

If not, can you describe in more detail how you are generating this info?
Because that is what i get from your description.

The information you are talking about here is traditional mod/ref info
that is completely independent of TBAA.

You lost me. I’m writing a compiler that generates LLVM IR. I have
high-level knowledge that allows me to prove the set of abstract heap
locations that may be read or written by any call, load, or store.

What do you suggest as a mechanism by which I can communicate that
information to LLVM?

I'm suggesting you create an attribute that deals with abstract heap
locations that you are describing, instead of overloading and tying it
into an existing set of information that has a different purpose
(TBAA, which at least in the current world, represents a nested type
tree that describes access rules of a language).

So, you’re suggesting inventing a new representation for type hierarchies,

I assume you mean abstract heap locations.

that would have the following properties:

- The same rules as TBAA for load/store.

For starters, yes.
Note that the bidirectional (both up and down the tree) check does TBAA is
not necessarily needed, depending on how you formulate it.

- The same rules as what I propose for call.

- Identical to TBAA in all other respects, except that it would have a
different name.

Am I understanding your proposal correctly?

You are understanding where I suggest you start, yes. I expect it will
slowly grow to be quite different than the existing TBAA tree.

Are you saying that such a mechanism already exists and that I should use
that instead?

As far as I can tell, TBAA is the cleanest such mechanism that I have
found thus far and its only shortcoming is that I cannot use it to annotate
calls.

The fact that some other mechanism is kinda-sorta-like-what-you-want
does not necessarily mean it should be tied into it :slight_smile:

Hence, I am trying to understand the actual relation between the TBAA
tree as it exists now

What TBAA tree?

The one produced by the only current producers.
The one whose purpose is described by the langref, and even the comments in
various places

Are you referring to the TBAA format as described in
http://llvm.org/docs/LangRef.html#tbaa-metadata, or some specific TBAA
tree that exists for some specific frontend for some specific language?

Let's say "both".
Since the TBAA tree produced by other producers all fall into the category
of "actual high level type hierarchy info", as the langref describes.

, and what you are suggesting creating.
For example, i could perform pointer analysis in clang and generate a
TBAA tree from that. It would work. It would be easy. It doesn't
mean it is necessarily a great approach.

Would your pointer analysis generate either disjoint sets, or nested sets?

This is entirely possible, for example, all the andersens's will end up
directional.
All of context-sensitive will end up another way.
Is your suggestion that they then drop all this info because it can't be
shoehorned into the existing TBAA format?
And that someone else later disentangle this from the actual *type based
info* being produced by other frontends?
What if someone wants to produce both kinds of info?
Right now your description is essentially the output of an analysis that
can be run just as well in clang on C/C++

For producers that *also* want to convey type based alias info and abstract
heap location info for calls, it now requires combining multiple analysis
outputs into a single tree called, confusingly, a "TBAA tree".

Can you explain what, other than saving you the same amount of work that
was recently done for struct-tbaa, your proposal buys?

For example, if you did a Steensgaard-style pointer analysis then TBAA
*would* be the most ideal way of representing the results of such an
analysis.

Actually, no.
For a consumer, if it's static, the ideal representation for a *TBAA tree*
would be a simple list of nodes and dfs in/out numbers attached to those
nodes, produced from walking the tree as it exists in your frontend. There
is no reason to explicitly represent the actual child/parent relationships
among the nodes.

You can then check ancestry by doing what we do elsewhere (for example,
dominates(a, b) checking), and use the DFS numbers to check ancestry.
There is no need to actually represent the tree, or walk it, as LLVM does
now. This method will give you constant time and less space usage than
what you have now.

If the TBAA tree does not really represent anything in particular, giving
it explicit structure when it's structure means nothing is entirely
pointless.

In the case of steensgaard in particular, if you are just asking aliases(x,
y) questions, you actually don't need the points-to relationships as
represented by ECR's, only something much simpler - whether two things are
in the same disjoint set. You don't need the union find structure to do
this once you've finished the algorithm, you could just number the
equivalence classes and go from there.

Of course, more complex queries about the relationships between the ECR's
themselves (IE the points-to graph), you'd need parent-child relationships.

But this is all besides the point, except to point out it's not actually
optimal for what you are suggesting representing :slight_smile:

One way to view my front end’s aliasing knowledge is that I’m doing
something like a unification-based analysis. I actually end up with a
three-level TBAA hierarchy (the world, with subtypes for my disjoint sets
of heap locations; some of those sets then have further subtypes).

Yes.

The more specific the info you can get from the metadata, the better off
you are eliminating loads/stores

Otherwise, yes, it's certainly incredibly helpful to the other
optimizations.

However, unless there is a good reason, i'd suggest you just make it
generic call read/write metadata that talks about values (be they incoming
arguments, alloca'd memory, or whatever else).

You lost me. How does this help a high-level compiler convey to LLVM that
a particular call and a particular load or store don’t have any dependency?

You add the modref info to the calls. You add attributes to the the
loads/stores. It becomes another disambiguation method to tell
whether the load/store and the call have a dependency.

Are you suggesting that I override AliasAnalysis?

I’m trying to make it possible for a frontend to convey this information
using LLVM IR, rather than having to override internal LLVM classes in
order to provide this information.

Can you explain what the benefit of doing so is, past some amount of work
( that was even done recently in a similar contextwithout much fanfare in a
not-horrible amount-of-time)?
I think i've described the downsides adequately:

It's not an ideal representation depending on the analysis
For producers that *also* want to convey type based alias info in addition
to this type of info, it requires combining multiple analysis outputs into
a single tree. This gets messy, quickly.
It overloads something with a pretty well-defined meaning right now.
It's not actually optimal as a representation for the output of most kinds
of analysis
It confuses a certain type of mod/ref information with actual type-based
aliasing info.

Hence there is a question of what format to use. I’m arguing that TBAA is
that format.

My proposal achieves this, albeit using a coarse-grained at a
coarse-grained level: you have to express your knowledge of dependencies
using a hierarchy of abstract sets of heap locations (i.e. exactly what
TBAA gives you).

TBAA does not actually give you that. It gives you a tree that
describes a type hierarchy according to the memory access rules of
some language.

I disagree.

It’s true that this is one use-case for TBAA, but that’s not how I use it.
I don’t believe that I’m under any obligation to use it that way.

Then feel free to update langref, the tests, all the comments, and the
name.
I would strongly suggest you not choose this path. It sucks for the future
for all the reasons I mentioned.

TBAA is just a format that LLVM consumes, that can be used by a producer
of LLVM IR to convey aliasing information.

Again, I will repeat: Just because something can be done, doesn't mean it
should.

TBAA MDNodes should describe the type hierarchy (with struct-path aware TBAA, it is a type DAG).
It sounds okay to me to add !tbaa.call to function calls to describe which fields of the type system the call is touching.
One example can be !tbaa.call {read a list of tag nodes, write a list of tag nodes}.

Thanks,
Manman

TBAA MDNodes should describe the type hierarchy (with struct-path aware
TBAA, it is a type DAG).
It sounds okay to me to add !tbaa.call to function calls to describe which
fields of the type system the call is touching.

I agree wholeheartedly :). However, that's not what is being suggested.
He's suggesting using the tbaa tree to represent the output of an analysis,
not a type hierarchy, because it's convenient and can represent the output
okay for his language.
My suggestion was that if he wants to do that, to add a different attribute
that works much the same way.

One example can be !tbaa.call {read a list of tag nodes, write a list of

TBAA MDNodes should describe the type hierarchy (with struct-path aware TBAA, it is a type DAG).
It sounds okay to me to add !tbaa.call to function calls to describe which fields of the type system the call is touching.

I agree wholeheartedly :).

OK, great.

However, that’s not what is being suggested.
He’s suggesting using the tbaa tree to represent the output of an analysis, not a type hierarchy, because it’s convenient and can represent the output okay for his language.

WebKit will continue to use TBAA this way.

My suggestion was that if he wants to do that, to add a different attribute that works much the same way.

Attributes should be defined in terms of their semantics in LLVM, not by their philosophical intent. If a static analysis produces results whose semantics can be mapped soundly onto a type system that involves a type hierarchy, then this is obviously the right thing to do.

-Filip

Hey Daniel,

Can you be more specific about your concerns here? It’s true that we describe the TBAA nodes in terms of expression C-like type constraints, but the intention of the design has always been for it to be more general.

Specifically, partitioning the heap for use-cases like what Phil is doing with Javascript was factored into the original design. We have even talked about adding type tags to represent C++ vtables (for example) since language loads and stores can’t touch them (not even through char*).

The datastructures and algorithms we have are powerful enough to express these sorts of things, and so long as a frontend abided by the rules, there shouldn’t be a problem.

-Chris

My concerns are simply that whether designed this way or not, it ends up
fairly inconsistent.

For example, what would the plan be when a frontend does something like
clang does now for C/C++ (generate type based TBAA), and also wants to do
something like Filip is suggesting (which is also doable on C/C++ with
simple frontend analysis)?

Generate a split tree of some sort, one side of which represents TBAA info,
and the other side which represents partitioned abstract heaps?[1]
It seems like that would be awfully confusing.

However, it would now be necessary in order to use the new
tbaa.read/tbaa.write metadata, since they will only reference tbaa tags.
But they only make a lot of sense on tbaa tags that point to partitioned
heaps like filip's, so if you did want to actually to make use of them, you
now have to put both the type info and the heap info in the same tree.

You also run into issues with the existing metadata on loads/stores in this
situation. It's again, entirely possible for a load to conflict with both a
tbaa type, and a partitioned heap. In Filip's scheme, there is no way to
represent this.

Because of this, the only thing I essentially asked Filip to do was not
place it in the same exact tree as we are currently putting type info into.

Then your heap.read/heap.write metadata works with the heap tree (and you
annotate loads/stores with your heap attributes), and your tbaa attributes
work on the tbaa tree. You can tag a load/store with both a heap tag and a
tbaa tag, and disambiguate based on both of them.

Now, if the consensus is this is still a good idea, great. My suggestion
would then be to update langref, rename the attributes, etc, to make this
all more clear.

--Dan
[1] The other option of trying to generate some fake set of heaps that
accurately represent the conflicts in both is, well, difficult :slight_smile:

Are you worried about clang doing this, or are you worried about WebKit doing this? Or are you worried about some other front end doing it?

WebKit won’t do it because we only have abstract heaps. We have no notion of types that originate from the source language.

I’ve also considered - as a thought experiment - front ends for other languages. For example, in Java you would probably use TBAA just to express the space of fields (I.e. Field name at Class name at ClassLoader).

Broadly I believe that using TBAA to literally express the types of the source language is more of a C-ism than a general use case. I think that higher level type safe languages have a more-or-less obvious mapping to abstract heaps, and they tend to be mostly just disjoint sets.

This mapping usually doesn’t involve describing the source language’ style hierarchy, as much as it involves describing the proofs about aliasing that arise from that language’s type system (and whatever analyses the frontend is able to perform).

TBAA’s ability to express a hierarchy, as opposed to just disjoint sets, is kind of an escape mechanism for the cases where disjoint sets are too constraining.

Let’s take the Java example, since that’s sort of a great example of a practical sound type system. Heck, JS VMs try to at best infer types that are Java-esque.

I’ve given an example above of abstract heaps that I would construct using TBAA. In your split tree world, what would the other TBAA data be? Why would you want to use TBAA for anything other than field names?

(And yes I understand you’d use TBAA differently for array element types - but those would be disjoint to field types anyway. And having a hierarchy there is slightly useful.)

I think it would be useful to get an example of what you’re worried about and how it would manifest in IR and attributes generated from some concrete frontend.

There are two possible answers here, depending on the constraints:

  1. The frontend author could unify them into one grand tree, like struct field TBAA does.
  2. The frontend author could model them as two separate TBAA trees, which the TBAA machinery in LLVM handles conservatively.

The conservative handling of different TBAA trees is critical, because you can LTO (for example) Javascript into C++ code in principle, each using their own TBAA structure. LLVM is already well set for this, and it isn’t an accident :slight_smile:

I’m not sure what you mean. The compiler will handle this conservatively. If you have two different schemas existing in the same application (e.g. due to LTO or due to a language implementing two different non-unified models for some weird reason) then the compiler just doesn’t draw any aliasing implications from references using two different schemas.

It is possible in principle to allow a load (for example) to have an arbitrary number of TBAA tags on it, which would solve this (allowing a single load to participate in multiple non-overlapping schemas) but I don’t think it is worth the complexity at all.

-Chris

JSC is able to make better use of the original TBAA design by encoding the access path within the type itself. Roughly:

JSCType->Class->Field/Index

This is what we’re doing with C++ struct-path TBAA now, but simpler because only one access path is legal.

JSC does need a hierarchical representation which makes TBAA a perfect fit.

The only decision point to make now is whether to allow current TBAA to apply to calls. That seems entirely consistent with LLVM’s approach to memory disambiguation.

I do think this discussion is interesting and useful with respect to adding different styles of alias analysis in the future, but we are way beyond the scope of the RFC.

-Andy

The datastructures and algorithms we have are powerful enough to express

these sorts of things, and so long as a frontend abided by the rules, there
shouldn't be a problem.

My concerns are simply that whether designed this way or not, it ends up
fairly inconsistent.

For example, what would the plan be when a frontend does something like
clang does now for C/C++ (generate type based TBAA), and also wants to do
something like Filip is suggesting (which is also doable on C/C++ with
simple frontend analysis)?

There are two possible answers here, depending on the constraints:

1. The frontend author could unify them into one grand tree, like struct
field TBAA does.

I would be impressed if you could unify the output of something like
andersens, and something like the current nested TBAA structure, without
massive loss of precision.

2. The frontend author could model them as two separate TBAA trees, which
the TBAA machinery in LLVM handles conservatively.

The conservative handling of different TBAA trees is critical, because you
can LTO (for example) Javascript into C++ code in principle, each using
their own TBAA structure. LLVM is already well set for this, and it isn't
an accident :slight_smile:

You also run into issues with the existing metadata on loads/stores in
this situation. It's again, entirely possible for a load to conflict with
both a tbaa type, and a partitioned heap. In Filip's scheme, there is no
way to represent this.

I'm not sure what you mean.

I mean it's not possible, in this scheme, to specify both to be used for
disambiguation.

Right now, you get *one* tbaa tag on the load.
So you have to choose which tree you use if you want if you choose option
#2 above.

You always have to choose one or the other, or else the lose the
disambiguation.

The compiler will handle this conservatively.

Only if you drop the tags/disambiguation capability from most things, or
i'm seriously confused.

let's take the following not-quite llvm-ir, but close enough.

!tbaa tree

0 -> everything

1 (parent 0) -> int
2 (parent 0 -> heap a

load <whatever> , !tbaa 1

call !tbaa.read 2

Note: You can't specify tbaa.read 2, 1 (at least as proposed).
the current machinery will say the call and the load never conflict.

It would seem you have to either make heap tags also children of
appropriate type tags, or only ever use one type of tree
What design for a split tree would work here?

I also can't think of a combined tree that would work without massive
precision loss.

If you have two different schemas existing in the same application (e.g.
due to LTO or due to a language implementing two different non-unified
models for some weird reason) then the compiler just doesn't draw any
aliasing implications from references using two different schemas.

Which is of course, bad.

You should in fact, be able to draw implications from both. If the design
essentially only allows one to draw from one, that seems like a pretty big
flaw.

It is possible in principle to allow a load (for example) to have an

arbitrary number of TBAA tags on it, which would solve this (allowing a
single load to participate in multiple non-overlapping schemas) but I don't
think it is worth the complexity at all.

Okay then, we'll agree to disagree. :slight_smile:

This actually happens a lot of the time. Restricting llvm to essentially
choosing one tree per load to use for disambiguation, as this scheme will
do, when it's not much more work to do better, seems shortsighted to me.

Hence it’s more meaningful to reason about TBAA in terms of its
semantics rather than hypothesizing about how and why someone would produce
it.

That would be great, but it's not what the langref says, nor does it
match up with the name of the thing you are creating, nor does it
necessarily have optimal semantics, nor would it be great for future
producers or the ability to do other analyses in those producers.

Hey Daniel,

Can you be more specific about your concerns here? It's true that we
describe the TBAA nodes in terms of expression C-like type constraints, but
the intention of the design has always been for it to be more general.

Specifically, partitioning the heap for use-cases like what Phil is doing
with Javascript was factored into the original design. We have even talked
about adding type tags to represent C++ vtables (for example) since
language loads and stores can't touch them (not even through char*).

The datastructures and algorithms we have are powerful enough to express
these sorts of things, and so long as a frontend abided by the rules, there
shouldn't be a problem.

My concerns are simply that whether designed this way or not, it ends up
fairly inconsistent.

For example, what would the plan be when a frontend does something like
clang does now for C/C++ (generate type based TBAA), and also wants to do
something like Filip is suggesting (which is also doable on C/C++ with
simple frontend analysis)?

Generate a split tree of some sort, one side of which represents TBAA
info, and the other side which represents partitioned abstract heaps?[1]
It seems like that would be awfully confusing.

However, it would now be necessary in order to use the new
tbaa.read/tbaa.write metadata, since they will only reference tbaa tags.
But they only make a lot of sense on tbaa tags that point to partitioned
heaps like filip's, so if you did want to actually to make use of them, you
now have to put both the type info and the heap info in the same tree.

You also run into issues with the existing metadata on loads/stores in
this situation. It's again, entirely possible for a load to conflict with
both a tbaa type, and a partitioned heap. In Filip's scheme, there is no
way to represent this.

Because of this, the only thing I essentially asked Filip to do was not
place it in the same exact tree as we are currently putting type info into.

Then your heap.read/heap.write metadata works with the heap tree (and you
annotate loads/stores with your heap attributes), and your tbaa attributes
work on the tbaa tree. You can tag a load/store with both a heap tag and a
tbaa tag, and disambiguate based on both of them.

Now, if the consensus is this is still a good idea, great. My suggestion
would then be to update langref, rename the attributes, etc, to make this
all more clear.

--Dan
[1] The other option of trying to generate some fake set of heaps that
accurately represent the conflicts in both is, well, difficult :slight_smile:
_______________________________________________

JSC is able to make better use of the original TBAA design by encoding the
access path within the type itself. Roughly:

JSCType->Class->Field/Index

This is what we're doing with C++ struct-path TBAA now, but simpler
because only one access path is legal.

JSC does need a hierarchical representation which makes TBAA a perfect fit.

The only decision point to make now is whether to allow current TBAA to
apply to calls. That seems entirely consistent with LLVM's approach to
memory disambiguation.

I do think this discussion is interesting and useful with respect to
adding different styles of alias analysis in the future, but we are way
beyond the scope of the RFC.

I again, respectfully disagree. What I am suggesting is not a massive
change. He already has to add attributes to various instructions. Renaming
those adds no additional work.
The only additional work is adding a new tag for loads/stores. Work was
just recently completed to do the same, and it was, AFAIK, not a huge
undertaking.

BTW, in any case, can we at least agree that updating the langref and comments all over the codebase that specify the tbaa tree is tied strictly to type hierarchy rules should be in scope to be updated to talk about an abstract heap hiearchy (with traditional type based as an example) instead?

BTW, in any case, can we at least agree that updating the langref and comments all over the codebase that specify the tbaa tree is tied strictly to type hierarchy rules should be in scope to be updated to talk about an abstract heap hiearchy (with traditional type based as an example) instead?

Of course. But that concern applies to WebKit’s existing use of TBAA, not what’s being proposed now. Would you like to file a PR to track it?

I think I follow and understand your points in this thread. I just don’t understand the implication.

Are you opposed to supporting the !tbaa tag on calls? On what grounds?

Are you opposed to WebKit’s current use of TBAA to model an abstract hierarchy of types? This is only shortsighted if their decision limits future possibilities. WebKit can use LLVM TBAA as long as it’s a good fit, following the rules spelled out in LangRef, with no burden on LLVM. When the time comes that it fails to sufficiently disambiguate memory references, it will be just as easy to introduce another metadata tag later. They have no bytecode compatibility requirements. If another frontend wants to support two domains of memory disambiguation, they can just as easily add a new tag. Your suggestion of separating the domains makes sense, but we can’t introduce a new feature without a client.

-Andy

This.

It’s a C-ism, and you’d only have a TBAA tree for it if you lacked any other information.

Hence if you had abstract heaps (such as ones you get from field names in Java-esque type systems, or tuple entry IDs in ML-esque type systems, etc) you wouldn’t be sweating about reconciling that with a tree that talked about types.

(As an aside, I’m sort of bothered by the implication that abstract heaps aren’t types. A type is nothing more than a set of values. For a pointer that means an abstract set of heap locations. So, my use of TBAA corresponds exactly to “types” in some language.)

I agree that TBAA would be more powerful if you could say something like:

load blah !tbaa.list !42

42 = (!tbaa !1, !tbaa !2, …

But it would probably also be more expensive to process such richer information, particularly if you had it on each load. I’d be inclined to bet that the current TBAA, even with its limitations, is a sensible compromise.

OTOH, being able to list types that are affected by a call would be great.

If someone hacked up a !tbaa.list, I would definitely try it out!