RFC: Add "operand bundles" to calls and invokes

We'd like to propose a scheme to attach "operand bundles" to call and
invoke instructions. This is based on the offline discussion
mentioned in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html.

# Motivation & Definition

Our motivation behind this is to track the state required for
deoptimization (described briefly later) through the LLVM pipeline as
a first-class IR citizen. We want to do this is a way that is
generally useful.

An "operand bundle" is a set of SSA values (called "bundle operands")
tagged with a string (called the "bundle tag"). One or more of such
bundles may be attached to a call or an invoke. The intended use of
these values is to support "frame introspection"-like functionality
for managed languages.

# Abstract Syntax

The syntax of a call instruction will be changed to look like this:

<result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*]
    <fnptrval>(<function args>) [operand_bundle*] [fn attrs]

where operand_bundle = tag '('[ value ] (',' value )* ')'
      value = normal SSA values
      tag = "< some name >"

In other words, after the function arguments we now have an optional
list of operand bundles of the form `"< bundle tag >"(bundle
attributes, values...)`. There can be more than one operand bundle in
a call. Two operand bundles in the same call instruction cannot have
the same tag.

We'd do something similar for invokes. I'll omit the invoke syntax
from this RFC to keep things brief.

An example:

    define i32 @f(i32 %x) {
     entry:
      %t = add i32 %x, 1
      ret i32 %t
    }

    define void @g(i16 %val, i8* %ptr) {
     entry:
      call void @f(i32 10) "some-bundle"(i32 42) "debug"(i32 100)
      call void @f(i32 20) "some-bundle"(i16 %val, i8* %ptr)
    }

Note 1: Operand bundles are *not* part of a function's signature, and
a given function may be called from multiple places with different
kinds of operand bundles. This reflects the fact that the operand
bundles are conceptually a part of the *call*, not the callee being
dispatched to.

Note 2: There may be tag specific requirements not mentioned here.
E.g. we may add a rule in the future that says operand bundles with
the tag `"integer-id"` may only contain exactly one constant integer.

# IR Semantics

Bundle operands (SSA values part of some operand bundle) are normal
SSA values. They need to dominate the call or invoke instruction
they're being passed into and can be optimized as usual. For
instance, LLVM is allowed (and strongly encouraged!) to PRE / LICM a
load feeding into an operand bundle if legal.

Operand bundles are characterized by the `"< bundle tag >"` string
associated with them.

The overall strategy is:

1. The semantics are as conservative as is reasonable for operand
    bundles with tags that LLVM does not have a special understanding
    of. This way LLVM does not miscompile code by default.

2. LLVM understands the semantics of operand bundles with certain
    specific tags more precisely, and can optimize them better.

This RFC talks mainly about (1). We will discuss (2) as we add smarts
to LLVM about specific kinds of operand bundles.

The IR-level semantics of an operand bundle with an arbitrary tag are:

1. The bundle operands passed in to a call escape in unknown ways
    before transferring control to the callee. For instance:

      declare void @opaque_runtime_fn()

      define void @f(i32* %v) { }

      define i32 @g() {
        %t = i32* @malloc(...)
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"(i32* %t)

        store i32 42, i32* %t
        call void @opaque_runtime_fn();
        ret (load i32, i32* %t)
      }

    Normally (without the `"unknown"` bundle) it would be okay to
    optimize `@g` to return `42`. But the `"unknown"` operand bundle
    escapes `%t`, and the call to `@opaque_runtime_fn` can therefore
    modify the location pointed to by `%t`.

2. Calls and invokes with operand bundles have unknown read / write
    effect on the heap on entry and exit (even if the call target is
    `readnone` or `readonly`). For instance:

      define void @f(i32* %v) { }

      define i32 @g() {
        %t = i32* @malloc(...)
        %t.unescaped = i32* @malloc(...)
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"(i32* %t)
        ret (load i32, i32* %t)
      }

    Normally it would be okay to optimize `@g` to return `undef`, but
    the `"unknown"` bundle potentially clobbers `%t`. Note that it
    clobbers `%t` only because it was *also escaped* by the
    `"unknown"` operand bundle -- it does not clobber `%t.unescaped`
    because it isn't reachable from the heap yet.

    However, it is okay to optimize

      define void @f(i32* %v) {
        store i32 10, i32* %v
        print(load i32, i32* %v)
      }

      define void @g() {
        %t = ...
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"()
      }

    to

      define void @f(i32* %v) {
        store i32 10, i32* %v
        print(10)
      }

      define void @g() {
        %t = ...
        call void @f(i32* %t) "unknown"()
      }

    The arbitrary heap clobbering only happens on the boundaries of
    the call operation, and therefore we can still do store-load
    forwarding *within* `@f`.

Since we haven't specified any "pure" LLVM way of accessing the
contents of operand bundles, the client is required to model such
accesses as calls to opaque functions (or inline assembly). This
ensures that things like IPSCCP work as intended. E.g. it is legal to
optimize

   define i32 @f(i32* %v) { ret i32 10 }

   define void @g() {
     %t = i32* @malloc(...)
     %v = call i32 @f(i32* %t) "unknown"(i32* %t)
     print(%v)
   }

to

   define i32 @f(i32* %v) { ret i32 10 }

   define void @g() {
     %t = i32* @malloc(...)
     %v = call i32 @f(i32* %t) "unknown"(i32* %t)
     print(10)
   }

LLVM won't generally be able to inline through calls and invokes with
operand bundles -- the inliner does not know what to replace the
arbitrary heap accesses implied on function entry and exit with.
However, we intend to teach the inliner to inline through calls /
invokes with some specific kinds of operand bundles.

# Lowering

The lowering strategy will be special cased for each bundle tag.
There won't be any "generic" lowering strategy -- `llc` is expected to
abort if it sees an operand bundle that it does not understand.

There is no requirement that the operand bundles actually make it to
the backend. Rewriting the operand bundles into "vanilla" LLVM IR at
some point in the pipeline (instead of teaching codegen to lower them)
is a perfectly reasonable lowering strategy.

# Example use cases

A couple of usage scenarios are very briefly described below:

## Deoptimization

This is our motivating use case. Some managed environments expect to
be able to discover the state of the abstract virtual machine at specific call
sites. LLVM will be able to support this requirement by attaching a
`"deopt"` operand bundle containing the state of the abstract virtual
machine (as a vector of SSA values) at the appropriate call sites.
There is a straightforward way
to extend the inliner work with `"deopt"` operand bundles.

`"deopt"` operand bundles will not have to be as pessimistic about
heap effects as the general "unknown operand bundle" case -- they only
imply a read from the entire heap on function entry or function exit,
depending on what kind of deoptimization state we're interested in.
They also don't imply escaping semantics.

## Value injection

By passing in one or more `alloca`s to an `"injectable-value"` tagged
operand bundle, languages can allow the runtime to overwrite the
values of specific variables, while still preserving a significant
amount of optimization potential.

Thoughts?
-- Sanjoy

We'd like to propose a scheme to attach "operand bundles" to call and
invoke instructions. This is based on the offline discussion
mentioned in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html.

# Motivation & Definition

Our motivation behind this is to track the state required for
deoptimization (described briefly later) through the LLVM pipeline as
a first-class IR citizen. We want to do this is a way that is
generally useful.

An "operand bundle" is a set of SSA values (called "bundle operands")
tagged with a string (called the "bundle tag"). One or more of such
bundles may be attached to a call or an invoke. The intended use of
these values is to support "frame introspection"-like functionality
for managed languages.

# Abstract Syntax

The syntax of a call instruction will be changed to look like this:

<result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*]
    <fnptrval>(<function args>) [operand_bundle*] [fn attrs]

where operand_bundle = tag '('[ value ] (',' value )* ')'
      value = normal SSA values
      tag = "< some name >"

In other words, after the function arguments we now have an optional
list of operand bundles of the form `"< bundle tag >"(bundle
attributes, values...)`. There can be more than one operand bundle in
a call. Two operand bundles in the same call instruction cannot have
the same tag.

We'd do something similar for invokes. I'll omit the invoke syntax
from this RFC to keep things brief.

An example:

    define i32 @f(i32 %x) {
     entry:
      %t = add i32 %x, 1
      ret i32 %t
    }

    define void @g(i16 %val, i8* %ptr) {
     entry:
      call void @f(i32 10) "some-bundle"(i32 42) "debug"(i32 100)
      call void @f(i32 20) "some-bundle"(i16 %val, i8* %ptr)
    }

Note 1: Operand bundles are *not* part of a function's signature, and
a given function may be called from multiple places with different
kinds of operand bundles. This reflects the fact that the operand
bundles are conceptually a part of the *call*, not the callee being
dispatched to.

Note 2: There may be tag specific requirements not mentioned here.
E.g. we may add a rule in the future that says operand bundles with
the tag `"integer-id"` may only contain exactly one constant integer.

# IR Semantics

Bundle operands (SSA values part of some operand bundle) are normal
SSA values. They need to dominate the call or invoke instruction
they're being passed into and can be optimized as usual. For
instance, LLVM is allowed (and strongly encouraged!) to PRE / LICM a
load feeding into an operand bundle if legal.

Operand bundles are characterized by the `"< bundle tag >"` string
associated with them.

The overall strategy is:

1. The semantics are as conservative as is reasonable for operand
    bundles with tags that LLVM does not have a special understanding
    of. This way LLVM does not miscompile code by default.

2. LLVM understands the semantics of operand bundles with certain
    specific tags more precisely, and can optimize them better.

This RFC talks mainly about (1). We will discuss (2) as we add smarts
to LLVM about specific kinds of operand bundles.

The IR-level semantics of an operand bundle with an arbitrary tag are:

1. The bundle operands passed in to a call escape in unknown ways
    before transferring control to the callee. For instance:

      declare void @opaque_runtime_fn()

      define void @f(i32* %v) { }

      define i32 @g() {
        %t = i32* @malloc(...)
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"(i32* %t)

        store i32 42, i32* %t
        call void @opaque_runtime_fn();
        ret (load i32, i32* %t)
      }

    Normally (without the `"unknown"` bundle) it would be okay to
    optimize `@g` to return `42`. But the `"unknown"` operand bundle
    escapes `%t`, and the call to `@opaque_runtime_fn` can therefore
    modify the location pointed to by `%t`.

2. Calls and invokes with operand bundles have unknown read / write
    effect on the heap on entry and exit (even if the call target is
    `readnone` or `readonly`). For instance:

      define void @f(i32* %v) { }

      define i32 @g() {
        %t = i32* @malloc(...)
        %t.unescaped = i32* @malloc(...)
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"(i32* %t)
        ret (load i32, i32* %t)
      }

    Normally it would be okay to optimize `@g` to return `undef`, but
    the `"unknown"` bundle potentially clobbers `%t`. Note that it
    clobbers `%t` only because it was *also escaped* by the
    `"unknown"` operand bundle -- it does not clobber `%t.unescaped`
    because it isn't reachable from the heap yet.

    However, it is okay to optimize

      define void @f(i32* %v) {
        store i32 10, i32* %v
        print(load i32, i32* %v)
      }

      define void @g() {
        %t = ...
        ;; "unknown" is a tag LLVM does not have any special knowledge of
        call void @f(i32* %t) "unknown"()
      }

    to

      define void @f(i32* %v) {
        store i32 10, i32* %v
        print(10)
      }

      define void @g() {
        %t = ...
        call void @f(i32* %t) "unknown"()
      }

    The arbitrary heap clobbering only happens on the boundaries of
    the call operation, and therefore we can still do store-load
    forwarding *within* `@f`.

Since we haven't specified any "pure" LLVM way of accessing the
contents of operand bundles, the client is required to model such
accesses as calls to opaque functions (or inline assembly). This
ensures that things like IPSCCP work as intended. E.g. it is legal to
optimize

   define i32 @f(i32* %v) { ret i32 10 }

   define void @g() {
     %t = i32* @malloc(...)
     %v = call i32 @f(i32* %t) "unknown"(i32* %t)
     print(%v)
   }

to

   define i32 @f(i32* %v) { ret i32 10 }

   define void @g() {
     %t = i32* @malloc(...)
     %v = call i32 @f(i32* %t) "unknown"(i32* %t)
     print(10)
   }

LLVM won't generally be able to inline through calls and invokes with
operand bundles -- the inliner does not know what to replace the
arbitrary heap accesses implied on function entry and exit with.
However, we intend to teach the inliner to inline through calls /
invokes with some specific kinds of operand bundles.

# Lowering

The lowering strategy will be special cased for each bundle tag.
There won't be any "generic" lowering strategy -- `llc` is expected to
abort if it sees an operand bundle that it does not understand.

There is no requirement that the operand bundles actually make it to
the backend. Rewriting the operand bundles into "vanilla" LLVM IR at
some point in the pipeline (instead of teaching codegen to lower them)
is a perfectly reasonable lowering strategy.

# Example use cases

A couple of usage scenarios are very briefly described below:

## Deoptimization

This is our motivating use case. Some managed environments expect to
be able to discover the state of the abstract virtual machine at specific
call
sites. LLVM will be able to support this requirement by attaching a
`"deopt"` operand bundle containing the state of the abstract virtual
machine (as a vector of SSA values) at the appropriate call sites.
There is a straightforward way
to extend the inliner work with `"deopt"` operand bundles.

`"deopt"` operand bundles will not have to be as pessimistic about
heap effects as the general "unknown operand bundle" case -- they only
imply a read from the entire heap on function entry or function exit,
depending on what kind of deoptimization state we're interested in.
They also don't imply escaping semantics.

## Value injection

By passing in one or more `alloca`s to an `"injectable-value"` tagged
operand bundle, languages can allow the runtime to overwrite the
values of specific variables, while still preserving a significant
amount of optimization potential.

Thoughts?

This seems pretty useful, generic, call-site annotation mechanism. I
believe that this has immediate application outside of the context of GC.

Our exception handling personality routine has a desire to know whether
some code is inside a specific try or catch. We can feed the value coming
out of our EH pad back into the call-site, making it very clear which EH
pad the call-site is associated with.

This seems pretty useful, generic, call-site annotation mechanism. I
believe that this has immediate application outside of the context of GC.

As supporting evidence, let me say that we're not using this for GC
either :). We will use to support deoptimization [1][2] [3]. We will
continue to support precise relocating garbage collection using
statepoints.

I can go into some detail on how we plan to use this for
deoptimization if you're interested; I left out most of deopt specific
bits to avoid cluttering up the main proposal.

[1]: http://www.philipreames.com/Blog/2015/05/20/deoptimization-terminology/
[2]: http://www.oracle.com/technetwork/java/whitepaper-135217.html#dynamic
[3]: https://blog.indutny.com/a.deoptimize-me-not

-- Sanjoy

We'd like to propose a scheme to attach "operand bundles" to call and
invoke instructions. This is based on the offline discussion
mentioned in
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html.

I'm (obviously) in support of the overall proposal. :slight_smile: A few details below.

# Motivation & Definition

Our motivation behind this is to track the state required for
deoptimization (described briefly later) through the LLVM pipeline as
a first-class IR citizen. We want to do this is a way that is
generally useful.

An "operand bundle" is a set of SSA values (called "bundle operands")
tagged with a string (called the "bundle tag"). One or more of such
bundles may be attached to a call or an invoke. The intended use of
these values is to support "frame introspection"-like functionality
for managed languages.

# Abstract Syntax

The syntax of a call instruction will be changed to look like this:

<result> = [tail | musttail] call [cconv] [ret attrs] <ty> [<fnty>*]
     <fnptrval>(<function args>) [operand_bundle*] [fn attrs]

where operand_bundle = tag '('[ value ] (',' value )* ')'
       value = normal SSA values
       tag = "< some name >"

tag needs to be "some string name" or <future keyword>. We also need to be clear about what the compatibility guarantees are. If I remember correctly, we discussed something along the following:
- string bundle names are entirely version locked to particular revision of LLVM. They are for experimentation and incremental development. There is no attempt to forward serialize them. In particular, using a string name which is out of sync with the version of LLVM can result in miscompiles.
- keyword bundle names become first class parts of the IR, they are forward serialized, and fully supported. Obviously, getting an experimental string bundle name promoted to a first class keyword bundle will require broad discussion and buy in.

We were deliberately trying to parallel the defacto policy around attributes vs string-attributes.

In other words, after the function arguments we now have an optional
list of operand bundles of the form `"< bundle tag >"(bundle
attributes, values...)`. There can be more than one operand bundle in
a call. Two operand bundles in the same call instruction cannot have
the same tag.

I don't think we need that last sentence. It should be up to the bundle implementation if that's legal or not. I don't have a strong preference here and we could easily relax this later.

We'd do something similar for invokes. I'll omit the invoke syntax
from this RFC to keep things brief.

An example:

     define i32 @f(i32 %x) {
      entry:
       %t = add i32 %x, 1
       ret i32 %t
     }

     define void @g(i16 %val, i8* %ptr) {
      entry:
       call void @f(i32 10) "some-bundle"(i32 42) "debug"(i32 100)
       call void @f(i32 20) "some-bundle"(i16 %val, i8* %ptr)
     }

Note 1: Operand bundles are *not* part of a function's signature, and
a given function may be called from multiple places with different
kinds of operand bundles. This reflects the fact that the operand
bundles are conceptually a part of the *call*, not the callee being
dispatched to.

Note 2: There may be tag specific requirements not mentioned here.
E.g. we may add a rule in the future that says operand bundles with
the tag `"integer-id"` may only contain exactly one constant integer.

# IR Semantics

Bundle operands (SSA values part of some operand bundle) are normal
SSA values. They need to dominate the call or invoke instruction
they're being passed into and can be optimized as usual. For
instance, LLVM is allowed (and strongly encouraged!) to PRE / LICM a
load feeding into an operand bundle if legal.

Operand bundles are characterized by the `"< bundle tag >"` string
associated with them.

The overall strategy is:

  1. The semantics are as conservative as is reasonable for operand
     bundles with tags that LLVM does not have a special understanding
     of. This way LLVM does not miscompile code by default.

  2. LLVM understands the semantics of operand bundles with certain
     specific tags more precisely, and can optimize them better.

This RFC talks mainly about (1). We will discuss (2) as we add smarts
to LLVM about specific kinds of operand bundles.

The IR-level semantics of an operand bundle with an arbitrary tag are:

  1. The bundle operands passed in to a call escape in unknown ways
     before transferring control to the callee. For instance:

       declare void @opaque_runtime_fn()

       define void @f(i32* %v) { }

       define i32 @g() {
         %t = i32* @malloc(...)
         ;; "unknown" is a tag LLVM does not have any special knowledge of
         call void @f(i32* %t) "unknown"(i32* %t)

         store i32 42, i32* %t
         call void @opaque_runtime_fn();
         ret (load i32, i32* %t)
       }

     Normally (without the `"unknown"` bundle) it would be okay to
     optimize `@g` to return `42`. But the `"unknown"` operand bundle
     escapes `%t`, and the call to `@opaque_runtime_fn` can therefore
     modify the location pointed to by `%t`.

  2. Calls and invokes with operand bundles have unknown read / write
     effect on the heap on entry and exit (even if the call target is
     `readnone` or `readonly`). For instance:

I don't think we actually need this. I think it would be perfectly fine to require the frontend ensure that the called function is not readonly if it being readonly would be problematic for the call site. I'm not really opposed to this generalization - I could see it being useful - but I'm worried about the amount of work involved. A *lot* of the optimizer assumes that attributes on a call site strictly less conservative than the underlying function. Changing that could have a long bug tail. I'd rather defer that work until someone defines an operand bundle type which requires it. The motivating example (deoptimization) doesn't seem to require this.

       define void @f(i32* %v) { }

       define i32 @g() {
         %t = i32* @malloc(...)
         %t.unescaped = i32* @malloc(...)
         ;; "unknown" is a tag LLVM does not have any special knowledge of
         call void @f(i32* %t) "unknown"(i32* %t)
         ret (load i32, i32* %t)
       }

     Normally it would be okay to optimize `@g` to return `undef`, but
     the `"unknown"` bundle potentially clobbers `%t`. Note that it
     clobbers `%t` only because it was *also escaped* by the
     `"unknown"` operand bundle -- it does not clobber `%t.unescaped`
     because it isn't reachable from the heap yet.

     However, it is okay to optimize

       define void @f(i32* %v) {
         store i32 10, i32* %v
         print(load i32, i32* %v)
       }

       define void @g() {
         %t = ...
         ;; "unknown" is a tag LLVM does not have any special knowledge of
         call void @f(i32* %t) "unknown"()
       }

     to

       define void @f(i32* %v) {
         store i32 10, i32* %v
         print(10)
       }

       define void @g() {
         %t = ...
         call void @f(i32* %t) "unknown"()
       }

     The arbitrary heap clobbering only happens on the boundaries of
     the call operation, and therefore we can still do store-load
     forwarding *within* `@f`.

Since we haven't specified any "pure" LLVM way of accessing the
contents of operand bundles, the client is required to model such
accesses as calls to opaque functions (or inline assembly).

I'm a bit confused by this section. By "client" do you mean frontend? And what are you trying to allow in the second sentence? The first sentence seems sufficient.

This
ensures that things like IPSCCP work as intended. E.g. it is legal to
optimize

    define i32 @f(i32* %v) { ret i32 10 }

    define void @g() {
      %t = i32* @malloc(...)
      %v = call i32 @f(i32* %t) "unknown"(i32* %t)
      print(%v)
    }

to

    define i32 @f(i32* %v) { ret i32 10 }

    define void @g() {
      %t = i32* @malloc(...)
      %v = call i32 @f(i32* %t) "unknown"(i32* %t)
      print(10)
    }

To say this differently, an operand bundle at a call site can not change the implementation of the called function. This is not a mechanism for function interposition.

LLVM won't generally be able to inline through calls and invokes with
operand bundles -- the inliner does not know what to replace the
arbitrary heap accesses implied on function entry and exit with.
However, we intend to teach the inliner to inline through calls /
invokes with some specific kinds of operand bundles.

# Lowering

The lowering strategy will be special cased for each bundle tag.
There won't be any "generic" lowering strategy -- `llc` is expected to
abort if it sees an operand bundle that it does not understand.

There is no requirement that the operand bundles actually make it to
the backend. Rewriting the operand bundles into "vanilla" LLVM IR at
some point in the pipeline (instead of teaching codegen to lower them)
is a perfectly reasonable lowering strategy.

# Example use cases

A couple of usage scenarios are very briefly described below:

## Deoptimization

This is our motivating use case. Some managed environments expect to
be able to discover the state of the abstract virtual machine at specific call
sites. LLVM will be able to support this requirement by attaching a
`"deopt"` operand bundle containing the state of the abstract virtual
machine (as a vector of SSA values) at the appropriate call sites.
There is a straightforward way
to extend the inliner work with `"deopt"` operand bundles.

`"deopt"` operand bundles will not have to be as pessimistic about
heap effects as the general "unknown operand bundle" case -- they only
imply a read from the entire heap on function entry or function exit,
depending on what kind of deoptimization state we're interested in.
They also don't imply escaping semantics.

An alternate framing here which would remove the attribute case I was worried about about would be to separate the memory and abstract state semantics of deoptimization. If the deopt bundle only described the abstract state and it was up to the frontend to ensure the callee was at least readonly, we wouldn't need to model memory in the deopt bundle. I think that's a much better starting place.

## Value injection

By passing in one or more `alloca`s to an `"injectable-value"` tagged
operand bundle, languages can allow the runtime to overwrite the
values of specific variables, while still preserving a significant
amount of optimization potential.

To be clear, this was intended to model use cases like Python's ability to inject values into caller frames.

tag needs to be "some string name" or <future keyword>. We also need to be
clear about what the compatibility guarantees are. If I remember correctly,
we discussed something along the following:
- string bundle names are entirely version locked to particular revision of
LLVM. They are for experimentation and incremental development. There is
no attempt to forward serialize them. In particular, using a string name
which is out of sync with the version of LLVM can result in miscompiles.
- keyword bundle names become first class parts of the IR, they are forward
serialized, and fully supported. Obviously, getting an experimental string
bundle name promoted to a first class keyword bundle will require broad
discussion and buy in.

We were deliberately trying to parallel the defacto policy around attributes
vs string-attributes.

Agreed.

In other words, after the function arguments we now have an optional
list of operand bundles of the form `"< bundle tag >"(bundle
attributes, values...)`. There can be more than one operand bundle in
a call. Two operand bundles in the same call instruction cannot have
the same tag.

I don't think we need that last sentence. It should be up to the bundle
implementation if that's legal or not. I don't have a strong preference
here and we could easily relax this later.

I'll remove the restriction. I think it is reasonable to have this
decided per bundle type, as you suggested.

  2. Calls and invokes with operand bundles have unknown read / write
     effect on the heap on entry and exit (even if the call target is
     `readnone` or `readonly`). For instance:

I don't think we actually need this. I think it would be perfectly fine to
require the frontend ensure that the called function is not readonly if it
being readonly would be problematic for the call site. I'm not really
opposed to this generalization - I could see it being useful - but I'm
worried about the amount of work involved. A *lot* of the optimizer assumes
that attributes on a call site strictly less conservative than the
underlying function. Changing that could have a long bug tail. I'd rather
defer that work until someone defines an operand bundle type which requires
it. The motivating example (deoptimization) doesn't seem to require this.

If we're doing late poll placement and if certain functions are
"frameless" in the abstract machine, then we will need this for
deoptimization.

The case I'm thinking of is:

  define void @foo() {
   ;; Can be just about any kind of uncounted loop that is readnone
   entry:
    br label %inf_loop

   inf_loop:
    br label %inf_loop
  }

  define void @caller() {
   entry:
    store i32 42, i32* @global
    call void @foo() "deopt"(i32 100)
    store i32 46, i32* @global
    ret void
  }

Right now `@foo` is `readnone`, so the first store of `i32 42` can be
DSE'ed. However, if we insert a poll inside `@foo` later, that will
have to be given a JVM state, which we cannot do anymore since a store
that would have been done by the abstract machine has been elided.

[ moved here, because this is related ]

`"deopt"` operand bundles will not have to be as pessimistic about
heap effects as the general "unknown operand bundle" case -- they only
imply a read from the entire heap on function entry or function exit,
depending on what kind of deoptimization state we're interested in.
They also don't imply escaping semantics.

An alternate framing here which would remove the attribute case I was
worried about about would be to separate the memory and abstract state
semantics of deoptimization. If the deopt bundle only described the
abstract state and it was up to the frontend to ensure the callee was at
least readonly, we wouldn't need to model memory in the deopt bundle. I
think that's a much better starting place.

Semantically, I think we need the state of the heap to be consistent
at method call boundaries, not within a method boundary. For
instance, consider this:

  ;; @global is 0 to start with

  define void @f() readonly {
    ;; do whatever
    call read_only_safepoint_poll() readonly "deopt"( ... deopt state
local to @f ...)
  }

  define void @g() {
    call void @f() "deopt"( ... deopt state local to @g ...)
    if (*@global == 42) { side_effect(); }
    store i32 42, i32* @global
  }

If we do not have the reads-everything-on-exit property, then this is
a valid transform:

  define void @f() readonly {
    ;; do whatever
    call read_only_safepoint_poll() readonly "deopt"( ... deopt state
local to @f ...)
    if (*@global == 42) { side_effect(); }
    store i32 42, i32* @global
  }

  define void @g() {
    call void @f() "deopt"( ... deopt state local to @g ...)
  }

If we *don't* inline `@f` into `@g`, and `@f` wants to deoptimize `@g`
(and only `@g`) after halting the thread at
`read_only_safepoint_poll`, we're in trouble. `@f` will execute the
store to `@global` before returning, and the deoptimized `@g` will
call `side_effect` when it shouldn't have. (Note: I put the `if
(*@global == 42)` to make the problem more obvious, but in practice I
think doing the same store twice is also problematic). Another way to
state this is that even though the state of the heap was consistent at
the call to `read_only_safepoint_poll`, it will not be consistent when
`@f` returns. Therefore we cannot use a "deopt `@g` on return with
vmstate xyz" scheme, unless we model the operand bundle as reading the
entire heap on return of `@f` (this would force the state of the heap
to be consistent at the point where we actually use the vmstate).

There is an analogous case where we have to model the deopt operand
bundle as reads-everything-on-entry: if we have cases where we
deoptimize on entry. IOW, something like this:

  ; @global starts off as 0

  define void @side_exit() readonly {
    call void @deoptimize_my_caller()
    return
  }

  define void @store_field(ref) {
   (*@global)++;
lbl:
   if (ref == nullptr) {
     call void @side_exit() ;; vm_state = at label lbl
     unreachable
   } else {
     ref->field = 42;
   }
  }

could be transformed to

  define void @side_exit() readonly {
    (*@global)++;
    call void @deoptimize_my_caller()
    return
  }

  define void @store_field(ref) {
lbl:
   if (ref == nullptr) {
     call void @side_exit() ;; vm_state = at label lbl
     unreachable
   } else {
     (*@global)++;
     ref->field = 42;
   }
  }

Now if `ref` is null and we do not inline `@side_exit` then we will
end up incrementing `@global` twice.

In practice I think we can work around these issues by marking
`@side_exit` and `@f` as external, so that inter-procedural code
motion does not happen but

a. That would be a workaround, the semantic issues will still exist
b. LLVM is still free to specialize external functions.

As a meta point, I think the right way to view operand bundles is as
something that *happens* before and after an call / invoke, not as a
set of values being passed around. For that reason, do you think they
should be renamed to be something else?

Since we haven't specified any "pure" LLVM way of accessing the
contents of operand bundles, the client is required to model such
accesses as calls to opaque functions (or inline assembly).

I'm a bit confused by this section. By "client" do you mean frontend? And
what are you trying to allow in the second sentence? The first sentence
seems sufficient.

This
ensures that things like IPSCCP work as intended. E.g. it is legal to
optimize

To say this differently, an operand bundle at a call site can not change the
implementation of the called function. This is not a mechanism for function
interposition.

I was really trying to say "whatever the optimizer directly
understands about the IR is correct", so you're right, this is about
disallowing arbitrary function interposition.

-- Sanjoy

   2. Calls and invokes with operand bundles have unknown read / write
      effect on the heap on entry and exit (even if the call target is
      `readnone` or `readonly`). For instance:

I don't think we actually need this. I think it would be perfectly fine to
require the frontend ensure that the called function is not readonly if it
being readonly would be problematic for the call site. I'm not really
opposed to this generalization - I could see it being useful - but I'm
worried about the amount of work involved. A *lot* of the optimizer assumes
that attributes on a call site strictly less conservative than the
underlying function. Changing that could have a long bug tail. I'd rather
defer that work until someone defines an operand bundle type which requires
it. The motivating example (deoptimization) doesn't seem to require this.

If we're doing late poll placement and if certain functions are
"frameless" in the abstract machine, then we will need this for
deoptimization.

The case I'm thinking of is:

   define void @foo() {
    ;; Can be just about any kind of uncounted loop that is readnone
    entry:
     br label %inf_loop

    inf_loop:
     br label %inf_loop
   }

   define void @caller() {
    entry:
     store i32 42, i32* @global
     call void @foo() "deopt"(i32 100)
     store i32 46, i32* @global
     ret void
   }

Right now `@foo` is `readnone`, so the first store of `i32 42` can be
DSE'ed. However, if we insert a poll inside `@foo` later, that will
have to be given a JVM state, which we cannot do anymore since a store
that would have been done by the abstract machine has been elided.

As we discussed offline, this example is invalid. Specifically, while late insertion of safepoints works just fine for garbage collection, there is no way to rematerialize the abstract state for a virtual frame. As a result, late insertion of deoptimization points is an unsolved problem. Instead, this example would have had to have had at least one deopt point (which reads the entire heap) in @foo. As a result, @foo must be at least readonly and the DSE can not happen.

[ moved here, because this is related ]

`"deopt"` operand bundles will not have to be as pessimistic about
heap effects as the general "unknown operand bundle" case -- they only
imply a read from the entire heap on function entry or function exit,
depending on what kind of deoptimization state we're interested in.
They also don't imply escaping semantics.

An alternate framing here which would remove the attribute case I was
worried about about would be to separate the memory and abstract state
semantics of deoptimization. If the deopt bundle only described the
abstract state and it was up to the frontend to ensure the callee was at
least readonly, we wouldn't need to model memory in the deopt bundle. I
think that's a much better starting place.

Semantically, I think we need the state of the heap to be consistent
at method call boundaries, not within a method boundary. For
instance, consider this:

   ;; @global is 0 to start with

   define void @f() readonly {
     ;; do whatever
     call read_only_safepoint_poll() readonly "deopt"( ... deopt state
local to @f ...)
   }

   define void @g() {
     call void @f() "deopt"( ... deopt state local to @g ...)
     if (*@global == 42) { side_effect(); }
     store i32 42, i32* @global
   }

If we do not have the reads-everything-on-exit property, then this is
a valid transform:

   define void @f() readonly {
     ;; do whatever
     call read_only_safepoint_poll() readonly "deopt"( ... deopt state
local to @f ...)
     if (*@global == 42) { side_effect(); }
     store i32 42, i32* @global
   }

   define void @g() {
     call void @f() "deopt"( ... deopt state local to @g ...)
   }

If we *don't* inline `@f` into `@g`, and `@f` wants to deoptimize `@g`
(and only `@g`) after halting the thread at
`read_only_safepoint_poll`, we're in trouble. `@f` will execute the
store to `@global` before returning, and the deoptimized `@g` will
call `side_effect` when it shouldn't have. (Note: I put the `if
(*@global == 42)` to make the problem more obvious, but in practice I
think doing the same store twice is also problematic). Another way to
state this is that even though the state of the heap was consistent at
the call to `read_only_safepoint_poll`, it will not be consistent when
`@f` returns. Therefore we cannot use a "deopt `@g` on return with
vmstate xyz" scheme, unless we model the operand bundle as reading the
entire heap on return of `@f` (this would force the state of the heap
to be consistent at the point where we actually use the vmstate).

Sanjoy had to explain this example to me offline, so let me try to summarize for other readers. The key issue here is that deoptimization is a very restricted form of function interposition. Specifically, if we have physical frames for both @f and @g created by compiled versions of each function, we can replace the frame @g with a new frame @g_int which resumes execution in the interpreter at the specified abstract VM state. Essentially, it's not safe to assume that the version of the code seen by the optimizer is the version which will execute after return. This means that it is not safe to perform an interprocedural optimization which moves a side effect past the return from @f without adjusting the deoptimization state to reflect that that movement has happened. Today, we have no mechanism to do that adjustment, so instead, we must disallow the movement.

It's worth pointing out a couple of things here:
1) This is only problematic for a class of optimizations LLVM does not currently implemented.
2) The code motion involved would only be legal if the optimizer saw *all* callers of @f. For a generic external function, this could never be true.

There is an analogous case where we have to model the deopt operand
bundle as reads-everything-on-entry: if we have cases where we
deoptimize on entry. IOW, something like this:

   ; @global starts off as 0

   define void @side_exit() readonly {
     call void @deoptimize_my_caller()
     return
   }

   define void @store_field(ref) {
    (*@global)++;
  lbl:
    if (ref == nullptr) {
      call void @side_exit() ;; vm_state = at label lbl
      unreachable
    } else {
      ref->field = 42;
    }
   }

could be transformed to

   define void @side_exit() readonly {
     (*@global)++;
     call void @deoptimize_my_caller()
     return
   }

   define void @store_field(ref) {
  lbl:
    if (ref == nullptr) {
      call void @side_exit() ;; vm_state = at label lbl
      unreachable
    } else {
      (*@global)++;
      ref->field = 42;
    }
   }

Now if `ref` is null and we do not inline `@side_exit` then we will
end up incrementing `@global` twice.

(This is just the inverse example to the above. Same issues apply.)

In practice I think we can work around these issues by marking
`@side_exit` and `@f` as external, so that inter-procedural code
motion does not happen but

  a. That would be a workaround, the semantic issues will still exist
  b. LLVM is still free to specialize external functions.

Your last point is a good one. I hadn't considered that in my response above.

My suggestion would be that we frame this in one of three ways:
1) We separate the function interposition requirements (the readonly on call and return mentioned above) as a separate function attribute. We do not necessarily need to tie that to the bundle description.
2) We could define the bundle to imply an unknown caller of the declared callee. This would get around the problem above by essentially preventing the callee from ever being non-external.
3) We could define the bundle to imply readonly/readwrite apply at beginning and end of the call. This is a much more restricted interpretation of the memory effects and doesn't seem to require any changes to the optimizer today. If we add an IPO pass like the one above, we might need some changes, but even there, they shouldn't be huge.

I think I'm leaning towards (1) with the function attribute having the semantics described in (3) but separated into it's own attribute. Thoughts?

(By function attribute, I might mean "linkage". This seems to be interestingly similar to available_externally and some of the odd properties wanted by the LTO folks. Just thinking aloud.)

In all cases, we should require that a call with deopt bundle arguments be to a function with at least readonly. As we discussed above, nothing else really makes sense.

As a meta point, I think the right way to view operand bundles is as
something that *happens* before and after an call / invoke, not as a
set of values being passed around. For that reason, do you think they
should be renamed to be something else?

Not really. In fact, I'm really concerned about the "happens" interpretation. I think that is likely to snowball with additional semantics and be hard to reason about. I'd much rather have the bundles be fairly restricted and introduce additional attributes for any weird requirements a runtime might have.

Philip

From: "David Majnemer" <david.majnemer@gmail.com>
To: "Sanjoy Das" <sanjoy@playingwithpointers.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Philip Reames"
<listmail@philipreames.com>, "Chandler Carruth"
<chandlerc@gmail.com>, "Nick Lewycky" <nlewycky@google.com>, "Hal
Finkel" <hfinkel@anl.gov>, "Chen Li" <meloli87@gmail.com>, "Russell
Hadley" <rhadley@microsoft.com>, "Kevin Modzelewski"
<kmod@dropbox.com>, "Swaroop Sridhar"
<Swaroop.Sridhar@microsoft.com>, rudi@dropbox.com, "Pat Gavlin"
<pagavlin@microsoft.com>, "Joseph Tremoulet" <jotrem@microsoft.com>,
"Reid Kleckner" <rnk@google.com>
Sent: Monday, August 10, 2015 11:38:32 PM
Subject: Re: RFC: Add "operand bundles" to calls and invokes

> We'd like to propose a scheme to attach "operand bundles" to call
> and

> invoke instructions. This is based on the offline discussion

> mentioned in

> http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/088748.html .

> # Motivation & Definition

> Our motivation behind this is to track the state required for

> deoptimization (described briefly later) through the LLVM pipeline
> as

> a first-class IR citizen. We want to do this is a way that is

> generally useful.

> An "operand bundle" is a set of SSA values (called "bundle
> operands")

> tagged with a string (called the "bundle tag"). One or more of such

> bundles may be attached to a call or an invoke. The intended use of

> these values is to support "frame introspection"-like functionality

> for managed languages.

> # Abstract Syntax

> The syntax of a call instruction will be changed to look like this:

> <result> = [tail | musttail] call [cconv] [ret attrs] <ty>
> [<fnty>*]

> <fnptrval>(<function args>) [operand_bundle*] [fn attrs]

> where operand_bundle = tag '('[ value ] (',' value )* ')'

> value = normal SSA values

> tag = "< some name >"

> In other words, after the function arguments we now have an
> optional

> list of operand bundles of the form `"< bundle tag >"(bundle

> attributes, values...)`. There can be more than one operand bundle
> in

> a call. Two operand bundles in the same call instruction cannot
> have

> the same tag.

> We'd do something similar for invokes. I'll omit the invoke syntax

> from this RFC to keep things brief.

> An example:

> define i32 @f(i32 %x) {

> entry:

> %t = add i32 %x, 1

> ret i32 %t

> }

> define void @g(i16 %val, i8* %ptr) {

> entry:

> call void @f(i32 10) "some-bundle"(i32 42) "debug"(i32 100)

> call void @f(i32 20) "some-bundle"(i16 %val, i8* %ptr)

> }

> Note 1: Operand bundles are *not* part of a function's signature,
> and

> a given function may be called from multiple places with different

> kinds of operand bundles. This reflects the fact that the operand

> bundles are conceptually a part of the *call*, not the callee being

> dispatched to.

> Note 2: There may be tag specific requirements not mentioned here.

> E.g. we may add a rule in the future that says operand bundles with

> the tag `"integer-id"` may only contain exactly one constant
> integer.

> # IR Semantics

> Bundle operands (SSA values part of some operand bundle) are normal

> SSA values. They need to dominate the call or invoke instruction

> they're being passed into and can be optimized as usual. For

> instance, LLVM is allowed (and strongly encouraged!) to PRE / LICM
> a

> load feeding into an operand bundle if legal.

> Operand bundles are characterized by the `"< bundle tag >"` string

> associated with them.

> The overall strategy is:

> 1. The semantics are as conservative as is reasonable for operand

> bundles with tags that LLVM does not have a special understanding

> of. This way LLVM does not miscompile code by default.

> 2. LLVM understands the semantics of operand bundles with certain

> specific tags more precisely, and can optimize them better.

> This RFC talks mainly about (1). We will discuss (2) as we add
> smarts

> to LLVM about specific kinds of operand bundles.

> The IR-level semantics of an operand bundle with an arbitrary tag
> are:

> 1. The bundle operands passed in to a call escape in unknown ways

> before transferring control to the callee. For instance:

> declare void @opaque_runtime_fn()

> define void @f(i32* %v) { }

> define i32 @g() {

> %t = i32* @malloc(...)

> ;; "unknown" is a tag LLVM does not have any special knowledge of

> call void @f(i32* %t) "unknown"(i32* %t)

> store i32 42, i32* %t

> call void @opaque_runtime_fn();

> ret (load i32, i32* %t)

> }

> Normally (without the `"unknown"` bundle) it would be okay to

> optimize `@g` to return `42`. But the `"unknown"` operand bundle

> escapes `%t`, and the call to `@opaque_runtime_fn` can therefore

> modify the location pointed to by `%t`.

> 2. Calls and invokes with operand bundles have unknown read / write

> effect on the heap on entry and exit (even if the call target is

> `readnone` or `readonly`). For instance:

> define void @f(i32* %v) { }

> define i32 @g() {

> %t = i32* @malloc(...)

> %t.unescaped = i32* @malloc(...)

> ;; "unknown" is a tag LLVM does not have any special knowledge of

> call void @f(i32* %t) "unknown"(i32* %t)

> ret (load i32, i32* %t)

> }

> Normally it would be okay to optimize `@g` to return `undef`, but

> the `"unknown"` bundle potentially clobbers `%t`. Note that it

> clobbers `%t` only because it was *also escaped* by the

> `"unknown"` operand bundle -- it does not clobber `%t.unescaped`

> because it isn't reachable from the heap yet.

> However, it is okay to optimize

> define void @f(i32* %v) {

> store i32 10, i32* %v

> print(load i32, i32* %v)

> }

> define void @g() {

> %t = ...

> ;; "unknown" is a tag LLVM does not have any special knowledge of

> call void @f(i32* %t) "unknown"()

> }

> to

> define void @f(i32* %v) {

> store i32 10, i32* %v

> print(10)

> }

> define void @g() {

> %t = ...

> call void @f(i32* %t) "unknown"()

> }

> The arbitrary heap clobbering only happens on the boundaries of

> the call operation, and therefore we can still do store-load

> forwarding *within* `@f`.

> Since we haven't specified any "pure" LLVM way of accessing the

> contents of operand bundles, the client is required to model such

> accesses as calls to opaque functions (or inline assembly). This

> ensures that things like IPSCCP work as intended. E.g. it is legal
> to

> optimize

> define i32 @f(i32* %v) { ret i32 10 }

> define void @g() {

> %t = i32* @malloc(...)

> %v = call i32 @f(i32* %t) "unknown"(i32* %t)

> print(%v)

> }

> to

> define i32 @f(i32* %v) { ret i32 10 }

> define void @g() {

> %t = i32* @malloc(...)

> %v = call i32 @f(i32* %t) "unknown"(i32* %t)

> print(10)

> }

> LLVM won't generally be able to inline through calls and invokes
> with

> operand bundles -- the inliner does not know what to replace the

> arbitrary heap accesses implied on function entry and exit with.

> However, we intend to teach the inliner to inline through calls /

> invokes with some specific kinds of operand bundles.

> # Lowering

> The lowering strategy will be special cased for each bundle tag.

> There won't be any "generic" lowering strategy -- `llc` is expected
> to

> abort if it sees an operand bundle that it does not understand.

> There is no requirement that the operand bundles actually make it
> to

> the backend. Rewriting the operand bundles into "vanilla" LLVM IR
> at

> some point in the pipeline (instead of teaching codegen to lower
> them)

> is a perfectly reasonable lowering strategy.

> # Example use cases

> A couple of usage scenarios are very briefly described below:

> ## Deoptimization

> This is our motivating use case. Some managed environments expect
> to

> be able to discover the state of the abstract virtual machine at
> specific call

> sites. LLVM will be able to support this requirement by attaching a

> `"deopt"` operand bundle containing the state of the abstract
> virtual

> machine (as a vector of SSA values) at the appropriate call sites.

> There is a straightforward way

> to extend the inliner work with `"deopt"` operand bundles.

> `"deopt"` operand bundles will not have to be as pessimistic about

> heap effects as the general "unknown operand bundle" case -- they
> only

> imply a read from the entire heap on function entry or function
> exit,

> depending on what kind of deoptimization state we're interested in.

> They also don't imply escaping semantics.

> ## Value injection

> By passing in one or more `alloca`s to an `"injectable-value"`
> tagged

> operand bundle, languages can allow the runtime to overwrite the

> values of specific variables, while still preserving a significant

> amount of optimization potential.

> Thoughts?

This seems pretty useful, generic, call-site annotation mechanism.

Agreed. It seems like these would be useful for our existing patchpoints too (to record the live values for the associated stack map, instead of using extra intrinsic arguments for them).

-Hal

That’s specifically the intent. This mechanism will allow us to work towards replacing (or at least greatly simplifying) both patchpoint and statepoints.

A high level summary of the proposal as it stands right now (from my
perspective), after
incorporating Philip's suggestions:

1. Operand bundles are a way to associate a set of SSA values with a
    call or invoke.

2. Operand bundles are lowered in some arbitrary bundle-tag specific
    manner.

3. The optimizer can optimize around operand bundles with (roughly)
    the assumption that they're just extra arguments to the call /
    invoke. In particular, the optimizer does not have to assume that
    operand bundles imply any extra memory / IO effects than what is
    apparent from the call.

4. Through the discussion we came up with a re-ordering
    restriction we'll have to place on function calls / invokes that
    may deoptimize their caller. This is orthogonal to the operand
    bundles discussion, and will be implemented as a separate call
    attribute.

Is everyone on the thread comfortable enough with the general idea
that I can start writing patches and sending them in for review?

-- Sanjoy

A high level summary of the proposal as it stands right now (from my
perspective), after
incorporating Philip's suggestions:

  1. Operand bundles are a way to associate a set of SSA values with a
     call or invoke.

  2. Operand bundles are lowered in some arbitrary bundle-tag specific
     manner.

  3. The optimizer can optimize around operand bundles with (roughly)
     the assumption that they're just extra arguments to the call /
     invoke. In particular, the optimizer does not have to assume that
     operand bundles imply any extra memory / IO effects than what is
     apparent from the call.

  4. Through the discussion we came up with a re-ordering
     restriction we'll have to place on function calls / invokes that
     may deoptimize their caller. This is orthogonal to the operand
     bundles discussion, and will be implemented as a separate call
     attribute.

Is everyone on the thread comfortable enough with the general idea
that I can start writing patches and sending them in for review?

I am.

Looks good to me too. Thanks.

Swaroop.

Initial set of patches are up for review at:

http://reviews.llvm.org/D12455
http://reviews.llvm.org/D12456
http://reviews.llvm.org/D12457

Thanks,
-- Sanjoy

Just wanted to confirm that I too like where this is going. =] I think Philip and others have really handled the bulk of the review, and I’m very comfortable with them finishing the patch review.

One issue where I wanted to chime in, hopefully just to add some clarity, is the “readonly” vs operand bundle set of (interrelated) issues.

First, as I think Philip already said, I think it is important that a readonly or a readnone attribute on a call is absolute. Optimizations shouldn’t have to go look for an operand bundle. Instead, we should prevent the call-side attributes from being added.

I think there may be a separate way of specifying all of this that makes things clearer. Operand bundles imply that when lowering, the call may be wrapped with a call to an external function before and/or after the called function, with the bundled operands escaped into those external functions which may capture, etc.

This both gives you the escape semantics, and it gives you something else; the runtime function might not return! That should (I think) exactly capture the semantic issue you were worried about with deopt. Because control may never reach the called function, or may never return to the caller even if the callee returns, code motion of side-effects would be clearly prohibited.

Does this make sense as an approach to specifying things? (Or worse, are you already there, and I’m just arriving late to the party?)

-Chandler

Hi Chandler,

Thanks for replying!

First, as I think Philip already said, I think it is important that a
readonly or a readnone attribute on a call is absolute. Optimizations
shouldn't have to go look for an operand bundle. Instead, we should prevent
the call-side attributes from being added.

I think Philip's concern was more about the *difference* between the
call side attributes and attributes on the function.

Say you have

define i32 @f() {
  ret i32 42
}

define void @g() {
  call void @f() [ "foo"(i32 100) ]
  ret void
}

Now I think we all agree that the call to `@f` cannot be marked as
`readnone` to have deopt semantics. We can (I suspect without too
much churn) make sure LLVM does not take such a `call` and mark it as
`readnone`.

However, `-functionattrs` (and related passes) are still allowed to
mark the *function* (`@f`) as `readnone`, and I think it would be very
weird if we disallowed that (since we'll have to iterate through all
of `@f`'s uses).

This brings us to the weird situation where we can have a
not-`readnone` call to a function that's marked `readnone`. This was
Philip's concern -- the semantics of the call is no longer the most
precise that can be deduced by looking at both the call and function
attributes. We'd possibly have issues with passes that looked at the
`CS.getCalledFunction()`'s attributes and decided to do an illegal
reordering because the function was marked `readnone`.

I think there may be a separate way of specifying all of this that makes
things clearer. Operand bundles imply that when lowering, the call may be
wrapped with a call to an external function before and/or after the called
function, with the bundled operands escaped into those external functions
which may capture, etc.

This both gives you the escape semantics, and it gives you something else;
the runtime function might not return! That should (I think) exactly capture
the semantic issue you were worried about with deopt. Because control may
never reach the called function, or may never return to the caller even if
the callee returns, code motion of side-effects would be clearly prohibited.

This is sort of what I was getting at when I said

"As a meta point, I think the right way to view operand bundles is as
something that *happens* before and after an call / invoke, not as a
set of values being passed around."

But with this scheme, the issue with a function's attributes being out
of sync with its actual semantics at a call site still exists.

I think a reasonable specification is to add a function attribute
`may_deopt_caller`[1]. Only functions that are marked
`may_deopt_caller` can actually access the operand bundles that was
passed to the function at a call site, and `may_deopt_caller` implies
all of the reordering restrictions we are interested in.
`-functionattrs` is not allowed to mark a `may_deopt_caller` function
as `readnone` (say) because they're not. If we wanted to be really
clever, we could even DCE deopt operand bundles in calls to functions
that are not marked `may_deopt_caller`.

This does bring up the semantic issue of whether `may_deopt_caller` is
truly a property of the callee, or am I just trying to come up with
arbitrary conservative attributes to sweep a complex issue under the
carpet. I'll have to spend some time thinking about this, but at this
time I think it is the former (otherwise I wouldn't be writhing this
:)) -- typically a callee has to *do* something to deopt its caller,
and that's usually a call to the runtime. `may_deopt_caller` in this
case is a conservative attribute stating that the callee may execute
such a deopting call. The most similar existing attribute I can find
is `returns_twice`.

It is (conservatively) okay to mark any function with
`may_deopt_caller`; and if LLVM's only concern was compiling for
managed, deopting environments, I'd consider making `may_deopt_caller`
the default and have an attribute `does_not_deopt` to indicate it's
negation. `does_not_deopt` would then be closer to more common
attributes like `readonly` and `argmemonly` -- its presence makes
optimization more effective, and its absence is conservatively
correct.

[1]: We can call it something more generic too, like
`inspects_stack_state` etc. That bike shed will be painted later.

-- Sanjoy

Hi Chandler,

Thanks for replying!

First, as I think Philip already said, I think it is important that a
readonly or a readnone attribute on a call is absolute. Optimizations
shouldn’t have to go look for an operand bundle. Instead, we should prevent
the call-side attributes from being added.

I think Philip’s concern was more about the difference between the
call side attributes and attributes on the function.

Say you have

define i32 @f() {
ret i32 42
}

define void @g() {
call void @f() [ “foo”(i32 100) ]
ret void
}

Now I think we all agree that the call to @f cannot be marked as
readnone to have deopt semantics. We can (I suspect without too
much churn) make sure LLVM does not take such a call and mark it as
readnone.

However, -functionattrs (and related passes) are still allowed to
mark the function (@f) as readnone, and I think it would be very
weird if we disallowed that (since we’ll have to iterate through all
of @f's uses).

This brings us to the weird situation where we can have a
not-readnone call to a function that’s marked readnone. This was
Philip’s concern – the semantics of the call is no longer the most
precise that can be deduced by looking at both the call and function
attributes. We’d possibly have issues with passes that looked at the
CS.getCalledFunction()'s attributes and decided to do an illegal
reordering because the function was marked readnone.

While I’m still mulling it over, I think that if we want something like operand bundles, we really need to move to the point where the only valid set of attributes to query is the call attributes when trying to understand the semantics of a call instruction. I actually like this model better. It clearly separates the idea that a particular call instruction’s semantics are modeled by a particular call instruction attribute set. A particular function’s semantics are modeled by its attribute set. Depending on the nature of the query, you should look at different ones.

Historically, getting this wrong only manifested in missed optimizations. With the ability to add extra functionality to call instructions (outside of the called function) we inherently introduce the concept of this being a correctness issue. I think we’ll have to carefully audit the optimizer here, but I’m not (yet) too worried about the ramifications.

I think there may be a separate way of specifying all of this that makes
things clearer. Operand bundles imply that when lowering, the call may be
wrapped with a call to an external function before and/or after the called
function, with the bundled operands escaped into those external functions
which may capture, etc.

This both gives you the escape semantics, and it gives you something else;
the runtime function might not return! That should (I think) exactly capture
the semantic issue you were worried about with deopt. Because control may
never reach the called function, or may never return to the caller even if
the callee returns, code motion of side-effects would be clearly prohibited.

This is sort of what I was getting at when I said

“As a meta point, I think the right way to view operand bundles is as
something that happens before and after an call / invoke, not as a
set of values being passed around.”

But with this scheme, the issue with a function’s attributes being out
of sync with its actual semantics at a call site still exists.

I think a reasonable specification is to add a function attribute
may_deopt_caller[1]. Only functions that are marked
may_deopt_caller can actually access the operand bundles that was
passed to the function at a call site, and may_deopt_caller implies
all of the reordering restrictions we are interested in.
-functionattrs is not allowed to mark a may_deopt_caller function
as readnone (say) because they’re not. If we wanted to be really
clever, we could even DCE deopt operand bundles in calls to functions
that are not marked may_deopt_caller.

This does bring up the semantic issue of whether may_deopt_caller is
truly a property of the callee, or am I just trying to come up with
arbitrary conservative attributes to sweep a complex issue under the
carpet. I’ll have to spend some time thinking about this, but at this
time I think it is the former (otherwise I wouldn’t be writhing this
:)) – typically a callee has to do something to deopt its caller,
and that’s usually a call to the runtime. may_deopt_caller in this
case is a conservative attribute stating that the callee may execute
such a deopting call. The most similar existing attribute I can find
is returns_twice.

I really think this just happens to be the special case of deopt, and that it is a mistake to design the IR extension based solely on that use case.

Consider many of the other decorator patterns that have been discussed as uses of this IR functionality. If the runtime logic invoked before or after the function can read or write memory other than what the callee does, we are moving to a point where the call instruction’s annotations (attributes + operand bundles) introduce a more restrictive semantic model than the function attributes alone.

I’m actually much more comfortable with the highly generic approach and eating the cost of teaching the optimizer about this distinction.