RFC: Add guard intrinsics to LLVM

global *ptr
declare @foo() readwrite
def @bar() { call @foo() [ "deopt"(XXX) ]; *ptr = 42 }
def @baz() { call @bar() [ "deopt"(YYY) ]; int v0 = *ptr }

Naively, it looks like an inter-proc CSE can forward 42 to v0, but
that's unsound, since @bar could get deoptimized at the call to
@foo(), and then who knows what'll get written to *ptr.

Ok, I think this example does a good job of getting at the root issue. You
claim this is not legal, I claim it is. :slight_smile: Specifically, because the use of
the inferred information will never be executed in baz. (see below)

Specifically, I think the problem here is that we're mixing a couple of
notions. First, we've got the state required for the deoptimization to
occur (i.e. deopt information). Second, we've got the actual deoptimization
mechanism. Third, we've got the *policy* under which deoptimization occurs.

The distinction between the later two is subtle and important. The
*mechanism* of exiting the callee and replacing it with an arbitrary
alternate implementation could absolutely break the deopt semantics as
you've pointed out. The policy we actually use does not. Specifically,
we've got the following restrictions:
1) We only replace callees with more general versions of themselves. Given
we might be invalidating a speculative assumption, this could be a *much*
more general version which includes actions and control flow invalidate any
attribute inference done over the callee.
2) We invalidate all callers of @foo which could have observed the incorrect
inference. (This is required to preserve correctness.)

Yes. I think I was too dramatic when I claimed that the
deoptimization model in LLMV is "wrong" -- the real story is more on
the lines of "frontend authors need to be aware of some subtleties".

I think we probably need to separate out something to represent the
interposition/replacement semantics implied by invalidation deoptimization.
In it's most generic form, this would model the full generality of the
mechanism and thus prevent nearly all inference. We could then clearly
express our *policy* as a restriction over that full generality.

Yes. LLVM already has a "mayBeOverriden" flag, we should just add a
function attribute, `interposable`, that makes `mayBeOverriden` return
true.

Per above, I think we're fine for invalidation deoptimization.

For side exits, the runtime function called can never be marked readonly (or
just about any other restricted semantics) precisely because it can execute
an arbitrary continuation.

The problem is a little easier with side exits, since with side exits,
we will either have them at the tail position, or have them follow an
unreachable (so having them as read/write/may-unwind is not a problem).

With guards, we have to solve a harder problem -- we don't want to
mark the guard as "can read write all memory", since we'd like to
forward `val` to `val1` in the example below:

  int val = *ptr;
  guard_on(arbitrary condition)
  int val1 = *ptr;

But, as discussed earlier, we're probably okay if we mark guard_on as
read/write; and use alias analysis to sneakily make it "practically
readonly".

-- Sanjoy

Hi Andy,

Responses inline:

By memory barrier, do you mean things like fences?

Thatā€™s right. Conservatively, I would not hoist at the LLVM level past
opaque non-readonly calls or fences.

Just for curiosity: why do you care about memory fences specifically?

@trap_on would produce a side effect (crashing), and LLVM needs some
limit on the extent to which side effects are reordered. I don't
particulary care about fences, but if we don't respect them then what
do we respect? Only the front end truly knows the semantics of traps
and other side effects. To me fences are just an initial stand-in at
LLVM level for those code motion boundaries. More precise intrinsics
could be introduced eventually if need to move traps across fences.

I actually see fences as a proxy for potential inter-process
communication and I/O. It's important that any opaque library call
could contain a fence.

For example, you brought up the infinite loop case. In general, to
make an infinite loop externally observable, it needs to read from a
volatile. Doing that would naturally prevent traps from being hoisted
above the loop.

I've begun to think that may-unwind (or may fail guard) and readonly
should be mutually exclusive. readonly refers to all system memory,
not just LLVM-visible memory. To achieve the effect of
"aliases-with-nothing-in-llvm" it's cleaner to use alias analysis.

(I've put this on the bug as well:)

18912 ā€“ Optimizer should consider "maythrow" calls (those without "nounwind) as having side effects.

(In reply to comment #2)
> The test case in this bug report is fixed by
> http://reviews.llvm.org/rL256728.
>
> I'm closing this because I don't have a test case and I no longer think it
> makes much sense to mark may-unwind calls "readonly". An unwind path always
> touches some memory. That fact that it doesn't alias with LLVM-visible

I'm not sure about this -- why does an unwind path *have* to write
some memory? Can't you implement unwind as "read the appropriate
exception handler RPC from a table, and return to that RPC"?

Just for the record, since the discussion has moved away from ā€œreadonlyā€ā€¦

I agree. As I said in the bug though, we should distinguish between
system memory and the memory addresses visible to LLVM. If a
personality function touches memory, the resume/unwind probably should not
be marked readonly even if it doesn't alias with anything in LLVM. That said, I
think the more important issue is whether the unwindable calls can be
reordered. If not, I don't think they should be readonly.

memory access can be handled by alias analysis.

Btw, I think in this interpretation it is incorrect for -functionattrs
to infer readnone for @foo in (which it does today):

define void @foo({ i8*, i32 } %val) personality i8 8 {
resume { i8*, i32 } %val
}

From the PR: the -functionattrs behavior looks like a bug to me, strictly speaking
(how can it assume the memory behavior of the personality function?)ā€¦
But maybe thereā€™s a good reason for allowing this optimization.

-Andy

Wait a secā€¦ itā€™s legal for the frontend to do the interprocedural CSE, but not LLVM. The frontend can guarantee that multiple functions can be deoptimized as a unit, but LLVM canā€™t make that assumption. As far as it knows, @guard_on will resume in the immediate caller.

  • Andy

This makes perfect sense to me now, especially if you want to use
@trap_on for safety checks. Without re-ordering restrictions, a
failed @trap_on will end up looking a lot like UB (since, say, you
could reorder a failing range check across a call to fflush); and so
that would be bad. You'd still have to be careful around
inaccessiblememonly and friends, though.

-- Sanjoy

I either missed that attribute or forgot about it. The semantics arenā€™t well specified. I would hope that it works as follows:
inaccessiblememonly functions, without further constraints, cannot be reordered w.r.t each other or CSEā€™d. However, readonly + inaccessiblememory attributes could be combined to allow both of those transformations.

@trap_on could be reordered with readonly + inaccessiblememory.

Iā€™m suggesting that readonly should refer to both accessible (LLVM) and inaccessible (system) memory.

-Andy

Assuming everyone is on the same page, here's a rough high level agenda:

# step A: Introduce an `interposable` function attribute

We can bike shed on the name and the exact specification, but the
general idea is that you cannot do IPA / IPO over callsites calling
`interposable` functions without inlining them. This attribute will
(usually) have to be used on function bodies that can deoptimize (e.g. has a
side exit / guard it in); but also has more general use cases.

# step B: Introduce a `side_exit` intrinsic

Specify an `@llvm.experimental.side_exit` intrinsic, polymorphic on the
return type:

- Consumes a "deopt" continuation, and replaces the current physical
Ā Ā Ā stack frame with one or more interpreter frames (implementation
Ā Ā Ā provided by the runtime).
- Calls to this intrinsic must be `musttail` (verifier will check this)
- We'll have some minor logic in the inliner such that when inlining @f into @g
Ā Ā Ā in

Ā Ā Ā Ā Ā define i32 @f() {
Ā Ā Ā Ā Ā Ā Ā if (X) return side_exit() [ "deopt"(X) ];
Ā Ā Ā Ā Ā Ā Ā return i32 20;
Ā Ā Ā Ā Ā }

Ā Ā Ā Ā Ā define i64 @g() {
Ā Ā Ā Ā Ā Ā Ā if (Y) {
Ā Ā Ā Ā Ā Ā Ā Ā Ā r = f() [ "deopt"(Y) ];
Ā Ā Ā Ā Ā Ā Ā Ā Ā print(r);
Ā Ā Ā Ā Ā }

Ā Ā Ā We get

Ā Ā Ā Ā Ā define i64 @g() {
Ā Ā Ā Ā Ā Ā Ā if (Y) {
Ā Ā Ā Ā Ā Ā Ā Ā Ā if (X) return side_exit() [ "deopt"(Y, X) ];
Ā Ā Ā Ā Ā Ā Ā Ā Ā print(20);
Ā Ā Ā Ā Ā Ā Ā }
Ā Ā Ā Ā Ā }

Ā Ā Ā and not

Ā Ā Ā Ā Ā define i64 @g() {
Ā Ā Ā Ā Ā Ā Ā if (Y) {
Ā Ā Ā Ā Ā Ā Ā Ā Ā r = X ? (side_exit() [ "deopt"(Y, X) ]) : 20;
Ā Ā Ā Ā Ā Ā Ā Ā Ā print(r);
Ā Ā Ā Ā Ā }

# step C: Introduce a `guard_on` intrinsic

Will be based around what was discussed / is going to be discussed on
this thread.

(I think Philip was right in suggesting to split out a "step B" that
only introduces a `side_exit` intrinsic. We *will* have to specify
them, since we'd like to optimize some after we've lowered guards into
explicit control flow, and for that we need a specification of side
exits.)

# aside: non-managed languages and guards

Chandler raised some points on IRC around making `guard_on` (and
possibly `side_exit`?) more generally applicable to unmanaged
languages; so we'd want to be careful to specify these in a way that
allows for implementations in an unmanaged environments (by function
cloning, for instance).

-- Sanjoy

inaccessiblememonly functions, without further constraints, cannot be reordered w.r.t each other or CSEā€™d.
However, readonly + inaccessiblememory attributes could be combined to allow both of those transformations

Although this seems right, from what I understand, read-only itself should be sufficient to do these optimizations.

More importantly, inaccessiblememonly was introduced to aid GlobalsAA to be able to say that a function, even though not marked read-only may not access a program visible global. Currently the attribute is not made use of anywhere and there is a plan to remove the attribute in future versions.

Thanks,

I noticed this after sending, but the examples have some potential for
confusion -- the X in the deopt state has nothing specifically to do
with the X in the condition.

     define i32 @f() {
       if (X) return side_exit() [ "deopt"(X) ];
       return i32 20;
     }

     define i64 @g() {
       if (Y) {
         r = f() [ "deopt"(Y) ];
         print(r);
     }

   We get

     define i64 @g() {
       if (Y) {
         if (X) return side_exit() [ "deopt"(Y, X) ];
         print(20);
       }
     }

   and not

     define i64 @g() {
       if (Y) {
         r = X ? (side_exit() [ "deopt"(Y, X) ]) : 20;
         print(r);
     }

-- Sanjoy

Assuming everyone is on the same page, here's a rough high level agenda:

# step A: Introduce an `interposable` function attribute

We can bike shed on the name and the exact specification, but the
general idea is that you cannot do IPA / IPO over callsites calling
`interposable` functions without inlining them. This attribute will
(usually) have to be used on function bodies that can deoptimize (e.g. has a
side exit / guard it in); but also has more general use cases.

+1

# step B: Introduce a `side_exit` intrinsic

Specify an `@llvm.experimental.side_exit` intrinsic, polymorphic on the
return type:

I didnā€™t know intrinsics could be polymorphic on the return type.

- Consumes a "deopt" continuation, and replaces the current physical
  stack frame with one or more interpreter frames (implementation
  provided by the runtime).
- Calls to this intrinsic must be `musttail` (verifier will check this)
- We'll have some minor logic in the inliner such that when inlining @f into @g
  in

    define i32 @f() {
      if (X) return side_exit() [ "deopt"(X) ];
      return i32 20;
    }

    define i64 @g() {
      if (Y) {
        r = f() [ "deopt"(Y) ];
        print(r);
    }

  We get

    define i64 @g() {
      if (Y) {
        if (X) return side_exit() [ "deopt"(Y, X) ];
        print(20);
      }
    }

  and not

    define i64 @g() {
      if (Y) {
        r = X ? (side_exit() [ "deopt"(Y, X) ]) : 20;
        print(r);
    }

I understand why youā€™re doing this: explicitly model the resume-at-return path. Butā€¦

- Itā€™s a bit awkward vs. side_exit(); unreachable, as evidenced by inlining.

- It would be nice to be able to model frequent OSR points as branch-to-unreachable because it may lead to better optimization, codegen, and compile time. I donā€™t think those are really fundamental problems though aside from adding a large number of return block users, but it may be work to find all of the small performance issues.

- Do you think this will make sense for all return argument conventions, including sret?

(I actually think this is a great approach, Iā€™m just playing Devilā€™s advocate here.)

# step C: Introduce a `guard_on` intrinsic

Will be based around what was discussed / is going to be discussed on
this thread.

(I think Philip was right in suggesting to split out a "step B" that
only introduces a `side_exit` intrinsic. We *will* have to specify
them, since we'd like to optimize some after we've lowered guards into
explicit control flow, and for that we need a specification of side
exits.)

+1

-Andy

Iā€™ve not had time to really dig into all of this thread, but I wanted to point out:

Assuming everyone is on the same page, hereā€™s a rough high level agenda:

step A: Introduce an interposable function attribute

We can bike shed on the name and the exact specification, but the
general idea is that you cannot do IPA / IPO over callsites calling
interposable functions without inlining them. This attribute will
(usually) have to be used on function bodies that can deoptimize (e.g. has a
side exit / guard it in); but also has more general use cases.

Note that we already have this exact concept in the IR via linkage for better or worse. I think it is really confusing as you are currently describing it because it seems deeply overlapping with linkage, which is where the whole interposition thing comes from, and yet you never mention how it interacts with linkage at all. What does it mean to have a common linkage function that lacks the interposable attribute? Or a LinkOnceODR function that does have that attribute?

If the goal is to factor replaceability out of linkage, we should actually factor it out rather than adding yet one more way to talk about this.

And generally, we need to be really careful adding function attributes. Look at the challenges we had figuring out norecurse. Adding attributes needs to be viewed as nearly as high cost as adding instructions, substantially higher cost than intrinsics.

step B: Introduce a side_exit intrinsic

Specify an @llvm.experimental.side_exit intrinsic, polymorphic on the
return type:

  • Consumes a ā€œdeoptā€ continuation, and replaces the current physical
    stack frame with one or more interpreter frames (implementation
    provided by the runtime).

I think it would be really helpful to work to describe these things in terms of semantic contracts on the IR rather than in terms of implementation strategies. For example, not all IR interacts with an interpreter, and so I donā€™t think we should use the term ā€œinterpreterā€ to specify the semantic model exposed by the IR.

I didnā€™t know intrinsics could be polymorphic on the return type.

@llvm.experimental.gc.result is polymorphic on its return type, for
instance.

I understand why youā€™re doing this: explicitly model the
resume-at-return path. Butā€¦

- Itā€™s a bit awkward vs. side_exit(); unreachable, as evidenced by
inlining.

Part of the reason why we thought this scheme,
return-result-of-side-exit, was better than the
side-exit-then-unreachable scheme is that the former is more honest
about data flow; and that would prevent some nastiness around IPA. But
given that we're talking about introducing an `interposable` attribute
that prevents IPA, the side-exit-then-unreachable approach sounds
feasible now. I need to see if there are other reasons for keeping
the return-result-of-side-exit variant; if not, I'll use the
side-exit-then-unreachable scheme.

- It would be nice to be able to model frequent OSR points as
branch-to-unreachable because it may lead to better optimization,
codegen, and compile time.

Agreed.

I donā€™t think those are really fundamental
problems though aside from adding a large number of return block
users

Return block users? Does LLVM coalesce all `ret` instructions to a
single `ret PHI`? I couldn't reproduce this in a small example IR.

but it may be work to find all of the small performance issues.

- Do you think this will make sense for all return argument
conventions, including sret?

I'm not very familiar with sret, but skimming the docs I don't see why
not. But generally, the frontend will have to know to generate
@side_exits that are legal.

-- Sanjoy

# step A: Introduce an `interposable` function attribute

We can bike shed on the name and the exact specification, but the
general idea is that you cannot do IPA / IPO over callsites calling
`interposable` functions without inlining them. This attribute will
(usually) have to be used on function bodies that can deoptimize (e.g. has
a
side exit / guard it in); but also has more general use cases.

Note that we already have this *exact* concept in the IR via linkage for
better or worse. I think it is really confusing as you are currently

I was going to have a more detailed discussion on this in the (yet to
be started) review thread for `interposable`: we'd like to be able to
inline `interposable` functions. The "interposition" can only happen
in physical function boundaries, so opt is allowed to do as much
IPA/IPO it wants once it makes the physical function boundary go away
via inlining. None of linkage types seem to have this property.

Part of the challenge here is to specify the attribute in a way that
allows inlining, but not IPA without inlining. In fact, maybe it is
best to not call it "interposable" at all?

Actually, I think one of the problems we're trying to solve with
`interposable` is applicable to the available_externally linkage as
well. Say we have

void foo() available_externally {
  %t0 = load atomic %ptr
  %t1 = load atomic %ptr
  if (%t0 != %t1) print("X");
}
void main() {
  foo();
  print("Y");
}

Now the possible behaviors of the above program are {print("X"),
print("Y")} or {print("Y")}. But if we run opt then we have

void foo() available_externally readnone nounwind {
  ;; After CSE'ing the two loads and folding the condition
}
void main() {
  foo();
  print("Y");
}

and some generic reordering

void foo() available_externally readnone nounwind {
  ;; After CSE'ing the two loads and folding the condition
}
void main() {
  print("Y");
  foo();  // legal since we're moving a readnone nounwind function that
          // was guaranteed to execute (hence can't have UB)
}

Now if we do not inline @foo(), and instead re-link the call site in
@main to some non-optimized copy (or differently optimized copy) of
foo, then it is possible for the program to have the behavior
{print("Y"); print ("X")}, which was disallowed in the earlier
program.

In other words, opt refined the semantics of @foo() (i.e. reduced the
set of behaviors it may have) in ways that would make later
optimizations invalid if we de-refine the implementation of @foo().

Given this, I'd say we don't need a new attribute / linkage type, and
can add our restriction to the available_externally linkage.

describing it because it seems deeply overlapping with linkage, which is
where the whole interposition thing comes from, and yet you never mention
how it interacts with linkage at all. What does it mean to have a common
linkage function that lacks the interposable attribute? Or a LinkOnceODR
function that does have that attribute?

What would you say about adding this as a new kind of linkage? I was
trying to avoid doing that since the intended semantics of,
GlobalValue::InterposableLinkage don't just describe what a linker
does, but also restricts what can be legally linked in (for the
can-inline-but-can't-IPA property to hold), but perhaps that's the
best way forward?

[Edit: I wrote this section before I wrote the available_externally
thing above.]

If the goal is to factor replaceability out of linkage, we should actually
factor it out rather than adding yet one more way to talk about this.

And generally, we need to be *really* careful adding function attributes.
Look at the challenges we had figuring out norecurse. Adding attributes
needs to be viewed as nearly as high cost as adding instructions,
substantially higher cost than intrinsics.

Only indirectly relevant to this discussion, but this is news to me --
my mental cost model was "attributes are easy to add and maintain", so
I didn't think too hard about alternatives.

I think it would be really helpful to work to describe these things in terms
of semantic contracts on the IR rather than in terms of implementation
strategies. For example, not all IR interacts with an interpreter, and so I
don't think we should use the term "interpreter" to specify the semantic
model exposed by the IR.

That's what I was getting at by:

Chandler raised some points on IRC around making `guard_on` (and
possibly `side_exit`?) more generally applicable to unmanaged
languages; so we'd want to be careful to specify these in a way that
allows for implementations in an unmanaged environments (by function
cloning, for instance).

-- Sanjoy

step A: Introduce an interposable function attribute

We can bike shed on the name and the exact specification, but the
general idea is that you cannot do IPA / IPO over callsites calling
interposable functions without inlining them. This attribute will
(usually) have to be used on function bodies that can deoptimize (e.g. has
a
side exit / guard it in); but also has more general use cases.

Note that we already have this exact concept in the IR via linkage for
better or worse. I think it is really confusing as you are currently

I was going to have a more detailed discussion on this in the (yet to
be started) review thread for interposable: weā€™d like to be able to
inline interposable functions. The ā€œinterpositionā€ can only happen
in physical function boundaries, so opt is allowed to do as much
IPA/IPO it wants once it makes the physical function boundary go away
via inlining. None of linkage types seem to have this property.

Part of the challenge here is to specify the attribute in a way that
allows inlining, but not IPA without inlining. In fact, maybe it is
best to not call it ā€œinterposableā€ at all?

Yea, this is something very different from interposable. GCC and other compilers that work to support symbol interposition make specific efforts to not inline them in specific ways (that frankly I donā€™t fully understand, as it doesnā€™t seem to be always which is what the definition of interposable indicates to meā€¦).

Actually, I think one of the problems weā€™re trying to solve with
interposable is applicable to the available_externally linkage as
well. Say we have

void foo() available_externally {
%t0 = load atomic %ptr
%t1 = load atomic %ptr
if (%t0 != %t1) print("X");
}
void main() {
foo();
print("Y");
}

Now the possible behaviors of the above program are {print(ā€œXā€),
print(ā€œYā€)} or {print(ā€œYā€)}. But if we run opt then we have

void foo() available_externally readnone nounwind {
;; After CSE'ing the two loads and folding the condition
}
void main() {
foo();
print("Y");
}

and some generic reordering

void foo() available_externally readnone nounwind {
;; After CSE'ing the two loads and folding the condition
}
void main() {
print("Y");
foo(); // legal since we're moving a readnone nounwind function that
// was guaranteed to execute (hence can't have UB)
}

Now if we do not inline @foo(), and instead re-link the call site in
@main to some non-optimized copy (or differently optimized copy) of
foo, then it is possible for the program to have the behavior
{print(ā€œYā€); print (ā€œXā€)}, which was disallowed in the earlier
program.

In other words, opt refined the semantics of @foo() (i.e. reduced the
set of behaviors it may have) in ways that would make later
optimizations invalid if we de-refine the implementation of @foo().

Given this, Iā€™d say we donā€™t need a new attribute / linkage type, and
can add our restriction to the available_externally linkage.

Interesting example, I agree it seems quite broken. Even more interesting, I canā€™t see anything we do in LLVM that prevents this from breaking essentially everywhere. =[[[[[[

link_once and link_once_odr at least seem equally broken because we donā€™t put the caller and callee into a single comdat or anything to ensure that the optimized one is selected at link time.

But there are also multiple different kinds of overriding we should think about:

  1. Can the definition get replaced at link time (or at runtime via an interpreter) with a differently optimized variant stemming from the same definition (thus it has the same behavior but not the same refinement). This is the ā€œODRā€ guarantee in some linkages (and vaguely implied for available_externally)

  2. Can the definition get replaced at link time (or at runtime via an interpreter) with a function that has fundamentally different behavior

  3. To support replacing the definition, the call edge must be preserved.

To support interposition you need #3, the most restrictive model. LLVM (i think) actually does a decent job of modeling this as we say that the function is totally opaque. We donā€™t do IPA or inlining. But I donā€™t think thatā€™s what youā€™re looking for.

Iā€™m curious whether your use case is actually in the #1 bucket or #2 bucket. That is, Iā€™m wondering if there is any way in which the ā€œdifferent implementationā€ would actually break in the face of optimizations on things like non-deduced function attributes, etc.

If your use case looks more like #1, then I actually think this is what we want for link_once_odr and available_externally. You probably want the former rather than the latter as you donā€™t want it to be discardable.

If your use case looks more like #2, then I think its essentially ā€œlink_onceā€ or ā€œlink_anyā€, and it isnā€™t clear that LLVM does a great job of modeling this today.

Iā€™d be mildly interested in factoring the discarding semantics from the ā€œwhat do other definitions look likeā€ semantics. The former are what I think fit cleanly into linkages, and the latter I think we wedged into them because they seemed to correspond in some cases and because attributes used to be very limited in number.

Part of the challenge here is to specify the attribute in a way that
allows inlining, but not IPA without inlining. In fact, maybe it is
best to not call it "interposable" at all?

Yea, this is something *very* different from interposable. GCC and other
compilers that work to support symbol interposition make specific efforts to
not inline them in specific ways (that frankly I don't fully understand, as
it doesn't seem to be always which is what the definition of interposable
indicates to me...).

Sure, not calling it interposable is fine for me. Credit where credit
is due: Philip had warned me about this exact thing offline (that the
term "interposable" is already taken).

In other words, opt refined the semantics of @foo() (i.e. reduced the
set of behaviors it may have) in ways that would make later
optimizations invalid if we de-refine the implementation of @foo().

Given this, I'd say we don't need a new attribute / linkage type, and
can add our restriction to the available_externally linkage.

Interesting example, I agree it seems quite broken. Even more interesting, I
can't see anything we do in LLVM that prevents this from breaking
essentially everywhere. =[[[[[[

link_once and link_once_odr at least seem equally broken because we don't
put the caller and callee into a single comdat or anything to ensure that
the optimized one is selected at link time.

But there are also multiple different kinds of overriding we should think
about:

1) Can the definition get replaced at link time (or at runtime via an
interpreter) with a differently *optimized* variant stemming from the same
definition (thus it has the same behavior but not the same refinement). This
is the "ODR" guarantee in some linkages (and vaguely implied for
available_externally)

2) Can the definition get replaced at link time (or at runtime via an
interpreter) with a function that has fundamentally different behavior

3) To support replacing the definition, the call edge must be preserved.

I'm working under context of a optimizer that does not know if its
input has been previously optimized or if its input is "raw" IR.
Realistically, I'd say deviating LLVM from this will be painful.
Given that I don't see how (2) and (3) are different:

Firstly, (1) and (2) are not _that_ different -- a differently
optimized variant of a function can have completely different
observable behavior (e.g. the "original" function could have started
with "if (*ptr != *ptr) { call @unknown(); return; }"). The only
practical difference I can see between (1) and (2) is that in (2)
inlining is incorrect since it would be retroactively invalid on
replacement. In (1) we have the invariant that the function in
question is always *a* valid implementation of what we started with,
but this can not be used to infer anything about the function we'll
actually call at runtime. Thus, I don't understand the difference
between (2) and (3); both of them seem to imply "don't do IPA/IPO,
including inlining" while (1) implies "the only IPA/IPO you can do is
inlining".

I'm curious whether your use case is actually in the #1 bucket or #2
bucket. That is, I'm wondering if there is any way in which the
"different implementation" would actually break in the face of
optimizations on things like *non-deduced* function attributes, etc.

With the understanding I have at this time (that isn't complete, as I
say above) I'd say we're (1). We can replace a possibly inlined
callee with another
arbitrary function, but if that happens the runtime will deoptimize
the caller. I'm not sure if I understood your second statement -- but
assuming I did -- we do "manually" attach attributes to some
well-known functions (e.g. in the standard library), but they never
get replaced.

-- Sanjoy

Iā€™m not suggesting that either. I think there is a happy middle ground, but Iā€™m probably not explaining it very effectively, sorry. Lemme just try again.

There are two conceptually separable aspects of IPO as it is commonly performed within LLVM. One is to use attributes on a function to optimize callers. The second is to use the definition of a function to deduce more refined attributes.

This separation is what I was trying to draw attention to between (1) and (2) above. My idea is that with (1) it remains fine to optimize callers based on a functionā€™s attributes, but not to deduce more refined attributes. But with (2) I donā€™t think you can do either.

I think (3) differs from both (1) and (2) because in some cases the restrictions only remain if the call edge remains. If you nuke (or rename) the call edge, the restrictions go away completely. In other cases though (my (3) example), the compiler is required to leave that exact call edge in place.

Currently, we clearly donā€™t actually separate these conceptual sides of IPO. We have a very all-or-nothing approach instead. So maybe this distinction isnā€™t interesting. But hopefully it explains how Iā€™m thinking of it. And because frontends can often directly specify some attributes that we know a-priori, it doesnā€™t seem a vacuous distinction to me in theory.

Does that explain things any better?

I'm not suggesting that either. I think there is a happy middle ground, but
I'm probably not explaining it very effectively, sorry. Lemme just try
again.

There are two conceptually separable aspects of IPO as it is commonly
performed within LLVM. One is to use attributes on a function to optimize
callers. The second is to use the definition of a function to deduce more
refined attributes.

But we also have more aggressive opts like IPSCCP that don't fall in
either category.

This separation is what I was trying to draw attention to between (1) and
(2) above. My idea is that with (1) it remains fine to optimize callers
based on a function's attributes, but not to deduce more refined attributes.
But with (2) I don't think you can do either.

I think (3) differs from both (1) and (2) because in some cases the
restrictions only remain *if* the call edge remains. If you nuke (or rename)
the call edge, the restrictions go away completely. In other cases though
(my (3) example), the compiler is required to leave that exact call edge in
place.

Currently, we clearly don't actually separate these conceptual sides of IPO.
We have a very all-or-nothing approach instead. So maybe this distinction
isn't interesting. But hopefully it explains how I'm thinking of it. And
because frontends can often directly specify *some* attributes that we know
a-priori, it doesn't seem a vacuous distinction to me in theory.

Does that explain things any better?

Yes, I think I see what you are going for (what I thought was (2)/(3)
is really just (3) in your scheme). Practically, I don't think it is
useful to differentiate between (1) and (2). To get (2)-like
behavior, the frontend can always emit a function definition without
any pre-defined attributes; annotated with (1)'s linkage type
(available_externally or linkonce_odr).

-- Sanjoy

The "introduce a side exit intrinsic" part of the plan is out for
review: http://reviews.llvm.org/D17732 (it does not require us to
solve the issue with linkage / interposability).

I decided to not call it a side_exit, since "side exit" can be
confused to mean "side exit from a superblock".