Rotated loop identification

Dear all,

I'm working on a late IR target dependent optimization on loops. A part of this
optimization requires to derive "by hand" the trip-count expression of a given
loop. In order to handle correctly these cases I need to check if the loop has
an entry guard or not. The problem I have is that starting from the information
I derive during my analysis (initial IV value, last IV value, comparison kind) I
don't know how to verify if the entry guard is present or not, mainly because
algebraic simplifications changed the entry guard comparison. Being able to know
if a given loop has been rotated would be enough for my purposes but I haven't
found a way to obtain this.

I think that the only way is to track in the IR the information, but I don't
know how to track it in a safe (the information shouldn't be lost) and
transparent way (it should not interfere with the optimizer). Thinking to custom
intrinsics I don't understand how to allow memory optimizations and, at the same
time, prevent the intrinsic elimination or hoisting from the target loop.

I would appreciate any hint on this topic.

Thanks in advance for future replies.

Best regards,
Michele Scandale

Up

See ScalarEvolution::isLoopEntryGuardedByCond. It's part of the analysis that we use to determine trip count and benefits from rotated loops.

LLVM never annotates the IR with a history of transformations. You could potentially annotate the LoopInfo analysis, but it will only be preserved within the same LoopPassManager (you would have to run right after LoopRotate).

The only robust solution is ScalarEvolution.

-Andy

Thanks for your reply.

Maybe it wasn't so clear, but the optimization I'm writing is target-dependent
and so it's declared inside the target backend and is run after the independent
optimizer. So I cannot run my pass just after LoopRotate and before InstCombine.
I can still use ScalarEvolution, but the instruction combiner can in some cases
simplify the entry guard condition.

A small example is the following:

/////////////////////////////////////
extern void x(int);

int foo_int(int a, int b) {
  int i;
  for (i = 0; i < b/2; ++i) {
    x(i);
  }

  return 0;
}
/////////////////////////////////////

the correspondent optimized IR is:

///////////////////////////////////////////////////////////////////////////////
define i32 @foo_int(i32 %a, i32 %b) nounwind uwtable {
entry:
  %div = sdiv i32 %b, 2
  %cmp3 = icmp sgt i32 %b, 1
  br i1 %cmp3, label %for.body, label %for.end

for.body: ; preds = %entry, %for.body
  %i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
  tail call void @x(i32 %i.04) nounwind
  %inc = add nsw i32 %i.04, 1
  %cmp = icmp slt i32 %inc, %div
  br i1 %cmp, label %for.body, label %for.end

for.end: ; preds = %for.body, %entry
  ret i32 0
}
///////////////////////////////////////////////////////////////////////////////

From the IR is possible to see that the condition in the entry block is

  %cmp3 = icmp sgt i32 %b, 1

but starting from the loop IV it should be expected something like

  %cmp3 = icmp slt i32 0, %div

and this is verified looking at the instcombine debug trace:
////////////////////////////////////////////////
IC: Visiting: %cmp3 = icmp slt i32 0, %div
IC: Old = %cmp3 = icmp sgt i32 %div, 0
    New = <badref> = icmp sge i32 %b, 2
IC: ADD: %cmp3 = icmp sge i32 %b, 2
IC: ERASE %0 = icmp sgt i32 %div, 0
IC: ADD: %div = sdiv i32 %b, 2
IC: Visiting: %div = sdiv i32 %b, 2
IC: Visiting: %cmp3 = icmp sge i32 %b, 2
IC: Old = %cmp3 = icmp sge i32 %b, 2
    New = <badref> = icmp sgt i32 %b, 1
IC: ADD: %cmp3 = icmp sgt i32 %b, 1
IC: ERASE %0 = icmp sge i32 %b, 2
////////////////////////////////////////////////

The fact that a loop has been rotated it means that it has an entry guard that
allows in the case of countable loop to optimize the trip count expression: it's
an information that can be lost during optimization and maybe not easy to derive
with an analysis. The usage of "IR extra information" would help in this case.
Using intrinsics with declared side-effects as information handlers I don't know
if it adds constraints to the optimizer. Indeed declaring no side-effects won't
preserve the information.

How can all this be solved?

Thanks again.

Best regards,
Michele Scandale

Thanks for your reply.

Maybe it wasn't so clear, but the optimization I'm writing is target-dependent
and so it's declared inside the target backend and is run after the independent
optimizer. So I cannot run my pass just after LoopRotate and before InstCombine.
I can still use ScalarEvolution, but the instruction combiner can in some cases
simplify the entry guard condition.

A small example is the following:

/////////////////////////////////////
extern void x(int);

int foo_int(int a, int b) {
int i;
for (i = 0; i < b/2; ++i) {
   x(i);
}

return 0;
}
/////////////////////////////////////

the correspondent optimized IR is:

///////////////////////////////////////////////////////////////////////////////
define i32 @foo_int(i32 %a, i32 %b) nounwind uwtable {
entry:
%div = sdiv i32 %b, 2
%cmp3 = icmp sgt i32 %b, 1
br i1 %cmp3, label %for.body, label %for.end

for.body: ; preds = %entry, %for.body
%i.04 = phi i32 [ %inc, %for.body ], [ 0, %entry ]
tail call void @x(i32 %i.04) nounwind
%inc = add nsw i32 %i.04, 1
%cmp = icmp slt i32 %inc, %div
br i1 %cmp, label %for.body, label %for.end

for.end: ; preds = %for.body, %entry
ret i32 0
}
///////////////////////////////////////////////////////////////////////////////

From the IR is possible to see that the condition in the entry block is

%cmp3 = icmp sgt i32 %b, 1

but starting from the loop IV it should be expected something like

%cmp3 = icmp slt i32 0, %div

and this is verified looking at the instcombine debug trace:
////////////////////////////////////////////////
IC: Visiting: %cmp3 = icmp slt i32 0, %div
IC: Old = %cmp3 = icmp sgt i32 %div, 0
   New = <badref> = icmp sge i32 %b, 2
IC: ADD: %cmp3 = icmp sge i32 %b, 2
IC: ERASE %0 = icmp sgt i32 %div, 0
IC: ADD: %div = sdiv i32 %b, 2
IC: Visiting: %div = sdiv i32 %b, 2
IC: Visiting: %cmp3 = icmp sge i32 %b, 2
IC: Old = %cmp3 = icmp sge i32 %b, 2
   New = <badref> = icmp sgt i32 %b, 1
IC: ADD: %cmp3 = icmp sgt i32 %b, 1
IC: ERASE %0 = icmp sge i32 %b, 2
////////////////////////////////////////////////

The fact that a loop has been rotated it means that it has an entry guard that
allows in the case of countable loop to optimize the trip count expression: it's
an information that can be lost during optimization and maybe not easy to derive
with an analysis. The usage of "IR extra information" would help in this case.
Using intrinsics with declared side-effects as information handlers I don't know
if it adds constraints to the optimizer. Indeed declaring no side-effects won't
preserve the information.

How can all this be solved?

Thanks for the details. Please add them to a bug report.

InstCombine is certainly interfering with our ability to analyze the loop. I think the problem is that ScalarEvolution cannot reason about signed division. This is a general problem independent of your target. At the moment I'm not sure if we can teach ScalarEvolution to reason about this, or if we can defer certain InstCombine's until after loop passes run, including target-specific loop passes (it's hard for me to say whether this InstCombine really a necessary canonicalization before other passes). A bug report would be appropriate.

Absent a fix for the above problem, it is certainly possible for you to add target-specific intrinsics. I can't guarantee that it won't interfere with optimization. It might also be possible to add metadata to the loop exit branch when it is cloned by LoopRotate, indicating that the loop is guarded by the same condition. I don't think we want something like this on trunk though since it is a heavy-handed workaround that could be hard to maintain.

-Andy

Thanks for the details. Please add them to a bug report.

I will do this.

InstCombine is certainly interfering with our ability to analyze the loop. I think the problem is that ScalarEvolution cannot reason about signed division. This is a general problem independent of your target. At the moment I'm not sure if we can teach ScalarEvolution to reason about this, or if we can defer certain InstCombine's until after loop passes run, including target-specific loop passes (it's hard for me to say whether this InstCombine really a necessary canonicalization before other passes). A bug report would be appropriate.

It sounds to me that the same capabilities of InstCombine should be implemented
also in the ScalarEvolution.

Absent a fix for the above problem, it is certainly possible for you to add target-specific intrinsics. I can't guarantee that it won't interfere with optimization. It might also be possible to add metadata to the loop exit branch when it is cloned by LoopRotate, indicating that the loop is guarded by the same condition. I don't think we want something like this on trunk though since it is a heavy-handed workaround that could be hard to maintain.

Following also the proposal 'Parallel Loop Metadata' it seems to me that
basically there's the same problem. Maybe I've not fully understood the things
but, assuming to have the ability to map metadata to BasicBlocks then such
informations could be registered directly to the loop header or latch, without
custom intrinsics and hopefully without interfering with the optimizer.

What do you think about this?

Thanks again.

Best regards,
Michele Scandale

Thanks for the details. Please add them to a bug report.

I will do this.

Thanks.

InstCombine is certainly interfering with our ability to analyze the loop. I think the problem is that ScalarEvolution cannot reason about signed division. This is a general problem independent of your target. At the moment I'm not sure if we can teach ScalarEvolution to reason about this, or if we can defer certain InstCombine's until after loop passes run, including target-specific loop passes (it's hard for me to say whether this InstCombine really a necessary canonicalization before other passes). A bug report would be appropriate.

It sounds to me that the same capabilities of InstCombine should be implemented
also in the ScalarEvolution.

It's doable, just not pretty. We could special-case SCEV's isImpliedCond to look for an sdiv opcode and recognize the inverse of InstCombine's trick. I doubt it makes sense to add an SDIV operator to SCEV since SCEV would want to assume modulus division, and the IR sdiv is a truncating divide.

Absent a fix for the above problem, it is certainly possible for you to add target-specific intrinsics. I can't guarantee that it won't interfere with optimization. It might also be possible to add metadata to the loop exit branch when it is cloned by LoopRotate, indicating that the loop is guarded by the same condition. I don't think we want something like this on trunk though since it is a heavy-handed workaround that could be hard to maintain.

Following also the proposal 'Parallel Loop Metadata' it seems to me that
basically there's the same problem. Maybe I've not fully understood the things
but, assuming to have the ability to map metadata to BasicBlocks then such
informations could be registered directly to the loop header or latch, without
custom intrinsics and hopefully without interfering with the optimizer.

What do you think about this?

There's been talk of adding metata do the branch that terminates the loop latch block. What llvm calls the "latch" is just a unique backward branch to the loop header and not necessarilly even a loop exit.

I'm not sure how you would interpret that metadata, since the branch exit may be rewritten (just like the loop guard). For example, the indvars pass may convert it into a != test.

As long as this is brainstorming time, I actually like the idea of an llvm.invariant intrinsic that the optimizers know to ignore. I like it for other purposes, but would happen to work for you as a temporary workaround. It could take one or two IR values (as metadata operands) and a metadata language describing the invariant, such as a relational operator and optional constant. In your case, you want to know that the loop counter's starting value is less than its limit, so you could conveniently plop one of those in the loop preheader. The invariant would only go away if no one else used the value, which in your case would make sense (e.g. if the loop test were rewritten in terms of %b, you probably wouldn't need the invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

But really, the only thing likely to make it onto trunk for this bug is the SCEV fix mentioned above. More details can be discussed in PR15205.

-Andy

From: "Andrew Trick" <atrick@apple.com>
To: "Michele Scandale" <michele.scandale@gmail.com>
Cc: "llvmdev@cs.uiuc.edu List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, February 7, 2013 11:56:42 PM
Subject: Re: [LLVMdev] Rotated loop identification

>> Thanks for the details. Please add them to a bug report.
>
> I will do this.

Thanks.

>> InstCombine is certainly interfering with our ability to analyze
>> the loop. I think the problem is that ScalarEvolution cannot
>> reason about signed division. This is a general problem
>> independent of your target. At the moment I'm not sure if we can
>> teach ScalarEvolution to reason about this, or if we can defer
>> certain InstCombine's until after loop passes run, including
>> target-specific loop passes (it's hard for me to say whether this
>> InstCombine really a necessary canonicalization before other
>> passes). A bug report would be appropriate.
>
> It sounds to me that the same capabilities of InstCombine should be
> implemented
> also in the ScalarEvolution.

It's doable, just not pretty. We could special-case SCEV's
isImpliedCond to look for an sdiv opcode and recognize the inverse
of InstCombine's trick. I doubt it makes sense to add an SDIV
operator to SCEV since SCEV would want to assume modulus division,
and the IR sdiv is a truncating divide.

>> Absent a fix for the above problem, it is certainly possible for
>> you to add target-specific intrinsics. I can't guarantee that it
>> won't interfere with optimization. It might also be possible to
>> add metadata to the loop exit branch when it is cloned by
>> LoopRotate, indicating that the loop is guarded by the same
>> condition. I don't think we want something like this on trunk
>> though since it is a heavy-handed workaround that could be hard
>> to maintain.
>
> Following also the proposal 'Parallel Loop Metadata' it seems to me
> that
> basically there's the same problem. Maybe I've not fully understood
> the things
> but, assuming to have the ability to map metadata to BasicBlocks
> then such
> informations could be registered directly to the loop header or
> latch, without
> custom intrinsics and hopefully without interfering with the
> optimizer.
>
> What do you think about this?

There's been talk of adding metata do the branch that terminates the
loop latch block. What llvm calls the "latch" is just a unique
backward branch to the loop header and not necessarilly even a loop
exit.

I'm not sure how you would interpret that metadata, since the branch
exit may be rewritten (just like the loop guard). For example, the
indvars pass may convert it into a != test.

As long as this is brainstorming time, I actually like the idea of an
llvm.invariant intrinsic that the optimizers know to ignore. I like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter's starting value is less than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn't need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you're in favor of this approach, can I pop out of the woodwork and say that it appears that we have a reasonably-large group of contributors in favor of an invariant intrinsic and we should move forward on that basis?

Thanks again,
Hal

I read the post you suggested. If I understood correctly the idea I like it very much. Just to better understand, when you say <<< It would be generally useful to model an intrinsic that is free and not a strong use but does have control dependence. That's what I call a meta-intrinsic. >>> you mean that this class of intrinsics cannot be removed because they have no uses and the can't be moved freely, e.g. hoisted from a loop?
Indeed, if they handle a value (they are users), that value would never be eliminated if the only user is the meta-intrinsic itself? Maybe this kind of behaviour must be a parameter respect to the semantic you need to implement (At the end, if the intrinsic is the only user and during lowering is simply eliminated all the values used can be checked to be dead and deleted in that case).
IMHO this is like a property that an intrinsic can have or not and the proposed llvm.invariant intrinsic is a particular intrinsic that has this property and whose semantic is to describe invariant facts (the metadata language you mentioned).

Thanks again.

Best regards,
Michele Scandale

There's been talk of adding metata do the branch that terminates the loop latch block. What llvm calls the "latch" is just a unique backward branch to the loop header and not necessarilly even a loop exit.

I'm not sure how you would interpret that metadata, since the branch exit may be rewritten (just like the loop guard). For example, the indvars pass may convert it into a != test.

As long as this is brainstorming time, I actually like the idea of an llvm.invariant intrinsic that the optimizers know to ignore. I like it for other purposes, but would happen to work for you as a temporary workaround. It could take one or two IR values (as metadata operands) and a metadata language describing the invariant, such as a relational operator and optional constant. In your case, you want to know that the loop counter's starting value is less than its limit, so you could conveniently plop one of those in the loop preheader. The invariant would only go away if no one else used the value, which in your case would make sense (e.g. if the loop test were rewritten in terms of %b, you probably wouldn't need the invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

I read the post you suggested. If I understood correctly the idea I like it very much. Just to better understand, when you say <<< It would be generally useful to model an intrinsic that is free and not a strong use but does have control dependence. That's what I call a meta-intrinsic. >>> you mean that this class of intrinsics cannot be removed because they have no uses and the can't be moved freely, e.g. hoisted from a loop?
Indeed, if they handle a value (they are users), that value would never be eliminated if the only user is the meta-intrinsic itself? Maybe this kind of behaviour must be a parameter respect to the semantic you need to implement (At the end, if the intrinsic is the only user and during lowering is simply eliminated all the values used can be checked to be dead and deleted in that case).

The value used by the intrinsic would be naturally eliminated as dead code if its only users were meta-intrinsics. It's just metadata that happens to be associated with some position in the control flow graph other than the value's definition. So in that sense it would not be a great solution for attaching metadata to a basic block independent of a value. For that, you probably want metadata directly on the block's terminator.

IMHO this is like a property that an intrinsic can have or not and the proposed llvm.invariant intrinsic is a particular intrinsic that has this property and whose semantic is to describe invariant facts (the metadata language you mentioned).

Yes, I agree. I mainly want a uniform approach for the optimizer. Currently we have a various ad-hoc approaches.

-Andy

A general llvm.invariant does seem like a convenient thing. I’m not pushing very hard because I don’t need it for anything I’m working on yet, and wasn’t aware that anyone else liked this particular idea. But I haven’t heard any argument against it. I am aware that a lot of people want to attach new semantics to the IR that are not necessarily easy to preserve, so I don’t want to oversell a solution.

I don’t have an opinion on whether each type of invariant should get its own intrinsic ID or we should have one and use the flexibility of metadata operands. The important thing is that the optimizer have a uniform approach to handling them, and that they truly cannot interfere with optimization other than as intended. For the sake of discussion, and to avoid confusion with the current, badly named llvm.invariant, I’ll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we’re sure we can unify or replace most of the current meta-style intrinsics. But I first need to better understand the motivation for those approaches, which I have not personally worked with much (they’re somewhat marginalized).

  • llvm.dbg

Are there any problems with our current approach?

  • llvm.lifetime and llvm.invariant (for pointers)

Why are they real value users? We probably don’t care about extra
users of an object base, but are we confident they won’t affect
optimization except as intended? I’d like to see an experiment where
we emit these gratuitously, suppress optimizations that use them,
strip them after -O3, and verify no effect on bitcode.

  • llvm.annotation

Does anyone use these? Supposedly the optimizer “ignores” them. But
the optimizer certainly doesn’t ignore additional uses of a value.

  • llvm.expect

This is injected in the def-use chain. Accidentally running the
optimizer with these intrinsics present would be disasterous. It’s
really the same problem as representing an invariant, just different
semantics.

  • load range metadata

We should be able to express general value ranges, independent of loads.
We should be able to express the invariant conditionally, independent of the value’s definition.
The load’s should be gvn-able without losing the invariants.

Future uses:

  • Relating value to other values (not just a constant range).

  • Pointer alignment.

  • Pointers to immutable memory (when dominated by intrinsic).

You mentioned a number of other things you’d like to use invariants for in a previous post, I won’t try to repeat them.

-Andy

I would only endorse a new llvm.meta approach if we're sure we can unify or

replace most of the current meta-style intrinsics. But I first need to
better understand the motivation for those approaches, which I have not
personally worked with much (they're somewhat marginalized).

- llvm.dbg

  Are there any problems with our current approach?

Haven't been following the current discussion, but in general there aren't
any problems with having llvm.dbg as a class of intrinsic.

-eric

From: "Andrew Trick" <atrick@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvmdev@cs.uiuc.edu List" <llvmdev@cs.uiuc.edu>, "Michele Scandale" <michele.scandale@gmail.com>
Sent: Friday, February 8, 2013 1:52:55 PM
Subject: llvm.meta (was Rotated loop identification)

As long as this is brainstorming time, I actually like the idea of an
llvm.invariant intrinsic that the optimizers know to ignore. I like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter's starting value is less than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn't need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you're in favor of this approach, can I pop out of the woodwork
and say that it appears that we have a reasonably-large group of
contributors in favor of an invariant intrinsic and we should move
forward on that basis?

A general llvm.invariant does seem like a convenient thing. I'm not
pushing very hard because I don't need it for anything I'm working
on yet, and wasn't aware that anyone else liked this particular
idea. But I haven't heard any argument against it. I am aware that a
lot of people want to attach new semantics to the IR that are not
necessarily easy to preserve, so I don't want to oversell a
solution.

I don't have an opinion on whether each type of invariant should get
its own intrinsic ID or we should have one and use the flexibility
of metadata operands. The important thing is that the optimizer have
a uniform approach to handling them, and that they truly cannot
interfere with optimization other than as intended. For the sake of
discussion, and to avoid confusion with the current, badly named
llvm.invariant, I'll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we're sure we can
unify or replace most of the current meta-style intrinsics. But I
first need to better understand the motivation for those approaches,
which I have not personally worked with much (they're somewhat
marginalized).

Thanks for writing this! As you point out there are already a number of "pseudo-users" in LLVM, and we don't have a good general scheme for handling them. Currently, special handling for llvm.dbg, expect, etc. are coded into several (many) different places to make them appear free, and adding more in a similar way might become unmanageable. Even worse, these intrinsics break the "hasOneUser" check, and this interferes with optimization. This may not be important for debug intrinsics, but certainly is for features intended to help optimization.

Based on this, it seems that we need to differentiate two classes of users: real users and, as Chandler called them, ephemeral users. We could update hasOneUser() to be hasOneUser(bool countEphemerals = false), or something like that, and the trick will be to support this without requiring additional iteration over all users. Thoughts?

We should also use this new class of users to replace a lot of the special-case code that currently exists for making some of these intrinsics free (and removing them before CodeGen, etc.).

Specifically regarding invariants; I think that dbg and lifetime (and annotation?) would need to remain separate intrinsics. The invariant could have both a 'required' and 'expected' mode, and we could merge expect into it. Value ranges should fit naturally into the semantics of an invariant.

- llvm.dbg

Are there any problems with our current approach?

- llvm.lifetime and llvm.invariant (for pointers)

Why are they real value users? We probably don't care about extra
users of an object base, but are we confident they won't affect
optimization except as intended? I'd like to see an experiment where
we emit these gratuitously, suppress optimizations that use them,
strip them after -O3, and verify no effect on bitcode.

- llvm.annotation

Does anyone use these? Supposedly the optimizer "ignores" them. But
the optimizer certainly doesn't ignore additional uses of a value.

As I recall, the optimizer does not currently consider these free everywhere that it should (by inspection); nevertheless, I've never seem them used anywhere. Why were they introduced?

- llvm.expect

This is injected in the def-use chain. Accidentally running the
optimizer with these intrinsics present would be disasterous. It's
really the same problem as representing an invariant, just different
semantics.

- load range metadata

We should be able to express general value ranges, independent of
loads.
We should be able to express the invariant conditionally, independent
of the value's definition.
The load's should be gvn-able without losing the invariants.

Agreed.

-Hal

From: "Andrew Trick" <atrick@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvmdev@cs.uiuc.edu List" <llvmdev@cs.uiuc.edu>, "Michele Scandale" <michele.scandale@gmail.com>
Sent: Friday, February 8, 2013 1:52:55 PM
Subject: llvm.meta (was Rotated loop identification)

As long as this is brainstorming time, I actually like the idea of an
llvm.invariant intrinsic that the optimizers know to ignore. I like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter's starting value is less than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn't need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you're in favor of this approach, can I pop out of the woodwork
and say that it appears that we have a reasonably-large group of
contributors in favor of an invariant intrinsic and we should move
forward on that basis?

A general llvm.invariant does seem like a convenient thing. I'm not
pushing very hard because I don't need it for anything I'm working
on yet, and wasn't aware that anyone else liked this particular
idea. But I haven't heard any argument against it. I am aware that a
lot of people want to attach new semantics to the IR that are not
necessarily easy to preserve, so I don't want to oversell a
solution.

I don't have an opinion on whether each type of invariant should get
its own intrinsic ID or we should have one and use the flexibility
of metadata operands. The important thing is that the optimizer have
a uniform approach to handling them, and that they truly cannot
interfere with optimization other than as intended. For the sake of
discussion, and to avoid confusion with the current, badly named
llvm.invariant, I'll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we're sure we can
unify or replace most of the current meta-style intrinsics. But I
first need to better understand the motivation for those approaches,
which I have not personally worked with much (they're somewhat
marginalized).

Thanks for writing this! As you point out there are already a number of "pseudo-users" in LLVM, and we don't have a good general scheme for handling them. Currently, special handling for llvm.dbg, expect, etc. are coded into several (many) different places to make them appear free, and adding more in a similar way might become unmanageable. Even worse, these intrinsics break the "hasOneUser" check, and this interferes with optimization. This may not be important for debug intrinsics, but certainly is for features intended to help optimization.

Based on this, it seems that we need to differentiate two classes of users: real users and, as Chandler called them, ephemeral users. We could update hasOneUser() to be hasOneUser(bool countEphemerals = false), or something like that, and the trick will be to support this without requiring additional iteration over all users. Thoughts?

Redefining hasOneUser() would be the most efficient approach, and definitely possible. But it would add complexity to a fundamental property of the IR and potentially cause bugs when developers forget to check the meta users.

I was proposing to use meta-data operands in the intrinsic instead. They add no visible users. They can be left dangling and cleaned up lazily. WeakVH are expensive, but I'm not worried about cost yet. The added complexity in this case happens at SSA update, which does need to visit meta users explicitly. I believe this is currently broken for dbg.value and works by luck for simple cases. So we need to fix this one way or another.

I contrived a test case for this and submitted PR15501.

-Andy

>> From: "Andrew Trick" <atrick@apple.com>
>> To: "Hal Finkel" <hfinkel@anl.gov>
>> Cc: "llvmdev@cs.uiuc.edu List" <llvmdev@cs.uiuc.edu>, "Michele
>> Scandale" <michele.scandale@gmail.com>
>> Sent: Friday, February 8, 2013 1:52:55 PM
>> Subject: llvm.meta (was Rotated loop identification)
>>
>>
>>
>>
>>
>>
>>
>> As long as this is brainstorming time, I actually like the idea of
>> an
>> llvm.invariant intrinsic that the optimizers know to ignore. I
>> like
>> it for other purposes, but would happen to work for you as a
>> temporary workaround. It could take one or two IR values (as
>> metadata operands) and a metadata language describing the
>> invariant,
>> such as a relational operator and optional constant. In your case,
>> you want to know that the loop counter's starting value is less
>> than
>> its limit, so you could conveniently plop one of those in the loop
>> preheader. The invariant would only go away if no one else used
>> the
>> value, which in your case would make sense (e.g. if the loop test
>> were rewritten in terms of %b, you probably wouldn't need the
>> invariant any more).
>>
>> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html
>>
>> If you're in favor of this approach, can I pop out of the woodwork
>> and say that it appears that we have a reasonably-large group of
>> contributors in favor of an invariant intrinsic and we should move
>> forward on that basis?
>>
>>
>> A general llvm.invariant does seem like a convenient thing. I'm
>> not
>> pushing very hard because I don't need it for anything I'm working
>> on yet, and wasn't aware that anyone else liked this particular
>> idea. But I haven't heard any argument against it. I am aware that
>> a
>> lot of people want to attach new semantics to the IR that are not
>> necessarily easy to preserve, so I don't want to oversell a
>> solution.
>>
>>
>> I don't have an opinion on whether each type of invariant should
>> get
>> its own intrinsic ID or we should have one and use the flexibility
>> of metadata operands. The important thing is that the optimizer
>> have
>> a uniform approach to handling them, and that they truly cannot
>> interfere with optimization other than as intended. For the sake
>> of
>> discussion, and to avoid confusion with the current, badly named
>> llvm.invariant, I'll just call these llvm.meta intrinsics.
>>
>>
>> I would only endorse a new llvm.meta approach if we're sure we can
>> unify or replace most of the current meta-style intrinsics. But I
>> first need to better understand the motivation for those
>> approaches,
>> which I have not personally worked with much (they're somewhat
>> marginalized).
>
> Thanks for writing this! As you point out there are already a
> number of "pseudo-users" in LLVM, and we don't have a good general
> scheme for handling them. Currently, special handling for
> llvm.dbg, expect, etc. are coded into several (many) different
> places to make them appear free, and adding more in a similar way
> might become unmanageable. Even worse, these intrinsics break the
> "hasOneUser" check, and this interferes with optimization. This
> may not be important for debug intrinsics, but certainly is for
> features intended to help optimization.
>
> Based on this, it seems that we need to differentiate two classes
> of users: real users and, as Chandler called them, ephemeral
> users. We could update hasOneUser() to be hasOneUser(bool
> countEphemerals = false), or something like that, and the trick
> will be to support this without requiring additional iteration
> over all users. Thoughts?

Redefining hasOneUser() would be the most efficient approach, and
definitely possible. But it would add complexity to a fundamental
property of the IR and potentially cause bugs when developers forget
to check the meta users.

I was proposing to use meta-data operands in the intrinsic instead.
They add no visible users. They can be left dangling and cleaned up
lazily. WeakVH are expensive, but I'm not worried about cost yet.
The added complexity in this case happens at SSA update, which does
need to visit meta users explicitly. I believe this is currently
broken for dbg.value and works by luck for simple cases. So we need
to fix this one way or another.

I'd still like to be able to chart a way forward on this. Regarding defining invariants, I don't understand how using metadata intrinsics really helps because, while a metadata operand could be used to tie the invariant to a value being constrained, the invariant expression itself would add additional uses to the values used in its subexpressions. When I had discussed this with Chandler (many months ago now), we had developed a working assumption that if the invariant is worth having, then it is worth carrying as a dependency of the values it constrains. This may not be true, but I think is a viewpoint worth evaluating.

On the other hand, when I started down this road I was motivated by a desire to implement support for __builtin_assume_aligned. I would still like to do this, or something like this to get equivalent functionality. The idea of metadata references seems like it might work well for alignment assumptions, as a restricted special case. Maybe something like this:

tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

but then the problem becomes making sure that these things stick around long enough to help the interesting inlining cases. Thoughts?

I contrived a test case for this and submitted PR15501.

Interesting.

Thanks again,
Hal

From: “Andrew Trick” <atrick@apple.com>
To: “Hal Finkel” <hfinkel@anl.gov>
Cc: “llvmdev@cs.uiuc.edu List” <llvmdev@cs.uiuc.edu>, “Michele
Scandale” <michele.scandale@gmail.com>
Sent: Friday, February 8, 2013 1:52:55 PM
Subject: llvm.meta (was Rotated loop identification)

As long as this is brainstorming time, I actually like the idea of
an
llvm.invariant intrinsic that the optimizers know to ignore. I
like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the
invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter’s starting value is less
than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used
the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn’t need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you’re in favor of this approach, can I pop out of the woodwork
and say that it appears that we have a reasonably-large group of
contributors in favor of an invariant intrinsic and we should move
forward on that basis?

A general llvm.invariant does seem like a convenient thing. I’m
not
pushing very hard because I don’t need it for anything I’m working
on yet, and wasn’t aware that anyone else liked this particular
idea. But I haven’t heard any argument against it. I am aware that
a
lot of people want to attach new semantics to the IR that are not
necessarily easy to preserve, so I don’t want to oversell a
solution.

I don’t have an opinion on whether each type of invariant should
get
its own intrinsic ID or we should have one and use the flexibility
of metadata operands. The important thing is that the optimizer
have
a uniform approach to handling them, and that they truly cannot
interfere with optimization other than as intended. For the sake
of
discussion, and to avoid confusion with the current, badly named
llvm.invariant, I’ll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we’re sure we can
unify or replace most of the current meta-style intrinsics. But I
first need to better understand the motivation for those
approaches,
which I have not personally worked with much (they’re somewhat
marginalized).

Thanks for writing this! As you point out there are already a
number of “pseudo-users” in LLVM, and we don’t have a good general
scheme for handling them. Currently, special handling for
llvm.dbg, expect, etc. are coded into several (many) different
places to make them appear free, and adding more in a similar way
might become unmanageable. Even worse, these intrinsics break the
“hasOneUser” check, and this interferes with optimization. This
may not be important for debug intrinsics, but certainly is for
features intended to help optimization.

Based on this, it seems that we need to differentiate two classes
of users: real users and, as Chandler called them, ephemeral
users. We could update hasOneUser() to be hasOneUser(bool
countEphemerals = false), or something like that, and the trick
will be to support this without requiring additional iteration
over all users. Thoughts?

Redefining hasOneUser() would be the most efficient approach, and
definitely possible. But it would add complexity to a fundamental
property of the IR and potentially cause bugs when developers forget
to check the meta users.

I was proposing to use meta-data operands in the intrinsic instead.
They add no visible users. They can be left dangling and cleaned up
lazily. WeakVH are expensive, but I’m not worried about cost yet.
The added complexity in this case happens at SSA update, which does
need to visit meta users explicitly. I believe this is currently
broken for dbg.value and works by luck for simple cases. So we need
to fix this one way or another.

I’d still like to be able to chart a way forward on this. Regarding defining invariants, I don’t understand how using metadata intrinsics really helps because, while a metadata operand could be used to tie the invariant to a value being constrained, the invariant expression itself would add additional uses to the values used in its subexpressions. When I had discussed this with Chandler (many months ago now), we had developed a working assumption that if the invariant is worth having, then it is worth carrying as a dependency of the values it constrains. This may not be true, but I think is a viewpoint worth evaluating.

Metadata is a better approach because it doesn’t interfere with optimizations that are free to ignore it.

Yes, if the invariant expression requires IR values and real uses, then that defeats the purpose. I prefer a meta-data language to the alternative of ephemeral expressions. I could be convinced either way. I just think it will be confusing to mix ephemeral expressions with program logic and impractical to make it transparent to optimizations. Not the end of the world though, and I’ll listen to any strong arguments to the contrary.

On the other hand, when I started down this road I was motivated by a desire to implement support for __builtin_assume_aligned. I would still like to do this, or something like this to get equivalent functionality. The idea of metadata references seems like it might work well for alignment assumptions, as a restricted special case. Maybe something like this:

tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think this concept is sound except for the problem of enforcing SSA properties. Do meta-data operands need to dominate? If so, we need to fix SSA updater to deal with this. Preserving the meta-data after code duplication would effectively require a meta-phi. We don’t want a real phi because we don’t want a real use. Admittedly this would be annoying, but we could always drop the metadata users instead. You could get by initially without solving this problem, as we’ve done with dbg.value.

I’m a bit less enthusiastic about meta-intrinsics after realizing that the only current example, dbg.value, is a poor use of the concept. It should have been meta-data attached directly to the value with source line info instead.

If we could demonstrate the advantage of converting lifetime intrinsics to meta-instrinsics (not real uses), then I’d be all for it.

but then the problem becomes making sure that these things stick around long enough to help the interesting inlining cases. Thoughts?

Ensuring meta-data sticks around seems like the dual of the problem of ensuring ephemeral values and uses don’t interfere with optimization. I’d rather have the first problem? Is it really harder?

So, you have two problems:

  1. Expressing the invariants. A meta-data language could be invented for this initially. Or simply constant operands to an alignment intrinsic as in your example.

  2. Associating the invariants with IR values. Options are, in order from the easiest to implement and sell to the public to the most complicated but robust:

2a) Attach meta-data to the value. This only works when constraint dominates all uses. Inlining would need to check before propagating.

2b) Add a real use to a special intrinsic: tail call void @llvm.assume.aligned(i32* %ptr, i32 16)

2c) Add a meta-use to an intrinsic (pure meta-intrinsic): tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think it’s a question of whether the simpler approaches work well enough for you.

-Andy

From: "Andrew Trick" < atrick@apple.com >
To: "Hal Finkel" < hfinkel@anl.gov >
Cc: " llvmdev@cs.uiuc.edu List" < llvmdev@cs.uiuc.edu >, "Michele
Scandale" < michele.scandale@gmail.com >
Sent: Friday, February 8, 2013 1:52:55 PM
Subject: llvm.meta (was Rotated loop identification)

As long as this is brainstorming time, I actually like the idea of
an
llvm.invariant intrinsic that the optimizers know to ignore. I
like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the
invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter's starting value is less
than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used
the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn't need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you're in favor of this approach, can I pop out of the woodwork
and say that it appears that we have a reasonably-large group of
contributors in favor of an invariant intrinsic and we should move
forward on that basis?

A general llvm.invariant does seem like a convenient thing. I'm
not
pushing very hard because I don't need it for anything I'm working
on yet, and wasn't aware that anyone else liked this particular
idea. But I haven't heard any argument against it. I am aware that
a
lot of people want to attach new semantics to the IR that are not
necessarily easy to preserve, so I don't want to oversell a
solution.

I don't have an opinion on whether each type of invariant should
get
its own intrinsic ID or we should have one and use the flexibility
of metadata operands. The important thing is that the optimizer
have
a uniform approach to handling them, and that they truly cannot
interfere with optimization other than as intended. For the sake
of
discussion, and to avoid confusion with the current, badly named
llvm.invariant, I'll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we're sure we can
unify or replace most of the current meta-style intrinsics. But I
first need to better understand the motivation for those
approaches,
which I have not personally worked with much (they're somewhat
marginalized).

Thanks for writing this! As you point out there are already a
number of "pseudo-users" in LLVM, and we don't have a good general
scheme for handling them. Currently, special handling for
llvm.dbg, expect, etc. are coded into several (many) different
places to make them appear free, and adding more in a similar way
might become unmanageable. Even worse, these intrinsics break the
"hasOneUser" check, and this interferes with optimization. This
may not be important for debug intrinsics, but certainly is for
features intended to help optimization.

Based on this, it seems that we need to differentiate two classes
of users: real users and, as Chandler called them, ephemeral
users. We could update hasOneUser() to be hasOneUser(bool
countEphemerals = false), or something like that, and the trick
will be to support this without requiring additional iteration
over all users. Thoughts?

Redefining hasOneUser() would be the most efficient approach, and
definitely possible. But it would add complexity to a fundamental
property of the IR and potentially cause bugs when developers forget
to check the meta users.

I was proposing to use meta-data operands in the intrinsic instead.
They add no visible users. They can be left dangling and cleaned up
lazily. WeakVH are expensive, but I'm not worried about cost yet.
The added complexity in this case happens at SSA update, which does
need to visit meta users explicitly. I believe this is currently
broken for dbg.value and works by luck for simple cases. So we need
to fix this one way or another.

I'd still like to be able to chart a way forward on this. Regarding
defining invariants, I don't understand how using metadata
intrinsics really helps because, while a metadata operand could be
used to tie the invariant to a value being constrained, the
invariant expression itself would add additional uses to the values
used in its subexpressions. When I had discussed this with Chandler
(many months ago now), we had developed a working assumption that if
the invariant is worth having, then it is worth carrying as a
dependency of the values it constrains. This may not be true, but I
think is a viewpoint worth evaluating.

Metadata is a better approach because it doesn’t interfere with
optimizations that are free to ignore it.

Yes, if the invariant expression requires IR values and real uses,
then that defeats the purpose. I prefer a meta-data language to the
alternative of ephemeral expressions. I could be convinced either
way. I just think it will be confusing to mix ephemeral expressions
with program logic and impractical to make it transparent to
optimizations. Not the end of the world though, and I’ll listen to
any strong arguments to the contrary.

When you say "metadata language", what do you have in mind? I can think of two general schemes:

1. A language in which each expression in the language is a metadata node; maybe something like:
     !invariant1 = !{ !"eq", !{ !"urem", i32 %value, i32 48 }, i32 0 }

2. A language encoded in a string with some syntax for substitution values; maybe something like:
     !invariant1 = !{ !"(eq (urem #1, 48), 0)", i32 %value}

On the other hand, when I started down this road I was motivated by a
desire to implement support for __builtin_assume_aligned. I would
still like to do this, or something like this to get equivalent
functionality. The idea of metadata references seems like it might
work well for alignment assumptions, as a restricted special case.
Maybe something like this:

tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think this concept is sound except for the problem of enforcing SSA
properties. Do meta-data operands need to dominate? If so, we need
to fix SSA updater to deal with this. Preserving the meta-data after
code duplication would effectively require a meta-phi. We don’t want
a real phi because we don’t want a real use. Admittedly this would
be annoying, but we could always drop the metadata users instead.
You could get by initially without solving this problem, as we’ve
done with dbg.value.

I’m a bit less enthusiastic about meta-intrinsics after realizing
that the only current example, dbg.value, is a poor use of the
concept. It should have been meta-data attached directly to the
value with source line info instead.

What if we say that they don't need to dominate? The dependency graph can certainly be well defined without the vector of IR instructions being a topological ordering of that graph. I understand that we like it this way for a number of reasons, but it is not clear to me that any of those reasons apply in this case. If the metadata intrinsic does not return a value, we can't introduce unwanted cycles. If they do return a value and could introduce a cycle, then we do decide how we would handle that.

If we could demonstrate the advantage of converting lifetime
intrinsics to meta-instrinsics (not real uses), then I’d be all for
it.

but then the problem becomes making sure that these things stick
around long enough to help the interesting inlining cases. Thoughts?

Ensuring meta-data sticks around seems like the dual of the problem
of ensuring ephemeral values and uses don’t interfere with
optimization. I’d rather have the first problem? Is it really
harder?

So, you have two problems:

1) Expressing the invariants. A meta-data language could be invented
for this initially. Or simply constant operands to an alignment
intrinsic as in your example.

2) Associating the invariants with IR values. Options are, in order
from the easiest to implement and sell to the public to the most
complicated but robust:

2a) Attach meta-data to the value. This only works when constraint
dominates all uses. Inlining would need to check before propagating.

2b) Add a real use to a special intrinsic: tail call void
@llvm.assume.aligned(i32* %ptr, i32 16)

2c) Add a meta-use to an intrinsic (pure meta-intrinsic): tail call
void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think it’s a question of whether the simpler approaches work well
enough for you.

Is it obvious which of these will be easier to deal with in InstCombine? SROA? -- I think that adding a real use will kill a lot of optimizations that InstCombine might do. On the other hand, updating InstCombine to propagate the invariants might also be quite complicated (this is the benefit of adding a real use... we get the "updating" of the invariants for free, and optimizations are blocked only when they'd kill the invariant (or, at least that's the theory)).

Thanks again,
Hal

Metadata is a better approach because it doesn’t interfere with
optimizations that are free to ignore it.

But in practice, they aren't going to be free to ignore it.

If you have metadata which encodes an invariant (in the most general sense,
lifetime instrinsics currently encode an invariant, branch weights do not)
that is useful to an optimization pass to transform the instructions in a
way that would not otherwise be safe or well defined, then you have a new
problem: what happens when an optimization pass does some unrelated
transform that obliquely invalidates this invariant? This is equally true
of an intrinsic which encodes the invariant.

However, there are generally speaking two ways to approach this: 1) use a
construct that optimizers will treat conservatively and explicitly handle
the construct in the places where it is a safe way to do so (and worth
doing), 2) use a construct that the optimizers will ignore, and explicitly
update the construct in all of the places where it ceases to be correct.
Metadata is a design which requires #2, and intrinsics are a design which
requires #1. The question is, which is better.

How my pro/con list breaks down for these approaches:
pro 1: very low probability of producing an invalid program due to missing
handling for the invariant in some pass
pro 1: can teach passes about the invariant lazily as a concrete need
arises, no need to be comprehensive
con 1: will impose a tax on the use of an invariant due to added entries in
the Use graph.

pro 2: zero-overhead on the Use graph or optimizers which don't care
con 2: requires comprehensive audit of transformations for cases where they
invalidate the invariant, *for each invariant*
con 2: when we fail to catch an invalidation, we miscompile the program
rather than failing to optimize the program

I'm strongly in favor of the first strategy for these reasons. Perhaps
there are other reasons I've missed?

Yes, if the invariant expression requires IR values and real uses, then
that defeats the purpose. I prefer a meta-data language to the alternative
of ephemeral expressions. I could be convinced either way. I just think it
will be confusing to mix ephemeral expressions with program logic and
impractical to make it transparent to optimizations. Not the end of the
world though, and I’ll listen to any strong arguments to the contrary.

I've given my argument above. I will add that there is one important
assumption about my argument that impacts the scaling factors: I'm assuming
that there will be relatively few values in the IR which are the subject of
invariants of any form. I also assume that there are likely to be a
reasonable number of different invariants we want to provide. Were we to
have only 1 or 2 invariant forms, and to apply them to a large % of the
values, perhaps strategy #2 would be better, although it would still seem
questionable to me. Debugging a miscompile is incredibly harder than
debugging *this* kind of missed optimization (IE, a simplification that we
can see wasn't performed, no micro-arch details here).

That said, I'm not sure if email is conveying my thoughts here effectively.
Let me know if this would be a good thing for us to talk about in person at
some point? (Hopefully it doesn't need to wait for the dev mtg...)

On the other hand, when I started down this road I was motivated by a
desire to implement support for __builtin_assume_aligned. I would still
like to do this, or something like this to get equivalent functionality.
The idea of metadata references seems like it might work well for alignment
assumptions, as a restricted special case. Maybe something like this:

tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think this concept is sound except for the problem of enforcing SSA
properties. Do meta-data operands need to dominate? If so, we need to fix
SSA updater to deal with this. Preserving the meta-data after code
duplication would effectively require a meta-phi.

Here is a great example of what I described above. I think that the cost of
updating the passes to correctly model invariants will be significantly
higher than updating them to disregard invariant uses of the values.

I’m a bit less enthusiastic about meta-intrinsics after realizing that the
only current example, dbg.value, is a poor use of the concept. It should
have been meta-data attached directly to the value with source line info
instead.

If we could demonstrate the advantage of converting lifetime intrinsics to
meta-instrinsics (not real uses), then I’d be all for it.

Having worked closely with the lifetime intrinsics, the cost of handling
them in their current form is *really* low. I feel like the cost of having
meta-SSA markers would be significantly more complex. Happy to be shown
wrong here though as well. Currently, lifetime markers seem to be really
easy to handle.

I see your point with regard to lifetime markers. I can imagine an IR pass merging allocas and invalidating the markers. Do you know of any existing passes that can invalidate them?

A value invariant is quite different because we’re only constraining the value itself at that point in the CFG, not the semantics of the data pointed to at other places in the CFG. The question is, can RAUW change the value? I’ve been assuming the answer is “no”. It seems like a terrible thing to do, but there’s no rule against it and no way to verify.

OTOH, more importantly, we should avoid supporting new IR constructs for marginal features. So, if we need a to handle transparent lifetime marker uses, we should use the same mechanism for value invariants.

I can imagine a domain that uses a few kinds of invariants, like typechecks, liberally. But I don’t know of any plans to make use of that, and it could be argued that LLVM IR is not the right level for these sort of invariants

This problem arises from meta-data uses in general. dbg.value already has the problem. Treating them as strict SSA def-use is probably a bad idea. It would be interesting to see what would happen if dbg.value were handled like lifetime in terms of overhead and defeated optimization.

I think we’re getting very lucky with lifetimes because there’s not much we can do anyway to optimize an alloca that didn’t get SROA’d.

Until there is a known, serious problem with either of the following approaches, we should probably restrict meta-data to:

(1) MD nodes attached to concrete values.
(2) Real uses feeding “meta” intrinsics.

-Andy

As long as this is brainstorming time, I actually like the idea of
an
llvm.invariant intrinsic that the optimizers know to ignore. I
like
it for other purposes, but would happen to work for you as a
temporary workaround. It could take one or two IR values (as
metadata operands) and a metadata language describing the
invariant,
such as a relational operator and optional constant. In your case,
you want to know that the loop counter’s starting value is less
than
its limit, so you could conveniently plop one of those in the loop
preheader. The invariant would only go away if no one else used
the
value, which in your case would make sense (e.g. if the loop test
were rewritten in terms of %b, you probably wouldn’t need the
invariant any more).

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121210/158601.html

If you’re in favor of this approach, can I pop out of the woodwork
and say that it appears that we have a reasonably-large group of
contributors in favor of an invariant intrinsic and we should move
forward on that basis?

A general llvm.invariant does seem like a convenient thing. I’m
not
pushing very hard because I don’t need it for anything I’m working
on yet, and wasn’t aware that anyone else liked this particular
idea. But I haven’t heard any argument against it. I am aware that
a
lot of people want to attach new semantics to the IR that are not
necessarily easy to preserve, so I don’t want to oversell a
solution.

I don’t have an opinion on whether each type of invariant should
get
its own intrinsic ID or we should have one and use the flexibility
of metadata operands. The important thing is that the optimizer
have
a uniform approach to handling them, and that they truly cannot
interfere with optimization other than as intended. For the sake
of
discussion, and to avoid confusion with the current, badly named
llvm.invariant, I’ll just call these llvm.meta intrinsics.

I would only endorse a new llvm.meta approach if we’re sure we can
unify or replace most of the current meta-style intrinsics. But I
first need to better understand the motivation for those
approaches,
which I have not personally worked with much (they’re somewhat
marginalized).

Thanks for writing this! As you point out there are already a
number of “pseudo-users” in LLVM, and we don’t have a good general
scheme for handling them. Currently, special handling for
llvm.dbg, expect, etc. are coded into several (many) different
places to make them appear free, and adding more in a similar way
might become unmanageable. Even worse, these intrinsics break the
“hasOneUser” check, and this interferes with optimization. This
may not be important for debug intrinsics, but certainly is for
features intended to help optimization.

Based on this, it seems that we need to differentiate two classes
of users: real users and, as Chandler called them, ephemeral
users. We could update hasOneUser() to be hasOneUser(bool
countEphemerals = false), or something like that, and the trick
will be to support this without requiring additional iteration
over all users. Thoughts?

Redefining hasOneUser() would be the most efficient approach, and
definitely possible. But it would add complexity to a fundamental
property of the IR and potentially cause bugs when developers forget
to check the meta users.

I was proposing to use meta-data operands in the intrinsic instead.
They add no visible users. They can be left dangling and cleaned up
lazily. WeakVH are expensive, but I’m not worried about cost yet.
The added complexity in this case happens at SSA update, which does
need to visit meta users explicitly. I believe this is currently
broken for dbg.value and works by luck for simple cases. So we need
to fix this one way or another.

I’d still like to be able to chart a way forward on this. Regarding
defining invariants, I don’t understand how using metadata
intrinsics really helps because, while a metadata operand could be
used to tie the invariant to a value being constrained, the
invariant expression itself would add additional uses to the values
used in its subexpressions. When I had discussed this with Chandler
(many months ago now), we had developed a working assumption that if
the invariant is worth having, then it is worth carrying as a
dependency of the values it constrains. This may not be true, but I
think is a viewpoint worth evaluating.

Metadata is a better approach because it doesn’t interfere with
optimizations that are free to ignore it.

Yes, if the invariant expression requires IR values and real uses,
then that defeats the purpose. I prefer a meta-data language to the
alternative of ephemeral expressions. I could be convinced either
way. I just think it will be confusing to mix ephemeral expressions
with program logic and impractical to make it transparent to
optimizations. Not the end of the world though, and I’ll listen to
any strong arguments to the contrary.

When you say “metadata language”, what do you have in mind? I can think of two general schemes:

  1. A language in which each expression in the language is a metadata node; maybe something like:
    !invariant1 = !{ !“eq”, !{ !“urem”, i32 %value, i32 48 }, i32 0 }

  2. A language encoded in a string with some syntax for substitution values; maybe something like:
    !invariant1 = !{ !"(eq (urem #1, 48), 0)", i32 %value}

I was thinking of the first approach, but not as a pseudo-IR with expression syntax, just as a way to express the few invariants that we support.

Either: @llvm.invariant(i8* %adr, metadata !invariant)

Or: @llvm.invariant(metadata !{i32* %adr}, metadata !invariant)

With: !invariant1 = !{ !“align”, i32 48, i32 0 }

Or just skip the meta-data altogether:
@llvm.align(i8*,i32, i32)

I don’t know if it’s better to use strings or enums. I’m not a meta-data designer.

If we only try to solve your immediate problem of builtin_assume_aligned, isn’t that good enough for now? Are you concerned about generalizing this to builtin_assume?

On the other hand, when I started down this road I was motivated by a
desire to implement support for __builtin_assume_aligned. I would
still like to do this, or something like this to get equivalent
functionality. The idea of metadata references seems like it might
work well for alignment assumptions, as a restricted special case.
Maybe something like this:

tail call void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think this concept is sound except for the problem of enforcing SSA
properties. Do meta-data operands need to dominate? If so, we need
to fix SSA updater to deal with this. Preserving the meta-data after
code duplication would effectively require a meta-phi. We don’t want
a real phi because we don’t want a real use. Admittedly this would
be annoying, but we could always drop the metadata users instead.
You could get by initially without solving this problem, as we’ve
done with dbg.value.

I’m a bit less enthusiastic about meta-intrinsics after realizing
that the only current example, dbg.value, is a poor use of the
concept. It should have been meta-data attached directly to the
value with source line info instead.

What if we say that they don’t need to dominate? The dependency graph can certainly be well defined without the vector of IR instructions being a topological ordering of that graph. I understand that we like it this way for a number of reasons, but it is not clear to me that any of those reasons apply in this case. If the metadata intrinsic does not return a value, we can’t introduce unwanted cycles. If they do return a value and could introduce a cycle, then we do decide how we would handle that.

Right, meta-intrinsics should never return a value. It’s a bit nasty to have to guess what to do when the value is RAUW’d though.

If we could demonstrate the advantage of converting lifetime
intrinsics to meta-instrinsics (not real uses), then I’d be all for
it.

but then the problem becomes making sure that these things stick
around long enough to help the interesting inlining cases. Thoughts?

Ensuring meta-data sticks around seems like the dual of the problem
of ensuring ephemeral values and uses don’t interfere with
optimization. I’d rather have the first problem? Is it really
harder?

So, you have two problems:

  1. Expressing the invariants. A meta-data language could be invented
    for this initially. Or simply constant operands to an alignment
    intrinsic as in your example.

  2. Associating the invariants with IR values. Options are, in order
    from the easiest to implement and sell to the public to the most
    complicated but robust:

2a) Attach meta-data to the value. This only works when constraint
dominates all uses. Inlining would need to check before propagating.

2b) Add a real use to a special intrinsic: tail call void
@llvm.assume.aligned(i32* %ptr, i32 16)

2c) Add a meta-use to an intrinsic (pure meta-intrinsic): tail call
void @llvm.assume.aligned(metadata !{i32* %ptr}, i32 16)

I think it’s a question of whether the simpler approaches work well
enough for you.

Is it obvious which of these will be easier to deal with in InstCombine? SROA? – I think that adding a real use will kill a lot of optimizations that InstCombine might do. On the other hand, updating InstCombine to propagate the invariants might also be quite complicated (this is the benefit of adding a real use… we get the “updating” of the invariants for free, and optimizations are blocked only when they’d kill the invariant (or, at least that’s the theory)).

That was also my intuition, and I didn’t like the idea of meta-data interfering with optimization. However, I’m no longer as interested in taking a giant step toward pervasive meta-intrinsics. I would go down the path of real uses first until we have compelling data or hit more concrete problems. e.g. What happens if you add an invariant instrinsic to every load?

The meta-data use is a potential back-up solution if we hit problems.

-Andy