[RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume

_Johannes_Doerfert · December 16, 2019, 11:16pm

Abstract:

It is often hard or impossible to encode complex, e.g., non-boolean,
information in an `llvm.assume(i1)`. This RFC describes various problems
we have right now and provides alternative design ideas.

Some Existing Problems:

A) The boolean requirement.
  The current `llvm.assume(i1)` expects a boolean that is known to hold
  true at runtime (once the `llvm.assume` call is reached). However,
  forming this boolean for "arbitrary" information is hard or even
  impossible. Alignment, which is an easy case, already requires 3 extra
  instructions, one of which is a `ptrtoint` and therefore not really
  optimizer friendly. Dereferenceability, is even scarier. Thus, we are
  currently limited to (almost) boolean information when we want to
  manifest information in the IR (which happens due to inlining or code
  motion, see https://reviews.llvm.org/D69477 for an example).

B) The one-use checks.
  Various pattern matching passes check the number of uses of a value.
  However, while `llvm.assume` is eventually eliminated by the backend
  it will still increase the use count of the operand. I doubt we are
  able to not increase the use count at all (and keep everything else
  working), but what we can do is make sure the uses in "assumptions"
  are easy to spot, thus filter. This is not the case right now because
  of the additional instructions we need to make the values boolean.
  Even if you have `__builtin_assume(ptr);` the `ptr` use will not be
  the `llvm.assume` call but a `icmp`.

C) The side-effect avoidance.
  `__assume`, `__builtin_assume`, `__builtin_assume_aligned`, and OpenMP
  `omp assume` are all defined to not evaluate their argument, thus to
  not cause the side effects that the evaluation of the argument would
  otherwise imply. The way we implement this restriction is by not
  emitting the argument expression in IR if it might cause a side
  effect. We warn the user if that happens. While this is generally
  speaking fine, it would be interesting to lift the *implementation*
  restriction. One benefit would be that we could implement `assert`
  through `__builtin_assume` properly.

D) The singleton ranges.
  An `llvm.assume` will only provide information for a single program
  point not a range. Even if the beginning and the end of a range have
  an `llvm.assume`, there are cases where the information will not be
  as good as a proper range assumption. OpenMP 5.1 introduces such
  range assumptions but other situations would benefit from them as
  well. Take for example function attributes and inlining. Since we know
  they hold for the entire function and not only when it is entered we
  could encode the information over the entire range of the inlined
  code.

Some Site Notes:

- It seems of little use that we interleave the code for the assumed
  expression with the user code. Having the `llvm.assume` allows us to
  tie information to a program point, beyond that we just clutter the
  function with instructions that we later remove anyway.

- Reconstructing information from the pattern of instructions that feed
into the `llvm.assume` is also not optimal, especially since we do
not need to "optimize" these instructions anyway.

- The current (=arbitrary) side-effects of `llvm.assume` are too strong.
  We have `inaccessiblemem_or_argmemonly` and we might be able to come
  up with something even more specialized for this, e.g.,
  `control_dependences_only` to indicate that there are only control
  dependences.

John_McCall1 · December 18, 2019, 6:40am

Abstract:

It is often hard or impossible to encode complex, e.g., non-boolean,
information in an llvm.assume(i1). This RFC describes various problems
we have right now and provides alternative design ideas.

Some Existing Problems:

A) The boolean requirement.
The current llvm.assume(i1) expects a boolean that is known to hold
true at runtime (once the llvm.assume call is reached). However,
forming this boolean for “arbitrary” information is hard or even
impossible. Alignment, which is an easy case, already requires 3 extra
instructions, one of which is a ptrtoint and therefore not really
optimizer friendly. Dereferenceability, is even scarier. Thus, we are
currently limited to (almost) boolean information when we want to
manifest information in the IR (which happens due to inlining or code
motion, see https://reviews.llvm.org/D69477 for an example).

B) The one-use checks.
Various pattern matching passes check the number of uses of a value.
However, while llvm.assume is eventually eliminated by the backend
it will still increase the use count of the operand. I doubt we are
able to not increase the use count at all (and keep everything else
working), but what we can do is make sure the uses in “assumptions”
are easy to spot, thus filter. This is not the case right now because
of the additional instructions we need to make the values boolean.
Even if you have __builtin_assume(ptr); the ptr use will not be
the llvm.assume call but a icmp.

C) The side-effect avoidance.
__assume, __builtin_assume, __builtin_assume_aligned, and OpenMP
omp assume are all defined to not evaluate their argument, thus to
not cause the side effects that the evaluation of the argument would
otherwise imply. The way we implement this restriction is by not
emitting the argument expression in IR if it might cause a side
effect. We warn the user if that happens. While this is generally
speaking fine, it would be interesting to lift the implementation
restriction. One benefit would be that we could implement assert
through __builtin_assume properly.

D) The singleton ranges.
An llvm.assume will only provide information for a single program
point not a range. Even if the beginning and the end of a range have
an llvm.assume, there are cases where the information will not be
as good as a proper range assumption. OpenMP 5.1 introduces such
range assumptions but other situations would benefit from them as
well. Take for example function attributes and inlining. Since we know
they hold for the entire function and not only when it is entered we
could encode the information over the entire range of the inlined
code.

Some Site Notes:

It seems of little use that we interleave the code for the assumed
expression with the user code. Having the llvm.assume allows us to
tie information to a program point, beyond that we just clutter the
function with instructions that we later remove anyway.

Reconstructing information from the pattern of instructions that feed
into the llvm.assume is also not optimal, especially since we do
not need to “optimize” these instructions anyway.

The current (=arbitrary) side-effects of llvm.assume are too strong.
We have inaccessiblemem_or_argmemonly and we might be able to come
up with something even more specialized for this, e.g.,
control_dependences_only to indicate that there are only control
dependences.

This is all well put; I think you’ve covered the major weaknesses.

Some Design Ideas:

Use named operand bundles to encode information.
If we want to encode property XYZ for a value %V holds at a certain
program point and the property is dependent on %N we could encode
that as:
llvm.assume() ["XYZ"(%V, %N)]
There are various benefits, including:

It is freely extensible.

The property is directly tied to the value. Thus, no need for
encoding patterns that introduce extra instructions and uses and
which we need to preserve and decode later.

Allows dynamic properties even when corresponding attributes do
not, e.g., llvm.assume() ["align"(%arg_ptr, %N)] is fine and
once %N becomes a constant, or we determine a lower bound, we
can introduce the align(C) attribute for %arg_ptr.

Outline assumption expression code (with side-effects).
If we potentially have side-effects, or we simply have a non-trivial
expression that requires to be lowered into instructions, we can
outline the assumption expression code and tie it to the
llvm.assume via another operand bundle property. It could look
something like this:
__builtin_assume(foo(i) == bar(j));
will cause us to generate
/// Must return true!
static bool llvm.assume.expression_#27(int i, int j) {
return foo(i) == bar(j);
}
and a llvm.assume call like this:
llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))] So we generate the expression in a new function which we (only) tie to the llvm.assume()through the "assume_fn" operand bundle. This will make sure we do not accidentally evaluate the code, or assume it is evaluated and produced side-effects. We can still optimize the code and use the information that we learn from it at thellvm.assume`
site though.

I think outlining is abstractly very promising, but I’m worried about
it impacting existing optimizations:

It’s won’t be obvious at all from the IR that the predicate function
is dead code. It would be a shame if we ended up emitting the predicate
function, or some global only it references, because we didn’t delete
all the llvm.assume calls early enough to recognize that it was dead.
Anything passed to the predicate function will by default look like it
escapes. This is particularly true if the predicate takes local
variables by references, which is the easiest and most straightforwardly
correct way for frontends to emit these predicates. So this will block
basic memory analyses (including mem2reg!) unless they’re taught to
remove or rewrite assumptions.

Unfortunately, I don’t have a great counter-proposal that isn’t a
major project.

(The major project is to make the predicates sub-functions within
the caller. This doesn’t fix all the problems, and sub-functions
introduce a host of new issues, but they do have the benefit of
making the analysis much more obviously intra-procedural.)

John.

_Johannes_Doerfert · December 18, 2019, 4:28pm

> Abstract:
>
> It is often hard or impossible to encode complex, e.g., non-boolean,
> information in an `llvm.assume(i1)`. This RFC describes various problems
> we have right now and provides alternative design ideas.
>
>
>
> Some Existing Problems:
>
> A) The boolean requirement.
> The current `llvm.assume(i1)` expects a boolean that is known to hold
> true at runtime (once the `llvm.assume` call is reached). However,
> forming this boolean for "arbitrary" information is hard or even
> impossible. Alignment, which is an easy case, already requires 3 extra
> instructions, one of which is a `ptrtoint` and therefore not really
> optimizer friendly. Dereferenceability, is even scarier. Thus, we are
> currently limited to (almost) boolean information when we want to
> manifest information in the IR (which happens due to inlining or code
> motion, see ⚙ D69477 [InstCombine] keep assumption before sinking calls for an example).
>
> B) The one-use checks.
> Various pattern matching passes check the number of uses of a value.
> However, while `llvm.assume` is eventually eliminated by the backend
> it will still increase the use count of the operand. I doubt we are
> able to not increase the use count at all (and keep everything else
> working), but what we can do is make sure the uses in "assumptions"
> are easy to spot, thus filter. This is not the case right now because
> of the additional instructions we need to make the values boolean.
> Even if you have `__builtin_assume(ptr);` the `ptr` use will not be
> the `llvm.assume` call but a `icmp`.
>
> C) The side-effect avoidance.
> `__assume`, `__builtin_assume`, `__builtin_assume_aligned`, and OpenMP
> `omp assume` are all defined to not evaluate their argument, thus to
> not cause the side effects that the evaluation of the argument would
> otherwise imply. The way we implement this restriction is by not
> emitting the argument expression in IR if it might cause a side
> effect. We warn the user if that happens. While this is generally
> speaking fine, it would be interesting to lift the *implementation*
> restriction. One benefit would be that we could implement `assert`
> through `__builtin_assume` properly.
>
> D) The singleton ranges.
> An `llvm.assume` will only provide information for a single program
> point not a range. Even if the beginning and the end of a range have
> an `llvm.assume`, there are cases where the information will not be
> as good as a proper range assumption. OpenMP 5.1 introduces such
> range assumptions but other situations would benefit from them as
> well. Take for example function attributes and inlining. Since we know
> they hold for the entire function and not only when it is entered we
> could encode the information over the entire range of the inlined
> code.
>
>
> Some Site Notes:
>
> - It seems of little use that we interleave the code for the assumed
> expression with the user code. Having the `llvm.assume` allows us to
> tie information to a program point, beyond that we just clutter the
> function with instructions that we later remove anyway.
>
> - Reconstructing information from the pattern of instructions that feed
> into the `llvm.assume` is also not optimal, especially since we do
> not need to "optimize" these instructions anyway.
>
> - The current (=arbitrary) side-effects of `llvm.assume` are too strong.
> We have `inaccessiblemem_or_argmemonly` and we might be able to come
> up with something even more specialized for this, e.g.,
> `control_dependences_only` to indicate that there are only control
> dependences.

This is all well put; I think you’ve covered the major weaknesses.

> Some Design Ideas:
>
> 1) Use named operand bundles to encode information.
> If we want to encode property XYZ for a value %V holds at a certain
> program point and the property is dependent on %N we could encode
> that as:
> `llvm.assume() ["XYZ"(%V, %N)]`
> There are various benefits, including:
> - It is freely extensible.
> - The property is directly tied to the value. Thus, no need for
> encoding patterns that introduce extra instructions and uses and
> which we need to preserve and decode later.
> - Allows dynamic properties even when corresponding attributes do
> not, e.g., `llvm.assume() ["align"(%arg_ptr, %N)]` is fine and
> once `%N` becomes a constant, or we determine a lower bound, we
> can introduce the `align(C)` attribute for `%arg_ptr`.
>
> 2) Outline assumption expression code (with side-effects).
> If we potentially have side-effects, or we simply have a non-trivial
> expression that requires to be lowered into instructions, we can
> outline the assumption expression code and tie it to the
> `llvm.assume` via another operand bundle property. It could look
> something like this:
> `__builtin_assume(foo(i) == bar(j));`
> will cause us to generate
> ```
> /// Must return true!
> static bool llvm.assume.expression_#27(int i, int j) {
> return foo(i) == bar(j);
> }
> ```
> and a `llvm.assume` call like this:
> `llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]
> So we generate the expression in a new function which we (only) tie to
> the `llvm.assume()` through the "assume_fn" operand bundle. This will
> make sure we do not accidentally evaluate the code, or assume it is
> evaluated and produced side-effects. We can still optimize the code
> and use the information that we learn from it at the `llvm.assume`
> site though.

I think outlining is abstractly very promising, but I’m worried about
it impacting existing optimizations:

- It’s won’t be obvious at all from the IR that the predicate function
  is dead code. It would be a shame if we ended up emitting the predicate
  function, or some global only it references, because we didn’t delete
  all the llvm.assume calls early enough to recognize that it was dead.

So, we already "delete" llvm.assume and its operands in the backends.
Having a dedicated middle-end pass to do that which also deletes the
associated "assume_fn" doesn't seem hard or complicated to set up.

Worst case (which I really don't think would ever happen), we emit the
functions which are internal and never referenced. The linker should
strip them.

- Anything passed to the predicate function will by default look like it
  escapes. This is particularly true if the predicate takes local
  variables by references, which is the easiest and most straightforwardly
  correct way for frontends to emit these predicates. So this will block
  basic memory analyses (including mem2reg!) unless they’re taught to
  remove or rewrite assumptions.

Partially true and we already have that problem though.

Mem2reg, and others, might need to know about llvm.assume uses but I
fail to see why they need to rewrite much (in the short therm). The
frontend generated code would naturally look like this:

%ptr.addr = alloca i32*
store %ptr, %ptr.addr
...
%ptr.val = load %ptr.addr
llvm.assume() ["align"(%ptr.val)]

Mem2reg should kick in just fine even if %ptr now has a "unknown" use.
But that "unknown" use is much less problematic than what we have now
because the user is the `llvm.assume` call and not some `ptrtoint` which
is used two steps later in an `llvm.assume.

If you feel I missed a problem here, please let me know.

Unfortunately, I don’t have a great counter-proposal that isn’t a
major project.

(The major project is to make the predicates sub-functions within
the caller. This doesn’t fix all the problems, and sub-functions
introduce a host of new issues, but they do have the benefit of
making the analysis much more obviously intra-procedural.)

I don't think inter-procedural reasoning is a problem or bad. Especially
here with internal functions that have a single use, it is really not
hard to make the connection.

We can easily teach the Attributor to follow these "pseudo-calls"
in order to derive information from them.

Thanks,
Johannes

John_McCall1 · December 18, 2019, 6:02pm

Abstract:

It is often hard or impossible to encode complex, e.g., non-boolean,
information in an llvm.assume(i1). This RFC describes various problems
we have right now and provides alternative design ideas.

Some Existing Problems:

A) The boolean requirement.
The current llvm.assume(i1) expects a boolean that is known to hold
true at runtime (once the llvm.assume call is reached). However,
forming this boolean for “arbitrary” information is hard or even
impossible. Alignment, which is an easy case, already requires 3 extra
instructions, one of which is a ptrtoint and therefore not really
optimizer friendly. Dereferenceability, is even scarier. Thus, we are
currently limited to (almost) boolean information when we want to
manifest information in the IR (which happens due to inlining or code
motion, see https://reviews.llvm.org/D69477 for an example).

B) The one-use checks.
Various pattern matching passes check the number of uses of a value.
However, while llvm.assume is eventually eliminated by the backend
it will still increase the use count of the operand. I doubt we are
able to not increase the use count at all (and keep everything else
working), but what we can do is make sure the uses in “assumptions”
are easy to spot, thus filter. This is not the case right now because
of the additional instructions we need to make the values boolean.
Even if you have __builtin_assume(ptr); the ptr use will not be
the llvm.assume call but a icmp.

C) The side-effect avoidance.
__assume, __builtin_assume, __builtin_assume_aligned, and OpenMP
omp assume are all defined to not evaluate their argument, thus to
not cause the side effects that the evaluation of the argument would
otherwise imply. The way we implement this restriction is by not
emitting the argument expression in IR if it might cause a side
effect. We warn the user if that happens. While this is generally
speaking fine, it would be interesting to lift the implementation
restriction. One benefit would be that we could implement assert
through __builtin_assume properly.

D) The singleton ranges.
An llvm.assume will only provide information for a single program
point not a range. Even if the beginning and the end of a range have
an llvm.assume, there are cases where the information will not be
as good as a proper range assumption. OpenMP 5.1 introduces such
range assumptions but other situations would benefit from them as
well. Take for example function attributes and inlining. Since we know
they hold for the entire function and not only when it is entered we
could encode the information over the entire range of the inlined
code.

Some Site Notes:

It seems of little use that we interleave the code for the assumed
expression with the user code. Having the llvm.assume allows us to
tie information to a program point, beyond that we just clutter the
function with instructions that we later remove anyway.

Reconstructing information from the pattern of instructions that feed
into the llvm.assume is also not optimal, especially since we do
not need to “optimize” these instructions anyway.

The current (=arbitrary) side-effects of llvm.assume are too strong.
We have inaccessiblemem_or_argmemonly and we might be able to come
up with something even more specialized for this, e.g.,
control_dependences_only to indicate that there are only control
dependences.

This is all well put; I think you’ve covered the major weaknesses.
Some Design Ideas:

Use named operand bundles to encode information.
If we want to encode property XYZ for a value %V holds at a certain
program point and the property is dependent on %N we could encode
that as:
llvm.assume() ["XYZ"(%V, %N)]
There are various benefits, including:

It is freely extensible.

The property is directly tied to the value. Thus, no need for
encoding patterns that introduce extra instructions and uses and
which we need to preserve and decode later.

Allows dynamic properties even when corresponding attributes do
not, e.g., llvm.assume() ["align"(%arg_ptr, %N)] is fine and
once %N becomes a constant, or we determine a lower bound, we
can introduce the align(C) attribute for %arg_ptr.

Outline assumption expression code (with side-effects).
If we potentially have side-effects, or we simply have a non-trivial
expression that requires to be lowered into instructions, we can
outline the assumption expression code and tie it to the
llvm.assume via another operand bundle property. It could look
something like this:
__builtin_assume(foo(i) == bar(j));
will cause us to generate
/// Must return true!
static bool llvm.assume.expression_#27(int i, int j) {
return foo(i) == bar(j);
}
and a llvm.assume call like this:
llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))] So we generate the expression in a new function which we (only) tie to the llvm.assume()through the "assume_fn" operand bundle. This will make sure we do not accidentally evaluate the code, or assume it is evaluated and produced side-effects. We can still optimize the code and use the information that we learn from it at thellvm.assume`
site though.
I think outlining is abstractly very promising, but I’m worried about
it impacting existing optimizations:

It’s won’t be obvious at all from the IR that the predicate function
is dead code. It would be a shame if we ended up emitting the predicate
function, or some global only it references, because we didn’t delete
all the llvm.assume calls early enough to recognize that it was dead.
So, we already “delete” llvm.assume and its operands in the backends.
Having a dedicated middle-end pass to do that which also deletes the
associated “assume_fn” doesn’t seem hard or complicated to set up.

Worst case (which I really don’t think would ever happen), we emit the
functions which are internal and never referenced. The linker should
strip them.

Anything passed to the predicate function will by default look like it
escapes. This is particularly true if the predicate takes local
variables by references, which is the easiest and most straightforwardly
correct way for frontends to emit these predicates. So this will block
basic memory analyses (including mem2reg!) unless they’re taught to
remove or rewrite assumptions.

Partially true and we already have that problem though.

Mem2reg, and others, might need to know about llvm.assume uses but I
fail to see why they need to rewrite much (in the short therm). The
frontend generated code would naturally look like this:

%ptr.addr = alloca i32*
store %ptr, %ptr.addr
…
%ptr.val = load %ptr.addr
llvm.assume() [“align”(%ptr.val)]

I disagree; the natural way to generate this code in frontends will
actually be to take the variable by reference. We can, of course, make
frontends smart enough to take the variable by value if it’s
obviously only loaded from in the expressions, but if the optimizers
still aren’t generally aware of the intrinsic, that will just mean
that assumptions pessimize slightly more abstracted code.

For example, if I had this:

  Clang::QualType type = …;
  __builtin_assume(!type.hasLocalQualifiers());

At a high level, I want to be able to apply mem2reg to the value of
this QualType; but at a low level, this method call takes type
by reference, and so the predicate function will take it by reference
as well.

Mem2reg should kick in just fine even if %ptr now has a “unknown” use.
But that “unknown” use is much less problematic than what we have now
because the user is the llvm.assume call and not some ptrtoint which
is used two steps later in an `llvm.assume.

If you feel I missed a problem here, please let me know.

Unfortunately, I don’t have a great counter-proposal that isn’t a
major project.

(The major project is to make the predicates sub-functions within
the caller. This doesn’t fix all the problems, and sub-functions
introduce a host of new issues, but they do have the benefit of
making the analysis much more obviously intra-procedural.)

I don’t think inter-procedural reasoning is a problem or bad. Especially
here with internal functions that have a single use, it is really not
hard to make the connection.

It’s certainly not a problem in theory. In theory every
intraprocedural analysis can be taught to go interprocedural
into a predicate.

John.

_Johannes_Doerfert · December 18, 2019, 7:18pm

Hi John,

Is it correct to assume that you are in favor of
  - changing llvm.assume to be more expressive
  - operand bundle uses, especially w/o outlining
  - outlining, if we can show it plays well with transformations, e.g.
    the binding to the code is "weak"

I inlined more comments below.

>>> Abstract:
>>>
>>> It is often hard or impossible to encode complex, e.g., non-boolean,
>>> information in an `llvm.assume(i1)`. This RFC describes various problems
>>> we have right now and provides alternative design ideas.
>>>
>>>
>>>
>>> Some Existing Problems:
>>>
>>> A) The boolean requirement.
>>> The current `llvm.assume(i1)` expects a boolean that is known to hold
>>> true at runtime (once the `llvm.assume` call is reached). However,
>>> forming this boolean for "arbitrary" information is hard or even
>>> impossible. Alignment, which is an easy case, already requires 3 extra
>>> instructions, one of which is a `ptrtoint` and therefore not really
>>> optimizer friendly. Dereferenceability, is even scarier. Thus, we are
>>> currently limited to (almost) boolean information when we want to
>>> manifest information in the IR (which happens due to inlining or code
>>> motion, see ⚙ D69477 [InstCombine] keep assumption before sinking calls for an example).
>>>
>>> B) The one-use checks.
>>> Various pattern matching passes check the number of uses of a value.
>>> However, while `llvm.assume` is eventually eliminated by the backend
>>> it will still increase the use count of the operand. I doubt we are
>>> able to not increase the use count at all (and keep everything else
>>> working), but what we can do is make sure the uses in "assumptions"
>>> are easy to spot, thus filter. This is not the case right now because
>>> of the additional instructions we need to make the values boolean.
>>> Even if you have `__builtin_assume(ptr);` the `ptr` use will not be
>>> the `llvm.assume` call but a `icmp`.
>>>
>>> C) The side-effect avoidance.
>>> `__assume`, `__builtin_assume`, `__builtin_assume_aligned`, and OpenMP
>>> `omp assume` are all defined to not evaluate their argument, thus to
>>> not cause the side effects that the evaluation of the argument would
>>> otherwise imply. The way we implement this restriction is by not
>>> emitting the argument expression in IR if it might cause a side
>>> effect. We warn the user if that happens. While this is generally
>>> speaking fine, it would be interesting to lift the *implementation*
>>> restriction. One benefit would be that we could implement `assert`
>>> through `__builtin_assume` properly.
>>>
>>> D) The singleton ranges.
>>> An `llvm.assume` will only provide information for a single program
>>> point not a range. Even if the beginning and the end of a range have
>>> an `llvm.assume`, there are cases where the information will not be
>>> as good as a proper range assumption. OpenMP 5.1 introduces such
>>> range assumptions but other situations would benefit from them as
>>> well. Take for example function attributes and inlining. Since we know
>>> they hold for the entire function and not only when it is entered we
>>> could encode the information over the entire range of the inlined
>>> code.
>>>
>>>
>>> Some Site Notes:
>>>
>>> - It seems of little use that we interleave the code for the assumed
>>> expression with the user code. Having the `llvm.assume` allows us to
>>> tie information to a program point, beyond that we just clutter the
>>> function with instructions that we later remove anyway.
>>>
>>> - Reconstructing information from the pattern of instructions that feed
>>> into the `llvm.assume` is also not optimal, especially since we do
>>> not need to "optimize" these instructions anyway.
>>>
>>> - The current (=arbitrary) side-effects of `llvm.assume` are too strong.
>>> We have `inaccessiblemem_or_argmemonly` and we might be able to come
>>> up with something even more specialized for this, e.g.,
>>> `control_dependences_only` to indicate that there are only control
>>> dependences.
>>
>> This is all well put; I think you’ve covered the major weaknesses.
>>
>>
>>> Some Design Ideas:
>>>
>>> 1) Use named operand bundles to encode information.
>>> If we want to encode property XYZ for a value %V holds at a certain
>>> program point and the property is dependent on %N we could encode
>>> that as:
>>> `llvm.assume() ["XYZ"(%V, %N)]`
>>> There are various benefits, including:
>>> - It is freely extensible.
>>> - The property is directly tied to the value. Thus, no need for
>>> encoding patterns that introduce extra instructions and uses and
>>> which we need to preserve and decode later.
>>> - Allows dynamic properties even when corresponding attributes do
>>> not, e.g., `llvm.assume() ["align"(%arg_ptr, %N)]` is fine and
>>> once `%N` becomes a constant, or we determine a lower bound, we
>>> can introduce the `align(C)` attribute for `%arg_ptr`.
>>>
>>> 2) Outline assumption expression code (with side-effects).
>>> If we potentially have side-effects, or we simply have a non-trivial
>>> expression that requires to be lowered into instructions, we can
>>> outline the assumption expression code and tie it to the
>>> `llvm.assume` via another operand bundle property. It could look
>>> something like this:
>>> `__builtin_assume(foo(i) == bar(j));`
>>> will cause us to generate
>>> ```
>>> /// Must return true!
>>> static bool llvm.assume.expression_#27(int i, int j) {
>>> return foo(i) == bar(j);
>>> }
>>> ```
>>> and a `llvm.assume` call like this:
>>> `llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]
>>> So we generate the expression in a new function which we (only) tie to
>>> the `llvm.assume()` through the "assume_fn" operand bundle. This will
>>> make sure we do not accidentally evaluate the code, or assume it is
>>> evaluated and produced side-effects. We can still optimize the code
>>> and use the information that we learn from it at the `llvm.assume`
>>> site though.
>>
>> I think outlining is abstractly very promising, but I’m worried about
>> it impacting existing optimizations:
>>
>> - It’s won’t be obvious at all from the IR that the predicate function
>> is dead code. It would be a shame if we ended up emitting the predicate
>> function, or some global only it references, because we didn’t delete
>> all the llvm.assume calls early enough to recognize that it was dead.
>
> So, we already "delete" llvm.assume and its operands in the backends.
> Having a dedicated middle-end pass to do that which also deletes the
> associated "assume_fn" doesn't seem hard or complicated to set up.
>
> Worst case (which I really don't think would ever happen), we emit the
> functions which are internal and never referenced. The linker should
> strip them.
>
>
>
>> - Anything passed to the predicate function will by default look like it
>> escapes. This is particularly true if the predicate takes local
>> variables by references, which is the easiest and most straightforwardly
>> correct way for frontends to emit these predicates. So this will block
>> basic memory analyses (including mem2reg!) unless they’re taught to
>> remove or rewrite assumptions.
>
> Partially true and we already have that problem though.
>
> Mem2reg, and others, might need to know about llvm.assume uses but I
> fail to see why they need to rewrite much (in the short therm). The
> frontend generated code would naturally look like this:
>
> %ptr.addr = alloca i32*
> store %ptr, %ptr.addr
> ...
> %ptr.val = load %ptr.addr
> llvm.assume() ["align"(%ptr.val)]

I disagree; the natural way to generate this code in frontends will
actually be to take the variable by reference. We can, of course, make
frontends smart enough to take the variable by value if it’s
obviously only loaded from in the expressions, but if the optimizers
still aren’t generally aware of the intrinsic, that will just mean
that assumptions pessimize slightly more abstracted code.

For example, if I had this:
  Clang::QualType type = …;
  __builtin_assume(!type.hasLocalQualifiers());
At a high level, I want to be able to apply mem2reg to the value of
this `QualType`; but at a low level, this method call takes `type`
by reference, and so the predicate function will take it by reference
as well.

At some point we need to realize code is only used in an assumption in
order to actually outline. There is no question about that. Where it
happens is a good question though. I could write a pass to do that in
the IR, in order to test the idea.

Since we don't have this code I won't speculate about it now. What I
will say instead is that we have code to modify the IR in order to pass
an argument by reference instead of by value. I am more than happy to
make it aware of llvm.assume operand bundle uses and I am very much
certain the amount of code needed to transform these from pass by
reference to pass by value is negligible.

> Mem2reg should kick in just fine even if %ptr now has a "unknown" use.
> But that "unknown" use is much less problematic than what we have now
> because the user is the `llvm.assume` call and not some `ptrtoint` which
> is used two steps later in an `llvm.assume.
>
> If you feel I missed a problem here, please let me know.
>
>
>
>> Unfortunately, I don’t have a great counter-proposal that isn’t a
>> major project.
>>
>> (The major project is to make the predicates sub-functions within
>> the caller. This doesn’t fix all the problems, and sub-functions
>> introduce a host of new issues, but they do have the benefit of
>> making the analysis much more obviously intra-procedural.)
>
> I don't think inter-procedural reasoning is a problem or bad. Especially
> here with internal functions that have a single use, it is really not
> hard to make the connection.

It’s certainly not a problem *in theory*. *In theory* every
intraprocedural analysis can be taught to go interprocedural
into a predicate.

We might have different expectations how assumes are used or where our
analyses/transformation are heading. Not every pass needs to look at
assumes at the end of the day. Most information that we use can already,
or should be described through an attribute and we have a working way to
deduce those interprocedurally. The future, I hope, is dominated by
interprocedural analysis and optimization. Given how far we have come
this summer alone, and what was said on the IPO panel at LLVM'dev, I am
confident we'll get there.

Thanks,
Johannes

John_McCall1 · December 18, 2019, 7:54pm

Hi John,

Is it correct to assume that you are in favor of

changing llvm.assume to be more expressive

operand bundle uses, especially w/o outlining

outlining, if we can show it plays well with transformations, e.g.
the binding to the code is “weak”

Yes, this is all correct. I’m excited that we’re looking into making this
less useless.

I inlined more comments below.
Abstract:

It is often hard or impossible to encode complex, e.g., non-boolean,
information in an llvm.assume(i1). This RFC describes various problems
we have right now and provides alternative design ideas.

Some Existing Problems:

A) The boolean requirement.
The current llvm.assume(i1) expects a boolean that is known to hold
true at runtime (once the llvm.assume call is reached). However,
forming this boolean for “arbitrary” information is hard or even
impossible. Alignment, which is an easy case, already requires 3 extra
instructions, one of which is a ptrtoint and therefore not really
optimizer friendly. Dereferenceability, is even scarier. Thus, we are
currently limited to (almost) boolean information when we want to
manifest information in the IR (which happens due to inlining or code
motion, see https://reviews.llvm.org/D69477 for an example).

B) The one-use checks.
Various pattern matching passes check the number of uses of a value.
However, while llvm.assume is eventually eliminated by the backend
it will still increase the use count of the operand. I doubt we are
able to not increase the use count at all (and keep everything else
working), but what we can do is make sure the uses in “assumptions”
are easy to spot, thus filter. This is not the case right now because
of the additional instructions we need to make the values boolean.
Even if you have __builtin_assume(ptr); the ptr use will not be
the llvm.assume call but a icmp.

C) The side-effect avoidance.
__assume, __builtin_assume, __builtin_assume_aligned, and OpenMP
omp assume are all defined to not evaluate their argument, thus to
not cause the side effects that the evaluation of the argument would
otherwise imply. The way we implement this restriction is by not
emitting the argument expression in IR if it might cause a side
effect. We warn the user if that happens. While this is generally
speaking fine, it would be interesting to lift the implementation
restriction. One benefit would be that we could implement assert
through __builtin_assume properly.

D) The singleton ranges.
An llvm.assume will only provide information for a single program
point not a range. Even if the beginning and the end of a range have
an llvm.assume, there are cases where the information will not be
as good as a proper range assumption. OpenMP 5.1 introduces such
range assumptions but other situations would benefit from them as
well. Take for example function attributes and inlining. Since we know
they hold for the entire function and not only when it is entered we
could encode the information over the entire range of the inlined
code.

Some Site Notes:

It seems of little use that we interleave the code for the assumed
expression with the user code. Having the llvm.assume allows us to
tie information to a program point, beyond that we just clutter the
function with instructions that we later remove anyway.

Reconstructing information from the pattern of instructions that feed
into the llvm.assume is also not optimal, especially since we do
not need to “optimize” these instructions anyway.

The current (=arbitrary) side-effects of llvm.assume are too strong.
We have inaccessiblemem_or_argmemonly and we might be able to come
up with something even more specialized for this, e.g.,
control_dependences_only to indicate that there are only control
dependences.

This is all well put; I think you’ve covered the major weaknesses.
Some Design Ideas:

Use named operand bundles to encode information.
If we want to encode property XYZ for a value %V holds at a certain
program point and the property is dependent on %N we could encode
that as:
llvm.assume() ["XYZ"(%V, %N)]
There are various benefits, including:

It is freely extensible.

The property is directly tied to the value. Thus, no need for
encoding patterns that introduce extra instructions and uses and
which we need to preserve and decode later.

Allows dynamic properties even when corresponding attributes do
not, e.g., llvm.assume() ["align"(%arg_ptr, %N)] is fine and
once %N becomes a constant, or we determine a lower bound, we
can introduce the align(C) attribute for %arg_ptr.

Outline assumption expression code (with side-effects).
If we potentially have side-effects, or we simply have a non-trivial
expression that requires to be lowered into instructions, we can
outline the assumption expression code and tie it to the
llvm.assume via another operand bundle property. It could look
something like this:
__builtin_assume(foo(i) == bar(j));
will cause us to generate
/// Must return true!
static bool llvm.assume.expression_#27(int i, int j) {
return foo(i) == bar(j);
}
and a llvm.assume call like this:
llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))] So we generate the expression in a new function which we (only) tie to the llvm.assume()through the "assume_fn" operand bundle. This will make sure we do not accidentally evaluate the code, or assume it is evaluated and produced side-effects. We can still optimize the code and use the information that we learn from it at thellvm.assume`
site though.
I think outlining is abstractly very promising, but I’m worried about
it impacting existing optimizations:

It’s won’t be obvious at all from the IR that the predicate function
is dead code. It would be a shame if we ended up emitting the predicate
function, or some global only it references, because we didn’t delete
all the llvm.assume calls early enough to recognize that it was dead.
So, we already “delete” llvm.assume and its operands in the backends.
Having a dedicated middle-end pass to do that which also deletes the
associated “assume_fn” doesn’t seem hard or complicated to set up.

Worst case (which I really don’t think would ever happen), we emit the
functions which are internal and never referenced. The linker should
strip them.

Anything passed to the predicate function will by default look like it
escapes. This is particularly true if the predicate takes local
variables by references, which is the easiest and most straightforwardly
correct way for frontends to emit these predicates. So this will block
basic memory analyses (including mem2reg!) unless they’re taught to
remove or rewrite assumptions.

Partially true and we already have that problem though.

Mem2reg, and others, might need to know about llvm.assume uses but I
fail to see why they need to rewrite much (in the short therm). The
frontend generated code would naturally look like this:

%ptr.addr = alloca i32*
store %ptr, %ptr.addr
…
%ptr.val = load %ptr.addr
llvm.assume() [“align”(%ptr.val)]
I disagree; the natural way to generate this code in frontends will
actually be to take the variable by reference. We can, of course, make
frontends smart enough to take the variable by value if it’s
obviously only loaded from in the expressions, but if the optimizers
still aren’t generally aware of the intrinsic, that will just mean
that assumptions pessimize slightly more abstracted code.

For example, if I had this:
Clang::QualType type = …;
__builtin_assume(!type.hasLocalQualifiers());
At a high level, I want to be able to apply mem2reg to the value of
this QualType; but at a low level, this method call takes type
by reference, and so the predicate function will take it by reference
as well.
At some point we need to realize code is only used in an assumption in
order to actually outline. There is no question about that. Where it
happens is a good question though. I could write a pass to do that in
the IR, in order to test the idea.

You mean for things like loads that are just passed to the intrinsic?
I agree, although I think other people who’ve worked with llvm.assume
have noticed that the presence of the load can change optimization in
ways that are hard to eliminate, e.g. if a load gets hoisted because
it’s “done twice”.

Since we don’t have this code I won’t speculate about it now. What I
will say instead is that we have code to modify the IR in order to pass
an argument by reference instead of by value. I am more than happy to
make it aware of llvm.assume operand bundle uses and I am very much
certain the amount of code needed to transform these from pass by
reference to pass by value is negligible.

Okay.

Mem2reg should kick in just fine even if %ptr now has a “unknown” use.
But that “unknown” use is much less problematic than what we have now
because the user is the llvm.assume call and not some ptrtoint which
is used two steps later in an `llvm.assume.

If you feel I missed a problem here, please let me know.

Unfortunately, I don’t have a great counter-proposal that isn’t a
major project.

(The major project is to make the predicates sub-functions within
the caller. This doesn’t fix all the problems, and sub-functions
introduce a host of new issues, but they do have the benefit of
making the analysis much more obviously intra-procedural.)

I don’t think inter-procedural reasoning is a problem or bad. Especially
here with internal functions that have a single use, it is really not
hard to make the connection.

It’s certainly not a problem in theory. In theory every
intraprocedural analysis can be taught to go interprocedural
into a predicate.

We might have different expectations how assumes are used or where our
analyses/transformation are heading. Not every pass needs to look at
assumes at the end of the day. Most information that we use can already,
or should be described through an attribute and we have a working way to
deduce those interprocedurally. The future, I hope, is dominated by
interprocedural analysis and optimization. Given how far we have come
this summer alone, and what was said on the IPO panel at LLVM’dev, I am
confident we’ll get there.

I suspect this will always be a practical point of tension, and I
did not come away from that panel feeling like there was anything
other than a fairly hand-wavey hope that somehow we’d get there.

John.

Michael_Kruse2 · December 18, 2019, 7:59pm

1) Use named operand bundles to encode information.
   If we want to encode property XYZ for a value %V holds at a certain
   program point and the property is dependent on %N we could encode
   that as:
     `llvm.assume() ["XYZ"(%V, %N)]`

What is the advantage of using operator bundles over directly using
arguments? That is, why not using

call llvm.assume_fn.i32.i32(@llvm.assume.expression_#27, %i, %j)

Is it to avoid the overloads of llvm.assume_fn? If yes, why are
overloads of llvm.assume a problem?

2) Outline assumption expression code (with side-effects).
  If we potentially have side-effects, or we simply have a non-trivial
  expression that requires to be lowered into instructions, we can
  outline the assumption expression code and tie it to the
  `llvm.assume` via another operand bundle property. It could look
  something like this:
    `__builtin_assume(foo(i) == bar(j));`
  will cause us to generate
    ```
    /// Must return true!
    static bool llvm.assume.expression_#27(int i, int j) {
      return foo(i) == bar(j);
    }
    ```
  and a `llvm.assume` call like this:
    `llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]
  So we generate the expression in a new function which we (only) tie to
  the `llvm.assume()` through the "assume_fn" operand bundle. This will
  make sure we do not accidentally evaluate the code, or assume it is
  evaluated and produced side-effects. We can still optimize the code
  and use the information that we learn from it at the `llvm.assume`
  site though.

I like the idea. If emitting assume_fns to the object file is an
issue, we could introduce a LinkageType that is never emitted to an
object file (and can only be used in an llvm.assume).

3) Use tokens to mark ranges.
   We have tokens which can be used to tie two instructions together,
   basically forming a range (with some conditions on the initial CFG).
   If we tie two `llvm.assume` calls together we can say that the
   information provided by the first holds for any point dominated by it
   and post-dominated by the second.

Is this the same mechanism as for llvm.lifetime.start/end? It may not
be easy to keep them consistently (post-)dominating each other, see
[llvm-dev] Well-formed @llvm.lifetime.start and @llvm.lifetime.end intrinsics .

Michael

_Johannes_Doerfert · December 18, 2019, 8:09pm

> 1) Use named operand bundles to encode information.
> If we want to encode property XYZ for a value %V holds at a certain
> program point and the property is dependent on %N we could encode
> that as:
> `llvm.assume() ["XYZ"(%V, %N)]`

What is the advantage of using operator bundles over directly using
arguments? That is, why not using

call llvm.assume_fn.i32.i32(@llvm.assume.expression_#27, %i, %j)

Is it to avoid the overloads of llvm.assume_fn? If yes, why are
overloads of llvm.assume a problem?

The advantage is that we can encode all attributes and other "simple"
properties without outlining in a very easy to recognize way. While it
is not as powerful I'm also happy if we adopt something like:
call llvm.assume.pi32(i32* align(8) dereferenceable(16) %ptr)

Pure outlining based assumptions might also work.

> 2) Outline assumption expression code (with side-effects).
> If we potentially have side-effects, or we simply have a non-trivial
> expression that requires to be lowered into instructions, we can
> outline the assumption expression code and tie it to the
> `llvm.assume` via another operand bundle property. It could look
> something like this:
> `__builtin_assume(foo(i) == bar(j));`
> will cause us to generate
> ```
> /// Must return true!
> static bool llvm.assume.expression_#27(int i, int j) {
> return foo(i) == bar(j);
> }
> ```
> and a `llvm.assume` call like this:
> `llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]
> So we generate the expression in a new function which we (only) tie to
> the `llvm.assume()` through the "assume_fn" operand bundle. This will
> make sure we do not accidentally evaluate the code, or assume it is
> evaluated and produced side-effects. We can still optimize the code
> and use the information that we learn from it at the `llvm.assume`
> site though.

I like the idea. If emitting assume_fns to the object file is an
issue, we could introduce a LinkageType that is never emitted to an
object file (and can only be used in an llvm.assume).

That is a really cool idea.

> 3) Use tokens to mark ranges.
> We have tokens which can be used to tie two instructions together,
> basically forming a range (with some conditions on the initial CFG).
> If we tie two `llvm.assume` calls together we can say that the
> information provided by the first holds for any point dominated by it
> and post-dominated by the second.

Is this the same mechanism as for llvm.lifetime.start/end? It may not
be easy to keep them consistently (post-)dominating each other, see
[llvm-dev] Well-formed @llvm.lifetime.start and @llvm.lifetime.end intrinsics .

Fair, but I argue that is fine as long as it means we only "loose" information.
We would still have the assume calls so point wise information would not
even be lost.

_Johannes_Doerfert · December 18, 2019, 8:21pm

> Hi John,
>
> Is it correct to assume that you are in favor of
> - changing llvm.assume to be more expressive
> - operand bundle uses, especially w/o outlining
> - outlining, if we can show it plays well with transformations, e.g.
> the binding to the code is "weak"

Yes, this is all correct. I’m excited that we’re looking into making this
less useless.

Perfect! I'll go ahead and prototype something then so we can collect
data and experience.

> > > > - Anything passed to the predicate function will by default
> > > > look like it
> > > > escapes. This is particularly true if the predicate takes local
> > > > variables by references, which is the easiest and most
> > > > straightforwardly
> > > > correct way for frontends to emit these predicates. So
> > > > this will block
> > > > basic memory analyses (including mem2reg!) unless they’re
> > > > taught to
> > > > remove or rewrite assumptions.
> > >
> > > Partially true and we already have that problem though.
> > >
> > > Mem2reg, and others, might need to know about llvm.assume uses but I
> > > fail to see why they need to rewrite much (in the short therm). The
> > > frontend generated code would naturally look like this:
> > >
> > > %ptr.addr = alloca i32*
> > > store %ptr, %ptr.addr
> > > ...
> > > %ptr.val = load %ptr.addr
> > > llvm.assume() ["align"(%ptr.val)]
> >
> > I disagree; the natural way to generate this code in frontends will
> > actually be to take the variable by reference. We can, of course,
> > make
> > frontends smart enough to take the variable by value if it’s
> > obviously only loaded from in the expressions, but if the optimizers
> > still aren’t generally aware of the intrinsic, that will just mean
> > that assumptions pessimize slightly more abstracted code.
> >
> > For example, if I had this:
> >
> > ```
> > Clang::QualType type = …;
> > __builtin_assume(!type.hasLocalQualifiers());
> > ```
> >
> > At a high level, I want to be able to apply mem2reg to the value of
> > this `QualType`; but at a low level, this method call takes `type`
> > by reference, and so the predicate function will take it by reference
> > as well.
>
> At some point we need to realize code is only used in an assumption in
> order to actually outline. There is no question about that. Where it
> happens is a good question though. I could write a pass to do that in
> the IR, in order to test the idea.

You mean for things like loads that are just passed to the intrinsic?
I agree, although I think other people who’ve worked with `llvm.assume`
have noticed that the presence of the load can change optimization in
ways that are hard to eliminate, e.g. if a load gets hoisted because
it’s “done twice”.

That's right and I don't know what the perfect solution looks like,
maybe always outlining is not so bad after all. I guess we all agree
that we have to minimize the llvm.assume impact while keeping
information available and correct, e.g., wrt. control dependences.

A crazy idea we could try further down the road:
  Once we have control dependences [0] we can start "moving" the assume
  calls, e.g., towards function entries. We can hoist it over
  conditionals when we can keep track of the control dependences and we
  know it would have been reached, e.g., the original position is in the
  "must-be-executed-context" of the new position [1].

[0] ⚙ D71578 [CodeMoverUtils] Improve IsControlFlowEquivalent.
[1] ⚙ D65186 [MustExec] Add a generic "must-be-executed-context" explorer

> Since we don't have this code I won't speculate about it now. What I
> will say instead is that we have code to modify the IR in order to pass
> an argument by reference instead of by value. I am more than happy to
> make it aware of llvm.assume operand bundle uses and I am very much
> certain the amount of code needed to transform these from pass by
> reference to pass by value is negligible.

Okay.

> > > Mem2reg should kick in just fine even if %ptr now has a
> > > "unknown" use.
> > > But that "unknown" use is much less problematic than what we
> > > have now
> > > because the user is the `llvm.assume` call and not some
> > > `ptrtoint` which
> > > is used two steps later in an `llvm.assume.
> > >
> > > If you feel I missed a problem here, please let me know.
> > >
> > >
> > >
> > > > Unfortunately, I don’t have a great counter-proposal that
> > > > isn’t a
> > > > major project.
> > > >
> > > > (The major project is to make the predicates sub-functions within
> > > > the caller. This doesn’t fix all the problems, and sub-functions
> > > > introduce a host of new issues, but they do have the benefit of
> > > > making the analysis much more obviously intra-procedural.)
> > >
> > > I don't think inter-procedural reasoning is a problem or bad.
> > > Especially
> > > here with internal functions that have a single use, it is
> > > really not
> > > hard to make the connection.
> >
> > It’s certainly not a problem *in theory*. *In theory* every
> > intraprocedural analysis can be taught to go interprocedural
> > into a predicate.
>
> We might have different expectations how assumes are used or where our
> analyses/transformation are heading. Not every pass needs to look at
> assumes at the end of the day. Most information that we use can already,
> or should be described through an attribute and we have a working way to
> deduce those interprocedurally. The future, I hope, is dominated by
> interprocedural analysis and optimization. Given how far we have come
> this summer alone, and what was said on the IPO panel at LLVM'dev, I am
> confident we'll get there.

I suspect this will always be a practical point of tension, and I
did not come away from that panel feeling like there was anything
other than a fairly hand-wavey hope that somehow we’d get there.

FWIW. A few people do actively work towards that goal by actually
implementing interprocedural analyses and optimizations.

_Johannes_Doerfert · December 19, 2019, 9:15am

First prototype: https://reviews.llvm.org/D71692

From what I've seen, we can the proposed changes work if we want to.
Outlining comes with the most overhead when we want to keep the same
amount of information but we can gradually work on this and adopt the
"native" operand bundle encodings first.

I haven't observed a problem wrt. outlining preventing anything but some
information is not properly propagated back yet.

Cheers,
Johannes

_Johannes_Doerfert · January 24, 2020, 6:49pm

We have made some progress on this and we would appreciate feedback to
determine if people are OK with this effort being (slowly) merged in.

For the operand bundle assumption encoding [see rational in A) below]
there are multiple patches prepared already. The two important ones are:
- Generate llvm.assumed with attributes in operand bundles from calls:
     https://reviews.llvm.org/D72475
   This can be used if the call is (re)moved to retain *all* information.
- Initial query API for llvm.assume holding attributes in operand
   bundles:
     https://reviews.llvm.org/D72885
   This allows easy use without dealing with the encodings. More APIs
   for querying the operand bundles are planned, e.g., to extract *all*
   information at once.

Another patch is in the works to ignore uses in llvm.assume operand
bundles where appropriate [see rational in B) below].

Finkel_Hal_J · January 30, 2020, 4:58am

Hi, Johannes,

Thanks for working on this. This seems like a good approach which nicely extends the set of capabilities we have now.

One additional comment, as I agree with everything in your rationale, except this:

- Reconstructing information from the pattern of instructions that feed
into the `llvm.assume` is also not optimal, especially since we do
not need to "optimize" these instructions anyway.

I don't understand what you mean by the reconstructing the pattern from the instructions not being optimal, but in general, we do need to optimize the instructions forming the assumption condition. The expressions might have calls that should be inlined, or as a general matter, have expressions involving temporaries that need to be simplified (especially soon as you start allowing expressions that might have side effects, you'll also have to deal with temporaries that are materialized to the stack).

There is also the matter of inter-procedural code sinking that will come up (assuming that we want to, with this mechanism, get rid of the existing ephemeral-values concept and utility functions). Instructions in the function containing the assumption, that are only used by the assumption, we'll want to sink in the outlined assumption function. I believe you're thinking about addressing this with outlining, but I think that these are separate issues because the outlining needs to be done early. Regarding this late-outlining approach:

A prototype for both operand bundle assumptions and outlined assumptions
is available herehttps://reviews.llvm.org/D71692. Note that most code
is required do to the outlining [see rational in point C) below]. This
is not as actively purposed for now as the operand bundle use.

I don't see how this can work in the case where you have side effects. Once you generate the code in the function containing the assume, if that code has side effects, it's can't be later deleted and moved into the assumption. One might make an argument that we can do this for non-volatile loads (because if the load is used only by the assumption it will be removed anyway), but for stores and other things that will appear from general expressions, it won't be sound to outline them later. We can't know if the side effect is supposed to be there or not, so the outlining needs to happen in the frontend.

-Hal

_Johannes_Doerfert · January 30, 2020, 5:22am

Thanks for working on this. This seems like a good approach which nicely
extends the set of capabilities we have now.

Great. We will move forward with the operand bundle part and hope for
more feedback as we go

One additional comment, as I agree with everything in your rationale, except
this:

> - Reconstructing information from the pattern of instructions that feed
> into the `llvm.assume` is also not optimal, especially since we do
> not need to "optimize" these instructions anyway.

I don't understand what you mean by the reconstructing the pattern from the
instructions not being optimal, but in general, we do need to optimize the
instructions forming the assumption condition. The expressions might have
calls that should be inlined, or as a general matter, have expressions
involving temporaries that need to be simplified (especially soon as you
start allowing expressions that might have side effects, you'll also have to
deal with temporaries that are materialized to the stack).

With "reconstructing the pattern" I meant that we need to figure out
over and over again that the following 5 instructions correspond to
`"align"(%a + 24, 32)` or `"align"(%a, 16)`.

    %ptrint = ptrtoint i32* %a to i64
    %offsetptr = add i64 %ptrint, 24
    %maskedptr = and i64 %offsetptr, 31
    %maskcond = icmp eq i64 %maskedptr, 0
    tail call void @llvm.assume(i1 %maskcond)

The sequence above is basically a hard coded pattern in >=2 places. My
argument now is that if you want to express property XYZ on value %V and
you do it as `"XYZ"(%V)` it is straightforward to reconstruct later. If
we however have to build a non-trivial sequence of instructions to
"encode" XYZ into a boolean value we always have to reconstruct the
sequence (=pattern).

There is also the matter of inter-procedural code sinking that will come up
(assuming that we want to, with this mechanism, get rid of the existing
ephemeral-values concept and utility functions). Instructions in the
function containing the assumption, that are only used by the assumption,
we'll want to sink in the outlined assumption function. I believe you're
thinking about addressing this with outlining, but I think that these are
separate issues because the outlining needs to be done early. Regarding this
late-outlining approach:

For the record, I never intended for "late-outlining" to be used in
practise. I wrote it so we can test how outlined assumptions would work.
It is a prove of concept that we can outlined the code pattern above,
connect the outlined function to the llvm.assume, and still determine
the alignment of a pointer. If we move to outlined assumptions we would
never generate anything else.

> A prototype for both operand bundle assumptions and outlined assumptions
> is available herehttps://reviews.llvm.org/D71692. Note that most code
> is required do to the outlining [see rational in point C) below]. This
> is not as actively purposed for now as the operand bundle use.

I don't see how this can work in the case where you have side effects. Once
you generate the code in the function containing the assume, if that code
has side effects, it's can't be later deleted and moved into the assumption.

I did not make that clear in my earlier email but I tried to say it
above. We would not generate the assumption expression in the user
function but directly in the outlined one. The prototype I have can be
used to "mimic" the result without requiring me to do non-trivial things
in Clang.

One might make an argument that we can do this for non-volatile loads
(because if the load is used only by the assumption it will be removed
anyway), but for stores and other things that will appear from general
expressions, it won't be sound to outline them later. We can't know if the
side effect is supposed to be there or not, so the outlining needs to happen
in the frontend.

I totally agree, outlining would need to happen in the frontend, and
hopefully implemented by someone other than me

Finkel_Hal_J · January 30, 2020, 5:31am

Thanks for working on this. This seems like a good approach which nicely
extends the set of capabilities we have now.

Great. We will move forward with the operand bundle part and hope for
more feedback as we go

One additional comment, as I agree with everything in your rationale, except
this:

- Reconstructing information from the pattern of instructions that feed
    into the `llvm.assume` is also not optimal, especially since we do
    not need to "optimize" these instructions anyway.

I don't understand what you mean by the reconstructing the pattern from the
instructions not being optimal, but in general, we do need to optimize the
instructions forming the assumption condition. The expressions might have
calls that should be inlined, or as a general matter, have expressions
involving temporaries that need to be simplified (especially soon as you
start allowing expressions that might have side effects, you'll also have to
deal with temporaries that are materialized to the stack).

With "reconstructing the pattern" I meant that we need to figure out
over and over again that the following 5 instructions correspond to
`"align"(%a + 24, 32)` or `"align"(%a, 16)`.

     %ptrint = ptrtoint i32* %a to i64
     %offsetptr = add i64 %ptrint, 24
     %maskedptr = and i64 %offsetptr, 31
     %maskcond = icmp eq i64 %maskedptr, 0
     tail call void @llvm.assume(i1 %maskcond)

The sequence above is basically a hard coded pattern in >=2 places. My
argument now is that if you want to express property XYZ on value %V and
you do it as `"XYZ"(%V)` it is straightforward to reconstruct later. If
we however have to build a non-trivial sequence of instructions to
"encode" XYZ into a boolean value we always have to reconstruct the
sequence (=pattern).

There is also the matter of inter-procedural code sinking that will come up
(assuming that we want to, with this mechanism, get rid of the existing
ephemeral-values concept and utility functions). Instructions in the
function containing the assumption, that are only used by the assumption,
we'll want to sink in the outlined assumption function. I believe you're
thinking about addressing this with outlining, but I think that these are
separate issues because the outlining needs to be done early. Regarding this
late-outlining approach:

For the record, I never intended for "late-outlining" to be used in
practise. I wrote it so we can test how outlined assumptions would work.
It is a prove of concept that we can outlined the code pattern above,
connect the outlined function to the llvm.assume, and still determine
the alignment of a pointer. If we move to outlined assumptions we would
never generate anything else.

A prototype for both operand bundle assumptions and outlined assumptions
is available herehttps://reviews.llvm.org/D71692. Note that most code
is required do to the outlining [see rational in point C) below]. This
is not as actively purposed for now as the operand bundle use.

I don't see how this can work in the case where you have side effects. Once
you generate the code in the function containing the assume, if that code
has side effects, it's can't be later deleted and moved into the assumption.

I did not make that clear in my earlier email but I tried to say it
above. We would not generate the assumption expression in the user
function but directly in the outlined one. The prototype I have can be
used to "mimic" the result without requiring me to do non-trivial things
in Clang.

One might make an argument that we can do this for non-volatile loads
(because if the load is used only by the assumption it will be removed
anyway), but for stores and other things that will appear from general
expressions, it won't be sound to outline them later. We can't know if the
side effect is supposed to be there or not, so the outlining needs to happen
in the frontend.

I totally agree, outlining would need to happen in the frontend, and
hopefully implemented by someone other than me

Sounds good

-Hal

_Johannes_Doerfert · June 26, 2020, 9:52pm

[Update]

Of the 4 problems listed below, A) and B) have been (mostly) addressed.
C) and D) remain open.

Over the last few months Gauthier (@tyker) made *a lot* of progress on
this. There is infrastructure to deal with operand bundles on
`llvm.assume`, e.g.,to simplify those operand bundles or to identify
assumes you don't need. Instcombine (among other things) can "cheaply"
reason about uses that can be removed without traversing def-use chains,
etc. There is some work to do to unify the ephemeral value handling and
the "droppable" uses stuff but we are moving in the right direction.

Yesterday, we switched the most commonly generated assuming (alignment)
to the bundle representation. Hopefully that will not only reduce IR
size (with all the usual consequences) but also make it easier to
identify and deal with (attribute) assumptions.

See https://reviews.llvm.org/D71739 for the switch. It simplifies the
handling quite a bit, i.a., compare
https://reviews.llvm.org/D81854?vs=on&id=273782
with
https://reviews.llvm.org/D81854?vs=on&id=270795

I would like to ask people to try the `--enable-knowledge-retention`
flag. It will preserve a lot of the (attribute) information we currently
loose during transformations like inlining or dead code elimination.
There is probably some tuning necessary but any input on the effects
would be very much appreciated.

~ Johannes

P.S.
Feedback and help is always welcome

Topic		Replies	Views
[RFC] __builtin_assume, [[assume]] optimization behavior Clang Frontend llvm	51	1339	March 11, 2024
How best to represent assume statements in LLVM IR? LLVM Dev List Archives	5	104	October 2, 2012
Compiler hint for vectors. LLVM Dev List Archives	1	83	January 27, 2021
[RFC] Invariants in LLVM LLVM Dev List Archives	13	117	July 20, 2014
Llvm.assume implementation details IR & Optimizations	4	139	May 23, 2024

[RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume

Related topics