Shift-by-signext - sext is bad for analysis - ignore it's use count?

In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
is simply a shift-by-zext. But the transform is limited to single-use sext.
We can quite trivially get a case where there are two shifts by the same sext:
https://godbolt.org/z/j6mO3t <- We should handle those cases.

In https://reviews.llvm.org/D68103#1686130 Sanjay Patel notes that this
sext is intrusive for analysis, that we will gain far better analysis with zext,
so we should just ignore forego of the one-use check,
and simply replace all shift-by-sext with shift-by-zext.

I implemented this proposed suggestion here:
https://reviews.llvm.org/D68150

Does anyone see any problems with that trade-off?

Roman.

In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
is simply a shift-by-zext.

Just to make sure I'm following, the reasoning here is that the shift amount must be positive or the shift would produce poison? And thus, it's safe to assume that the sext == zext because we've (at worst) removed UB in the original program?

If so, two slightly off topic ideas.

1) This feels like a demanded bits problem. We know that any shift value outside of a given range is UB, and thus only need to demand the bits necessary to represent the defined range. Might be an interesting extension.

2) Are we possibly missing opportunities by not exploiting knowledge of the a known negative shift amount?

But the transform is limited to single-use sext.
We can quite trivially get a case where there are two shifts by the same sext:
https://godbolt.org/z/j6mO3t <- We should handle those cases.

In https://reviews.llvm.org/D68103#1686130 Sanjay Patel notes that this
sext is intrusive for analysis, that we will gain far better analysis with zext,
so we should just ignore forego of the one-use check,
and simply replace all shift-by-sext with shift-by-zext.

Doing the multi-use case is unfortunately complicated. Your limited use scan might be a reasonable option in practice, but the need for cutoffs creates undesirable dynamics.

A couple ideas on how to possibly approach the problem:

1) If we can prove that one shift dominates the other uses, then if we can find UB which triggers based on overflow, we can do the replacement.

2) Having a general multiple use demanded use routine would be very powerful. Is it worth exploring the harder topic for generality?

3) If we had an anyextend IR node, it might be reasonable to eagerly produce the duplicate nodes, and rely on later CSE. I keep running across cases where we have an extend where we know the high bits don't matter, maybe it's time to represent that?

Thanks for taking a look!

> In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
> is simply a shift-by-zext.

Just to make sure I'm following, the reasoning here is that the shift
amount must be positive or the shift would produce poison? And thus,
it's safe to assume that the sext == zext because we've (at worst)
removed UB in the original program?

Yes.
zext and sext are equivalent for non-negative inputs.
For negative inputs, sext will produce negative output.
And interpreted as unsigned number, such negative shift amount
is *always* bigger than largest legal shift amount:
* i1 can only be shifted by 0
* i2 can only be shifted by 1 (positive); 2,3 (negative) are not legal
shift amounts
* i3 can only be shifted by 1,2 (positive); 3 (positive) and 4-7
(negative) are not legal shift amounts
* i4 can only be shifted by 1,2,3 (positive); 4-7 (positive) and 8-15
(negative) are not legal shift amounts
* and so on.
Therefore we must not have negative shift amount (or we have UB), so
we can just use zext.

If so, two slightly off topic ideas.

1) This feels like a demanded bits problem. We know that any shift
value outside of a given range is UB, and thus only need to demand the
bits necessary to represent the defined range. Might be an interesting
extension.

Most specifically, we demand only low
bitwidth(shift)-ctlz(bitwidth(shift)-1) bits
of shift amount, yes. I did think of it as a demanded bits problem,
but didn't really succeed, let me see again..

2) Are we possibly missing opportunities by not exploiting knowledge of
the a known negative shift amount?

Anything specific you are thinking of?

> But the transform is limited to single-use sext.
> We can quite trivially get a case where there are two shifts by the same sext:
> https://godbolt.org/z/j6mO3t <- We should handle those cases.
>
> In https://reviews.llvm.org/D68103#1686130 Sanjay Patel notes that this
> sext is intrusive for analysis, that we will gain far better analysis with zext,
> so we should just ignore forego of the one-use check,
> and simply replace all shift-by-sext with shift-by-zext.

Doing the multi-use case is unfortunately complicated. Your limited use
scan might be a reasonable option in practice, but the need for cutoffs
creates undesirable dynamics.

To reiterate, the proposal is to just always transform that sext into zext,
with no care to the use count. So i'm not sure what cutoffs you mean.

A couple ideas on how to possibly approach the problem:

1) If we can prove that one shift dominates the other uses, then if we
can find UB which triggers based on overflow, we can do the replacement.

I'm not sure what all this means.
Here i only want to get rid of *all* of those sext's for good, so everything
else that may have to deal with looking past extension of shift amount
doesn't need to be modified.

2) Having a general multiple use demanded use routine would be very
powerful. Is it worth exploring the harder topic for generality?

There is some support for multi-use demandedbits in backend.
As for middle-end i'm not sure.
For sure, having that powerful mechanism may be useful, in general.

3) If we had an anyextend IR node, it might be reasonable to eagerly
produce the duplicate nodes, and rely on later CSE. I keep running
across cases where we have an extend where we know the high bits don't
matter, maybe it's time to represent that?

> I implemented this proposed suggestion here:
> https://reviews.llvm.org/D68150
>
> Does anyone see any problems with that trade-off?
>

Roman.

Thanks for taking a look!

In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
is simply a shift-by-zext.

Just to make sure I’m following, the reasoning here is that the shift
amount must be positive or the shift would produce poison? And thus,
it’s safe to assume that the sext == zext because we’ve (at worst)
removed UB in the original program?
Yes.
zext and sext are equivalent for non-negative inputs.
For negative inputs, sext will produce negative output.
And interpreted as unsigned number, such negative shift amount
is always bigger than largest legal shift amount:

  • i1 can only be shifted by 0
  • i2 can only be shifted by 1 (positive); 2,3 (negative) are not legal
    shift amounts
  • i3 can only be shifted by 1,2 (positive); 3 (positive) and 4-7
    (negative) are not legal shift amounts
  • i4 can only be shifted by 1,2,3 (positive); 4-7 (positive) and 8-15
    (negative) are not legal shift amounts
  • and so on.
    Therefore we must not have negative shift amount (or we have UB), so
    we can just use zext.

If so, two slightly off topic ideas.

  1. This feels like a demanded bits problem. We know that any shift
    value outside of a given range is UB, and thus only need to demand the
    bits necessary to represent the defined range. Might be an interesting
    extension.
    Most specifically, we demand only low
    bitwidth(shift)-ctlz(bitwidth(shift)-1) bits
    of shift amount, yes. I did think of it as a demanded bits problem,
    but didn’t really succeed, let me see again…

While the demanded bits analysis in InstCombine doesn’t handle multi-use scenarios (apart from cases where a specific use may be reduced to an instruction operand), it should be possible to handle this as part of BDCE. This would require a) computing demanded bits for the shift amount in the DemandedBits analysis, which currently doesn’t happen, and b) replacing a sext with a zext in BDCE if the top bit of the source operand is not demanded.

Nikita

The thing is, we *don't* "not demand" those high bits.
We *don't* not care what's in those bits - IR shifts don't mask their
shift amounts.
I.e we can't replace `x >> (32-y)` with `x >> (-y)`,
which would be legal transform should we not demand those bits.
We very much demand them. We just know those bits to be zero.

And i'm not sure how to convey that to SimplifyDemandedBits().
I can't pass the Known down the stack, the function resets it first thing,
so Known can only be passed from callee to the caller.

This is why i'm asking whether anyone is concerned if we proceed with
https://reviews.llvm.org/D68150

Bump. Any further thoughts here?

To recap - i don't really see how this can be a demandedbits problem - we do
demand all those bits, we just know they must be zero.
(i would love to be proven wrong though!)

Roman.

Bump. Any further thoughts here?

To recap - i don't really see how this can be a demandedbits problem - we do
demand all those bits, we just know they must be zero.
(i would love to be proven wrong though!)

Roman.

>
> The thing is, we *don't* "not demand" those high bits.
> We *don't* not care what's in those bits - IR shifts don't mask their
> shift amounts.
> I.e we can't replace `x >> (32-y)` with `x >> (-y)`,
> which would be legal transform should we not demand those bits.
> We very much demand them. We just know those bits to be zero.
>
> And i'm not sure how to convey that to SimplifyDemandedBits().
> I can't pass the Known down the stack, the function resets it first thing,
> so Known can only be passed from callee to the caller.
>
> This is why i'm asking whether anyone is concerned if we proceed with
> https://reviews.llvm.org/D68150
>
> >
> >>
> >> Thanks for taking a look!
> >>
> >> > > In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
> >> > > is simply a shift-by-zext.
> >> >
> >> > Just to make sure I'm following, the reasoning here is that the shift
> >> > amount must be positive or the shift would produce poison? And thus,
> >> > it's safe to assume that the sext == zext because we've (at worst)
> >> > removed UB in the original program?
> >> Yes.
> >> zext and sext are equivalent for non-negative inputs.
> >> For negative inputs, sext will produce negative output.
> >> And interpreted as unsigned number, such negative shift amount
> >> is *always* bigger than largest legal shift amount:
> >> * i1 can only be shifted by 0
> >> * i2 can only be shifted by 1 (positive); 2,3 (negative) are not legal
> >> shift amounts
> >> * i3 can only be shifted by 1,2 (positive); 3 (positive) and 4-7
> >> (negative) are not legal shift amounts
> >> * i4 can only be shifted by 1,2,3 (positive); 4-7 (positive) and 8-15
> >> (negative) are not legal shift amounts
> >> * and so on.
> >> Therefore we must not have negative shift amount (or we have UB), so
> >> we can just use zext.
> >>
> >> > If so, two slightly off topic ideas.
> >> >
> >> > 1) This feels like a demanded bits problem. We know that any shift
> >> > value outside of a given range is UB, and thus only need to demand the
> >> > bits necessary to represent the defined range. Might be an interesting
> >> > extension.
> >> Most specifically, we demand only low
> >> bitwidth(shift)-ctlz(bitwidth(shift)-1) bits
> >> of shift amount, yes. I did think of it as a demanded bits problem,
> >> but didn't really succeed, let me see again..
> >
> >
> > While the demanded bits analysis in InstCombine doesn't handle multi-use scenarios (apart from cases where a specific use may be reduced to an instruction operand), it should be possible to handle this as part of BDCE. This would require a) computing demanded bits for the shift amount in the DemandedBits analysis, which currently doesn't happen, and b) replacing a sext with a zext in BDCE if the top bit of the source operand is not demanded.
> >
> > Nikita
> >

> >> > 2) Are we possibly missing opportunities by not exploiting knowledge of
> >> > the a known negative shift amount?
> >> Anything specific you are thinking of?

Okay, i can now tell that the issue is *somewhat* larger, and we do
certainly miss some opportunities:
https://godbolt.org/z/6ucL9u

int shl0(int x, signed char y) {
  return x << (32 - y);
}
Produces
  %o0 = sext i8 %6 to i32
  %o1 = sub i32 32, %o0
  %r = shl i32 %5, %o1

While
int shl1(int x, unsigned char y) {
  return x << (32 - y);
}
produces
  %n0 = zext i8 %6 to i32
  %n1 = sub i32 32, %n0
  %r = shl i32 %5, %n1
(as one would expect)

And here we again don't need sign-extension:
https://rise4fun.com/Alive/ymk

In my eyes, that doesn't completely invalidate
https://reviews.llvm.org/D68150 approach,
i still think it is a good idea, but clearly we *also* need
something else, more generic.

My first thought would be CVP and LVI, but i did not look yet..

Roman

>
> Bump. Any further thoughts here?
>
> To recap - i don't really see how this can be a demandedbits problem - we do
> demand all those bits, we just know they must be zero.
> (i would love to be proven wrong though!)
>
> Roman.
>
> >
> > The thing is, we *don't* "not demand" those high bits.
> > We *don't* not care what's in those bits - IR shifts don't mask their
> > shift amounts.
> > I.e we can't replace `x >> (32-y)` with `x >> (-y)`,
> > which would be legal transform should we not demand those bits.
> > We very much demand them. We just know those bits to be zero.
> >
> > And i'm not sure how to convey that to SimplifyDemandedBits().
> > I can't pass the Known down the stack, the function resets it first thing,
> > so Known can only be passed from callee to the caller.
> >
> > This is why i'm asking whether anyone is concerned if we proceed with
> > https://reviews.llvm.org/D68150
> >
> > >
> > >>
> > >> Thanks for taking a look!
> > >>
> > >> > > In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
> > >> > > is simply a shift-by-zext.
> > >> >
> > >> > Just to make sure I'm following, the reasoning here is that the shift
> > >> > amount must be positive or the shift would produce poison? And thus,
> > >> > it's safe to assume that the sext == zext because we've (at worst)
> > >> > removed UB in the original program?
> > >> Yes.
> > >> zext and sext are equivalent for non-negative inputs.
> > >> For negative inputs, sext will produce negative output.
> > >> And interpreted as unsigned number, such negative shift amount
> > >> is *always* bigger than largest legal shift amount:
> > >> * i1 can only be shifted by 0
> > >> * i2 can only be shifted by 1 (positive); 2,3 (negative) are not legal
> > >> shift amounts
> > >> * i3 can only be shifted by 1,2 (positive); 3 (positive) and 4-7
> > >> (negative) are not legal shift amounts
> > >> * i4 can only be shifted by 1,2,3 (positive); 4-7 (positive) and 8-15
> > >> (negative) are not legal shift amounts
> > >> * and so on.
> > >> Therefore we must not have negative shift amount (or we have UB), so
> > >> we can just use zext.
> > >>
> > >> > If so, two slightly off topic ideas.
> > >> >
> > >> > 1) This feels like a demanded bits problem. We know that any shift
> > >> > value outside of a given range is UB, and thus only need to demand the
> > >> > bits necessary to represent the defined range. Might be an interesting
> > >> > extension.
> > >> Most specifically, we demand only low
> > >> bitwidth(shift)-ctlz(bitwidth(shift)-1) bits
> > >> of shift amount, yes. I did think of it as a demanded bits problem,
> > >> but didn't really succeed, let me see again..
> > >
> > >
> > > While the demanded bits analysis in InstCombine doesn't handle multi-use scenarios (apart from cases where a specific use may be reduced to an instruction operand), it should be possible to handle this as part of BDCE. This would require a) computing demanded bits for the shift amount in the DemandedBits analysis, which currently doesn't happen, and b) replacing a sext with a zext in BDCE if the top bit of the source operand is not demanded.
> > >
> > > Nikita
> > >

> > >> > 2) Are we possibly missing opportunities by not exploiting knowledge of
> > >> > the a known negative shift amount?
> > >> Anything specific you are thinking of?
Okay, i can now tell that the issue is *somewhat* larger, and we do
certainly miss some opportunities:
https://godbolt.org/z/6ucL9u

int shl0(int x, signed char y) {
  return x << (32 - y);
}
Produces
  %o0 = sext i8 %6 to i32
  %o1 = sub i32 32, %o0
  %r = shl i32 %5, %o1

While
int shl1(int x, unsigned char y) {
  return x << (32 - y);
}
produces
  %n0 = zext i8 %6 to i32
  %n1 = sub i32 32, %n0
  %r = shl i32 %5, %n1
(as one would expect)

And here we again don't need sign-extension:
https://rise4fun.com/Alive/ymk

In my eyes, that doesn't completely invalidate
https://reviews.llvm.org/D68150 approach,
i still think it is a good idea, but clearly we *also* need
something else, more generic.

My first thought would be CVP and LVI, but i did not look yet..

... and i'm not sure they can help here.
They go from defs to uses, but here when we see sext we don't know
anything about it's uses yet. We'd need to go in other direction:
"this value must have this constant range. now let's back-propagate
that info, what does that tell us about it's defs?"

So i'm not sure how to approach this.
I'm likely missing something basic, feel free to please enlighten me :slight_smile:

Roman

Bump. Any further thoughts here?

To recap - i don't really see how this can be a demandedbits problem - we do
demand all those bits, we just know they must be zero.
(i would love to be proven wrong though!)

Roman.

The thing is, we *don't* "not demand" those high bits.
We *don't* not care what's in those bits - IR shifts don't mask their
shift amounts.
I.e we can't replace `x >> (32-y)` with `x >> (-y)`,
which would be legal transform should we not demand those bits.
We very much demand them. We just know those bits to be zero.

And i'm not sure how to convey that to SimplifyDemandedBits().
I can't pass the Known down the stack, the function resets it first thing,
so Known can only be passed from callee to the caller.

This is why i'm asking whether anyone is concerned if we proceed with
https://reviews.llvm.org/D68150

Thanks for taking a look!

In https://reviews.llvm.org/D68103 the InstCombine learned that shift-by-sext
is simply a shift-by-zext.

Just to make sure I'm following, the reasoning here is that the shift
amount must be positive or the shift would produce poison? And thus,
it's safe to assume that the sext == zext because we've (at worst)
removed UB in the original program?

Yes.
zext and sext are equivalent for non-negative inputs.
For negative inputs, sext will produce negative output.
And interpreted as unsigned number, such negative shift amount
is *always* bigger than largest legal shift amount:
* i1 can only be shifted by 0
* i2 can only be shifted by 1 (positive); 2,3 (negative) are not legal
shift amounts
* i3 can only be shifted by 1,2 (positive); 3 (positive) and 4-7
(negative) are not legal shift amounts
* i4 can only be shifted by 1,2,3 (positive); 4-7 (positive) and 8-15
(negative) are not legal shift amounts
* and so on.
Therefore we must not have negative shift amount (or we have UB), so
we can just use zext.

If so, two slightly off topic ideas.

1) This feels like a demanded bits problem. We know that any shift
value outside of a given range is UB, and thus only need to demand the
bits necessary to represent the defined range. Might be an interesting
extension.

Most specifically, we demand only low
bitwidth(shift)-ctlz(bitwidth(shift)-1) bits
of shift amount, yes. I did think of it as a demanded bits problem,
but didn't really succeed, let me see again..

While the demanded bits analysis in InstCombine doesn't handle multi-use scenarios (apart from cases where a specific use may be reduced to an instruction operand), it should be possible to handle this as part of BDCE. This would require a) computing demanded bits for the shift amount in the DemandedBits analysis, which currently doesn't happen, and b) replacing a sext with a zext in BDCE if the top bit of the source operand is not demanded.

Nikita

2) Are we possibly missing opportunities by not exploiting knowledge of
the a known negative shift amount?

Anything specific you are thinking of?

Okay, i can now tell that the issue is *somewhat* larger, and we do
certainly miss some opportunities:
https://godbolt.org/z/6ucL9u

int shl0(int x, signed char y) {
   return x << (32 - y);
}
Produces
   %o0 = sext i8 %6 to i32
   %o1 = sub i32 32, %o0
   %r = shl i32 %5, %o1

While
int shl1(int x, unsigned char y) {
   return x << (32 - y);
}
produces
   %n0 = zext i8 %6 to i32
   %n1 = sub i32 32, %n0
   %r = shl i32 %5, %n1
(as one would expect)

And here we again don't need sign-extension:
https://rise4fun.com/Alive/ymk

In my eyes, that doesn't completely invalidate
https://reviews.llvm.org/D68150 approach,
i still think it is a good idea, but clearly we *also* need
something else, more generic.

My first thought would be CVP and LVI, but i did not look yet..

... and i'm not sure they can help here.
They go from defs to uses, but here when we see sext we don't know
anything about it's uses yet. We'd need to go in other direction:
"this value must have this constant range. now let's back-propagate
that info, what does that tell us about it's defs?"

So i'm not sure how to approach this.
I'm likely missing something basic, feel free to please enlighten me :slight_smile:

I think this a case where LLVM is missing something, not you.

In general, we don't have any transform - with the exception of demanded bits, and some of the one use rules in instcombine - which produce analysis results which are true for all *current* uses. Our analysis results tend to be true for all *possible* uses. (e.g. KnownBits and LVI) The tend to work forward from definitions, not backwards from uses. And in particular, they don't iterate between top-down and bottom-up.

The challenge is that implementing something which specializes results for existing uses is quite challenging to do efficiently. I vaguely remember some academic research on this topic, but haven't dug into anything. I'd suggest starting there.