[RFC] Canonicalization of unsigned subtraction with saturation

Hi,

This message is a result of a discussion of backend optimization for sub(max) pattern(https://reviews.llvm.org/D25987), which can be either converted to unsigned min-max or unsigned saturation instruction(if the target supports it).
Currently these versions of the code produce different IR(and we need to manage both types in backend):

(1.16)
void foo(unsigned short *p, unsigned short max, int n) {
  int i;
unsigned short m;
for (i = 0; i < n; i++) {
    m = *--p;
    *p =(m >= max ? m-max : 0);
  }
}

(2.16)
void goo(unsigned short *p, unsigned short max, int n) {
  int i;
  unsigned short m;
  for (i = 0; i < n; i++) {
    m = *--p;
    unsigned short umax = m > max ? m : max;
    *p =umax - max;
  }
}

(1.16)
%cmp = icmp ugt i16 %x, %y
%sub2 = sub i16 %y, %x
%res = select i1 %cmp, i16 0, i16 %sub2

or

(2.16)
%cmp = icmp ugt i16 %x, %y
%sel = select i1 %cmp, i16 %x, i16 %y
%sub = sub i16 %sel, %x

Which of these versions is canonical? I think first version is better, because it can be converted to unsigned saturation instruction(i.e. PSUBUS), using existing backend code.

Thanks for posting this question, Julia.

I had a similar question about a signed min/max variant here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

The 2nd version in each case contains a canonical max/min representation in IR, and this could enable more IR analysis.
A secondary advantage is that the backend recognizes the max/min in the second IR form when creating DAG nodes,
and this directly affects isel for many targets.

A possibly important difference between the earlier example and the current unsigned case:
is a select with a zero constant operand easier to reason about in IR than the canonical min/max?

This seems important. And pattern-matching max(x,y)-y to a saturating subtract seems easy in the backend. It might be in some cases… maybe? I mean, it might be easier to analyze in ComputeMaskedBits or something, but we don’t really do much to optimize selects involving zero. -Eli

Thanks for posting this question, Julia.

I had a similar question about a signed min/max variant here:
http://lists.llvm.org/pipermail/llvm-dev/2016-November/106868.html

The 2nd version in each case contains a canonical max/min representation
in IR, and this could enable more IR analysis.
A secondary advantage is that the backend recognizes the max/min in the
second IR form when creating DAG nodes,
and this directly affects isel for many targets.

This seems important. And pattern-matching max(x,y)-y to a saturating
subtract seems easy in the backend.

A possibly important difference between the earlier example and the
current unsigned case:
is a select with a zero constant operand easier to reason about in IR than
the canonical min/max?

It might be in some cases... maybe? I mean, it might be easier to analyze
in ComputeMaskedBits or something, but we don't really do much to optimize
selects involving zero.

Because of my CPU upbringing, I always see this:
select A, B, 0

as:
and (sext A), B

Any chance of control-flow is bad!
...but now I know that's wrong for IR. :slight_smile:

So forming the min/max sounds like the right answer to me.

Note that we don't actually canonicalize the signed min/max cases from the
earlier thread yet. We detect those in value tracking, and that was good
enough to produce the ideal backend results, but I haven't gotten back to
doing the transform in IR.

So, can we declare the second version as a cannonical?
Second version also has another advantage. If operands of select are different types, it still works as max. In the first case it is separated by trunk and not converts to max instruction in the end. Here is an example for 32 and 16 bit:

(1)
void foo(unsigned short *p, int max, int n) {
  int i;
unsigned m;
for (i = 0; i < n; i++) {
    m = *--p;
    *p = (unsigned short)(m >= max ? m-max : 0);
  }
}

(2)
void goo(unsigned short *p, int max, int n) {
  int i;
  unsigned m;
  for (i = 0; i < n; i++) {
    m = *--p;
    unsigned umax = m > max ? m : max;
    *p = (unsigned short)(umax - max);
  }
}

(1)
%cmp = icmp ugt i32 %x, %y
%sub2 = sub i32 %y, %x
%tr = trunc i32 %sub2 to i16
%res = select i1 %cmp, i16 0, i16 %tr

or

(2)
%cmp = icmp ugt i32 %x, %y
%sel = select i1 %cmp, i32 %x, i32 %y
%sub = sub i32 %sel, %x
%res = trunc i32 %sub to i16

-Julia

There’s no formal process to decide a canonicalization (and it could always be changed), but given that we have:
(a) not heard any objections to the min/max forms

(b) see advantages to the min/max forms

I think it’s worth adding codegen tests for the preferred IR patterns. At the least, we want to make sure that there are no regressions for those forms vs. the forms that do not contain min/max. For x86, we already know that we want to see ‘psubus’ on some of these, so getting that implemented is also a safe step IMO.