killing vicmp and vfcmp

Now that icmp and fcmp have supported returning vectors of i1 for a while, I think it's time to remove the vicmp and vfcmp instructions from LLVM. The good news is that we've never shipped a release that included them so we won't be providing auto-upgrade support.

There is some existing backend support for vicmp and vfcmp that looks different from what icmp and fcmp do. If this actually matters to you, please port the bits you need over to icmp/fcmp, or let me know if you think you still need vicmp/vfcmp and can't switch within (say) a week.

If I don't hear from anyone within a week then I'll go ahead and rip the existing support out.

Thanks!
Nick

Hi all,

I'm working on a project which tries to prove an access to an array is
safe. For example,

int foo(int s) {
  int * p = malloc(s * sizeof int);
  ...
  int q = p[s - 2];
}

then the access of p[s - 2] always stays in bound.

I implemented a prototype using the Scalar Evolution pass. Here are the
pseudo-code of the implementation:

  const SCEV * offset = SE->getMinusSCEV(SE->getSCEV(GEP), GEPBase);
  const SCEV * bounds = SE->getSCEV(objSize);

  if (SE->getSMaxExpr(offset, bounds) == bounds) {
    ++safeGEPs;
  }

But it turns out that SCEVSMaxExpr does not handle the case of SMax(N,
N-2).

My question is, is there a plan to support something like this, or is it
possible to do some refactoring to enhance the capability of Scalar
Evolution? I notice that Scalar Evolution, Instruction Combining and
Memory Dependence Analysis require sort of evaluating symbolic
expression like this.

For this case SMax(A, B) is equivalent to SMax(A-B,0) + B, instruction
combining handles sophisticated expressions like (A+B)-B pretty well.
It would be great if Scalar Evolution can support this.

Any comments are appreciated.

Cheers,

Haohui

Mai, Haohui wrote:

Hi all,

I'm working on a project which tries to prove an access to an array is
safe. For example,

int foo(int s) {
  int * p = malloc(s * sizeof int);
  ...
  int q = p[s - 2];
}

then the access of p[s - 2] always stays in bound.

I implemented a prototype using the Scalar Evolution pass. Here are the
pseudo-code of the implementation:

  const SCEV * offset = SE->getMinusSCEV(SE->getSCEV(GEP), GEPBase);
  const SCEV * bounds = SE->getSCEV(objSize);
   if (SE->getSMaxExpr(offset, bounds) == bounds) {
    ++safeGEPs;
  }

But it turns out that SCEVSMaxExpr does not handle the case of SMax(N,
N-2).

Consider 8-bit integers and N = -127. N-1 equals INT_MIN and N-2 then is equal to INT_MAX, or 127. In that case, the SMax would equal N-2, not N.

In other cases (like N = 2) the SMax would equal N, not N-2.

Because of this, we cannot reduce this SMax any further. Your suggestion that "SMax(A, B) == SMax(A-B, 0) + B" is incorrect.

Nick

Mai, Haohui wrote:
> Hi all,
>
> I'm working on a project which tries to prove an access to an array is
> safe. For example,
>
> int foo(int s) {
> int * p = malloc(s * sizeof int);
> ...
> int q = p[s - 2];
> }
>
> then the access of p[s - 2] always stays in bound.
>
> I implemented a prototype using the Scalar Evolution pass. Here are the
> pseudo-code of the implementation:
>
> const SCEV * offset = SE->getMinusSCEV(SE->getSCEV(GEP), GEPBase);
> const SCEV * bounds = SE->getSCEV(objSize);
>
> if (SE->getSMaxExpr(offset, bounds) == bounds) {
> ++safeGEPs;
> }
>
> But it turns out that SCEVSMaxExpr does not handle the case of SMax(N,
> N-2).

Consider 8-bit integers and N = -127. N-1 equals INT_MIN and N-2 then is
equal to INT_MAX, or 127. In that case, the SMax would equal N-2, not N.

In other cases (like N = 2) the SMax would equal N, not N-2.

Because of this, we cannot reduce this SMax any further. Your suggestion
that "SMax(A, B) == SMax(A-B, 0) + B" is incorrect.
Nick

It seems that there are codes in Scalar Evolution to handle this case.
Many operations in scalar evolution only happen when SCEV can be sign
extended.

Nick,

It might be a little bit difficult to handle SMax correctly. But is it
possible to reduce A+(-A) to 0 in SAddExpr?

Haohui

Mai, Haohui wrote:

Nick,

It might be a little bit difficult to handle SMax correctly. But is it
possible to reduce A+(-A) to 0 in SAddExpr?

Yes, it should already do that. Here's a quick test I wrote up:

   $ cat x.ll
   define i8 @test(i8 %x) {
     %neg = sub i8 0, %x
     %sum = add i8 %x, %neg
     ret i8 %sum
   }
   $ llvm-as < x.ll | opt -analyze -scalar-evolution -disable-output
   Printing analysis 'Scalar Evolution Analysis' for function 'test':
   Classifying expressions for: test
           %neg = sub i8 0, %x ; <i8> [#uses=1]
     --> (-1 * %x)
           %sum = add i8 %x, %neg ; <i8> [#uses=1]
     --> 0
   Determining loop execution counts for: test

You'll note that %sum is deduced to be 0 and not (%x + (-1 * %x)) which indicates that this optimization is working correctly. If you have a more complicated case where it isn't, feel free to file a bug and cc me.

Nick

Hi Nick,

Now that icmp and fcmp have supported returning vectors of i1 for a while,

the code generators don't know how to codegen vectors of i1, so does
this actually work?

Ciao,

Duncan.

Nick,

It works pretty cool.

Thank you.

Haohui

No, but there are no clients of them yet.

-Chris

What's the proposed replacement that will be pattern-matched to the vector comparison instructions in AltiVec, SSE, and NEON?

Nate

Nate Begeman wrote:

Hi Nick,

Now that icmp and fcmp have supported returning vectors of i1 for a
while,

the code generators don't know how to codegen vectors of i1, so does
this actually work?

No, but there are no clients of them yet.

What's the proposed replacement that will be pattern-matched to the vector comparison instructions in AltiVec, SSE, and NEON?

That's a good question. The LLVM IR representation is to use icmp and fcmp on vectors returning a vector of i1. The backend can store that as a vector of whatever MVT it likes.

I finished a patch to remove vicmp and vfcmp (I'll mail it to llvm-commits instead of spamming everyone here) and I had to XFAIL 9 tests, two in CodeGen/X86 and seven in CodeGen/ARM. The two in X86 were from PRs filed by RapidMind who are using LLVM. The icmp/fcmp version should probably be supported first.

That said, having seen the existing VICmp and VFCmp implementation, I'm more motivated than ever to delete it. It's very buggy and is causing bugs in other places, like this one line .ll:

   global <4 x i1> icmp slt ( <4 x i32> <i32 1, i32 1, i32 1, i32 1>, <4 x i32> <i32 1, i32 2, i32 1, i32 2> ) ;

which llvm-as incorrectly rejects. And when you figure out why you should look at how the bitcode reader/writer decides when to use FUNC_CODE_INST_CMP vs. CMP2 and figure out how that ever worked at all...

Nick

[fi]cmp + vector sext should work.

-Chris