if-conversion

Hi.
     I'm trying to vectorize the following piece of code with Loop Vectorizer (from LLVM distribution Nov 2015), but no vectorization takes place:
         int *Test(int *res, int *c, int *d, int *p) {
             int i;

             for (i = 0; i < 16; i++) {
                 //res[i] = (p[i] == 0) ? c[i] : d[i];
                 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
             }

             return NULL;
         }

     It seems the problem is the if conversion, which is not working on this simple example, although I guess it should.
     Is it a problem of low profitability given the loads?

     Could you please tell me if there is any way to make If conversion work on programs like this.

     Below is an older message from 2013 about the status of if conversion from LLVM at that date (from http://lists.llvm.org/pipermail/llvm-dev/2013-November/067427.html or Redirecting to Google Groups ).
> Hi Rob,
>
> I chose to answer on the list since from time to time people come back
> to this.
>
> That said, I did implement the generic variant of if-conversion that is
> called "control-flow to data-flow conversion" as a basis for SIMD
> vectorization. Essentially, the conversion removes all control flow
> except for loop back edges and replaces it by masks and blend (select)
> operations.
>
> Details on the algorithm can be found in our paper on "Whole-Function
> Vectorization" (CGO 2011). The old, LLVM-based implementation of the
> algorithm is still online at github I believe. A completely rewritten
> one will be released along with submission of my PhD thesis at the end
> of the year.
>
> That said, if you are only looking for if-conversion of code with
> limited complexity (e.g. no side effects, no loops), it is really simple:
> - Compute masks for every block (entry mask: disjunction of incoming
> masks, exit masks: conjunctions of entry mask and (negated) condition).
> - Replace each phi by a select that uses the entry mask of the
> corresponding block.
> - Order blocks topologically by data dependencies, remove outgoing
> edges, create unconditional branches from each block to the next in the
> list.
>
> Cheers,
> Ralf
>
> > Hi all,
> >
> > Sorry to dig up an old thread but I wondered what the status of
> > if-conversion in LLVM is. Has any work been done towards handling this as a
> > transform pass on the IR?
> >
> > I'm looking to implement an if-conversion pass and wanted to ensure that I'm
> > not duplicating work. Is this something that others would also find useful?
> >
> > Rob
> >
> > --
> > View this message in context:
> http://llvm.1065342.n5.nabble.com/if-conversion-tp2349p62937.html
> > Sent from the LLVM - Dev mailing list archive at Nabble.com.
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

  Thank you,
     Alex

Hi Rob,

The problem here is that the d[i] array is only conditionally accessed, and so we can't if-convert the loop body. The compiler does not know that d[i] is actually dereferenceable for all i from 0 to 15 (the array might be shorter and p[i] is 0 for i past the end of d's extent).

You can inform LLVM that it is safe to vectorize anyway by adding a pragma, like this:

  #pragma clang loop vectorize(assume_safety)

or you can use the OpenMP 4 pragma:

  #pragma omp simd

-Hal

Hi,

Hi Rob,

The problem here is that the d[i] array is only conditionally accessed, and so we can’t if-convert the loop body. The compiler does not know that d[i] is actually dereferenceable for all i from 0 to 15 (the array might be shorter and p[i] is 0 for i past the end of d’s extent).

You can inform LLVM that it is safe to vectorize anyway by adding a pragma, like this:

#pragma clang loop vectorize(assume_safety)

I don’t think it would be enough (due to current limitations of vectorizer). The following expression will be vectorizable:
res[i] = (p[i] == 0) ? (res[i] + c[i]) : (res[i] + d[i]); // Note “+ c[i]”
This way, vectorizer deals with similar expressions in both true and false branches, and just needs a vector select to vectorize them. In the original code it needs to figure out, that it needs some kind of “+ 0” in the true branch, and it currently fails to do so.

Another way to help vectorizer in this case is to manually hoist the loads, like this:
int tmp_d = d[i];
res[i] = (p[i] == 0) ? res[i] : (res[i] + tmp_d);

Hope this helps,
Michael

From: "Michael Zolotukhin" <mzolotukhin@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "RCU" <alex.e.susu@gmail.com>, "llvm-dev@lists.llvm.org >>
llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Saturday, April 23, 2016 2:25:23 PM
Subject: Re: [llvm-dev] if-conversion

Hi,

> Hi Rob,

> The problem here is that the d[i] array is only conditionally
> accessed, and so we can't if-convert the loop body. The compiler
> does not know that d[i] is actually dereferenceable for all i from
> 0
> to 15 (the array might be shorter and p[i] is 0 for i past the end
> of d's extent).

> You can inform LLVM that it is safe to vectorize anyway by adding a
> pragma, like this:

> #pragma clang loop vectorize(assume_safety)

I don’t think it would be enough (due to current limitations of
vectorizer). The following expression will be vectorizable:
res[i] = (p[i] == 0) ? (res[i] + c[i]) : (res[i] + d[i]); // Note “+
c[i]”
This way, vectorizer deals with similar expressions in both true and
false branches, and just needs a vector select to vectorize them. In
the original code it needs to figure out, that it needs some kind of
“+ 0” in the true branch, and it currently fails to do so.

Do we have a bug report open on this? If not, we should. You appear to be right, however, this seems not by design but by omission. It's not an explicit check in LoopVectorizationLegality::canVectorizeWithIfConvert, but that we end up IR like this:

loop:
...
%res = load ...
%p = load
%c = icmp eq i32 %p, 0
%br i1 %c, label %load_and_add, label %do_store

load_and_add:
%d = load ...
%resd = add i32 %res, % d

do_store:
%v = phi [ %resd, %load_and_add ], [ %res, %loop ]
store %v, ...

And the relevant check in LoopVectorizationLegality::blockCanBePredicated, does this:

if (it->mayReadFromMemory()) {
LoadInst *LI = dyn_cast<LoadInst>(it);
if (!LI)
return false;
if (!SafePtrs.count(LI->getPointerOperand())) {
if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) ||
isLegalMaskedGather(LI->getType())) {
MaskedOp.insert(LI);
continue;
}
return false;
}
}

and there's no exception here for (Hints.getForce() == LoopVectorizeHints::FK_Enabled ).

The check just above this (for possibly-trapping constant operands), and the corresponding check in canIfConvertPHINodes, we might also skip if we're explicitly instructed to vectorize?

Also, one exception: If you target an instruction set with predicated (masked) vector loads, then we will vectorize this loop (with or without the pragma) by generated masked loads.

-Hal

From: "Hal Finkel via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Michael Zolotukhin" <mzolotukhin@apple.com>
Cc: "llvm-dev@lists.llvm.org >> llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Sunday, April 24, 2016 8:09:24 AM
Subject: Re: [llvm-dev] if-conversion

> From: "Michael Zolotukhin" <mzolotukhin@apple.com>

> To: "Hal Finkel" <hfinkel@anl.gov>

> Cc: "RCU" <alex.e.susu@gmail.com>, "llvm-dev@lists.llvm.org >>
> llvm-dev" <llvm-dev@lists.llvm.org>

> Sent: Saturday, April 23, 2016 2:25:23 PM

> Subject: Re: [llvm-dev] if-conversion

> Hi,

>

> > Hi Rob,
>

> > The problem here is that the d[i] array is only conditionally
> > accessed, and so we can't if-convert the loop body. The compiler
> > does not know that d[i] is actually dereferenceable for all i
> > from
> > 0
> > to 15 (the array might be shorter and p[i] is 0 for i past the
> > end
> > of d's extent).
>

> > You can inform LLVM that it is safe to vectorize anyway by adding
> > a
> > pragma, like this:
>

> > #pragma clang loop vectorize(assume_safety)
>

> I don’t think it would be enough (due to current limitations of
> vectorizer). The following expression will be vectorizable:

> res[i] = (p[i] == 0) ? (res[i] + c[i]) : (res[i] + d[i]); // Note
> “+
> c[i]”

> This way, vectorizer deals with similar expressions in both true
> and
> false branches, and just needs a vector select to vectorize them.
> In
> the original code it needs to figure out, that it needs some kind
> of
> “+ 0” in the true branch, and it currently fails to do so.

Do we have a bug report open on this? If not, we should. You appear
to be right, however, this seems not by design but by omission. It's
not an explicit check in
LoopVectorizationLegality::canVectorizeWithIfConvert, but that we
end up IR like this:

loop:
...
%res = load ...
%p = load
%c = icmp eq i32 %p, 0
%br i1 %c, label %load_and_add, label %do_store

load_and_add:
%d = load ...
%resd = add i32 %res, %d

do_store:
%v = phi [ %resd, %load_and_add ], [ %res, %loop ]
store %v, ...

And the relevant check in
LoopVectorizationLegality::blockCanBePredicated, does this:

if (it->mayReadFromMemory()) {
LoadInst *LI = dyn_cast<LoadInst>(it);
if (!LI)
return false;
if (!SafePtrs.count(LI->getPointerOperand())) {
if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) ||
isLegalMaskedGather(LI->getType())) {
MaskedOp.insert(LI);
continue;
}
return false;
}
}

and there's no exception here for (Hints.getForce() ==
LoopVectorizeHints::FK_Enabled).

The check just above this (for possibly-trapping constant operands),
and the corresponding check in canIfConvertPHINodes, we might also
skip if we're explicitly instructed to vectorize?

Also, one exception: If you target an instruction set with predicated
(masked) vector loads, then we will vectorize this loop (with or
without the pragma) by generated masked loads.

With r267514 and r267515, we should now vectorize this loop with #pragma clang vectorize(assume_safety).

-Hal

Hello.
     Thank you very much for your prompt answer.
     It was very helpful - for example, I was able to manually hoist the loads with my Nov2015 LLVM version (int tmp_d = d[i], etc).

     I am curious: you mention that with r267514 and r267515, you should now vectorize the loop with #pragma clang vectorize(assume_safety).
     By giving:
      svn log http://llvm.org/svn/llvm-project/llvm/trunk LoopVectorize.cpp I get
      svn: E160013: File not found: revision 267891, path '/llvm/trunk/LoopVectorize.cpp'
     Also at https://llvm.org/svn/llvm-project/llvm/trunk/ it is written: "llvm-project - Revision 267891: /llvm/trunk" .
     So the current revision is 267891, so it includes I guess r267514 and r267515.
     Since I don't know well subversion I've downloaded LoopVectorize.cpp from https://llvm.org/svn/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp and planning to make a new build.

   Best regards,
     Alex