Hi all,
AVX-512 ISA introduces new vector instructions VCOMPRESS and VEXPAND in order to allow vectorization of the following loops with two specific types of cross-iteration dependencies:
Compress:
for (int i=0; i<N; ++i)
If (t[i])
*A++ = expr;
Expand:
for (i=0; i<N; ++i)
If (t[i])
X[i] = *A++;
else
X[i] = PassThruV[i];
On this poster ( http://llvm.org/devmtg/2013-11/slides/Demikhovsky-Poster.pdf ) you’ll find depicted “compress” and “expand” patterns.
The RFC proposes to support this functionality by introducing two intrinsics to LLVM IR:
llvm.masked.expandload.*
llvm.masked.compressstore.*
The syntax of these two intrinsics is similar to the syntax of llvm.masked.load.* and masked.store.*, respectively, but the semantics are different, matching the above patterns.
%res = call <16 x float> @llvm.masked.expandload.v16f32.p0f32 (float* %ptr, <16 x i1>%mask, <16 x float> %passthru)
void @llvm.masked.compressstore.v16f32.p0f32 (<16 x float> , float* , <16 x i1> )
The arguments - %mask, %value and %passthru all have the same vector length.
The underlying type of %ptr corresponds to the scalar type of the vector value.
(In brief; the full syntax description will be provided in subsequent full documentation.)
The intrinsics are planned to be target independent, similar to masked.load/store/gather/scatter. They will be lowered effectively on AVX-512 and scalarized on other targets, also akin to masked.* intrinsics.
Loop vectorizer will query TTI about existence of effective support for these intrinsics, and if provided will be able to handle loops with such cross-iteration dependences.
The first step will include the full documentation and implementation of CodeGen part.
An additional information about expand load ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=expandload&techs=AVX_512 ) and compress store (https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=compressstore&techs=AVX_512 ) you also can find in the Intel Intrinsic Guide.
- Elena