error of using GATHER intrinsic

Hi all,

I am using gather intrinsic to load a value from the same address twice at the same time. Basically, I used my own pass to changed the following bitcode:

%a = getelementptr inbounds [100 x double], [100 x double]* %A, i32, 0, i64 0
%1 = load double, double* a, align

to:

%a = getelementptr inbounds [100 x double], [100 x double]* %A, i32, 0, i64 0
%splat.a = insertelement <2 x double*> undef, double* %a, i32 0
%brcst.a = shufflevector <2 x double*> % splat.a, <2 x double*> undef, <2 x i32> zeroinitializer
%gep.addr = getelementptr <2 x double*> % brcst.a, <2 x i64> zeroinitializer

%1_gather = call <2 x double> @llvm.masked.gather.v8f64(<2 x double*> %gep.addr, i32 8, <2 x i1> <i1 true, i1, true>, <2 x double> undef)

I could load my pass successfully with opt, but I got the following errors when I either run the new bitcode using lli or generate the assembly using llc:

PromoteIntegerOperand Op #2: 0x41bf3a8: v2f64,ch = masked_gather 0x415ec40, 0x41bf030, 0x41bf280, 0x41bbb30, 0x41becb8<LD16[%a]> [ORD=8] [ID=0]

Do not know how to promote this operator’s operand!

Any idea about this error? Or could anyone give me an example how to use the gather intrinsic if there is something wrong with the way I am using it?

Best,
Zhi

Hi Zhi,

Hi Zhi,

Any idea about this error? Or could anyone give me an example how to use the
gather intrinsic if there is something wrong with the way I am using it?

Modulo obvious typos, the snippets look like they ought to work (on
trunk at least). Do you have an actual .ll or .bc and llc invocation
that fails?

Only typo that caught my eye is ‘llvm.masked.gather.v8f64’ which should have v2 instead of v8 to match the <2 x double>

Could that be the problem?

Cheers,
Pete

Hi Tim,

Thanks for your response. It works now if I use llc -mcpu=skylake. But it still fails if I use -mcpu=core-avx2. It seems that avx2 supports gather/scatter, but I am not sure why it doesn’t work.

Best,
Zhi

Hi Pete,

Sorry, that is a typo…

Best,
Zhi

Only typo that caught my eye is ‘llvm.masked.gather.v8f64’ which should have v2 instead of v8 to match the <2 x double>

There's an extra comma after an "i1" too. But they both just result in
LLVM rejecting the code immediately.

But it still fails if I use -mcpu=core-avx2.

My simple tests get correctly expanded to scalar loads. I've still not
seen a selection failure.

It seems that avx2 supports gather/scatter, but I am not sure why it doesn't work.

AVX2 supports some gather instructions, but they're more limited than
the AVX-512 variants ones @llvm.masked.gather was added for. It looks
like you can get the AVX2 ones using x86-specific intrinsics (look for
@llvm.x86.avx2.gather.d.pd etc in tests/CodeGen/X86).

It might make sense to use the AVX2 ones for @llvm.masked.gather as
well, but there would be more register shuffling so it might not.
Either way, no-one seems to have done so yet.

Cheers.

Tim.

Hi Tim,

Thanks for your response. The attached is the .bc file after my pass. I could generate the assembly with -mcpu=skx but not with -mcpu=core-avx2. Could you please take a look? BTW, I am using LLVM-3.7.

Best,
Zhi

test_opt.bc (3.03 KB)

Hi Zhi,

Got it. Thanks. I will try it with the trunk version.

Hi Tim,

I got the following error when I used llc -O3 -mcpu=skx to generate the assembly for the attached bitcode:

llc: /home/zhi/tools/llvm_3.7/llvm-3.7.1.src/lib/Target/X86/X86ISelLowering.cpp:13472: llvm::SDValue LowerIntVSETCC_AVX512(llvm::SDValue, llvm::SelectionDAG&, const llvm::X86Subtarget*): Assertion `Op0.getValueType().getVectorElementType().getSizeInBits() >= 8 && Op.getValueType().getScalarType() == MVT::i1 && “Cannot set masked compare for this operation”’ failed.

Is it also a bug or do I miss something for avx-512 comparison?

Best,
Zhi

ao_opt.bc (32.7 KB)

It's almost certainly a bug in LLVM, though it's difficult to be sure
with such a big .bc file. Your best bet is to reduce it down to the
smallest .bc file you can and report a bug.

Cheers.

Tim.

Hi Tim,

I updated the file a little bit. It seemed that the error was caused by line 825 where it has a select instruction. The file can be successfully compiled on trunk if I change the select instruction to add. Any idea? or I should report a bug.

BTW, the error info is:
llc: /home/zhi/tools/llvm_trunk/source/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:3069: llvm::SDValue llvm::SelectionDAG::getNode(unsigned int, llvm::SDLoc, llvm::EVT, llvm::SDValue): Assertion `Operand.getValueType().bitsLT(VT) && “Invalid sext node, dst < src!”’ failed.

Best,
Zhi

ao_opt.ll (132 KB)