Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {

for (int i = 0; i < len; i++)

if (C[i] > VSMALL)

A[i] = B[i] / C[i];

}

In this test the div operation is conditional but llvm is generating unconditional div for this case:

vector.body: ; preds = %vector.body, %vector.ph

%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]

%0 = getelementptr inbounds float, float* %C, i64 %index

%1 = bitcast float* %0 to <8 x float>*

%wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6

%2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30

%3 = getelementptr inbounds float, float* %B, i64 %index

%4 = bitcast float* %3 to <8 x float>*

%wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9

%5 = fdiv <8 x float> %wide.masked.load, %wide.load

%6 = getelementptr inbounds float, float* %A, i64 %index

%7 = bitcast float* %6 to <8 x float>*

call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13

%index.next = add i64 %index, 8

%8 = icmp eq i64 %index.next, %n.vec

br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane.

This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem.

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, unsigned VF) {

if (!Legal->blockNeedsPredication(I->getParent()))

return false;

switch(I->getOpcode()) {

default:

break;

case Instruction::UDiv: ← Floating point operations not considered i.e FDiv & FRem

case Instruction::SDiv:

case Instruction::SRem:

case Instruction::URem:

return mayDivideByZero(*I);

}

I don’t have any background of this function, but I feel this should consider FDiv & FRem instructions as well.

If there is no objection to it, will do a patch.

Thanks,

Ashutosh

Hi Ashutosh,

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {

  for (int i = 0; i < len; i++)

    if (C[i] > VSMALL)

      A[i] = B[i] / C[i];

}

In this test the div operation is conditional but llvm is generating unconditional div for this case:

vector.body: ; preds = %vector.body, %vector.ph

  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]

  %0 = getelementptr inbounds float, float* %C, i64 %index

  %1 = bitcast float* %0 to <8 x float>*

  %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6

  %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30

  %3 = getelementptr inbounds float, float* %B, i64 %index

  %4 = bitcast float* %3 to <8 x float>*

  %wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9

  %5 = fdiv <8 x float> %wide.masked.load, %wide.load

  %6 = getelementptr inbounds float, float* %A, i64 %index

  %7 = bitcast float* %6 to <8 x float>*

  call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13

  %index.next = add i64 %index, 8

  %8 = icmp eq i64 %index.next, %n.vec

  br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

Can you elaborate on why you think the floating point operations are
"unsafe" and need to be predicated? Integer division by zero and
remainder by zero is Undefined Behavior, but the corresponding
floating point operations just result in a NaN or infinity in "error"
cases such as division by zero.

You might be thinking about the "floating point exceptions" that these
operations can signal. If so, keep in mind that by default these do
not trap but simply make the operation silently return in a default
value such as an infinity, zero, or NaN. The LLVM IR instructions fdiv
and frem (as well as their siblings fadd, fmul, etc.) are assumed to
execute in an environment [1] where this default handling is not
changed and where nobody inspects any flags (e.g., in an FPU status
register) that may be set when exceptions occur. Programs where this
assumption is not true have to use the constrained fp intrinsics [2],
which indeed constrain the vectorizer and all other optimization
passes (LV is far from the only pass that will move an fdiv out of a
conditional).

Cheers,
Robin

[1]: https://llvm.org/docs/LangRef.html#floating-point-environment
[2]: LLVM Language Reference Manual — LLVM 16.0.0git documentation

Thanks for the detailed explanation Robin, was not aware of this fact that for the floating point operation llvm assumes:

"The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment."

The test snip mentioned in my previous mail if from openFOAM application, it fails at runtime because of unconditional FDIV.

Thanks,
Ashutosh

People are working on supporting the floating point environment on
LLVM at the moment, but it looks like that program will need changing
too. You're only allowed to rely on exception state if you have
"#pragma STDC FENV_ACCESS ON" which it doesn't seem to.

Cheers.

Tim.