Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Nema_Ashutosh · September 25, 2018, 7:23am

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {

for (int i = 0; i < len; i++)

if (C[i] > VSMALL)

A[i] = B[i] / C[i];

}

In this test the div operation is conditional but llvm is generating unconditional div for this case:

vector.body: ; preds = %vector.body, %vector.ph

%index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]

%0 = getelementptr inbounds float, float* %C, i64 %index

%1 = bitcast float* %0 to <8 x float>*

%wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6

%2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30

%3 = getelementptr inbounds float, float* %B, i64 %index

%4 = bitcast float* %3 to <8 x float>*

%wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9

%5 = fdiv <8 x float> %wide.masked.load, %wide.load

%6 = getelementptr inbounds float, float* %A, i64 %index

%7 = bitcast float* %6 to <8 x float>*

call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13

%index.next = add i64 %index, 8

%8 = icmp eq i64 %index.next, %n.vec

br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

If I change the data type of A, B & C to the integer type then it generates the right code, where div is predicated based on the mask, and scalar div gets generated for each lane.

This seems like a problem in predicate instruction detection part of LV, currently it considers only UDiv, SDiv, URem, SRem.

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I, unsigned VF) {

if (!Legal->blockNeedsPredication(I->getParent()))

return false;

switch(I->getOpcode()) {

default:

break;

case Instruction::UDiv: ← Floating point operations not considered i.e FDiv & FRem

case Instruction::SDiv:

case Instruction::SRem:

case Instruction::URem:

return mayDivideByZero(*I);

}

I don’t have any background of this function, but I feel this should consider FDiv & FRem instructions as well.

If there is no objection to it, will do a patch.

Thanks,

Ashutosh

Robin_Kruppe · September 25, 2018, 9:44am

Hi Ashutosh,

Hi,

Consider the following test case:

int foo(float *A, float *B, float *C, int len, int VSMALL) {

  for (int i = 0; i < len; i++)

    if (C[i] > VSMALL)

      A[i] = B[i] / C[i];

}

In this test the div operation is conditional but llvm is generating unconditional div for this case:

vector.body: ; preds = %vector.body, %vector.ph

  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]

  %0 = getelementptr inbounds float, float* %C, i64 %index

  %1 = bitcast float* %0 to <8 x float>*

  %wide.load = load <8 x float>, <8 x float>* %1, align 4, !tbaa !2, !alias.scope !6

  %2 = fcmp ogt <8 x float> %wide.load, %broadcast.splat30

  %3 = getelementptr inbounds float, float* %B, i64 %index

  %4 = bitcast float* %3 to <8 x float>*

  %wide.masked.load = call <8 x float> @llvm.masked.load.v8f32.p0v8f32(<8 x float>* %4, i32 4, <8 x i1> %2, <8 x float> undef), !tbaa !2, !alias.scope !9

  %5 = fdiv <8 x float> %wide.masked.load, %wide.load

  %6 = getelementptr inbounds float, float* %A, i64 %index

  %7 = bitcast float* %6 to <8 x float>*

  call void @llvm.masked.store.v8f32.p0v8f32(<8 x float> %5, <8 x float>* %7, i32 4, <8 x i1> %2), !tbaa !2, !alias.scope !11, !noalias !13

  %index.next = add i64 %index, 8

  %8 = icmp eq i64 %index.next, %n.vec

  br i1 %8, label %middle.block, label %vector.body, !llvm.loop !14

The generated IR seems unsafe because fdiv is not respecting the compare mask.

As div is the unsafe operation, llvm should generates the predicated divs.

Can you elaborate on why you think the floating point operations are
"unsafe" and need to be predicated? Integer division by zero and
remainder by zero is Undefined Behavior, but the corresponding
floating point operations just result in a NaN or infinity in "error"
cases such as division by zero.

You might be thinking about the "floating point exceptions" that these
operations can signal. If so, keep in mind that by default these do
not trap but simply make the operation silently return in a default
value such as an infinity, zero, or NaN. The LLVM IR instructions fdiv
and frem (as well as their siblings fadd, fmul, etc.) are assumed to
execute in an environment [1] where this default handling is not
changed and where nobody inspects any flags (e.g., in an FPU status
register) that may be set when exceptions occur. Programs where this
assumption is not true have to use the constrained fp intrinsics [2],
which indeed constrain the vectorizer and all other optimization
passes (LV is far from the only pass that will move an fdiv out of a
conditional).

Cheers,
Robin

[1]: LLVM Language Reference Manual — LLVM 18.0.0git documentation
[2]: LLVM Language Reference Manual — LLVM 18.0.0git documentation

Nema_Ashutosh · September 26, 2018, 4:47am

Thanks for the detailed explanation Robin, was not aware of this fact that for the floating point operation llvm assumes:

"The default LLVM floating-point environment assumes that floating-point instructions do not have side effects. Results assume the round-to-nearest rounding mode. No floating-point exception state is maintained in this environment."

The test snip mentioned in my previous mail if from openFOAM application, it fails at runtime because of unconditional FDIV.

Thanks,
Ashutosh

TNorthover · September 26, 2018, 7:16am

People are working on supporting the floating point environment on
LLVM at the moment, but it looks like that program will need changing
too. You're only allowed to rely on exception state if you have
"#pragma STDC FENV_ACCESS ON" which it doesn't seem to.

Cheers.

Tim.

Topic		Replies	Views
Help understanding and lowering LLVM IDS conditional codes correctly LLVM Dev List Archives	7	80	March 14, 2017
Loop vectorization and unsafe floating point math LLVM Dev List Archives	3	128	June 25, 2020
llvm 10: Why is float experimental_vector_reduce_fmin not tried? LLVM Dev List Archives	1	65	December 2, 2020
Problems expanding fcmp to a libcall LLVM Dev List Archives	10	141	July 7, 2008
default behavior or LLVM Dev List Archives	6	89	May 27, 2020

Unsafe floating point operation (FDiv & FRem) in LoopVectorizer

Related Topics