Introduction:
We would like to add new keyword to ‘sdiv’‘udiv’ instructions i.e. ‘no-overflow’.
This is the updated solution devised in the discussion: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118257.html
The proposed keywords:
“nof” stands for ‘no-overflow’
Syntax:
= sdiv nof , ; yields ty:result
= udiv nof , ; yields ty:result
Overview:
If the keyword is present, the compiler can assume no zero values in the denominator. Moreover, for sdiv the division MIN_INT / -1 is prohibited. Otherwise, undefined behavior.
Poison value is returned, in case of division by zero or MIN_INT/-1 if the keyword not present.
Motivation:
In the current state if the loop-vectorizer decides that it should vectorize a loop which contains a predicated integer division - it will vectorize the loop body and scalarize the predicated division instruction into a sequence of branches that guard scalar division operations. In some cases the generated code for this will not be very efficient. Speculating the divides using current vector sdiv instruction is not an option due to the danger of integer divide-by-zero.
There are two ways for ensuring the safety of “vector div under condition”, One way is to use the same condition as the scalar execution. Current serialization approach and previous masked integer div intrinsic proposal (http://lists.llvm.org/pipermail/llvm-dev/2017-October/118257.html) follows this idea. Second way is to check the actual divisor, regardless of the original condition. The ‘no-overflow’ keyword follows this idea. If the original code has possible div-by-zero behavior, for example, the latter approach will end up hiding it – by taking advantage of the undefined behavior.
With the addition of ‘nof’ keyword Clang will lower C\C++ division to ‘nof’ div IR since it will keep the same semantics.
In case the vectorizer decided to vectorize one of the predicated div it can be done by widening the datatype of the div and the ‘nof’ keyword will not hold anymore (because of the risk that one of the predicated lanes may have zero).
Keeping that with the widened datatype will allow codegen to lower that instruction as a vector instruction while ensuring lanes that may have zero values do not trigger a trap.
Implementation considerations:
Initially all the targets can scalarize vector sdiv\udiv instructions to one with ‘nof’ by using guards for each lane:
%r = sdiv <4 x i32> %a, %b can be lowered to:
(assuimg %a = <i32 %a.0, i32 %a.1, i32 %a.2, i32 %a.3>, %b = <i32 %b.0, i32 %b.1, i32 %b.2, i32 %b.3> and %r = <i32 %r.0, i32 %r.1, i32 %r.2, i32 %r.3>)
If CheckSafety(%a.0,%b.0):
%r.0 = sdiv nof i32 %a.0, %b.0
If CheckSafety(%a.1,%b.1):
%r.1 = sdiv nof i32 %a.1, %b.1
If CheckSafety(%a.2,%b.2):
%r.2 = sdiv nof i32 %a.2, %b.2
If CheckSafety(%a.3,%b.3):
%r.3 = sdiv nof i32 %a.3, %b.3
CheckSafety(a,b): (of sdiv)
b != 0 || (b != -1 && a != MIN_INT)
CheckSafety(a,b): (of udiv)
b != 0
Changes in LangRef.rst of udiv/sdiv Instructions: