The Julia language implements sqrt(x) with conditional branch taken if x<0. Alas this prevents vectorization of loops with sqrt. Often the argument can be proven to be non-negative. E.g., sqrt(xx+yy). Is there an existing LLVM pass or analysis that does floating-point range propagation to eliminate such unnecessary checks?

I don’t believe we have much in this area currently.

Generally, something like this would existing in InstCombine and ValueTracking.

Take a look at ComputeSignBit in ValueTracking.cpp. This doesn’t apply (?) to floating point numbers, but we’d need something equivalent for them. It looks like there may already be a start in the form of:
CannotBeNegativeZero

Other places to look would be SimplifyFAdd and InstCombine::visitFAdd.

For this particular example, you’re probably going to want a pattern in SimplifyFCmp of the form:
matcher: sqrt_call( fadd(Value(X), SpecificValue(X)), fadd(Value(Y), SpecificValue(Y)))
&& CannotBeNegativeZero(X) && CannotBeNegativeZero(Y)

You might also look at LazyValueInfo, but that’s probably of secondary interest. It’s purely in terms of constant integer ranges currently.

Thanks for the pointers. Looks like LazyValueInfo has the sort of infrastructure I had in mind. LVILatticeVal could be extended to floating point. (The comment “this can be made a lot more rich in the future” is an invitation :-). I’m thinking a simple lattice would address most cases of interest for floating-point checks. The lattice points for floating-point could be all subsets of the eight-element set:

{-inf,<0,-0,+0,>0,+inf,-nan,+nan},

where <0 and >0 denote finite negative/positive numbers respectively.

From: "Arch Robison" <arch.robison@intel.com>
To: "Philip Reames" <listmail@philipreames.com>, llvmdev@cs.uiuc.edu
Sent: Thursday, January 8, 2015 12:54:32 PM
Subject: Re: [LLVMdev] Floating-point range checks

Thanks for the pointers. Looks like LazyValueInfo has the sort of
infrastructure I had in mind. LVILatticeVal could be extended to
floating point. (The comment “this can be made a lot more rich in
the future” is an invitation :-). I’m thinking a simple lattice
would address most cases of interest for floating-point checks. The
lattice points for floating-point could be all subsets of the
eight-element set:

{-inf,<0,-0,+0,>0,+inf,-nan,+nan},

where <0 and >0 denote finite negative/positive numbers respectively.

I'm not sure. Checks against 1.0 are also common. Why not just add a FP range class, like our constant range, and go from there?

Checks against 1.0 are also common. Why not just add a FP range class, like our constant range, and go from there?

That's certainly another way to go. My worry is that a more complicated lattice gets us deeper into rounding-mode issues and considerably more work for smaller gain. I like the idea of creating an FPRange class. We could start with a simple one and extend it as experience warrants.

From: "Arch Robison" <arch.robison@intel.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Philip Reames" <listmail@philipreames.com>, llvmdev@cs.uiuc.edu
Sent: Thursday, January 8, 2015 1:26:41 PM
Subject: RE: [LLVMdev] Floating-point range checks

> Checks against 1.0 are also common. Why not just add a FP range
> class, like our constant range, and go from there?

That's certainly another way to go. My worry is that a more
complicated lattice gets us deeper into rounding-mode issues and
considerably more work for smaller gain. I like the idea of
creating an FPRange class. We could start with a simple one and
extend it as experience warrants.

I worry about that too -- I think creating a simple one and extending from there makes sense.

With floating point, won't you need to model a partial order for the end points? I thought there were pairs of floating point values which are incomparable. Not sure if partial order is the right abstraction, but I'm pretty sure a total order isn't.

This may make implementing the range comparisons (which are themselves partially ordered) a bit tricky...

Philip
(Who knows just enough about floating point to know he doesn't really know what's he's talking about.)

Just to note, this is probably a different case than eliminating range checks. I’m really surprised that GVN and/or JumpThreading (via LVI Constant) doesn’t get this though. Propagating a constant along an edge is pretty much independent of any floating point semantics. In fact, looking at LVI, this looks really simple to add. See the first check in getEdgeValueLocal. We’re talking probably 20 LOC (assuming it doesn’t flush out other bugs.) I haven’t glanced at the GVN case.

From: "Philip Reames" <listmail@philipreames.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Arch Robison" <arch.robison@intel.com>
Cc: llvmdev@cs.uiuc.edu
Sent: Thursday, January 8, 2015 3:04:01 PM
Subject: Re: [LLVMdev] Floating-point range checks

With floating point, won't you need to model a partial order for the
end
points? I thought there were pairs of floating point values which
are
incomparable. Not sure if partial order is the right abstraction,
but
I'm pretty sure a total order isn't.

This may make implementing the range comparisons (which are
themselves
partially ordered) a bit tricky...

Philip
(Who knows just enough about floating point to know he doesn't really
know what's he's talking about.)

I don't think that's quite the right way to think about it. First, there are FP numbers that fall outside of the number line (NaN, etc.), and those need be represented separately. Second, you need to make sure you're representing your range with sufficient (but perhaps not too much) precision. What I mean is the following. Let's say we have some double precision value, x, known to be [-1.0,1.0]. And then we have this code:

y = x + 1e-38;
z = (y > 1.0);

Now, what happens to the range and is z known to be false? One answer is that it becomes [-1.0 + 1e-38, 1.0 + 1e-38], and z might be true. Another answer is that z is always false, because for no number in [-1.0,1.0] in double precision is y anything greater than exactly 1.0.

So I think the important part is to use exactly the same precision for the range that is used by the values being modeled. APFloat should serve us well in this, so hopefully it will be straightforward.

Yes, the modeling of floating-point is trickier. The wrap-around trick used by ConstantRange seems less applicable, and there are the unordered NaNs. Though in all cases, the key abstraction is a lattice of values, so an instance of FPRange should be thought of as a point on a lattice, not an interval. The lattice needs to be complicated enough the cover the cases of interest, but not so complicated that it gobbles up excessive time and space. My guess is that most of the cases of practical interest are eliminating domain checks and optimizations that need to know that Nan/Inf are not in play.

Yes, the modeling of floating-point is trickier. The wrap-around trick used by ConstantRange seems less applicable, and there are the unordered NaNs. Though in all cases, the key abstraction is a lattice of values, so an instance of FPRange should be thought of as a point on a lattice, not an interval. The lattice needs to be complicated enough the cover the cases of interest, but not so complicated that it gobbles up excessive time and space. My guess is that most of the cases of practical interest are eliminating domain checks and optimizations that need to know that Nan/Inf are not in play.

Even with Inf/Nan, you could use this to add fast-math flags to dominated instructions, even if you can’t find a specific constant.

For example,

%cmp = fcmp oeq %x, %y
if (%cmp) {
// %x and %y aren’t NaN here so can change
%add = fadd float %x, 1.0
// to
%add = fadd nnan float %x, 1.0
}

This does complicate things a little as it means that you may not always be able to just insert a constant in to a range. For example, this with ueq:

%cmp = fcmp oeq %x, 2.0
if (%cmp) {
// %x is 2.0 here, OR is NaN
// So this doesn’t constant fold:
%add = fadd float %x, 1.0
// but this does, because %x isn’t a NaN (imagine the front-end put the fast math flags on here, not a pass)
%add = fadd nnan float %x, 1.0
}

After writing a simple FPRange, I’ve hit a stumbling block. I don’t know what LLVM code should be extended to use it. I was initially thinking of extending LazyValueInfo, but it appears to be used for passes that don’t address the case that I need for Julia. I’m now wondering if I’m better off extending SimplifyFCmpInst to handle the few cases in question instead of trying to be more general. Here’s an example case for Julia where a trivially true domain check should be removed:

%jl_value_t = type {}

@jl_domain_exception = external global %jl_value_t*

I just want to fold the branch. Following Philip’s earlier suggestion, adding a routine CannotBeOrderedLessThanZero (that’s analogous to CannotBeNegativeZero) and making SimplifyFCmpInst use it would be enough to cover the cases of primary interest. Comments?

Yeah, that seems reasonable to me. LazyValueInfo seems to be more about what ranges of values control flow defines. Knowing that 'fmul float %0, %0’ is >= 0 is perfectly reasonable for SimplifyFCmpInst to know.

From: "Pete Cooper" <peter_cooper@apple.com>
To: "Arch Robison" <arch.robison@intel.com>
Cc: "Philip Reames" <listmail@philipreames.com>, "Hal Finkel" <hfinkel@anl.gov>, llvmdev@cs.uiuc.edu
Sent: Tuesday, January 13, 2015 1:40:42 PM
Subject: Re: [LLVMdev] Floating-point range checks

Hi Arch

Yeah, that seems reasonable to me. LazyValueInfo seems to be more
about what ranges of values control flow defines. Knowing that 'fmul
float %0, %0’ is >= 0 is perfectly reasonable for SimplifyFCmpInst
to know.

Agreed (at least with non-nans?).

I don't want your work on an FPRange class to get lost, however, because we do need something like that generally speaking.

I'll submit my SimplifyFCmpInst improvement to llvm-commits after the tests finish running.

So it doesn't get lost, attached is what I had in mind for a basic FPRange.h . The name "FPRange" isn't really quite right since it really represents a lattice point. It's quite coarse-grain, but cheap to compute and hits on the basic properties of IEEE arithmetic. The public interface hides the coarseness, so it could be internally extended to be more precise without affecting clients.