Proposal: floating point accuracy metadata (OpenCL related)

Hi,

This is my proposal to add floating point accuracy support to LLVM.
The intention is that the frontend may provide metadata to signal to
the backend that it may select a less accurate (i.e. more efficient)
instruction to perform a given operation. This is primarily a
requirement of OpenCL, which specifies that certain floating point
operations may be computed inaccurately.

Comments appreciated.

0001-Annotate-imprecise-FP-division-with-fpaccuracy-metad.patch (3.84 KB)

Hi Peter,

This sounds like I really good idea. One thing that did occur to me
though from an OpenCL point of view is that ULP accuracy requirements
can differ for embedded and full profile so that may need to be handled
somehow.

Thanks,
Rob

Peter,
Is there a way to make this flag globally available? Metadata can be fairly expensive to handle at each node when in many cases it is a global flag and not a per operation flag.

There are two main reasons why I think we shouldn't go for global
flags:

1) It becomes difficult if not impossible to correctly link together
   modules with different accuracy requirements, especially if LTO
   is done on the combined module.

2) Some LLVM optimisations will create operations with a accuracy
   requirement different from the language specified accuracy.
   For example, consider the following OpenCL kernel:

From: Peter Collingbourne [mailto:peter@pcc.me.uk]
Sent: Thursday, September 08, 2011 10:28 AM
To: Villmow, Micah
Cc: Robert Quill; anton.lokhmotov@arm.com; cfe-dev@cs.uiuc.edu;
llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] [cfe-dev] Proposal: floating point accuracy
metadata (OpenCL related)

> Peter,
> Is there a way to make this flag globally available? Metadata can be
fairly expensive to handle at each node when in many cases it is a
global flag and not a per operation flag.

There are two main reasons why I think we shouldn't go for global
flags:

1) It becomes difficult if not impossible to correctly link together
   modules with different accuracy requirements, especially if LTO
   is done on the combined module.

2) Some LLVM optimisations will create operations with a accuracy
   requirement different from the language specified accuracy.
   For example, consider the following OpenCL kernel:

-----
#pragma OPENCL EXTENSION cl_khr_fp64: enable

__kernel void dpdiv(__global float *result, float x, float y) {
  *result = (double) x / (double) y;
}
-----

When compiled to LLVM with optimisations turned off, the function
looks like this:

-----
define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable
{
entry:
  %result.addr = alloca float*, align 8
  %x.addr = alloca float, align 4
  %y.addr = alloca float, align 4
  store float* %result, float** %result.addr, align 8
  store float %x, float* %x.addr, align 4
  store float %y, float* %y.addr, align 4
  %tmp = load float* %x.addr, align 4
  %conv = fpext float %tmp to double
  %tmp1 = load float* %y.addr, align 4
  %conv2 = fpext float %tmp1 to double
  %div = fdiv double %conv, %conv2
  %conv3 = fptrunc double %div to float
  %tmp4 = load float** %result.addr, align 8
  store float %conv3, float* %tmp4
  ret void
}
-----

With optimisations turned on:

-----
define void @dpdiv(float* nocapture %result, float %x, float %y)
nounwind uwtable {
entry:
  %conv3 = fdiv float %x, %y
  store float %conv3, float* %result, align 4, !tbaa !1
  ret void
}
-----

The main optimisation applied here is near the top of
InstCombiner::visitFPTrunc,
which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x,
y).
Because double precision floating point divides are accurate in OpenCL,
the single precision divide in the optimised code must also be
accurate, unlike a "direct" single precision divide.

I would imagine that creating a pinned metadata name for fpaccuracy, as
we currently do for dbg, tbaa and prof, would go some way towards
addressing
the efficiency problem.

[Villmow, Micah] Yeah, that could work also.

Hi Peter,

This is my proposal to add floating point accuracy support to LLVM.
The intention is that the frontend may provide metadata to signal to
the backend that it may select a less accurate (i.e. more efficient)
instruction to perform a given operation.

I'm a bit worried that accuracy metadata may be dropped or modified before
it reaches the backend. Currently, an OpenCL backend can safely assume that
the division operation is the same as specified in the OpenCL standard (i.e.
with the maximum relative error of 2.5 ulps), and map it, say, to the native
division instruction. If "no accuracy metadata" would mean "use correct
rounding", you are correct in that accuracy metadata could be dropped
without negatively affecting accuracy. Dropping accuracy metadata could,
however, negatively affect performance, if the backend would have to go for
a slow software implementation instead of a fast hardware implementation.
If would be better if "no accuracy metadata" would mean "use the default
language accuracy", so only allowing for relaxing accuracy further. But
then LLVM-IR would remain language-dependent.

We could also introduce a set of intrinsics for inaccurate FP
operations. The main disadvantage is that we would need to add an
intrinsic for each FP operation, which could add up to a lot of work.
Furthermore, the new intrinsics would not necessarily be recognised
by the existing optimisers.

The existing optimisers would need to be extended to recognise relaxed
accuracy requirements. Otherwise, they could over-tighten them for safety
(again negatively affecting performance).

I suggest we discuss this on Friday at the LLVM developers' meeting in
London.

Best, Anton.