From: Peter Collingbourne [mailto:peter@pcc.me.uk]
Sent: Thursday, September 08, 2011 10:28 AM
To: Villmow, Micah
Cc: Robert Quill; anton.lokhmotov@arm.com; cfe-dev@cs.uiuc.edu;
llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] [cfe-dev] Proposal: floating point accuracy
metadata (OpenCL related)
> Peter,
> Is there a way to make this flag globally available? Metadata can be
fairly expensive to handle at each node when in many cases it is a
global flag and not a per operation flag.
There are two main reasons why I think we shouldn't go for global
flags:
1) It becomes difficult if not impossible to correctly link together
modules with different accuracy requirements, especially if LTO
is done on the combined module.
2) Some LLVM optimisations will create operations with a accuracy
requirement different from the language specified accuracy.
For example, consider the following OpenCL kernel:
-----
#pragma OPENCL EXTENSION cl_khr_fp64: enable
__kernel void dpdiv(__global float *result, float x, float y) {
*result = (double) x / (double) y;
}
-----
When compiled to LLVM with optimisations turned off, the function
looks like this:
-----
define void @dpdiv(float* %result, float %x, float %y) nounwind uwtable
{
entry:
%result.addr = alloca float*, align 8
%x.addr = alloca float, align 4
%y.addr = alloca float, align 4
store float* %result, float** %result.addr, align 8
store float %x, float* %x.addr, align 4
store float %y, float* %y.addr, align 4
%tmp = load float* %x.addr, align 4
%conv = fpext float %tmp to double
%tmp1 = load float* %y.addr, align 4
%conv2 = fpext float %tmp1 to double
%div = fdiv double %conv, %conv2
%conv3 = fptrunc double %div to float
%tmp4 = load float** %result.addr, align 8
store float %conv3, float* %tmp4
ret void
}
-----
With optimisations turned on:
-----
define void @dpdiv(float* nocapture %result, float %x, float %y)
nounwind uwtable {
entry:
%conv3 = fdiv float %x, %y
store float %conv3, float* %result, align 4, !tbaa !1
ret void
}
-----
The main optimisation applied here is near the top of
InstCombiner::visitFPTrunc,
which simplifies fptrunc(fdiv (fpextend x), (fpextend y)) to fdiv(x,
y).
Because double precision floating point divides are accurate in OpenCL,
the single precision divide in the optimised code must also be
accurate, unlike a "direct" single precision divide.
I would imagine that creating a pinned metadata name for fpaccuracy, as
we currently do for dbg, tbaa and prof, would go some way towards
addressing
the efficiency problem.
[Villmow, Micah] Yeah, that could work also.