constant folding for standard math functions

Hi!

I'd like to replace all calls to standard math functions (e.g. sin(0.5)) by
their result.
What strategy do you recommend?
Should I write a pass that does only this or should I copy and
modify the SCCP pass?

A problem with an extra pass could be that that I need to alternate
my pass and SCCP several times since the results of the math functions
could be folded again.

-Jochen

sin/cos etc should already be handled by lib/Analysis/ConstantFolding.cpp.

-Chris

Hi!

sin/cos etc should already be handled by lib/Analysis/ConstantFolding.cpp.
   
Thanks for the hint and it works!
Now I have a new Problem:

I have this function:

float foo(float a, float b)
{
     float x = a * b * 0.0f;
     return cos(0.5) * sin(0.5) * x;
};

after compiling it with clang (cpp mode) and renaming _ZSt3sinf to sin
and _ZSt3cosf to cos I get the following:

define float @_Z3fooff(float %a, float %b) nounwind {
entry:
   %mul = fmul float %a, %b ; <float> [#uses=1]
   %mul2 = fmul float %mul, 0.000000e+000 ; <float> [#uses=1]
   %mul6 = fmul float 0x3FDAED54A0000000, %mul2 ; <float> [#uses=1]
   ret float %mul6
}

the sin and cos calls are folded, but not the mul by zero.
May be this is missing in llvm::ConstantFoldInstOperands in ConsantFolding.cpp?

I would expect the following optimizations, but didn't find them in the code:
x + 0 = x
x * 0 = 0
x * 1 = 1
x * -1 = -x

-Jochen

Is x*0 => 0 true if isnan(x)?

And cos(x)*sin(x) makes me desperately want to fold it to sin(2*x)/2,
but I suppose that's not allowed either.

Nope, any multiply with q.nan results in q.nan, no matter what the other arguments value is.

So '0 * q.nan == q.nan', and not 0.

Is x*0 => 0 true if isnan(x)?
   

then where do I have to add it if I want to make it a non-standard modification
of my local llvm version?
would it make sense to add a subset of float to llvm or a kind of modifier
(e.g. valid float to indicate that it is always valid and not nan) to allow more aggressive optimization?
just like e.g. inbounds for load

-Jochen

You should check out the -enable-finite-only-fp-math and -enable-unsafe-fp-math options.

You should check out the -enable-finite-only-fp-math and -enable-unsafe-fp-math options.

Good hint, but

     llvm::UnsafeFPMath = true;
     llvm::FiniteOnlyFPMathOption = true;

at the beginning of my code does not help.
I found llvm::Reassociate::OptimizeExpression in llvm\lib\Transforms\Scalar\Reassociate.cpp
which looks like it does X * 0 = 0 for int, but it does not get called for int,
but it works for int.

-Jochen

These flags only affect the code generator. If you want to add this optimization to your copy of llvm, you can do so by adding it to the instcombine pass. We'd need IR enhancements to do this sort of thing for real in the mid-level optimizer, some thoughts are here:
http://nondot.org/sabre/LLVMNotes/FloatingPointChanges.txt

However, this is far from being a concrete proposal.

-Chris

These flags only affect the code generator. If you want to add this optimization to your copy of llvm, you can do so by adding it to the instcombine pass. We'd need IR enhancements to do this sort of thing for real in the mid-level optimizer, some thoughts are here:
http://nondot.org/sabre/LLVMNotes/FloatingPointChanges.txt
   
Thanks, now it works. I inserted

       if (Op1F->isNullValue())
         return ReplaceInstUsesWith(I, Op1C); // X * 0 == 0

into llvm::InstCombiner::visitFMul

Why not at first create a compile time option for this so that the
code is already available for special purposes?

-Jochen

I'm not sure how that would work, but it most likely wouldn't fit with the design of llvm. If this is important, I'd rather fix the representational issue.

-Chris

I agree. The option should control the kind of IR generated as Chris outlined
in the document.

We need to figure out exactly what "strict" means. Is it like "restrict" in C
where it only has meaning with respect to other things marked "restrict"
or is it more general? Can I reassociate a strict with a relaxed? Is it
possible that "strict" will mean different things for different operations?

We may need something akin to a fence over which no code motion can
occur.

Some compilers use special operations to represent parentheses for Fortran
codes. I don't think that will work very well for LLVM, however.

I think the proposal is heading in the right direction. We just need to fill
in the details.

                                                       -Dave

I'm not sure how that would work, but it most likely wouldn't fit with the design of llvm. If this is important, I'd rather fix the representational issue.

Just
#ifdef RELAXED_FLOAT
here the code like I posted
#endif

what do you mean by the representational issue? something in the IR?
Perhaps something like the value tracking can be implemented that
tracks if a float can be NaN. then only a flag is needed for the load
instruction that the float from memory is assumed not to be NaN.

-Jochen

I'm not sure how that would work, but it most likely wouldn't fit with the design of llvm. If this is important, I'd rather fix the representational issue.

Just
#ifdef RELAXED_FLOAT
here the code like I posted
#endif

Ok, that's what I thought, we don't want that in mainline.

what do you mean by the representational issue? something in the IR?

Yes. Each fp operation should have the information tagged onto like (like nuw/nsw on integer ops). This allows inlining between functions with different settings, allows expressing things like the fortran parentheses rules etc.

-Chris

Why not at first create a compile time option for this so that the
code is already available for special purposes?

I'm not sure how that would work, but it most likely wouldn't fit with the
design of llvm. If this is important, I'd rather fix the representational
issue.

I agree. The option should control the kind of IR generated as Chris outlined
in the document.

We need to figure out exactly what "strict" means. Is it like "restrict" in C
where it only has meaning with respect to other things marked "restrict"
or is it more general? Can I reassociate a strict with a relaxed? Is it
possible that "strict" will mean different things for different operations?

I think the strict flag should determine the mathematical properties of
the operation itself, i.e. give it all the properties that integer
operations have (associativity, and whatever else is missing).

You then have to write down what mathematical properties you need for
your optimizing transformation, and if ALL operations involved in it
fulfill the properties you can do the xform.
You can have strict and relaxed operations in same functions, and
optimize (even the FP values that would eventually be used in a strict
OP), if the operations involved in the optimization step has all the
needed mathematical properties.

You shouldn't realistically have both relaxed and strict ops in the same
function, but it can happen with inlining.

Lets note associative adds with a+, and strict ones with s+.

For associativity for example you need 2 operations to be associative:
(X a+ Y) a+ Z -> you can transform this to X a+ (Y a+ Z)

(X s+ Y) a+ Z -> you can't transform this

((X s+ Y) a+ X) a+ Y, you can partially transform it,
1) lets note X s+ Y = Z
2) (Z a+ X) a+ Y -> Z a+ (X a+ Y) -> (X s+ Y) a+ (X a+ Y)
3) you can always choose to restrict a relaxed op (a+ -> s+)
    (X s+ Y) a+ (X s+ Y) -> 2 a* (X s+ Y)

We may need something akin to a fence over which no code motion can
occur.

Maybe for signaling NaN, I don't know how those things work.
I tend to avoid using any FP code in my code :wink:

Some compilers use special operations to represent parentheses for Fortran
codes. I don't think that will work very well for LLVM, however.

With strict/relaxed flags for each operation (not value!) I think it can
work, IF the frontend sets the flags appropriately.

Best regards,
--Edwin

> We need to figure out exactly what "strict" means. Is it like "restrict"
> in C where it only has meaning with respect to other things marked
> "restrict" or is it more general? Can I reassociate a strict with a
> relaxed? Is it possible that "strict" will mean different things for
> different operations?

Lets note associative adds with a+, and strict ones with s+.

For associativity for example you need 2 operations to be associative:
(X a+ Y) a+ Z -> you can transform this to X a+ (Y a+ Z)

(X s+ Y) a+ Z -> you can't transform this

((X s+ Y) a+ X) a+ Y, you can partially transform it,
1) lets note X s+ Y = Z
2) (Z a+ X) a+ Y -> Z a+ (X a+ Y) -> (X s+ Y) a+ (X a+ Y)
3) you can always choose to restrict a relaxed op (a+ -> s+)
    (X s+ Y) a+ (X s+ Y) -> 2 a* (X s+ Y)

That's sort of along the lines of what I was thinking. I was just trying to
list the questions that immediately came to mind, not looking for solutions
just yet. :slight_smile:

> We may need something akin to a fence over which no code motion can
> occur.

Maybe for signaling NaN, I don't know how those things work.
I tend to avoid using any FP code in my code :wink:

Signalling NaN is one case. I'm sure there are others.

> Some compilers use special operations to represent parentheses for
> Fortran codes. I don't think that will work very well for LLVM, however.

With strict/relaxed flags for each operation (not value!) I think it can
work, IF the frontend sets the flags appropriately.

I agree.

                                                          -Dave

The only other thing I could imagine that it is useful for is for rounding mode control. IMO rounding mode should be explicitly marked on the instruction as well.

-Chris

> Signalling NaN is one case. I'm sure there are others.

The only other thing I could imagine that it is useful for is for rounding
mode control.

Yep.

IMO rounding mode should be explicitly marked on the
instruction as well.

That would also be useful for some GPUs where each instruction can specify
its own rounding mode.

                                                -Dave

SSE4 also has this for at least one conversion instruction.

-Chris

As does ARM NEON. Sorta. vcvt normally uses round-to-zero, but it can optionally use the mode specified by the control register instead.