I'm interested in implementing an InstCombine optimization that I discovered and verified with Alive-NJ (with the help of the authors of Alive-NJ). The optimization is described in Alive-NJ format as follows:

Effectively the optimization targets code that casts a float to an int with the same width, XORs the sign bit, and casts back to float, and replaces it with a subtraction from -0.0.

I am not very familiar with C++ or the LLVM codebase so I would greatly appreciate some help in writing a patch adding this optimization.

I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours) and either canonicalize NaNs or don’t support denormals. This is actually important because this kind of bitmath on floats is very commonly used as part of algorithms for complex math functions that need to get precise bit patterns from the source (similarly for the transformation of masking off the sign bit → fabs). It’s also important because if the float happens to “really” be an integer, it’s highly likely we’ll end up zero-flushing it and losing the data.

Example:

a = load float
b = bitcast a to int
c = xor b, signbit
d = bitcast c to float
store d

Personally I would feel this is safe if and only if the float is coming from an arithmetic operation — in that case, we know that doing another arithmetic operation on it should be safe, since it’s already canonalized and can’t be a denorm [if the platform doesn’t support them].

I say this coming only a few weeks after our team spent literally dozens of human-hours tracking down an extremely obscure bug involving a GL conformance test in which ints were casted to floats, manipulated with float instructions, then sent back to int, resulting in the ints being flushed to zero and the test failing.

The problem is that it proves it’s correct on a certain set of semantics, but not all targets will share those semantics, which is something you have to be very careful about with floating point. In some cases “correct under IEEE” means “correct under non-IEEE”, but in other cases it doesn’t.

Yes, if transforming the int op to an FP op induces target-dependent
behavior, then we can't do this transform in InstCombine without some kind
of predication. And we should revert or rein in: http://reviews.llvm.org/rL249702

As I noted in that commit message, it's not clear what the FP model of LLVM
IR actually is. Based on existing IR transforms, I assumed it was
IEEE-754...ish.

I feel like a reasonable way of defining it would be that it’s IEEE-754-like, except that if the target diverges from IEEE-754 (in terms of ’missing some part of IEEE-754’ or ‘less precise than IEEE-574’ etc), the transformations we introduce shouldn’t break other things.

e.g. if a target does not support denormals, float ops throughout LLVM are as such free to lose their denormals at any time. This doesn’t normally cause problems, but if you allow turning int ops into float ops, you now allow int operations to destroy denormals at any time, which is more powerful (and more dangerous) than the target natively defines, I think.

This doesn’t seem like a good idea to me. There are many architectures where those bitcasts are free operations and the xor will be executed in a shorter pipe than any FP op would. Cell SPU, for example.

It’s definitely one that would need some target hooks, and is probably not actually worth doing without analysing the producers and consumers of the value. If the source and destination values need to be in floating point registers, the cost of FPR<->GPR moves is likely to be a lot higher than the cost of the subtract, even if the xor is free. If the results are going to end up in integer registers or memory, then the xor version is probably cheaper (though, even there, it may be better for register pressure to keep the results in FPRs).

I’d expect that most users of this pattern are immediately followed by a branch on the result. On some architectures, that can become a branch on a floating point condition code, but on others it’s going to be a move to GPR, which means that you lose the win entirely.

Personal feeling: LLVM should not assume anything about relative costs of floats or ints (if we want to, there should be some sort of target hooks involved). There are some targets where float costs 100 times more than int, and there are some where int costs lots more than float, so I don’t think it’s obvious exactly what is and isn’t canonical for anything where one is choosing between float and int ops, even ignoring issues of correctness.

Yes, I agree with that assessment now. And given the FP correctness issues raised, I don’t see any hope for D18874 as-is.

Creating an fneg intrinsic (or IR instruction?) was proposed in D18874, and I think that’s been considered before (but rejected?). I don’t understand what effects that would have.

Now that we have raised the question of FP correctness, I think we need to answer the question: what can target-independent IR passes assume about the underlying LLVM IR FP machine? We’d like to be flexible enough to handle a target that doesn’t support denorms. Are there other considerations? Is it safe to do any FP transforms in InstCombine?

I am not entirely sure this is safe. Transforming this to an fsub could change the value stored on platforms that implement negates using arithmetic instead of with bitmath (such as ours)

I think it’s probably safe for IEEE754-2008 conformant platforms because negation was clarified to be a non-arithmetic bit flip that cannot cause exceptions in that specification. However, I’m sure it’s unsafe for some IEEE754-1985 platforms because it introduces exceptions when given a NaN.

On MIPS, the semantics for negation depend on a configuration bit (ABS2008) but in practice the majority of MIPS environments use arithmetic negation and trigger exceptions when negating a NaN. That said, the most recently published MIPS specifications require non-arithmetic negation and drop support for the IEEE754-1985 standard.

This could introduce new FP exceptions. It’s also likely to be much worse on platforms with no FPU like early MIPS.

Quite a few modern implementations too. MIPS is often used in domains where having an FPU would be wasteful.

I did some digging into IEEE-754 and it seems like this is actually not even safe on fully conformant IEEE-754-2008 platforms.

5.5.1 Sign bit operations
5.5.1.0 Implementations shall provide the following homogeneous quiet-computational sign bit operations for all supported arithmetic formats; they only affect the sign bit. The operations treat floating-point numbers and NaNs alike, and signal no exception. These operations may propagate non-canonical encodings.

copy(x) copies a floating-point operand x to a destination in the same format, with no change to the sign bit.
negate(x) copies a floating-point operand x to a destination in the same format, reversing the sign bit. negate(x) is not the same as subtraction(0, x) (see 6.3).

Note the MAY. fneg is required to flip the top bit even if the input is a NaN. But fneg is not required to maintain the other bits. If the input is a non-canonical NaN, the fneg MAY canonicalize it. In fact, even the ‘copy’ MAY canonicalize it. (it also MAY choose to not canonicalize it)

Thus, if the integer being fneg’d is a non-canonical NaN, fneg MAY modify bits other than the top bit.

I just wanted to stress this for future discussions: One important goal of the intermediate representation is to normalize the program.
If something can be represented by two equivalent IR constructs then in general we should try to choose one variant as normal form and transform to that!

If it turns out that it is the wrong variant for the target, we can still transform into the other direction during code selection. Of course this rule cannot universally be applied if reversing the operation in the backend is unreasonable.

(and of course for this specific case we have to decide first whether the two patterns are equivalent anyway given existing llvm backends)

Keep in mind we had a real GLSL conformance test fail because of exactly this sort of “optimization” (using float operations to perform non-destructive operations on values that are actually integers).