Add the nneg (non-negative) poison-generating flag to the uitofp instruction, with the following semantics:
uitofp nneg iN %x to fM returns poison if %x is negative.
A corollary is that uitofp nneg iN %x to fM is equivilent to sitofp iN %x to fM.
This would be implemented by having PossiblyNonNegInst support uitofp (currently only supports zext).
Motivation
There are two motivations.
This would allow us to canonicalize sitofp iN %x to fM to uitofp nneg iN %x to fM if we could prove %x is non-negative. This
canonicalization can be useful in that it preserves more
information about the result (namely that it is non-negative).
Just Re-Prove Signbit When We Want to Change Sign.
For example, If a target prefers a certain sign, just re-prove the
signbit of the input in the backend to try and change the cast between sitofp ā uitofp.
This doesnāt really work as was evident when the canonicalization was
introduce without the nneg flag.
We often have better analysis in the middle-end allowing us to prove
non-negative in cases we canāt in the backend, further we throw out
information when lowering so its often simply unprovable later on.
Use a Different Flag Name
This proposal re-uses the nneg flag which is currently in use by zext (for essentially the same reason of allowing the backend to
easily convert zext nneg ā sext).
Personally I think the flag name/current usage fits this extension
well, but there may be confusions I am overlooking.
Iām a bit confused about one aspect of this. The floating-point result of uitofp is, by definition, always non-negative. So, I donāt understand the claim that this canonicalization preserves information.
Iām also inclined to take issue with the terminology here, and I suppose this applies to the existing zext nneg construct as well. LLVM IR is generally agnostic as to whether an integer value is signed or unsigned. So when you say āpoison if %x is negativeā what you really mean is āpoison if the sign bit of %x is setā, right? This is a particularly relevant distinction for the case you are proposing here, because the uitofp instruction is one of the rare cases where the LLVM IR definition does make an explicit distinction between signed and unsigned integers: āThe āuitofp ā instruction regards value as an unsigned integer and converts that value to the ty2 type.ā
Iām not sure I see the value of the canonicalization. I understand that if you know the sign bit isnāt set you can make a much more efficient lowering of uitofp on x86-based targets, but we already have that same lowering for sitofp, so I donāt see the value of the sitofp ā uitofp nneg canonicalization.
Iām a bit confused about one aspect of this. The floating-point result of uitofp is, by definition, always non-negative. So, I donāt understand the claim that this canonicalization preserves information.
As in if we do sitofp ā uitofp when we have proven the operand is non-negative, we may not be able to re-prove that later on (possibly due to erased instructions/context) or in general the backend has less sophisticated analysis. By attaching nneg when we do that (or really anytime we know the operand of uitofp is non-negative) we are preserving the information that we know so that later on we can re-use said information.
Iām also inclined to take issue with the terminology here, and I suppose this applies to the existing zext nneg construct as well. LLVM IR is generally agnostic as to whether an integer value is signed or unsigned. So when you say āpoison if %x is negativeā what you really mean is āpoison if the sign bit of %x is setā, right? This is a particularly relevant distinction for the case you are proposing here, because the uitofp instruction is one of the rare cases where the LLVM IR definition does make an explicit distinction between signed and unsigned integers: āThe āuitofp ā instruction regards value as an unsigned integer and converts that value to the ty2 type.ā
What about icmp s{...}, fcmp, ashr, ā¦ There are clearly cases where the signbit is valuable to know, both for the middle ends sake and in the backend.
Iām not sure I see the value of the canonicalization. I understand that if you know the sign bit isnāt set you can make a much more efficient lowering of uitofp on x86-based targets, but we already have that same lowering for sitofp , so I donāt see the value of the sitofp ā uitofp nneg canonicalization.
This is moreso canonicalization for caonicalizationās sake (similiar to sext ā zext). It may help with some folds, but even if we scrap the canonicalization part of it, having the nneg flag on uitofp allows the backend to make good codegen decisions, which it isnāt able to do now.
Your original RFC said that the nneg flag would let you know that the result was non-negative (which is always true for uitofp). Iām not disputing the value of knowing that the signbit is not set on the operand. That makes a very big difference for this instruction on x86.
I was more concerned about the value of ācanonicalization for canonicalizationās sake.ā If I can prove that the signbit is zero, I could replace uitofp with sitofp and it would have the same effect as the new flag you are proposing for the purposes of lowering in the backend.
But as I was thinking about what the canonical form in this case should be, it led me to a realization that answers my question above (and I apologize if this is what you were saying and I just misunderstood). If we had the flag you are proposing, sitofp i64 %x to float and uitofp nneg i64 %x to float would be equivalent. I was focused on the uitofp form of this, which we know will always return a positive value regardless of the nneg flag, but I just realized that we donāt know whether sitofp will produce a positive or negative value, unless we backtrack and look at value analysis for the input operand.
I think this is a reasonable extension ā given that we already have zext nneg this is a very simple addition, so itās reasonable to do it even if the value is not super high.