UB and known bits

On the subject of undefined behavior and known bits, as I'm sure some of you are aware, code in ValueTracking.cpp is exploiting poison value rules to get a bit of extra precision in the known bits. These rules fire on examples like the ones below. Do we have a set of rules that clients of known bits need to follow to avoid unsoundness? I remember Nuno and/or David Majnemer saying something about this but I don't have it handy.

John

%0:i32 = var
%1:i32 = lshr %0, 1:i32
%2:i32 = addnw 1:i32, %1
infer %2

known from Souper: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
known from compiler: 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
llvm is stronger

%0:i32 = var (000000000000000xxxxxxxxxxxxxxxxx)
%1:i32 = and 65535:i32, %0
%2:i16 = var
%3:i32 = zext %2
%4:i32 = mulnw %1, %3
infer %4

known from Souper: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
known from compiler: 0xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
llvm is stronger

We used to have a serious problem with isSafeToSpeculativelyExecute utilizing computeKnownBits with division. It’d try to prove the denominator is non-zero to avoid division by zero but get tricked when it was faced with a denominator whose value is produced by a lshr exact and reason that the range of the operation must be constrained within some bound. However, the lshr could produce any value because it could have been fed in undef or poison for either/both operands thus rendering the constraint in computeKnownBits useless. I fixed this not too long ago…

The only remaining “interesting” use is SimplifyDemandedUseBits. It is not far-fetched that we would remove an ‘or’ because we’d think it would be redundant due to computeKnownBits reasoning that undefined behavior would result otherwise. However, the bit in question might actually be undef with the ‘or’ being the only remaining operation keeping an undef value from being replaced with zero.

My personal opinion here is that we create two different interfaces into computeKnownBits: one which reasons that operating on undef is OK and one which is not comfortable with operating on undef.

The one uncomfortable with operating on undef would be ideal for stuff like isSafeToSpeculativelyExecute while the other would be great for isKnownToBeAPowerOfTwo.

My personal opinion here is that we create two different interfaces into computeKnownBits: one which reasons that
operating on undef is OK and one which is not comfortable with operating on undef.
The one uncomfortable with operating on undef would be ideal for stuff like isSafeToSpeculativelyExecute while the
other would be great for isKnownToBeAPowerOfTwo.

This sounds easy and like the right thing to do.

I'm working on a patch that fixes up some of the imprecisions, but so far I'm not doing any of the UB-based ones.

John