ComputeMaskedBits Bug

In tracking down an alignment bug, I think I found a problem in
ComputeMaskedBits. But I am not very well versed in this area of LLVM and
need some more eyes. This is in the 2.3 release, though it looks like the
relevant pieces operate the same way in trunk.

I have the following add recurrence:

  %r849 = select i1 %r848, i64 0, i64 %r847 ; <i64> [#uses=10]
  %r862 = shl i64 %r849, 3 ; <i64> [#uses=3]
  %"$SR_S112.0" = phi i64 [ %r862, %"file alignment.f, line 79, in reduction
loop at depth 0, bb64" ], [ %r1874, %"file alignment.f, line 79, in loop at
depth 1, bb77" ] ; <i64> [#uses=2]
  %r1874 = add i64 %r851, %"$SR_S112.0" ; <i64> [#uses=1]
  %r849 = select i1 %r848, i64 0, i64 %r847 ; <i64> [#uses=10]
  %r851 = shl i64 %r849, 6 ; <i64> [#uses=16]

It's unimportant what the values from the two selects are -- we are deep
enough into ComputeMaskedBits at this point that the Depth threshold takes
hold and we return "unknown" for those instructions.

So the recurrence looks like:

SR_S112.0 = phi(r849 << 3, sh r849 << 6 + SR_S112.0)

ComputeMaskedBits correctly identifies the recurrence and sets
L = %r851 = shl i64 %r849, 6 ; <i64> [#uses=16]
R = %r862 = shl i64 %r849, 3

ComputeMaskedBits on R returns KnownZero as 0x7, as expected.

The PHI-analysis code in ComputeMaskedBits sets Mask2 to 0x7. It then calls
itself recursively on L passing Mask2 as the mask.

Here's where the problem creeps in. ComputeMaskedBits returns KnownZero as
0x3f for L. Shouldn't the passed Mask limit it to returning 0x7 since the
comment at the top of ComputeMaskedBits says:

"This code only analyzes bits in Mask, in order to short-circuit processing."

The Shl code for ComputeMaskedBits shifts the incoming Mask to the right by
the (constant) left shift amount. Presumably it does this because we know
those bits will be zero after the Shl and it wants to analyze fewer bits. The
operand of this Shl (and the other one, too) has an unknown number of zero
bits. So ComputeMaskedBits says we know the output has the lower six bits set
to zero, which is correct.

The PHI code for ComputeMaskedBits then does this:

          KnownZero = Mask &
                      APInt::getLowBitsSet(BitWidth,
                                           KnownZero2.countTrailingOnes());

KnownZero2 is the output of the second call to ComputeMaskedBits (the Shl by
six in this case). Mask is the original mask coming into ComputeMaskedBits
with the PHI, which is all ones in this case.l

So the PHI ends up producing 0x3f as KnownZero.

I think this is wrong. We've got a PHI where one operand is known to have
three low zero bits and the other is known to have six low bits. I contend
that after the PHI we only know we have three low bits for sure.

Is my analysis correct? If so, is the PHI code the culprit (for not returning
the min of the KnownZero bits) or is the Shl code the culprit (for not paying
attention to the Mask passed in (it right shifts it)?

If my analysis is incorrect, can someone explain how we can know we have six
low zero bits coming out of this?

Thanks!

                                                   -Dave

David Greene wrote:

Is my analysis correct? If so, is the PHI code the culprit (for not returning
the min of the KnownZero bits) or is the Shl code the culprit (for not paying
attention to the Mask passed in (it right shifts it)?

I think your analysis is correct, and that Shl -- and many of the other operations (AShr, LShr, SExt, Add?, Call?) -- should be modified to always respect Mask.

Nick

After thinking about this some more, I'm not so sure. The Shl really DOES
result in six low zero bits, so it should return that. That PHI node has
semantics that it should take care of itself (taking the min of the known
zero and one bits).

It seems a little brittle to me to have the PHI pass its semantics down
to children via the Mask. It should just do what it knows is right in the
first place.

That said, there are many places that don't respect the Mask. Closer
reading of the comment leads me to believe the Mask is simply a
time-saving device, not a correctness-enforcing mechanism.

I've fixed the PHI analysis to do the min in our code and it fixes the
testcase I was working on. Doing a min like this would also allow us
to have PHI nodes compute known zero and one bits even when there
isn't a recurrence.

                                                    -Dave

I believe your analysis of the problem here is correct. Thanks for
finding this and figuring it out! If you can submit your fix, please
do so, otherwise let me know.

Dan

David Greene wrote:

David Greene wrote:

Is my analysis correct? If so, is the PHI code the culprit (for not
returning the min of the KnownZero bits) or is the Shl code the culprit
(for not paying attention to the Mask passed in (it right shifts it)?

I think your analysis is correct, and that Shl -- and many of the other
operations (AShr, LShr, SExt, Add?, Call?) -- should be modified to
always respect Mask.

After thinking about this some more, I'm not so sure. The Shl really DOES
result in six low zero bits, so it should return that. That PHI node has
semantics that it should take care of itself (taking the min of the known
zero and one bits).

It seems a little brittle to me to have the PHI pass its semantics down
to children via the Mask. It should just do what it knows is right in the
first place.

That said, there are many places that don't respect the Mask. Closer
reading of the comment leads me to believe the Mask is simply a time-saving device, not a correctness-enforcing mechanism.

That's fine, but if you fix it that way, please audit InstructionCombiner SimplifyDemandedBits, which I believe has the same bug.

I've fixed the PHI analysis to do the min in our code and it fixes the
testcase I was working on. Doing a min like this would also allow us
to have PHI nodes compute known zero and one bits even when there
isn't a recurrence.

Great! Did you commit a patch for this?

Nick

> That said, there are many places that don't respect the Mask. Closer
> reading of the comment leads me to believe the Mask is simply a
> time-saving device, not a correctness-enforcing mechanism.

That's fine, but if you fix it that way, please audit
InstructionCombiner SimplifyDemandedBits, which I believe has the same bug.

Ok.

> I've fixed the PHI analysis to do the min in our code and it fixes the
> testcase I was working on. Doing a min like this would also allow us
> to have PHI nodes compute known zero and one bits even when there
> isn't a recurrence.

Great! Did you commit a patch for this?

Not yet. :frowning: I am waiting for some paperwork on this end. Hopefully we only
have to go through this pain once and then I can be much more active. I've
been told it should be approved by the end of the month.

                                                       -Dave

Sorry for being out of touch. Has this been fixed?

Evan

I've been trying to test it but have had problems due to the llvm-gcc
bootstrap problems I mentioned last week.

It's had lots of testing over here so I'll just check it in today.

                                              -Dave

Done.

                                         -Dave