ConstantFold 'undef xor undef'

Hi,

At line 2292, lib/VMCore/ConstantFold.cpp (llvm2.7 release)

Constant *llvm::ConstantFoldBinaryInstruction(unsigned Opcode,
                                              Constant *C1, Constant *C2) {
  ...
  // Handle UndefValue up front.
  if (isa<UndefValue>(C1) || isa<UndefValue>(C2)) {
    switch (Opcode) {
    case Instruction::Xor:
      if (isa<UndefValue>(C1) && isa<UndefValue>(C2))
        // Handle undef ^ undef -> 0 special case. This is a common
        // idiom (misuse).
        return Constant::getNullValue(C1->getType());
      // Fallthrough
    case Instruction::Add:

This function folds ‘undef xor undef’ into 0(getNullValue) at this
case. At http://llvm.org/docs/LangRef.html#undefvalues, undef xor
undef can also be evaluating to undef.

/////////////////////////////
  %A = xor undef, undef
  [...]
Safe:
  %A = undef
  [...]

This example points out that two undef operands are not necessarily
the same. [...], but the short answer is that an undef "variable" can
arbitrarily change its value over its "live range". This is true
because the "variable" doesn't actually have a live range. Instead,
the value is logically read from arbitrary registers that happen to be
around when needed, so the value is not necessarily consistent over
time.
/////////////////////////////

Which semantics is better? I guess both are fine because if we assume
these two def's are same, then it is 0 as
'ConstantFoldBinaryInstruction', while if we assume they are different
then it is equal to undef. But the second case seems to include the
first one. If we let undef xor undef to be undef, later we can use
this undef as 0, but also other values w.r.t contexts. Is there any
reason that ConstantFoldBinaryInstruction uses the first assumption?

Thanks.

The right answer is that undef ^ undef = undef. Folding it to 0 is a conservatively correct approximation of undef. This is done because (annoyingly) a lot of people write things like this:

int x;
x = x^x;

As a "clever" way of clearing out x, particularly for vectors which don't have a convenient 0 literal. This is nonsense, but common enough to try to not completely break.

-Chris

Which semantics is better? I guess both are fine because if we assume
these two def's are same, then it is 0 as
'ConstantFoldBinaryInstruction', while if we assume they are different
then it is equal to undef. But the second case seems to include the
first one. If we let undef xor undef to be undef, later we can use
this undef as 0, but also other values w.r.t contexts. Is there any
reason that ConstantFoldBinaryInstruction uses the first assumption?

The right answer is that undef ^ undef = undef. Folding it to 0 is a conservatively correct approximation of undef. This is done because (annoyingly) a lot of people write things like this:

int x;
x = x^x;

As a "clever" way of clearing out x, particularly for vectors which don't have a convenient 0 literal. This is nonsense, but common enough to try to not completely break.

Does this also apply to two different variables? say
   int z x y;
   z = x ^ y;
If ConstantFoldBinaryInstruction also folds x ^ y into z, should this
pass (which uses ConstantFold) also initialize x and y with a same
initial value? Otherwise at runtime z may not be 0.

Which semantics is better? I guess both are fine because if we assume
these two def's are same, then it is 0 as
'ConstantFoldBinaryInstruction', while if we assume they are different
then it is equal to undef. But the second case seems to include the
first one. If we let undef xor undef to be undef, later we can use
this undef as 0, but also other values w.r.t contexts. Is there any
reason that ConstantFoldBinaryInstruction uses the first assumption?

The right answer is that undef ^ undef = undef. Folding it to 0 is a conservatively correct approximation of undef. This is done because (annoyingly) a lot of people write things like this:

int x;
x = x^x;

As a "clever" way of clearing out x, particularly for vectors which don't have a convenient 0 literal. This is nonsense, but common enough to try to not completely break.

Does this also apply to two different variables? say
int z x y;
z = x ^ y;
If ConstantFoldBinaryInstruction also folds x ^ y into z, should this
pass (which uses ConstantFold) also initialize x and y with a same
initial value? Otherwise at runtime z may not be 0.

I guess my question is what variables can have undef values. In my
understanding, only uninitialized locals and globals can be undefined,
so compilers are free to assign them at runtime, if
ConstantFoldBinaryInstruction ensures that x and y can only be locals
and globals. But function parameters cannot be undefined, which should
be arbitrary symbols, depending on concrete arguments. Is this
correct?

I don't really understand your question here. Are you asking about C or LLVM IR, or something else?

-Chris

Does this also apply to two different variables? say
int z x y;
z = x ^ y;
If ConstantFoldBinaryInstruction also folds x ^ y into z, should this
pass (which uses ConstantFold) also initialize x and y with a same
initial value? Otherwise at runtime z may not be 0.

I guess my question is what variables can have undef values. In my
understanding, only uninitialized locals and globals can be undefined,
so compilers are free to assign them at runtime, if
ConstantFoldBinaryInstruction ensures that x and y can only be locals
and globals. But function parameters cannot be undefined, which should
be arbitrary symbols, depending on concrete arguments. Is this
correct?

I don't really understand your question here. Are you asking about C or LLVM IR, or something else?

Sorry for the confusion. Does ConstantFoldBinaryInstruction also fold
x ^ y into 0 if both x and y are undef? I can understand why x ^ x can
be 0 if x is undef, but if x and y are different variables, that x ^ y
= 0 must reply on the assumption that x and y need to be initialized
with the same value.

That's a poorly formed question. ConstantFoldBinaryInstruction doesn't do any real analysis - if there is a symbolic operand, it doesn't do the fold. It only folds when the two operands *are* undef. It has no idea if they came from the "same" variable or not.

-Chris