undef * 0

What is the value of undef * 0 in LLVM?

According to its definition in the LLVM IR reference;

"The string ‘undef‘ can be used anywhere a constant is expected..."

Am I correct to say that undef * 0 = 0 following this definition?

Best Regards,
soham

What is the value of undef * 0 in LLVM?

According to its definition in the LLVM IR reference;

"The string ‘undef‘ can be used anywhere a constant is expected..."

Am I correct to say that undef * 0 = 0 following this definition?

Yes, this is correct. The optimizer can insert any value for 'undef', so this transformation is correct for multiple reasons.

(a) X * 0 = 0 is correct for any X because of arithmethic rules, so you don't even need 'undef' for X.
(b) undef * X = 0 is correct as well by choosing 0 for undef. Your example is a special case of this by setting X = 0.

-Manuel

llvm thinks so.

You can easily see what it actually does in cases like this.

echo ‘int foo(int a){return 23 * a;}’ >undeftimes.c

clang -S -emit-llvm undeftimes.c

vi undeftimes.ll # change the 23 to undef
llc undeftimes.ll

less undeftimes.s

Personally, I find ARM (or other RISC) code far easier to follow than x86, so I generally add a “-march=arm” to the llc step if I’m on an x86 host, but that’s personal preference.

foo:
push {r0}

mov r0, #0
add sp, sp, #4
mov pc, lr

Yep … it just returns 0.

Idle question, if anyone is reading still … how do you get llc to do -Os or -Oz? The docs say the argument must be an integer, and anything other than 0…3 is rejected. (in fact … bug report … 10 thru 39 are also silently accepted as are 100 thru 399 etc)

llvm thinks so.

You can easily see what it actually does in cases like this.

echo 'int foo(int a){return 23 * a;}' >undeftimes.c
clang -S -emit-llvm undeftimes.c
vi undeftimes.ll # change the 23 to undef
llc undeftimes.ll
less undeftimes.s

Personally, I find ARM (or other RISC) code far easier to follow than x86,
so I generally add a "-march=arm" to the llc step if I'm on an x86 host,
but that's personal preference.

Or even `opt -S -O3 undeftimes.ll` to see what the (full) optimizer pipeline does instead of just what llc does (llc only does a limited set of transformations).

@Manuel Jacob and Bruce Hoult

Thanks a lot for your answer.

Best Regards,
soham

I don’t know of a way to do it from the command-line, but if you’re willing to change the IR, you can add the optsize (for -Os) or minsize (for -Oz) IR attribute to the function you’re compiling.

Hi,

Back to the original question - undef * 0 = 0 may have some issues.

Consider the evaluation of the expression undef * (1 - 1)

(1) undef * (1 - 1) ~> undef * 0 ~> 0 (as suggested)

(2) undef * (1 - 1) ~> undef * 1 + undef * (-1) ~> undef + undef ~> undef
and now the evaluation of undef can be some other value than 0.

Thus it does not follow the law of distribution of multiplication over
addition property and the algebraic transformations are ambiguous.

Similarly associativity of boolean algebra is also violated as follows:

undef & a & !a
(1) ~> (undef & a) & !a ~> undef & !a ~> undef
(2) ~> undef (a & !a) ~> undef & 0 ~> 0

Is there any clear set of rules how LLVM handles such cases?

Best Regards,
soham

I think you're looking at "undef" the wrong way. "Undef" is not a value, "undef" is a placeholder where the compiler can choose an arbitrary value at every occurence of "undef".

You could choose different values for "undef" in your example, so the law of distribution holds:

undef * (1 - 1) ~> undef * 1 + undef * (-1) ~> 0 * 1 + 0 * (-1) ~> 0 + 0 = 0

or even

undef * (1 - 1) ~> undef * 1 + undef * (-1) ~> undef + undef ~> undef (same as yours so far) ~> 0

Here's a blog post which describes "undef" in more detail, also describing a case where it inhibits optimization: Redirecting…

Hi Soham,

You're right that in LLVM IR arithmetic (with the current definition
of `undef`) is not distributive. You can't replace `A * (B + C)` with
`A * B + A * C` in general, since (exactly as you said) for A =
`undef`, B = `1`, C = `-1` the former always computes `0` while the
latter computes `undef`. This is fundamentally because replacing `A *
(B + C)` with `A * B + A * C` increases the number of uses of `A`
(which is `undef`), and thus injects more behavior into the program.
I _think_ going from `A * B + A * C` to `A * (B + C)` is okay though.

Here is a simpler example of a similar issue: `X - X` is not
equivalent `0`, in that `0` cannot be replaced with `X - X`, even
though `X - X` can be folded to `0`.

-- Sanjoy

Sanjoy Das wrote:
> Here is a simpler example of a similar issue: `X - X` is not
> equivalent `0`, in that `0` cannot be replaced with `X - X`, even
> though `X - X` can be folded to `0`.

I forgot to state that `X` above is some unknown value which can
potentially be `undef`. A full example would be:

define i32 @f_0(i32 %x) {
   ret i32 0
}

define i32 @f_1(i32 %x) {
   %v = sub i32 %x, %x
   ret i32 %v
}

`@f_1` can be optimized to `@f_0` but not vice versa, unless you
somehow know that `%x` cannot be `undef`.

-- Sanjoy

Thanks for your answers.

Another example of unsound transformation on Boolean algebra.

According to the LLVM documentation (http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to consider ’ a & undef = undef ’ and ’ a | undef = undef ’ but ‘undef xor undef = undef’ is safe.

Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are undef expressions, the value must be specialized to some constant (e.g. 0).

But, if the expression is transformed to $a xor b$ then it may return ‘undef’.

As a result, the transformation ’ ((a & (~b)) |((~a) & b)) ~> xor a b ’ is unsound. LLVM presently performs this transformation.

Best Regards,
soham

source

Thanks for your answers.

Another example of unsound transformation on Boolean algebra.

According to the LLVM documentation
(http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to
consider ' a & undef = undef ' and ' a | undef = undef ' but 'undef xor
undef = undef' is safe.

Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are undef
expressions, the value must be specialized to some constant (e.g. 0).

But, if the expression is transformed to $a xor b$ then it may return
'undef'.

As a result, the transformation ' ((a & (~b)) |((~a) & b)) ~> xor a b '
is unsound. LLVM presently performs this transformation.

Best Regards,
soham

source

Hi Soham,

Soham Chakraborty wrote:
> As a result, the transformation ' ((a& (~b)) |((~a)& b)) ~> xor a b '
> is unsound. LLVM presently performs this transformation.

This transform looks fine to me.

For simplicity let's look at an `i1` expression. Since these are
bitwise ops we can extend the reasoning below to iN.

Transform: ((A & (~B)) | ((~A) & B)) ~> A ^ B

Neither A nor B are undef:
   Transform is correct by normal boolean algebra

Both A and B are undef:
   LHS = (undef & undef) | (undef & undef) = undef // Since ~undef = undef
   RHS = undef
   Thus transform is correct.

A is undef but B is not undef:
   LHS = ((undef & (~B)) | (undef & B)) // Since ~undef = undef

   Now, by choosing 0 for undef in LHS, we can make LHS be equal to 0,
   and by choosing 1 for undef in LHS, we can make LHS be equal to 1.
   Therefore

   LHS = undef
   RHS = undef ^ B = undef

   Thus the transform is correct.

A is not undef but B is undef:
   Similar reasoning follows.

-- Sanjoy

Hi,

Both A and B are undef:
LHS = (undef & undef) | (undef & undef) = undef // Since ~undef = undef
RHS = undef
Thus transform is correct.

LLVM documentation () suggests that

it is unsafe to consider (a & undef = undef) and (a | undef = undef).

“As such, it is unsafe to optimize or assume that the result of the ‘and‘ is ‘undef‘. However, it is safe to assume that all bits of the ‘undef‘ could be 0, and optimize the ‘and‘ to 0. Likewise, it is safe to assume that all the bits of the ‘undef‘ operand to the ‘or‘ could be set, allowing the ‘or‘ to be folded to -1.”

As a result,

LHS = (undef & undef) | (undef & undef) = c_1 | c_2 where c_1 and c_2 are constants and as a result,

And finally, LHS = c where c is a constant; not undef.

Please correct me if I am missing something.

Best Regards,

soham

Yes, but the documentation specifies correctly "a & undef”, which does not have the same properties as “undef & undef”.
While “a & undef” (for an arbitrary a) can’t be folded to undef, “undef & undef” can be folded to undef.

I get your point. I agree the transformation is correct.

As I understand, Instead of evaluating/folding (a & undef) and (~a & undef) individually, LLVM creates a^undef directly and then it is safe to evaluate a^undef to undef.

Thanks to all for the discussion.

Best Regards,

soham