According to its definition in the LLVM IR reference;

"The string ‘undef‘ can be used anywhere a constant is expected..."

Am I correct to say that undef * 0 = 0 following this definition?

Yes, this is correct. The optimizer can insert any value for 'undef', so this transformation is correct for multiple reasons.

(a) X * 0 = 0 is correct for any X because of arithmethic rules, so you don't even need 'undef' for X.
(b) undef * X = 0 is correct as well by choosing 0 for undef. Your example is a special case of this by setting X = 0.

vi undeftimes.ll # change the 23 to undef
llc undeftimes.ll

less undeftimes.s

Personally, I find ARM (or other RISC) code far easier to follow than x86, so I generally add a “-march=arm” to the llc step if I’m on an x86 host, but that’s personal preference.

foo:
push {r0}

mov r0, #0
add sp, sp, #4
mov pc, lr

Yep … it just returns 0.

Idle question, if anyone is reading still … how do you get llc to do -Os or -Oz? The docs say the argument must be an integer, and anything other than 0…3 is rejected. (in fact … bug report … 10 thru 39 are also silently accepted as are 100 thru 399 etc)

You can easily see what it actually does in cases like this.

echo 'int foo(int a){return 23 * a;}' >undeftimes.c
clang -S -emit-llvm undeftimes.c
vi undeftimes.ll # change the 23 to undef
llc undeftimes.ll
less undeftimes.s

Personally, I find ARM (or other RISC) code far easier to follow than x86,
so I generally add a "-march=arm" to the llc step if I'm on an x86 host,
but that's personal preference.

Or even `opt -S -O3 undeftimes.ll` to see what the (full) optimizer pipeline does instead of just what llc does (llc only does a limited set of transformations).

I don’t know of a way to do it from the command-line, but if you’re willing to change the IR, you can add the optsize (for -Os) or minsize (for -Oz) IR attribute to the function you’re compiling.

I think you're looking at "undef" the wrong way. "Undef" is not a value, "undef" is a placeholder where the compiler can choose an arbitrary value at every occurence of "undef".

You could choose different values for "undef" in your example, so the law of distribution holds:

You're right that in LLVM IR arithmetic (with the current definition
of `undef`) is not distributive. You can't replace `A * (B + C)` with
`A * B + A * C` in general, since (exactly as you said) for A =
`undef`, B = `1`, C = `-1` the former always computes `0` while the
latter computes `undef`. This is fundamentally because replacing `A *
(B + C)` with `A * B + A * C` increases the number of uses of `A`
(which is `undef`), and thus injects more behavior into the program.
I _think_ going from `A * B + A * C` to `A * (B + C)` is okay though.

Here is a simpler example of a similar issue: `X - X` is not
equivalent `0`, in that `0` cannot be replaced with `X - X`, even
though `X - X` can be folded to `0`.

Sanjoy Das wrote:
> Here is a simpler example of a similar issue: `X - X` is not
> equivalent `0`, in that `0` cannot be replaced with `X - X`, even
> though `X - X` can be folded to `0`.

I forgot to state that `X` above is some unknown value which can
potentially be `undef`. A full example would be:

define i32 @f_0(i32 %x) {
ret i32 0
}
define i32 @f_1(i32 %x) {
%v = sub i32 %x, %x
ret i32 %v
}

`@f_1` can be optimized to `@f_0` but not vice versa, unless you
somehow know that `%x` cannot be `undef`.

Another example of unsound transformation on Boolean algebra.

According to the LLVM documentation (http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to consider ’ a & undef = undef ’ and ’ a | undef = undef ’ but ‘undef xor undef = undef’ is safe.

Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are undef expressions, the value must be specialized to some constant (e.g. 0).

But, if the expression is transformed to $a xor b$ then it may return ‘undef’.

As a result, the transformation ’ ((a & (~b)) |((~a) & b)) ~> xor a b ’ is unsound. LLVM presently performs this transformation.

Another example of unsound transformation on Boolean algebra.

According to the LLVM documentation
(http://llvm.org/docs/LangRef.html#undefined-values) it is unsafe to
consider ' a & undef = undef ' and ' a | undef = undef ' but 'undef xor
undef = undef' is safe.

Now, given an expression ((a & (~b)) | ((~a) & b)) where a and b are undef
expressions, the value must be specialized to some constant (e.g. 0).

But, if the expression is transformed to $a xor b$ then it may return
'undef'.

As a result, the transformation ' ((a & (~b)) |((~a) & b)) ~> xor a b '
is unsound. LLVM presently performs this transformation.

Soham Chakraborty wrote:
> As a result, the transformation ' ((a& (~b)) |((~a)& b)) ~> xor a b '
> is unsound. LLVM presently performs this transformation.

This transform looks fine to me.

For simplicity let's look at an `i1` expression. Since these are
bitwise ops we can extend the reasoning below to iN.

Transform: ((A & (~B)) | ((~A) & B)) ~> A ^ B

Neither A nor B are undef:
Transform is correct by normal boolean algebra

Both A and B are undef:
LHS = (undef & undef) | (undef & undef) = undef // Since ~undef = undef
RHS = undef
Thus transform is correct.

A is undef but B is not undef:
LHS = ((undef & (~B)) | (undef & B)) // Since ~undef = undef

Now, by choosing 0 for undef in LHS, we can make LHS be equal to 0,
and by choosing 1 for undef in LHS, we can make LHS be equal to 1.
Therefore

LHS = undef
RHS = undef ^ B = undef

Thus the transform is correct.

A is not undef but B is undef:
Similar reasoning follows.

Both A and B are undef:
LHS = (undef & undef) | (undef & undef) = undef // Since ~undef = undef
RHS = undef
Thus transform is correct.

LLVM documentation () suggests that

it is unsafe to consider (a & undef = undef) and (a | undef = undef).

“As such, it is unsafe to optimize or assume that the result of the ‘and‘ is ‘undef‘. However, it is safe to assume that all bits of the ‘undef‘ could be 0, and optimize the ‘and‘ to 0. Likewise, it is safe to assume that all the bits of the ‘undef‘ operand to the ‘or‘ could be set, allowing the ‘or‘ to be folded to -1.”

As a result,

LHS = (undef & undef) | (undef & undef) = c_1 | c_2 where c_1 and c_2 are constants and as a result,

And finally, LHS = c where c is a constant; not undef.

Yes, but the documentation specifies correctly "a & undef”, which does not have the same properties as “undef & undef”.
While “a & undef” (for an arbitrary a) can’t be folded to undef, “undef & undef” can be folded to undef.

I get your point. I agree the transformation is correct.

As I understand, Instead of evaluating/folding (a & undef) and (~a & undef) individually, LLVM creates a^undef directly and then it is safe to evaluate a^undef to undef.