Is this undefined behavior optimization legal?

Hi,

I've found a test case where SelectionDAG is doing an undefined behavior
optimization, and I need help determining whether or not this is legal.

Here is the example IR:

define void @test(<4 x i8> addrspace(1)* %out, float %a) {
  %uint8 = fptoui float %a to i8
  %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8, i32 0
  store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
  ret void
}

Since %vec is a 32-bit vector, a common way to implement this function on a target
with 32-bit registers would be to zero initialize a 32-bit register to hold
the initial vector and then 'mask' and 'or' the inserted value with the
initial vector. In AMDGPU assembly it would look something like:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_and_b32 v1, v1, 0x000000ff
v_or_b32 v0, v0, v1

The optimization the SelectionDAG does for us in this function, though, ends
up removing the mask operation. Which gives us:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_or_b32 v0, v0, v1

The reason the SelectionDAG is doing this is because it knows that the result
of %uint8 = fptoui float %a to i8 is undefined when the result uses more than
8-bits. So, it assumes that the result will only set the low 8-bits, because
anything else would be undefined behavior and the program would be broken.
This assumption is what causes it to remove the 'and' operation.

So effectively, what has happened here, is that by inserting the result of
an operation with undefined behavior into one lane of a vector, we have
overwritten all the other lanes of the vector.

Is this optimization legal? To me it seems wrong that undefined behavior
in one lane of a vector could affect another lane. However, given that LLVM IR
is SSA and we are technically creating a new vector and not modifying the old
one, then maybe it's OK. I'm just not sure.

Appreciate any insight people may have.

Thanks,
Tom

From: "Tom Stellard via llvm-dev" <llvm-dev@lists.llvm.org>
To: llvm-dev@lists.llvm.org
Sent: Monday, October 3, 2016 3:51:40 PM
Subject: [llvm-dev] Is this undefined behavior optimization legal?

Hi,

I've found a test case where SelectionDAG is doing an undefined
behavior
optimization, and I need help determining whether or not this is
legal.

Here is the example IR:

define void @test(<4 x i8> addrspace(1)* %out, float %a) {
  %uint8 = fptoui float %a to i8
  %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8,
  i32 0
  store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
  ret void
}

Since %vec is a 32-bit vector, a common way to implement this
function on a target
with 32-bit registers would be to zero initialize a 32-bit register
to hold
the initial vector and then 'mask' and 'or' the inserted value with
the
initial vector. In AMDGPU assembly it would look something like:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_and_b32 v1, v1, 0x000000ff
v_or_b32 v0, v0, v1

The optimization the SelectionDAG does for us in this function,
though, ends
up removing the mask operation. Which gives us:

v_mov_b32 v0, 0
v_cvt_u32_f32_e32 v1, s0
v_or_b32 v0, v0, v1

The reason the SelectionDAG is doing this is because it knows that
the result
of %uint8 = fptoui float %a to i8 is undefined when the result uses
more than
8-bits. So, it assumes that the result will only set the low 8-bits,
because
anything else would be undefined behavior and the program would be
broken.
This assumption is what causes it to remove the 'and' operation.

So effectively, what has happened here, is that by inserting the
result of
an operation with undefined behavior into one lane of a vector, we
have
overwritten all the other lanes of the vector.

Is this optimization legal? To me it seems wrong that undefined
behavior
in one lane of a vector could affect another lane. However, given
that LLVM IR
is SSA and we are technically creating a new vector and not modifying
the old
one, then maybe it's OK. I'm just not sure.

Appreciate any insight people may have.

So, to be clear, for values of %a that are not undefined behavior (i.e. that really do produce an integer than can be represented in the i8), the code does indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out? If so, this seems legal to me.

-Hal

> From: "Tom Stellard via llvm-dev" <llvm-dev@lists.llvm.org>
> To: llvm-dev@lists.llvm.org
> Sent: Monday, October 3, 2016 3:51:40 PM
> Subject: [llvm-dev] Is this undefined behavior optimization legal?
>
> Hi,
>
> I've found a test case where SelectionDAG is doing an undefined
> behavior
> optimization, and I need help determining whether or not this is
> legal.
>
> Here is the example IR:
>
> define void @test(<4 x i8> addrspace(1)* %out, float %a) {
> %uint8 = fptoui float %a to i8
> %vec = insertelement <4 x i8> <i8 0, i8 0, i8 0, i8 0>, i8 %uint8,
> i32 0
> store <4 x i8> %vec, <4 x i8> addrspace(1)* %out
> ret void
> }
>
> Since %vec is a 32-bit vector, a common way to implement this
> function on a target
> with 32-bit registers would be to zero initialize a 32-bit register
> to hold
> the initial vector and then 'mask' and 'or' the inserted value with
> the
> initial vector. In AMDGPU assembly it would look something like:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_and_b32 v1, v1, 0x000000ff
> v_or_b32 v0, v0, v1
>
> The optimization the SelectionDAG does for us in this function,
> though, ends
> up removing the mask operation. Which gives us:
>
> v_mov_b32 v0, 0
> v_cvt_u32_f32_e32 v1, s0
> v_or_b32 v0, v0, v1
>
> The reason the SelectionDAG is doing this is because it knows that
> the result
> of %uint8 = fptoui float %a to i8 is undefined when the result uses
> more than
> 8-bits. So, it assumes that the result will only set the low 8-bits,
> because
> anything else would be undefined behavior and the program would be
> broken.
> This assumption is what causes it to remove the 'and' operation.
>
> So effectively, what has happened here, is that by inserting the
> result of
> an operation with undefined behavior into one lane of a vector, we
> have
> overwritten all the other lanes of the vector.
>
> Is this optimization legal? To me it seems wrong that undefined
> behavior
> in one lane of a vector could affect another lane. However, given
> that LLVM IR
> is SSA and we are technically creating a new vector and not modifying
> the old
> one, then maybe it's OK. I'm just not sure.
>
> Appreciate any insight people may have.

So, to be clear, for values of %a that are not undefined behavior (i.e. that really do produce an integer than can be represented in the i8), the code does indeed store <4 x i8> <i8 %uint8, i8 0, i8 0, i8 0> into *%out? If so, this seems legal to me.

That is correct. When there is no undefined behavior then the high 24-bits
(representing lanes 1, 2, 3) of the stored value are always 0.

-Tom

Isn’t undefined behavior in a program that all the program is undefined?
I’m not sure why you think that there should be a limit to what the optimizer can do specifically on the vector lane while we don’t put any limit usually.

There might be a question about your fptoui conversion here though: is it guarantee to write zero to the upper bits of the 32bits register?
In the IR it produces an i8 value, and insert it in a vector. It isn’t clear to me which combine / transformation knows that the fptoui will zero the upper part of the register.

This assumption is what causes it to remove the ‘and’ operation.

CMIIW, this assumption appears to be flawed. Initialization values are escaping side-effects and removing them is making a correct program incorrect.

-Kevin

The way insertelement is defined, inserting an element never affects the other elements of the vector ("…") So the question is whether you’re triggering undefined behavior in some other way. Looking at LangRef for fptoui, it says “If the value cannot fit in, the results are undefined”, i.e. the value is equivalent to the constant “undef”. Therefore, you should end up storing “<4 x i8> <undef, 0, 0, 0>”, not “<4 x i8> undef”. Note that there’s a tradeoff here: saying that fptoui for out-of-range values doesn’t have undefined behavior allows us to simplify control flow and hoist operations more aggressively. -Eli –