Mem2reg: load before single store

Hi all!

While playing with LLVM, I’ve found a weird behavior in mem2reg pass.

When optimizing single stores, undefined value is placed before any load preceding the store (based on basicblock’s ordering and simple dominator analysis, if I remember correctly).

This is the line that is responsible for the behavior: (LLVM9 does the same)

https://llvm.org/doxygen/PromoteMemoryToRegister_8cpp_source.html#l00629

A problem arises, and I am not sure if it is really a problem or just weird C-compliant behavior.

int a; // or, equally, int a=0;

int main(){
int b;
if (b) // (*)
b=a;
if (b)
printf(“This will be called”);
}

The first load of variable b, before the single store (the first branching) is replaced by undef, so the second branch will be replaced by a phi node, if the (*) branch is taken, the value is 0, else undef.

I’m concerned that this is an LLVM bug.

Reproduction:

clang -S -emit-llvm test.c
opt -mem2reg test.ll

I’m not at the computer right now, so I cannot show the exact generated code.

Radnai László

Hi again!

The initial code: (test.c)

#include <stdio.h>

int a; // or, equally, int a=0;

int main(){
int b;
if (b) // (*)
b=a;
if (b)
puts(“This will be called”);
}

In LLVM IR:

define dso_local i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
%3 = load i32, i32* %2, align 4
%4 = icmp ne i32 %3, 0
br i1 %4, label %5, label %7

5: ; preds = %0
%6 = load i32, i32* @a, align 4
store i32 %6, i32* %2, align 4
br label %7

7: ; preds = %5, %0
%8 = load i32, i32* %2, align 4
%9 = icmp ne i32 %8, 0
br i1 %9, label %10, label %12

10: ; preds = %7
%11 = call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str, i64 0, i64 0))
br label %12

12: ; preds = %10, %7
%13 = load i32, i32* %1, align 4
ret i32 %13
}

After optimizing: clang -S -emit-llvm test.c -O0 -o - | sed 's/optnone//g' | opt asd.ll -mem2reg | llvm-dis
(Note: optnone is removed to have mem2reg optimize the main function)

define dso_local i32 @main() #0 {
%1 = icmp ne i32 undef, 0 ; ← undef!!
br i1 %1, label %2, label %4

2: ; preds = %0
%3 = load i32, i32* @a, align 4
br label %4

4: ; preds = %2, %0
%.0 = phi i32 [ %3, %2 ], [ undef, %0 ] ; ← undef!!
%5 = icmp ne i32 %.0, 0
br i1 %5, label %6, label %8

6: ; preds = %4
%7 = call i32 @puts(i8* getelementptr inbounds ([20 x i8], [20 x i8]* @.str, i64 0, i64 0))
br label %8

8: ; preds = %6, %4
ret i32 0
}

My problem is the two annotated “undef!!” comments. They are the same undef value, and, by decoupling these, the program changes.
I’m not really sure, because the C standard has some really weird undefined behaviour definitions.
Also, I don’t know how LLVM “undef” should work. If “undef” is an unspecified-like value, not an undefined-behavior-causing value, then it could also be a bug in LLVM.

Please reach back if you need more information.

László

At this line, you invoke undefined behavior by reading the value of “b”, before it’s been initialized. At this point, the compiler may do whatever it likes.

A problem arises, and I am not sure if it is really a problem or just
weird C-compliant behavior.

int a; // or, equally, int a=0;

int main(){
  int b;
  if (b) // (*)

At this line, you invoke undefined behavior by reading the value of "b",
before it's been initialized. At this point, the compiler may do whatever
it likes.

FTR, it is *not* UB to read the value of b, you will read undef which is totally fine. It is however UB to branch on undef.
At that point we can stop discussing what "happens next".

~ Johannes

A small note:

Definitions

The C11 draft (or whatever I found; link) says it is the “use” of the uninitialized value:

Under the Undefined Behavior section:
i) “- The value of an object with automatic storage duration is used while it is indeterminate” (see indeterminate value later)

Where indeterminate value is defined as “either an unspecified value or a trap representation”…

On the other hand, at the definition of the term “Unspecified behavior”, the following is the definition:
ii) “unspecified behavior: use of an unspecified value, or other behavior […]”

Consequences?

It seems to me that these two statements/definitions (i) and (ii) contradict each other, however, neither one is stricter than the other one…

So, we use an object. It can be with or without automatic storage duration. The object can hold unspecified value or can be a trap representation. The question is, using the value is unspecified or undefined?

  1. Based on this: the behavior is unspecified, if the object “used” is without automatic storage duration.

  2. The value is undefined, if an object is “used” with automatic storage duration, which has a trap representation.

So, if I understand correctly, an object with automatic storage duration has an unspecified value before the first modification (I’ve yet to find an explicit statement of this, however, it seems reasonable…). Which of the two contradicting statements holds? Is it unspecified or undefined?

László

Hi,
6.7.9.10 says:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

Under the Undefined Behavior section:
i) “- The value of an object with automatic storage duration is used while it is indeterminate” (see indeterminate value later)

Combined with this, we can say that the source program has UB.

Juneyoung

Combined with this, we can say that the source program has UB.

Ah, thanks, I’ve missed that point. So it is undefined behavior.

On the other hand, at the definition of the term “Unspecified behavior”, the following is the definition:

ii) “unspecified behavior: use of an unspecified value, or other behavior […]”

So I can also say that it is unspecified behavior, as it is an unspecified value (I guess, not a trap representation?) that is used.
That’s a contradiction, which is my point :slight_smile:

László