Understand assumptions towards uninitialized variables on stack

div_code · November 20, 2018, 10:04pm

Hello list,

Hope this is the right place to post this question. So I am writing to understand assumptions made by Clang, in terms of the values of uninitialized variables on the stack.

My observation is that when Clang compiles the following piece of code without any optimization, the assembly code will check the path condition, and assign variable t with whatever keeps on the stack, which seems pretty reasonable to me.

int main() {

        char a[20];
        char* p = a;
        int t;
        if (p) {
                t = p[7];
        }

        return t;
}

On the other hand, when optimized with -O2, the whole if condition is gone, and t is assigned with zero (i.e., xor eax, eax), then returned at the end of the main function.

So directly reading from the uninitialized variables are considered to be “undefined behavior”. And as far as I can see, compiler shouldn’t make any assumption on the that, right? My test environment is 64-bit Ubuntu 18.04 with Clang version 5.0.

I am trying to understand whether clang -O2 utilizes some analysis to make sure the initial value of stack variables must be zeroed. At least so far from the assembly code and the enabled compiler options by -O2 I didn’t figure out such tricks. Am I missed anything here?

Thanks,
Irene

Andrea_Bocci · November 21, 2018, 8:26am

Hi Irene,
I am not an expert, but here is my interpretation: “undefined behaviour” means that the behaviour observed by the person running the compiled program is not dictated by the C++ standard, and thus the compiler is free to do whatever it wants.

Setting the value of an uninitialised variable to zero, or to any arbitrary number, or to a random number, or not setting it at all, would be acceptable behaviours for “undefined behaviour”.

Ciao,
.Andrea

div_code · November 21, 2018, 8:48am

Hello Andrea,

Thank you for the clarification. It makes a lot of sense to me.

On the other hand, I am trying to understand the “inconsistency” regarding different optimization levels, with respect to this undefined behaviour. So basically this is how I executed the presented code on my machine:

➜  code clang-5.0 -g -O0 test.c
➜  code ./a.out
➜  code echo $?
192
➜  code clang-5.0 -g -O2 test.c
➜  code ./a.out
➜  code echo $?
0
➜  code

So for the -O2 case since t is zeroed, the return value will be zero in anyway. In contrast, for -O0 the return value seems un-predictable. IMHO the inconsistency makes a lot of additional effort and perhaps is not preferred, but I guess that’s eventually the programmer’s responsibility to solve that?

Overall, from the assembly code generated from clang -O2 (attached below), uninitialized variables on the stack is assumed to be zero due to some reason, and I am writing to inquire the motivation/analysis behind.

0000000000400490 <main>:
  400490:       31 c0                   xor    %eax,%eax
  400492:       c3                      retq
  400493:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40049a:       00 00 00
  40049d:       0f 1f 00                nopl   (%rax)

Sincerely,
Irene

pogo59 · November 21, 2018, 3:03pm

IMHO the inconsistency makes a lot of additional effort and perhaps is
not preferred, but I guess that's eventually the programmer's
responsibility to solve that?

Correct. The program has undefined behavior, and it is the programmer's
responsibility to solve that. The Undefined Behavior Sanitizer would
reveal the problem immediately.

uninitialized variables on the stack is assumed to be zero due to
some reason,

That is not exactly what happened. The assignment is from uninitialized
memory, which will have an unknown value. Because the value is unknown,
the assignment can be optimized to avoid a read from memory, and
substitute any convenient value, without perturbing any defined property
of the program. The most convenient value to use here is zero.

This is a different sequence of reasoning than what you suggested, which
is more like this: The stack values are assumed to be zero, therefore
we can use value propagation to assign the value zero instead of reading
memory with a known value.

I agree that the net effect here is the same, but the reasoning is
important for correct understanding of the program's semantics.
--paulr

dblaikie · November 21, 2018, 3:34pm

One particular point is: "In contrast, for -O0 the return value seems un-predictable. "

Not entirely true - and if you were writing this code to intentionally get a unpredictable value (to seed a random number generator etc) - that’s a security problem (has been a bug in crypto libraries etc where they’ve used similar techniques and eventually the compiler breaks them - or people find ways to compromise the source of the data (by writing specific values to stack variables elsewhere in the program making the values more predictable)

sberg · November 21, 2018, 4:58pm

But why bother to come up with a specific value at all, why not drop the "xorl %eax, %eax" completely and use whatever value is present in %eax?

Eli_Friedman · November 21, 2018, 8:54pm

The optimizer isn't specifically trying to catch this case. It just runs a series of transforms which assume that the behavior is defined, and some of those transforms constrain the behavior of the function. This eventually leads to generating an xor which wasn't necessary for the original function. If you're curious about what happens in this particular case, you can use "-mllvm -print-after-all" to see how various transforms change the IR.

-Eli

Topic		Replies	Views
(no subject) Clang Frontend	3	80	August 5, 2015
RFC-0x000 - Should (automatic) local variables be initialized to 0 for the programmer? Clang Frontend	8	98	May 20, 2011
[clang]Assigning an uninitialized array to another array produces undefined behavior with optimization -O1 Code Generation clang , llvm	2	197	August 15, 2023
[RFC] automatic variable initialization Clang Frontend	75	284	April 10, 2019
Clang warnings on uninitialized variables Clang Frontend	18	2129	May 7, 2023

Understand assumptions towards uninitialized variables on stack

Related topics