Uninitialized variable - question

Hello,

I was wondering about the case below. I tried to find any information in C standard, but I found nothing.
In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed.

What I found is that with -O2:
LLVM (trunk) prints both "a" and "b"
GCC (4.2) prints both "a" and "b"
GCC (trunk) prints "b" only.

As I said, I don't know what standard says here.

#include <stdio.h>

void f(int i) __attribute__((noinline));
void g(int i) __attribute__((noinline));

void f(int i) {
  if (i) printf("a\n");
}

void g(int i) {
  if (!i) printf("b\n");
}

int main() {
  int i;
  f(i);
  g(i);
}

- Kuba

Hi Jakub,

I was wondering about the case below. I tried to find any information in C standard, but I found nothing.
In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed.

I'm no C expert, but my understanding is that making any use of an uninitialized
variable results in totally undefined behaviour, where "totally" means anything
goes (eg erasing the contents of your hard-drive). From this point of view you
are lucky it only printed some funny numbers rather than sending a killer sound
pulse from your speakers into your brain (though trying to wrap your head
around C semantics may produce a similar brain curdling result!).

The situation in other languages is quite different, for example using an
uninitialized variable in Ada can result in funny effects but the language
standard carefully delimits just how far they can go, and it's not that far.
As LLVM's "undef" is modelled on C's extreme behaviour, getting correct Ada
semantics in LLVM IR is rather tricky and in fact I didn't solve it yet.

Ciao, Duncan.

Hello,

I was wondering about the case below. I tried to find any information in C standard, but I found nothing.
In this case, variable "i" is uninitialized, but it is the _same_ value passed as an argument, so only of "a" or "b" should be printed.

What I found is that with -O2:
LLVM (trunk) prints both "a" and "b"
GCC (4.2) prints both "a" and "b"
GCC (trunk) prints "b" only.

As I said, I don't know what standard says here.

Depends which Standard you read, and which part of it :slight_smile:

ISO C99, 6.2.6.1/5
  Certain object representations need not represent a value of the object type.
If the stored value of an object has such a representation and is read by an lvalue
expression that does not have character type, the behavior is undefined.

ISO C99 says, in 6.5/5 "If an exceptional condition occurs during
the evaluation of an expression (THAT IS, IF THE RESULT IS NOT
MATHEMATICALLY DEFINED ... ) the behaviour is undefined.
[Emphasis mine]

This suggests that the use of an unspecified value leads to undefined
behaviour in which case all the results you got are permitted.
Unfortunately the C Standard is not known for good wording or
consistency.

C99 attempted to over-constrain integer type representations (IMHO),
and the specifications are themselves not well defined and therefore not
normative. (This is what happens when the political driving forces
don't know any mathematics and barely understand an CS).

I haven't seen the C++11 wording on this. Hopefully they didn't
follow ISO C here.

Passing an uninitialized value as a function argument is undefined behaviour on the spot, regardless of what the callee does (even if it never references that argument).

That aside, there is no way that 'i' has the same value, since it has no value. This:
   int i;
   printf("%d\n", i);
is allowed to print two different values (or, applying the above rule, format your hard drive). The way the standard defines the behaviour that you see when reading a value is by stating that it must have the value that was last stored in it. When no value was stored, there are no rules to apply about what you get each time you look at it, and there is no other guarantee of a consistent value anywhere in the standard. This rule permits your program to print both 'a' and 'b', or neither.

I should mention that the above is for C++, and I don't have a copy of any of the standards handy, but I expect the rules to be the same for C and C++ here.

Nick

I think that the relevant part in C11 is section 6.2.6.1, which tells you that accessing a trap representation, _other than using a char type_, is undefined. Objects of automatic storage, which don't have an initializer are of indeterminate value, which either is an unspecified value or a trap representation.

What I found is that with -O2:
LLVM (trunk) prints both "a" and "b"

I can't reproduce this with r168538. I only get "a".

Regards,
Patrik Hägglund

I can reproduce if 'noinline' is dropped.

Dmitri

Passing an uninitialized value as a function argument is undefined behaviour on the spot, regardless of what the callee does (even if it never references that argument).

Cite reference? No? Then you're guessing :wink:

That aside, there is no way that 'i' has the same value, since it has no value.

This is definitely NOT correct in ISO C. It has an unspecified value,
and in C99 that may be a "trap value".

You state the rules generically but both C99 and C++ have special
rules for unsigned char, where use of an uninitialised value
is definitely not undefined.

I should mention that the above is for C++, and I don't have a copy of any of the standards handy, but I expect the rules to be the same for C and C++ here.

It's VERY unwise to make such assumptions regarding conformance
issues since C and C++ have completely distinct conformance models.
They also treat uninitialised variables distinctly: the rules in C++ were
constructed independently of ISO C rules, particularly as in C++ there
are classes with constructors etc to consider, and generalised rules
covering such cases as well as scalars and aggregates are likely
to be distinct and have different consequences in their details.

One must be aware that Standards are imperfect documents and often
specifications in one place are incomplete or even wrong, unless
some other place is also considered. You need to be a Standards guru
to really know where to find all the relevant clauses.

Even then, as I pointed out in my prior post on this topic, the Standard
itself can be inconsistent, or fail to achieve a normative requirement
despite the intent of the committee. This is the case with integer
representation rules in C99: it looks reasonable but is actually
non-normative gibberish. However the rules do have an impact,
and they have a very unfortunate impact in over-constraining
integer representations.

In particular if, by specification of your vendor, you have a full
twos complement representation of "int" all possible values of an uninitialised
int variable are valid ints and the behaviour of all mathematical operations
and copying is then specified by the usual rules: it's undefined only if there
is overflow, division by zero, or whatever.

On a 64 bit machine like x86_64 the usual representations of
integers are full, and therefore copying and other operations
are well defined (allowing for undefined behaviour on division
by zero etc).

In particular:

  int x[2]; int y[2];
  x[0]=y[0];
  x[1]=y[1];

is a perfectly valid way to copy a possibly incompletely
filled array provided int has a full representation.

Historically, C was a real mess, with the most traditional
copying of arrays of chars aliasing other values being undefined.

C++ did NOT follow C here. It invented its own, more consistent, set
of rules. Not sure about C++11 though.

Passing an uninitialized value as a function argument is undefined behaviour on the spot, regardless of what the callee does (even if it never references that argument).

Cite reference? No? Then you're guessing :wink:

This is a rule in C++ that I'm not sure also applies to C. The applicable text from N3376 is:

   "When a function is called, each parameter shall be initialized with its corresponding argument." [expr.call]/4

the initialization performs lvalue-to-rvalue conversion, which is what ultimately triggers explicit UB:

   "If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior." [conv.lval]/1

That aside, there is no way that 'i' has the same value, since it has no value.

This is definitely NOT correct in ISO C. It has an unspecified value,
and in C99 that may be a "trap value".

You state the rules generically but both C99 and C++ have special
rules for unsigned char, where use of an uninitialised value
is definitely not undefined.

Not to put words in your mouth, but I think you might have been trying to refer to this:

   "For unsigned character types, all possible bit patterns of the value representation represent numbers." [basic.fundamental]

? If so, I don't think that means quite what you think it means. Instead of changing the effect of being uninitialized (or having "indeterminate value" in C++ parlance) -- rather, it precludes the possibility of trap bits. Another way to look at it is that an unsigned char is safe to use to examine any byte (pointer aliasing rules dealt with elsewhere).

And again, I believe that C has effectively the same rules for unsigned char that C++ does, though I haven't a copy of the C standard handy to verify this (I'm on vacation).

I should mention that the above is for C++, and I don't have a copy of any of the standards handy, but I expect the rules to be the same for C and C++ here.

It's VERY unwise to make such assumptions regarding conformance
issues since C and C++ have completely distinct conformance models.
They also treat uninitialised variables distinctly: the rules in C++ were
constructed independently of ISO C rules, particularly as in C++ there
are classes with constructors etc to consider, and generalised rules
covering such cases as well as scalars and aggregates are likely
to be distinct and have different consequences in their details.

One must be aware that Standards are imperfect documents and often
specifications in one place are incomplete or even wrong, unless
some other place is also considered. You need to be a Standards guru
to really know where to find all the relevant clauses.

Even then, as I pointed out in my prior post on this topic, the Standard
itself can be inconsistent, or fail to achieve a normative requirement
despite the intent of the committee. This is the case with integer
representation rules in C99: it looks reasonable but is actually
non-normative gibberish. However the rules do have an impact,
and they have a very unfortunate impact in over-constraining
integer representations.

In particular if, by specification of your vendor, you have a full
twos complement representation of "int" all possible values of an uninitialised
int variable are valid ints and the behaviour of all mathematical operations
and copying is then specified by the usual rules: it's undefined only if there
is overflow, division by zero, or whatever.

On a 64 bit machine like x86_64 the usual representations of
integers are full, and therefore copying and other operations
are well defined (allowing for undefined behaviour on division
by zero etc).

In particular:

  int x[2]; int y[2];
  x[0]=y[0];
  x[1]=y[1];

is a perfectly valid way to copy a possibly incompletely
filled array provided int has a full representation.

Historically, C was a real mess, with the most traditional
copying of arrays of chars aliasing other values being undefined.

C++ did NOT follow C here. It invented its own, more consistent, set
of rules. Not sure about C++11 though.

Fair enough! The intent of my argument was to point out that -- surely in C++, and I think in C as well -- the fact that uninitialized values may have different values each time you look at them is made possible not because the standard says so, but because it fails to say what value you will observe. It's undefined behaviour not explicitly (as in my lvalue-to-rvalue conversion text above), but rather by the absence of any text for us to quote.

Nick

Just adding my 2 cents - to the best of my understanding, C99 also makes this behavior undefined explicitly.

From the standard (I think it's the draft C99-TC2):

Sec 6.7.8.10: "If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate."
Appendix J.2: "The behavior is undefined in the following circumstances: […] The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.8, 6.8)."