(Request for comments) Implicit cast kind for initialization of references.

a[i] becomes *(a+i), which dereferences the address, i.e. undefined
behavior, in your case.

Not without an lvalue to rvalue conversion.

Abramo Bagnara wrote:

a[i] becomes *(a+i), which dereferences the address, i.e. undefined
behavior, in your case.

Not without an lvalue to rvalue conversion.

That may be the ultimate goal of the C++ committee to word into the Standard
some time in the future, but the current C++11 does not incorporate that
understanding yet. It leaves undefined the effects of evaluating an lvalue
referring to nothing. So evaluating *(a+i) is UB *if* there is no object at
address a+i. Since whether there is an object in this case is unspecified,
the Standard does not require anything from the implementation - in other
words, the behavior is undefined.

To elaborate on what I think the rules are, it's different for cases like
the following

    int a[3][1] = { };
    int x = *(&a[0][0]+1);

Note that the above code is fine, since there is guaranteed to be an object
at the dereferenced address.

    int x = *(&a[0][0] + 2); // UB

This is undefined behavior, because the addition of 2 goes one beyond the
after-the-end address of a[0]. Note that the following is well defined again

    int x = *(&a[0][0] + 1 + 1); // valid, not UB

Note the associativity of binary plus. Having a pointer to a[1][0] allows
you to add 1 again to obtain another after-the-end that again points to a
valid object a[2][0].

To complete the picture

    (*(int*)0), 42;

This is undefined behavior, because the lvalue "*(int*)0" does not refer to
an object or function, but an lvalue does refer to an object or function and
C++11 does not define what happens if it doesn't (since it by definition
can't).

Note that from C99 times &*E is considered equivalent to E, so starting
from the original

int a[5];
int *p = &a[5];

we have the following equivalent transformations:

&a[5] => &*(a+5) => a + 5

IMHO following a different line of thought we'd soon come to nonsense.

The wording in C++0x 8.3.2 p5 says:

  A reference shall be initialized to refer to a valid object
  or function. [Note: in particular, a null reference cannot
  exist in a well-defined program, because the only way to create
  such a reference would be to bind it to the “object” obtained
  by dereferencing a null pointer, which causes undefined behavior.

Now, lvalue references are initialized using lvalues. If there could be
no invalid lvalues at all, then why stating the first sentence above?

It is also worth to stress the use of words "in particular" in the Note
above: as far as I understand, it suggests that the real cause of UB is
that we are *binding* the invalid lvalue to a reference (i.e., it is not
the mere action of computing the invalid lvalue).

That is, after computing an lvalue, we necessarily have to do one of the
following action (unless I am missing something):
  - read from it (UB if invalid);
  - write to it (UB if invalid);
  - bind it to a reference (UB if invalid).
  - take its address (sometimes this is well-defined even though the
lvalue is not invalid, e.g., for invalid "off-by-one lvalues").

To our eyes, this view would provide quite a clear picture of the
matter. Is anyone aware of, e.g., comments from the STandard Committee
arguing against such a point of view?

Enea.

Abramo Bagnara wrote:

a[i] becomes *(a+i), which dereferences the address, i.e. undefined
behavior, in your case.

Not without an lvalue to rvalue conversion.

That may be the ultimate goal of the C++ committee to word into the Standard
some time in the future, but the current C++11 does not incorporate that
understanding yet. It leaves undefined the effects of evaluating an lvalue
referring to nothing. So evaluating *(a+i) is UB *if* there is no object at
address a+i. Since whether there is an object in this case is unspecified,
the Standard does not require anything from the implementation - in other
words, the behavior is undefined.

Note that from C99 times &*E is considered equivalent to E, so starting
from the original

int a[5];
int *p = &a[5];

we have the following equivalent transformations:

&a[5] => &*(a+5) => a + 5

IMHO following a different line of thought we'd soon come to nonsense.

The wording in C++0x 8.3.2 p5 says:

A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot
exist in a well-defined program, because the only way to create
such a reference would be to bind it to the “object” obtained
by dereferencing a null pointer, which causes undefined behavior.

Now, lvalue references are initialized using lvalues. If there could be
no invalid lvalues at all, then why stating the first sentence above?

Simple - dangling references. I'm not entirely sure that the spec says
such things are immediate UB (though I suppose it should be... ).

It is also worth to stress the use of words "in particular" in the Note
above: as far as I understand, it suggests that the real cause of UB is
that we are *binding* the invalid lvalue to a reference (i.e., it is not
the mere action of computing the invalid lvalue).

I'd err the other way & assume the wording means that it's UB by
dereferencing null, not the act of binding that to a reference.

Here's at least one piece of wording from the standard that states
explicitly that the act of dereferencing something is immediate UB:

"The effect of dereferencing a pointer returned as a request for zero
size is undefined."

- 3.7.4.1 [basic.stc.dynamic.allocation] paragraph 2

Though I'm still trying to find the wording that supports the more
general/obvious case of dereferencing null (or a dangling pointer,
etc), I'd be fairly sure it's the same case.

- David

[...snip...]

The wording in C++0x 8.3.2 p5 says:

A reference shall be initialized to refer to a valid object
or function. [Note: in particular, a null reference cannot
exist in a well-defined program, because the only way to create
such a reference would be to bind it to the “object” obtained
by dereferencing a null pointer, which causes undefined behavior.

Now, lvalue references are initialized using lvalues. If there could be
no invalid lvalues at all, then why stating the first sentence above?

Simple - dangling references. I'm not entirely sure that the spec says
such things are immediate UB (though I suppose it should be... ).

If your point of view is consistently pursued (i.e., we can not
*compute* an invalid lvalue without incurring UB), then we will not be
able to compute the dangling reference at all; hence, the provision
above about proper initialization of references would again be pointless.

If we stick to the other point of view (i.e., some invalid lvalues can
be computed without incurring UB and the UB is incurred when *using*
them in certain ways) then the sentence will make more sense, imho.

It is also worth to stress the use of words "in particular" in the Note
above: as far as I understand, it suggests that the real cause of UB is
that we are *binding* the invalid lvalue to a reference (i.e., it is not
the mere action of computing the invalid lvalue).

I'd err the other way & assume the wording means that it's UB by
dereferencing null, not the act of binding that to a reference.

Here's at least one piece of wording from the standard that states
explicitly that the act of dereferencing something is immediate UB:

"The effect of dereferencing a pointer returned as a request for zero
size is undefined."

- 3.7.4.1 [basic.stc.dynamic.allocation] paragraph 2

This is quite different.
Note 35 says: "C++ differs from C in requiring a zero request to return
a non-null pointer." So, it looks like the pointer returned as a request
for zero size could be a "wild" pointer, i.e., much worse than the null
pointer or an off-by-one pointer. For instance, as far as I remember,
the simple act of *computing* an off-by-2 pointer can lead to UB, even
before any attempt to read it, store it or dereference it.

Though I'm still trying to find the wording that supports the more
general/obvious case of dereferencing null (or a dangling pointer,
etc), I'd be fairly sure it's the same case.

- David

Also see the following change to the standard, which afaict has been
approved and entered the new C++11 standard:

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1102

In few words, they explicitly removed the usage of null pointer
dereferencing as an example of UB *because* there are discussions going
on arguing that UB is not caused by dereferencing, but by later uses of
the invalid lvalue.

It would be really interesting if someone in contact with people in the
standardization committee could write here their point of view (e.g.,
reporting whether or not there have been progresses in those discussions).

Enea.

Enea Zaffanella wrote:

Abramo Bagnara wrote:

a[i] becomes *(a+i), which dereferences the address, i.e. undefined
behavior, in your case.

Not without an lvalue to rvalue conversion.

That may be the ultimate goal of the C++ committee to word into the
Standard some time in the future, but the current C++11 does not
incorporate that understanding yet. It leaves undefined the effects of
evaluating an lvalue referring to nothing. So evaluating *(a+i) is UB
*if* there is no object at address a+i. Since whether there is an object
in this case is unspecified, the Standard does not require anything from
the implementation - in other words, the behavior is undefined.

Note that from C99 times &*E is considered equivalent to E, so starting
from the original

int a[5];
int *p = &a[5];

we have the following equivalent transformations:

&a[5] => &*(a+5) => a + 5

IMHO following a different line of thought we'd soon come to nonsense.

The wording in C++0x 8.3.2 p5 says:

  A reference shall be initialized to refer to a valid object
  or function. [Note: in particular, a null reference cannot
  exist in a well-defined program, because the only way to create
  such a reference would be to bind it to the “object” obtained
  by dereferencing a null pointer, which causes undefined behavior.

Now, lvalue references are initialized using lvalues. If there could be
no invalid lvalues at all, then why stating the first sentence above?

This as stated is a diagnosable semantics rule. So when you violate it the
implementation is required to diagnose the program.

The note is non-normative and gives an example of where that rule is
violated, but it also says that such a program is not well-defined anymore.
And hence since the program becomes not well-defined anymore you are not
required to emit a diagnostic anymore.

So the undefined behavior does not come from the text you quoted at all, it
comes from a different place (the mere act of dereferencing, not the binding
of the reference).

The note continues to give an example of a violation that does not have
undefined behavior (trying to bind a non-const reference to a bitfield), and
that the rule you quoted renders as requiring a diagnostic.