LLVM 2.4 problem? (resend)

Hi,

I don't know enough C to know for certain if this is a programmer or compiler error:

In a Objective-C source file I have:

.
static const char sessionEntriesKVO = ' ';
.

Later I use that variable as a ID by taking it's address like this:

[feedManager addObserver:self forKeyPath:@"sessionEntriesCount" options:0 context:&sessionEntriesKVO];

and later

.
if (aContext == &sessionEntriesKVO) {
.

With GCC 4.2 everything works as expected but with LLVM-GCC it seems that the optimizer does something strange to the sessionEntriesKVO variable (I get strange "unrecognized selector sent to instance..." errors at runtime that has nothing to do with the sessionEntriesKVO).

Removing the const keyword (or compiling with -O0) fixes the problem.

Hi,

I don't know enough C to know for certain if this is a programmer or
compiler error:

Hi Tatu,

With this information it is impossible to tell if it is your fault or llvm's fault. Please file a bug with a testcase that demonstrates the problem, thanks!

-Chris

Hi Chris,

With this program llvm-gcc -O2 optimizes test2 away even though it's address is taken in program (gcc-4.2 does not, neither does llvm-gcc with -O or -O0):

#include <stdio.h>

static const char test1 = 'x';
static const char test2 = 'x';

int main(int argc, char **argv)
{
   printf("%p\n", &test1);
   printf("%p\n", &test2);

   return 0;
}

With this program llvm-gcc -O2 optimizes test2 away even though it's
address is taken in program (gcc-4.2 does not, neither does llvm-gcc
with -O or -O0):

I can confirm that test2 is replaced with test1 everywhere using llvm-gcc
from svn head.

#include <stdio.h>

static const char test1 = 'x';
static const char test2 = 'x';

int main(int argc, char **argv)
{
   printf("%p\n", &test1);
   printf("%p\n", &test2);

   return 0;
}

Ciao,

Duncan.

With this program llvm-gcc -O2 optimizes test2 away even though it's
address is taken in program (gcc-4.2 does not, neither does llvm-gcc
with -O or -O0):

I guess this is a bit academic case since you can easily circumvent this by marking variables as "static const volatile" or by just not initializing the variables to the same value (or leaving the initialization out).

Just wanted to report the mismatch between gcc and llvm-gcc :slight_smile:

Tatu Vaajalahti wrote:

With this program llvm-gcc -O2 optimizes test2 away even though it's address is taken in program (gcc-4.2 does not, neither does llvm-gcc with -O or -O0):

#include <stdio.h>

static const char test1 = 'x';
static const char test2 = 'x';

int main(int argc, char **argv)
{
   printf("%p\n", &test1);
   printf("%p\n", &test2);

   return 0;
}

Seems to me that it is perfectly legitimate for the compiler to fold the
two char constants together. Since they are both static and const,
they cannot be changed from outside the compilation unit, and the
compiler sees that they are not changed within the unit.

If you give test2 the value 'y', both test1 and test2 are retained.

True, but note that it is the address of a variable that is used, not the value.

What is more troublesome is that llvm-gcc combines these also across files with -O4:

test.c:
#include <stdio.h>

static const char test1 = 'a';
static const char test2 = 'a';

void testf();

int main(int argc, char **argv)
{
   printf("%p\n", &test1);
   printf("%p\n", &test2);

   testf();

   return 0;
}

test2.c:
#include <stdio.h>

static const char test1 = 'a';
static const char test2 = 'a';

void testf(void)
{
   printf("%p\n", &test1);
   printf("%p\n", &test2);

}

llvm-gcc -O3 test.c test2.c :
0x1ffd
0x1ffe

llvm-gcc -O4 test.c test2.c :
0x1ffd

True, but note that it is the address of a variable that is used, not
the value.

Yes, but why do you think they should get a different address? I can
understand that it is surprising that they do, but determining whether
this is legal or not requires reading the language standard. Hopefully
a language lawyer can chime in and say whether this transform is valid
or not.

Ciao,

Duncan.

I agree the whole construction is a litle bit strange (stupid even). It is however common way to specify context identity in one Objective-C pattern (although I don't think anyone actually uses initialized const variables, I was just playing with them to see how compilers put stuff in segments).

I do think however that it's bit dangerous to combine static constants across compilation units.

Tatu Vaajalahti wrote:

Me too :slight_smile:

Duncan Sands wrote:

True, but note that it is the address of a variable that is used, not the value.

Yes, but why do you think they should get a different address? I can
understand that it is surprising that they do, but determining whether
this is legal or not requires reading the language standard. Hopefully
a language lawyer can chime in and say whether this transform is valid
or not.

Change the return statement to:

   return test1 == test2;

LLVM will constant-fold that to false, which is inconsistent with the other optimization.

Nick

GCC does the same things with strings in some cases. You shouldn't depend on this behavior if you want portable code. If you avoid marking the global variable const, you should have better luck.

-Chris

ACK! Thank you all for your answers!

FWIW, I've been discussing this with some of my colleagues (who may well be the foremost experts on this topic), and so far we don't have a definite answer (we're looking at C99 and C++). We do think that a strict reading of the standard allows the optimization, but there is also some suspicion that that is unintended (at least in C++).

  Daveed

FWIW, a C++ CoreWG issue has been opened to clarify this.

(My own position is that different objects should have guaranteed different addresses. To alias them, a code generator must prove that it wouldn't change observable behavior.)

  Daveed

Hello, David

(My own position is that different objects should have guaranteed
different addresses. To alias them, a code generator must prove that
it wouldn’t change observable behavior.)

However, it’s pretty common linker optimization to merge constant strings / small literal values. So, even if compiler itself won’t merge them, they will be emitted into mergeable section and then linker will perform this optimization.

Correct, but note that literals (unlike variables) don't define distinct objects per se. For example, two occurrences of "literal" may evaluate to the same (array) object. The standard even explicitly allows for e.g. "literal" to be a subobject of "string literal" (2.13.4/10 in N2723).

  Daveed

You all are wrong. Amazingly so.

First, String literals and objects are different. String literals are defined like this:

2 Whether all string literals are distinct (that is, are stored in
   nonoverlapping objects) is implementation-defined.

That applies _only_ to string literals, absolutely nothing else. Objects are defined like so:

Two pointers of
   the same type compare equal if and only if they are both null, both
   point to the same object or function, or both point one past the end
   of the same array.

This means they _must_ compare !=, if they are different objects. Wether are the same object or or not is answered by the notion of linkage:

8 An identifier used in more than one translation unit can potentially
   refer to the same entity in these translation units depending on the
   linkage (_basic.link_) of the identifier specified in each translation
   unit.

2 A name is said to have linkage when it might denote the same object,
   reference, function, type, template, namespace or value as a name
   introduced by a declaration in another scope:

to be pedantically clear, entity includes objects:

3 An entity is a value, object, subobject, base class subobject, array
   element, variable, function, instance of a function, enumerator, type,
   class member, template, or namespace.

Now, you ask, how can we be sure these have no linkage across translation units, because:

3 A name having namespace scope (_basic.scope.namespace_) has internal
   linkage if it is the name of

   --an object, reference, function or function template that is

     explicitly declared static or,

We know that they do not denote the same object because the rules that guide us when they do are not met:

9 Two names that are the same (clause _basic_) and that are declared in
   different scopes shall denote the same object, reference, function,
   type, enumerator, template or namespace if

   --both names have external linkage or else both names have internal
     linkage and are declared in the same translation unit; and

   --both names refer to members of the same namespace or to members, not
     by inheritance, of the same class; and

   --when both names denote functions, the function types are identical
     for purposes of overloading; and

   --when both names denote function templates, the signatures
     (_temp.over.link_) are the same.

We know that they cannot have linkage across translation units because:

   --When a name has external linkage, the entity it denotes can be
     referred to by names from scopes of other translation units or from
     other scopes of the same translation unit.

   --When a name has internal linkage, the entity it denotes can be
     referred to by names from other scopes in the same translation unit.

Welcome to C and C++ 101. I'm amazed that this isn't as plan as day to anyone that works on a compiler. Kinda basic stuff. Ignorance of the rules doesn't mean you can't just read the words of the standard. You don't have to guess.

The standard is meant to be fairly accessible:

Every byte has a unique address.

1 The fundamental storage unit in the C++ memory model is the byte.

5 Unless it is a bit-field (_class.bit_), a most derived object shall
   have a non-zero size and shall occupy one or more bytes of storage.

So, let me state is this way, the address _must_ be different. If you can't tell they are not, you are free to have them be the same.

[...]

Objects are defined like so:

Two pointers of
  the same type compare equal if and only if they are both null,
both
  point to the same object or function, or both point one past the
end
  of the same array.

This means they _must_ compare !=, if they are different objects.

Aha! Thanks for quoting that: It's from an expired standard (1998, presumably). The 2003 standard has changed the words for that, taking away the property under discussion (for a different reason -- see Core Issue 73).

[...]

So, let me state is this way, the address _must_ be different. If you
can't tell they are not, you are free to have them be the same.

The changes to address Core issue 73 invalidates your reasoning in the current standard (and in the working paper for the next standard). However, I'll mention this history to the issues maintainer as support for my position that I want the dinstinct-address-guarantee (back).

  Daveed