C++ objects sometimes initialized to zero before constructor is called?

I'm running into a problem with clang (both in XCode 4.4 and clang's Subversion repository, and maybe earlier versions too) which is a little strange. I've tried to dig around in clang, but there's just too much background knowledge I lack to solve this on my own. I'll try to explain it here the best I can, and maybe someone can point me in the right direction?

Here's my problem:

I've got a game built on an old version of the Unreal Engine, which does something strange/scary/interesting to support the engine's scripting language. The scripting language is bridged to C++, so that some objects can be used in either language, back and forth, transparently.

To support this, the game malloc()'s a buffer for a scripted object, memcpy()'s a block of bytes over it that has bunch of default values for various data members specified in script, and then does a placement-new on that buffer, where the C++ constructor for that object might overwrite some of those default values, bubbling up through all the C++ parent classes.

There's a whole bunch of macro and template magic to make this system work. It's impressive and terrifying, but it has worked on several titles since 1997 or so, across various platforms, versions and targets for CodeWarrior, gcc, and Visual Studio.

Here's the problem: some of the objects, between where we memcpy()'d the default values from script and where we call the C++ constructor via placement-new, has a memset() inserted that zeroes out the whole object. This memset() is being inserted by clang, not our code.

This is happening around line 416 of clang's lib/CodeGen/CGExprCXX.cpp:

   // Otherwise, just memset the whole thing to zero. This is legal
   // because in LLVM, all default initializers (other than the ones we just
   // handled above) are guaranteed to have a bit pattern of all zeros.
   CGF.Builder.CreateMemSet(DestPtr, CGF.Builder.getInt8(0), SizeVal,
                            Align.getQuantity());

All the scriptable objects get constructed with the same macro, but looking at the disassembly, only some of them get a memset() inserted. It's not clear to me why some do and some don't.

The same macro magic that constructs the object is used in every C++ class that is scriptable, but most of these classes are big and complicated beyond that piece of code. I could arrange for Apple to take a look at the source code off-list if that would be helpful, but it's not my code to hand out to the public.

This isn't my area of expertise, but I need to get rid of that memset() reliably for Unreal's mechanism to function properly. What might make clang decide that a given C++ constructor would need to run through EmitNullBaseClassInitialization()? I wasn't able to find the right incantation of grep to figure out where semantic analysis makes this decision.

(This is tested against clang on Mac OS X. The gcc-llvm that ships in Xcode 4.4 works for the game, but I'd rather ship this game with clang if possible, because clang is awesome. :slight_smile: )

Thanks,
--ryan.

Hi Ryan,

There are a few different C++ notions which are relevant here:

  • default-initialization is what happens when no initializer is specified for an object
  • value-initialization is what happens when an empty initializer is specified (as empty parens, or in C++11, as empty braces)
  • zero-initialization is the step which is memset()ing your object to 0s.

Both default-initialization and value-initialization can result in a default constructor being called. But value-initialization sometimes performs zero-initialization first, when the default constructor is not user-provided.

Example:

#include

struct A {
A() { std::cout << n << std::endl; }
int n;
};
struct B {
A a;
};
struct C {
C() {}
A a;
};

int main() {
char *p = new char[sizeof(B)];
(int)p = 1;
new (p) B; // prints 1

char *q = new char[sizeof(B)];
(int)q = 1;
new (q) B(); // prints 0

char *r = new char[sizeof(B)];
(int)r = 1;
new (r) C; // prints 1

char *s = new char[sizeof(B)];
(int)s = 1;
new (s) C(); // prints 1
}

Note that new B() triggers zero-initialization, because it calls a non-user-provided default constructor using empty parens. The other cases do not, either because they call a user-provided constructor or because they use default-initialization rather than value-initialization.

Note that new B() triggers zero-initialization, because it calls a
non-user-provided default constructor using empty parens. The other
cases do not, either because they call a user-provided constructor or
because they use default-initialization rather than value-initialization.

That was totally it! I removed a "()" from a macro, and now the whole game is running when built with clang.

Thanks so much for a fast and detailed response. After reading this, I was up and running within about 60 seconds. :slight_smile:

--ryan.