TLDR: we’re looking for a well-defined way to create objects with vptrs via memcpy
.
We have a common code pattern on creating new message objects in protobuf parser:
// Message is almost trivial, except that it has virtual methods
class Message {
public:
virtual ~Message();
// allocate memory from the arena, and call placement new.
virtual Message* New(Arena* arena);
intptr_t meta_data_;
};
Message* create(const Message* default_instance, Arena* arena) {
return default_instance->New(arena);
}
This code creates a new instance of Message
from the default_instance
via a New
virtual function. It works fine and has well-defined behavior. This code has a devirtualization cost. In many cases, the overhead of invoking virtual functions outweighs their benefits, especially considering that for most protobuf messages, these functions just perform very little work.
A faster approach is to utilize memcpy, copying the bytes from the default instance to a buffer and then materializing a new object from the buffer:
Message* create(const Message* default_instance, Arena* arena) {
unsigned size = default_instance->size(); // including vptr
char* buffer = arena.allocate(size);
// bitwise copy of object representation.
memcpy(buffer, default_instance, size); // UB1: memcpy on a non-trivially copyable object
return reinterpret_cast<Message*>(buffer); // UB2: Message object lifetime is not started
}
This implementation is significantly faster, as observed in the parsing benchmarks, with a reduction of ~10% CPU time and ~8% cpu cycles. Parsing is a critical workload for us.
However this code has undefined behavior:
-
Undefined Behavior 1 (UB1): according to the C++ basic.types.general p2: “For any object (other than a potentially-overlapping subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes ([intro.memory]) making up the object can be copied into an array of char, unsigned char, or std::byte ([cstddef.syn]).”,
Message
is not trivially copyable, so using memcpy to copy its bytes is not guaranteed to work. -
Undefined Behavior 2 (UB2): according to the C++ intro.object p1: “An object is created by a definition, by a new-expression (expr.new), by an operation that implicitly creates objects (see below), when implicitly changing the active member of a union, or when a temporary object is created (conv.rval, class.temporary)”, the returned object is not created in any of these ways, particularly its lifetime has not started properly.
Given the significant performance improvement, we’d like to apply these changes in production. However, our main concern is these undefined behavior, even though the code works with the current clang toolchain. Some ideas:
Regarding UB1, I don’t have a good answer. The best I can say for this particular case it should work in practice (maybe it is safe to assume it always works?).
The “Making More Objects Contiguous” P1945R0 introduces a new contiguous-layout type, which covers class types with virtual functions, this should make it well-defined behavior, but this proposal doesn’t seem to have any updates.
Regarding UB2, we can provide a builtin like __builtin_start_object_lifetime
in clang to bless it, the builtin is similar to C++23 std::start_lifetime_as
, but permits starting lifetime of non implicit-lifetime types. (The builtin implementation is not complicated given the current way that clang lowers the code to LLVM IR, for most part it is no-op, but we need to make sure optimizers -fstrict-vtable-pointers
, -fstrict-aliasing
, sanitizers respect it).
(We also considered restructuring the Message
to eliminate virtual methods and instead implement a hand-rolled vtable mechanism to make it trivially-copyable type, but it will require a considerable amount of effort for a complete rewrite, so this is not a great option).
Any thoughts and suggestions are very welcome.