Which optimization barriers do I need here?

I am writing sandboxing code for Google Chrome that has the potential to trigger execution of a hook function on any arbitrary system call that the program makes. Inside of the hook function, it performs a read-only operation on a global data structure that describes some of the sandboxing policies. This works fine once sandbox initialization has completed, but it requires special care when initially setting up this data structure as we never want to present invalid data (e.g. use after free) to the hook function. And as system calls are somewhat out of the scope of the C++ language specification, we have to assume that they can happen at (almost) any time.

A very simplified version of our code looks like this:

#include <string.h>

// We have some low-level code that can trigger on any system call and that
// performs a read-only access to “data”. It is guaranteed to only ever access
// indices between [0…n_data) and it doesn’t need to look at “n_data”. So, it
// is OK if a) “data” points to either new or old memory locations as long as
// they contain valid data, and b) the value of “n_data” does not need to be
// consistent.
// We are not concerned about multi-threaded code. Our code is guaranteed to be
// single-threaded at this time.
// We are also not concerned about asynchronous signals. Our code will only ever
// execute synchronously as a direct result of a system call.
static char *data;
static size_t n_data;

void AddToData(char ch) {
// Copy data from “data” to “new_data”. Don’t use realloc() as it could be
// making system calls at unexpected times.
// Then switch pointers without making any interleaving system calls.
char *old_data = data;
char *new_data = new char[n_data + 1];
memcpy(new_data, data, n_data);

// Do we need a barrier here to make sure memcpy() has completed before
// we switch the pointer in “data”?
data = new_data;
data[n_data++] = ch;

// We are worried about the compiler moving the deletion of “old_data” prior
// to the assignment of “data”. As it doesn’t know about us trapping all
// system calls, it could very conceivably reason that “delete[]” doesn’t
// have global side-effect and can thus be moved prior to our assignment.
// We add a optimization barrier to prevent this from happening.
// Is this the correct type of barrier?
asm volatile("" : “=r”(data) : “0”(data) : “memory”);
delete[] old_data;
}

int main(int argc, char *argv[]) {
AddToData(‘X’);
return 0;
}

I realize that we are operating somewhat outside of the guarantees that the language can give us. But we want to make sure we defensively write our code so that it is unlikely to trigger unintended compiler optimizations. Please also note that we have not actually observed any re-ordering that would conflict with the intention of our code; but if we can give hints to the compiler that allow it to better deduce our intentions, then that’s what we want to do.

This code is not actually performance critical; so, aggressively disabling optimizations in favor of correctness is fine with us.

The use of non-portable assembly would be acceptable, but is not our first preference. Using compiler extensions in more recent compilers are also possible, but we have to have a fallback plan for when we compile with older versions of clang or gcc. Maybe, we can just rely on the fact, that their optimizers were sufficiently conservative, that we don’t need to prevent reordering.

Markus

I think you need an atomic store here, or the compiler could e.g.,
split an 8-byte store into two 4-byte stores.

Dmitri

I think we don’t care about the compiler splitting the store, as long as it isn’t interleaved with a system call. Our hook only ever runs as a direct result of an intercepted system call.

We have some (limited) control over when we output system calls, as we know that basic assignments won’t generate system calls behind the scenes. But of course, something like “delete[]” could very legitimately result in system calls. So, our assignment to the pointer has to be completed prior to “delete[]”.

Otherwise, we could end up dereferencing the old pointer from our hook, even though “delete[]” already released the memory.

Similarly, we have to make sure that “memcpy()” has completed prior to us switching the pointer.

N.B.: this is all single threaded code and due to the nature of how the sandbox works, we know it will never be changed to be multi-threaded code. So, as far as the CPU is concerned, this is perfectly deterministic code. We don’t need to worry about CPU barriers or about effects of the memory model. Neither do we have to worry about any asynchronous events. All we care about are compiler barriers.

Markus

asm volatile("":::"memory"); is the traditional compiler optimization barrier and should work fine in this case.

John.