Proposed Enhancement to AddressSanitizer: Initialization Order

+llvmdev, -llvm-dev

+llvmdev, -llvm-dev

Hi Reid,

Hello,

I’m starting work on a project to detect initialization order problems
in C++ files using AddressSanitizer.
The extension in question will hopefully result in AddressSanitizer
being able to detect initializers which read an undefined value from a
static or global variable defined in another TU.
I’m currently working on this as a patch to AddressSanitizer, but I’m
open to suggestions as to what the proper way to implement this
extension would be.

One of the simplest examples of this is the following example:
It is undefined what this program will output, and it’s fairly easy to
see this behavior.

When compiled as:
$ clang++ file_1.cpp file_2.cpp main.cpp
$./a.out
x: 2
y: 1

However, when compiled as:
$ clang++ file_2.cpp file_1.cpp main.cpp
$./a.out
x: 1
y: 2

//file_1.cpp
extern int y;
int x = y + 1;

//file_2.cpp
extern int x;
int y = x + 1;

//main.cpp
#include
extern int x,y;

int main(){
std::cout << "x: " << x << std::endl;
std::cout << "y: " << y << std::endl;
}

Here’s a sketch of the detection algorithm:
For each TU:

  1. Before each TU’s initializers run, conditionally poison the
    global variable shadow memory
    -Each global variable is poisoned, unless it was defined in that TU
    -Additional information is added to struct __asan_global to
    identify which TU a global was declared in

This could be tricky.
First, we don’t want to poison the linker-initialized globals because they are always initialized regardless the TU order.

Second, consider we have 3 TUs, t1, t2, and t3, each has a global (g1, g2 and g3) with initializer.
When we are running initializers in t2, we need to poison g1 and g3, but so far we have seen only g1.
I don’t know any good and portable way to get g3.

One solution is to run the binary twice: once with the default order of TU initializers, and second time with the reverted order (not sure if that’s easy).

Or it might be a bit simpler…
Currently, asan creates an unnamed linker-initialized global array for all instrumented globals in a given module.

% cat glob.cc
int foo();
int bar();
int AAA = foo();
int BBB = bar();

% clang -O2 -faddress-sanitizer -S -o - -emit-llvm glob.cc

@ 2 = private global [2 x { i64, i64, i64, i64 }] [{ i64, i64, i64, i64 } { i64 ptrtoint ({ i32, [60 x i8] }* @AAA to i64), i64 4, i64 64, i64 ptrtoint ([14 x i8]* @0 to i64) }, { i64, i64, i64, i64 } { i64 ptrtoint ({ i32, [60 x i8] }* @BBB to i64), i64 4, i64 64, i64 ptrtoint ([14 x i8]* @1 to i64) }]

If we make this array discoverable by other modules (using appending linkage?), the problem is solved.

–kcc

As we've discussed offline, it may be easy to wrap and re-implement
__libc_global_ctors (which essentially iterates over __CTOR_LIST__ and
calls the ctors for each module in the linkage order).

We can then shuffle the ctors in any order we want, e.g. explicitly
ask for reverse order. Other means of changing the ctor order may
require relinking the binary.
We'll just need to associate each pointer in the __CTOR_LIST__ with
the corresponding per-module structure that describes the globals,
poison all the globals in a certain module after its ctor has been
called, and unpoison all the globals after __libc_global_ctors is
done.