GSoC 2012 proposal : Integrate Baggy Bounds Checking into SAFECode

Dear LLVM developers:
Here is my another proposal of LLVM. Any suggestion would be welcome!

Integrate Baggy Bounds Checking into SAFECode

Abstract: Baggy Bounds Checking (BBC) is an efficient bounds checking technique that pad and align objects to powers of two and enable allocation bounds. It uses a contiguous array as bounds table to enable efficient bounds lookup and thus has low overhead at runtime. This project is aims to integrate BBC into SAFECode.

Motivation

BBC is presented in the paper [1], which uses a binary buddy allocator that pads and aligns objects to powers of two. Thus it can just use only a single byte stored in a contiguous array to compute the base and the size of a pointer. It is an efficient defense against out-of-bounds errors and the authors claim that it is more than two times faster than the fasted previous techniques. By integrating BBC into SAFECode, users can use SAFECode in a more efficient way to check spatial errors.

Project Detail

BBC enforces allocation bounds instead of object bounds. It constrains allocation sizes and alignment to powers of two. For instance,

x = malloc (200). BBC allocs the size of x to be 256 instead:

x = malloc (256).

Previous techniques that recorded a pointer to the start of the object and its size in bounds table, which requires at least eight bytes. By enabling allocation bounds, BBC can use only a single byte to encode bounds information. This single byte is set to be log2(size). We use ‘e’ to stand for this byte. As for the above instance, e = 8. The base and the size of a pointer p can be computed as the following:

size = 1 << e;

base = p & ~(size -1);

BBC uses a contiguous array to implement bounds table. Each entry of this table stores the above single byte. Also, application memory is divided into aligned slots with slot_size bytes. The bound table has an entry for each slot. When checking whether a pointer is out of bound or not, BBC can enable optimized bounds checks. For instance, p is a pointer, we want to check whether the following pointer arithmetic is safe or not: q = p + i. An explicit bonds check is as the following:

First we need compute the size and the base of p:

Size = 1 << table[p>>slot_size]

Base= p & ~(size-1)

Then we check whether q satisfies this condition:

q >= base && q < base + size

BBC optimizes it as the following:

(p^q) >> table[p>>slot_size] ==0

It means that if q is in bound, it should have the same prefix with only the e least significant bits modified, where e is the binary logarithm of the allocation size, stored in the p>>slot_size entry of the bonds table.

Here I give an example to explain how BBC works. (We set slot_size to be 16)

p =malloc (100); // It returns the address 0x12345600

BCD pads and aligns the size to be 128

p = malloc (128); So the upper bound of p is 0x12345680 and the lower bound of p is 0x12345600. table [p>>slot_size] = table[p>>16] = log2128 = 7

There is pointer arithmetic:

q = p + 144 = 0x12345690.

We check: (p^q) >>table[p>>slot_size] = 0x00000090>>7 != 0 So q is out of bound.

Currently some initial work of BBC has been done in SAFECode. But there is still a lot of work to do. The padding and alignment transform works with LLVM 2.x and needs to be updated to LLVM 3. There are no run time checks of BBC, including pointer arithmetic and array indexing, in the trunk of SAFECode now. Also I will extend BBC to check pointer dereferences and access to scalar fields in structure. So this project is to complete BBC at runtime in SAFECode.

Timeline

The following plan is scheduled as weeks.

[1-2] make padding and alignment transform work with LLVM 3.

[3-5] Add run-time checks, including each pointer arithmetic and array indexing operation.

[6-9] Extend BBC to include the following checks: access to scalar fields in structures and pointer dereferences.

[10-11] Test and improvement, make BBC works well with the test-suite.

[12] Document for BBC support in SAFECode.

Project experience

In GSoC2009, I took part in a project: support Scilab language on SWIG [2]. I added a backend module in SWIG, so that it can support all the C features for Scilab language: variables, functions, constants, enums, structs, unions, pointers and arrays.

In GSoC2010, I also successfully finished a project called“epfs”[3] , which means embedding Python from Scilab. This project introduces a mechanism to load and use Python code from Scilab.

I have about one year’s experience for LLVM. I use it mainly to implement control flow integrity for Operating Systems and thus improve system security. I recently submitted a patch for Target.h file to improve compatibility with SWIG, which has been applied on the trunk.

Biography

Name: Baozeng Ding

University: Institute of Software, Chinese Academy of Science

Email: sploving1@gmail.com

IRC name: sploving

References

[1]. P. Akritidis,M. Costa,M. Castro, and S. Hand. Baggy Bounds Checking: An Efficient and Backwards-Compatible Defenseagainst Out-of-Bounds Errors. In Proceedings of the 18th USENIX Security Symposium, August 2009.

[2]. http://code.google.com/p/google-summer-of-code-2009-swig/downloads/list

[3]. http://forge.scilab.org/index.php/p/epfs/