Sandboxing code

Hello,

  I'm absolutely 101% new to LLVM so please bare with me :).

  I'm trying to explore what LLVM can and cannot be used for. One
thing I was wondering, whether it would be possible to execute an LLVM
code in a completely sandboxed environment? By sandboxed I mean that
the executed code should not have direct access to any system
resources (i.e. hard drive, networking, devices), only through some
specific API that I would provide. The idea is to be able to execute a
random LLVM code from the internet in a completely safe way (provided
that the specific code adheres to my libs in the first place...
otherwise it shouldn't even compile).

Thanks,
  Peter

Péter Szilágyi wrote:

Hello,

  I'm absolutely 101% new to LLVM so please bare with me :).

  I'm trying to explore what LLVM can and cannot be used for. One
thing I was wondering, whether it would be possible to execute an LLVM
code in a completely sandboxed environment? By sandboxed I mean that
the executed code should not have direct access to any system
resources (i.e. hard drive, networking, devices), only through some
specific API that I would provide. The idea is to be able to execute a
random LLVM code from the internet in a completely safe way (provided
that the specific code adheres to my libs in the first place...
otherwise it shouldn't even compile).
  

The short answer is that you could build a system like this using LLVM, you could build it more quickly using the SAFECode compiler (which is built on LLVM and will be released as soon as we can get the legal paperwork done). However, you will need to add functionality to the LLVM/SAFECode system in order to be able to do the sandboxing. LLVM does not provide this functionality at present.

The long answer:

1) You can build the program analysis and transformation passes needed to do this as a set of LLVM passes.

2) SAFECode provides control-flow integrity as one of its memory safety properties. It ensures that the return address of a function won't be overwritten, and it instruments indirect function calls with run-time checks to ensure that they call valid functions.

3) You could enhance the instrumentation on indirect function calls to ensure that they don't call system calls or other functions which you consider "dangerous."

4) You can combine this with operating system techniques (e.g., chroot jails, private name spaces (Linux/Plan 9 only), SELinux, etc.) to limit access to operating system resources. Depending on how you want to sandbox the code, using OS isolation techniques and/or virtual machines (e.g., VMWare, Xen) may be more straightforward and easier to implement.

-- John T.

2009/11/6 Péter Szilágyi <peterke@gmail.com>

I’m trying to explore what LLVM can and cannot be used for. One
thing I was wondering, whether it would be possible to execute an LLVM
code in a completely sandboxed environment? By sandboxed I mean that
the executed code should not have direct access to any system
resources (i.e. hard drive, networking, devices), only through some
specific API that I would provide. The idea is to be able to execute a
random LLVM code from the internet in a completely safe way (provided
that the specific code adheres to my libs in the first place…
otherwise it shouldn’t even compile).

It is not the goal of LLVM to provide or enforce program safety.

Other projects do this, either on top of the LLVM representation (e.g., SAFECode which John already mentioned) or on native code directly (e.g., Native Client: http://code.google.com/p/nativeclient/ ) – so you’d have to compile LLVM to native code first.

Misha