secure virtual architecture / safecode

SVA/safecode looks interesting. Is it available to play with? I grepped for strings such as "sva" "secure" "safecode" in the LLVM source tree and didn't find anything, nor did I see obvious links to implementations from the project web pages.

In the short term, questions I'd be interested in answering are: What happens when embedded codes that I care about (which currently lack an LLVM backend but that's a detail...) are compiled safely? Can Safecode beat Deputy in terms of code size and RAM usage?

Thanks,

John Regehr

John,

We (more accurately, John Criswell and Brice Lin) are working on a debugging version of SAFECode right now, which should be robust enough to play with soon.

What kinds of embedded codes do you have in mind? One of our goals has been to minimize or even eliminate run-time checks for embedded code that meets certain restrictions. You can see the following paper for more details:
  "Memory Safety Without Garbage Collection for Embedded Applications"
  http://llvm.org/pubs/2005-02-TECS-SAFECode.html
I'd be happy to discuss this further, on or off the list.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Hi Vikram,

I think it's worth continuing to discuss this on-list. I'm interested in different kinds of embedded software, but specifically in TinyOS
applications.

We (more accurately, John Criswell and Brice Lin) are working on a
debugging version of SAFECode right now, which should be robust enough
to play with soon.

This is great to hear.

I know the SAFECode paper and like it. However, my intuition fails when I try to imagine what your transformations will do to a TinyOS
application.

These applications contain some substantially ugly pointer manipulation in the network stack. The applications will clearly violate some
of the assumptions from your 2005 paper. On the other hand, these apps are definitely no worse than the Linux kernel. They are
substantially more static in nature, containing few or no function pointers and little or no heap allocation.

Our Safe TinyOS work (memory safety via Deputy) suffers from code size problems. Some architectures, such as msp430, are quite flash-poor,
and so even a few percent code bloat will stop large applications from compiling. This is too bad since these large applications are the
ones thst most need safety. Since we have deployed Safe TinyOS as part of a recent release of TinyOS, the problem is not just a
theoretical one! Making things worse, msp430-gcc is not one of the stronger gcc ports.

So I am very interested in approaches that can reduce or eliminate dynamic checks. However, this legacy code base is not going to lend
itself to heavy-handed restrictions. I have been expecting that dragging a heavyweight decision procedure into the bounds check optimizer
will be necessary.

More broadly, I'm intersted in exploring whole-program compilation of embedded codes using LLVM. The transformations that are most
interesting to me are those that do not make sense (and hence, have not been explored deeply) by the general-purpose compiler community. A
good example is getting rid of the call stack (we have an upcoming LCTES paper about this). Clearly it's stupid to do this for Firefox or
MSWord, but it can win for microcontroller codes. Our prototype was partially via source-to-source transformation, partially via gcc
hacks. Overall this was not very fun-- we should have done the work in LLVM (but nobody wrote AVR or msp430 backends for us to use :).

John Regehr

I understand. They only way you can keep the code bloat at a few percent or less is to aggressively eliminate nearly all run-time checks. We haven't measured our code size increases but if you have any codes to send us, we can try to get you some preliminary numbers.

We do have a static array bounds checking algorithm based on the Omega integer programming library, but it is not hugely effective. I think this can be strengthened a *lot*.

We'd have to see the pointer manipulation in your codes to figure out what (if any) run-time checks they might introduce.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

We do have a static array bounds checking algorithm based on the Omega
integer programming library, but it is not hugely effective. I think
this can be strengthened a *lot*.

This is great to hear. I now have extremely high expectations :).

I have put a few apps here:

   Index of /~regehr/hacked_tinyos_apps

These are for compiling only-- they will not run in any useful sense. They are embedded C code from which I have deleted processor-specific constructs so that llvm-gcc for x86 can compile them. Lacking actual LLVM backends for the processors of interest, this seemed the easiest way to get started.

These apps contain multiple entry points for interrupt handlers, which llvm-gcc for x86 will ignore. That is OK, since quite a bit of code is reachable from main().

John

I should add that I would be interested in helping with hooking a good decision procedure into LLVM. This will be useful far beyond array bounds check elimination.

What is the current thinking? SMT solver?

John

SMT solvers seem more general-purpose than we need. We just need an ILP solver of some kind (Omega, PIP, TaPAS, ...) for almost all uses I can think of. Do you want to do more powerful symbolic analysis for some purpose?

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Hi Vikram-

Certainly I'm far from being an expert on this topic. However, many of the theories supported by SMT-LIB seem useful, such as arrays, bitvectors, etc:

   http://combination.cs.uiowa.edu/smtlib/theories.html

Also, SMT has a uniform input format and so different solvers could be tried out at little cost.

In the slightly longer term, the ability to deal with quantifiers would be nice to help sort out lightweight verification problems of the sort that ESC-Java supports. This seems to be about the right level of specification for many types of problems.

John

Has anyone had time to generate safe versions of one or more of these? If so I'd be interseted to look at the output.

Thanks,

John