[PATCH] Protection against stack-based memory corruption errors using SafeStack

Dear LLVM developers,

Our team has developed an LLVM-based protection mechanism that (i) prevents control-flow hijack attacks enabled by memory corruption errors and (ii) has very low performance overhead. We would like to contribute the implementation to LLVM. We presented this work at the OSDI 2014 conference, at several software companies, and several US universities. We received positive feedback, and so we’ve open-sourced our prototype available for download from our project website (http://levee.epfl.ch).

There are three components (safe stack, CPS, and CPI), and each can be used individually. Our most stable part is the safe stack instrumentation, which separates the program stack into a safe stack, which stores return addresses, register spills, and local variables that are statically verified to be accessed in a safe way, and the unsafe stack, which stores everything else. Such separation makes it much harder for an attacker to corrupt objects on the safe stack, including function pointers stored in spilled registers and return addresses. A detailed description of the individual components is available in our OSDI paper on code-pointer integrity (http://dslab.epfl.ch/pubs/cpi.pdf).

The overhead of our implementation of the safe stack is very close to zero (0.01% on the Phoronix benchmarks and 0.03% on SPEC2006 CPU on average). This is lower than the overhead of stack cookies, which are supported by LLVM and are commonly used today, yet the security guarantees of the safe stack are strictly stronger than stack cookies. In some cases, the safe stack improves performance due to better cache locality.

Our current implementation of the safe stack is stable and robust, we used it to recompile multiple projects on Linux including Chromium, and we also recompiled the entire FreeBSD user-space system and more than 100 packages. We ran unit tests on the FreeBSD system and many of the packages and observed no errors caused by the safe stack. The safe stack is also fully binary compatible with non-instrumented code and can be applied to parts of a program selectively.

We attach our implementation of the safe stack as three patches against current SVN HEAD of LLVM (r221153), clang (r221154) and compiler-rt (r220991). The same changes are also available on https://github.com/cpi-llvm in the safestack-r221153 branches of corresponding repositories. The patches make the following changes:

– Add the safestack function attribute, similar to the ssp, sspstrong and sspreq attributes.
– Add the SafeStack instrumentation pass that applies the safe stack to all functions that have the safestack attribute. This pass moves all unsafe local variables to the unsafe stack with a separate stack pointer, whereas all safe variables remain on the regular stack that is managed by LLVM as usual.
– Invoke the pass as the last stage before code generation (at the same time the existing cookie-based stack protector pass is invoked).
– Add -fsafe-stack and -fno-safe-stack options to clang to control safe stack usage (the safe stack is disabled by default).
– Add attribute((no_safe_stack)) attribute to clang that can be used to disable the safe stack for individual functions even when enabled globally.
– Add basic runtime support for the safe stack to compiler-rt. The runtime manages unsafe stack allocation/deallocation for each thread.
– Add unit tests for the safe stack.

You can find more information about the safe stack, as well as other parts of or control-flow hijack protection technique in our OSDI paper. FYI here is the abstract of the paper:

<< Systems code is often written in low-level languages like C/C++, which offer many benefits but also delegate memory management to programmers. This invites memory safety bugs that attackers can exploit to divert control flow and compromise the system. Deployed defense mechanisms (e.g., ASLR, DEP) are incomplete, and stronger defense mechanisms (e.g., CFI) often have high overhead and limited guarantees.

We introduce code-pointer integrity (CPI), a new design point that guarantees the integrity of all code pointers in a program (e.g., function pointers, saved return addresses) and thereby prevents all control-flow hijack attacks, including return-oriented programming. We also introduce code-pointer separation (CPS), a relaxation of CPI with better performance properties. CPI and CPS offer substantially better security-to-overhead ratios than the state of the art, they are practical (we protect a complete FreeBSD system and over 100 packages like apache and postgresql), effective (prevent all attacks in the RIPE benchmark), and efficient: on SPEC CPU2006, CPS averages 1.2% overhead for C and 1.9% for C/C++, while CPI’s overhead is 2.9% for C and 8.4% for C/C++. >>

(This is joint work with V. Kuznetsov, L. Szekeres, M. Payer, G. Candea, R. Sekar, and D. Song)

We look forward to your feedback and hope for a prompt merge into LLVM, to make the software built with clang more secure.

  • Volodymyr Kuznetsov & the CPI team

safestack-llvm.diff (81.6 KB)

safestack-clang.diff (13.2 KB)

safestack-compiler-rt.diff (20.8 KB)

Hi Volodymyr,

I enjoyed the paper, and we're very interested in deploying this technique on FreeBSD. Would you mind putting the patches on Phabricator (http://reviews.llvm.org) and adding me as a reviewer?

David

P.S. If you have patches against the LLVM 3.5 release, we'd be interested in importing them into copy of LLVM in the FreeBSD tree, otherwise we'll grab them as part of the next LLVM release.

Hi David,

Thanks a lot for your quick reply and your interest in our technique! I’ve just uploaded the three patches to Phabricator: http://reviews.llvm.org/D6094, http://reviews.llvm.org/D6095 and http://reviews.llvm.org/D6096 .

We will prepare the patches for the 3.5 branch as well (it should be fairly straightforward, as most changes are well isolated) and send them to you tomorrow. We would also be glad to share our changes to the FreeBSD libc that integrate the safe stack runtime in a more direct and clean way than if linked as a compiler-rt library.

Hi Volodymyr,

disclaimer: my opinion is biased because I’ve co-authored AddressSanitizer and SafeStack is doing very similar things.

The functionality of SafeStack is limited in scope, but given the near-zero overhead and non-zero benefits I’d still like to see it in LLVM trunk.

SafeStack, both the LLVM and compiler-rt parts, is very similar to what we do in AddressSanitizer, so I would like to see more code reuse, especially in compiler-rt.
What about user-visible interface? Do we want it to be more similar to asan/tsan/msan/lsan/ubsan/dfsan flags, e.g. -fsanitize=safe-stack ?
I am puzzled why you are doing transformations on the CodeGen level, as opposed to doing it in LLVM IR pass.

LLVM code base is c++11 now, so in the new code please use c++11, at least where it leads to simpler code (e.g. “for” loops).
compiler-rt part lacks tests. same for clang part.
Are you planing to support this feature in LLVM long term?

You say that SafeStack is a superset of stack cookies.
What are the downsides?
You at least increase the memory footprint by doubling the stack sizes.
You also add some (minor) incompatibility and the need for the new attributes to disable SafeStack.
What else?

I’ve also left a few specific comments in phabricator.

–kcc

Not quite. The space overhead is constant for each stack frame - you just need to keep track of the top of two stacks, rather than one. The important overhead is that you reduce locality of reference. You will need a minimum of two cache lines for each stack frame instead of one. In practice, this is not a huge problem, because you need several cache lines live for good performance of the stack and the total number of lines is not much different.

There are likely to be some pathological cases though, when both the safe and unsafe stacks have the same alignment for the top and you are dealing with some other heap data with the same alignment. This will increase the contention in set-associative cache lines and may cause more misses.

David

Yes, indeed, the increase in the memory footprint is minimal and constant for each stack frame that uses the unsafe stack - it’s just a single unsafe stack frame pointer per unsafe stack frame. The space for each stack object is still allocated only once: either on normal or on the unsafe stack, but not both. In practice, we indeed didn’t observe any measurable increase in the memory footprint due to the safe stack in our experiments.

As for the cache locality, we actually observed that the safe stack sometimes improves the cache hit rate. This is especially the case for programs that allocate large arrays or long-lived objects on the stack that should be normally evicted from the cache, but are kept there only because they share the same cache lines with e.g., spilled registers. With the safe stack, such objects are moved elsewhere, which results in the frequently accessed objects on the normal stack being closer to each other and occupy less cache lines in total. Of course there might be pathological negative cases as well, but as we show in our paper, both the average and the maximum overhead looks quite good in practice (see Figures 3 and 4 in http://dslab.epfl.ch/pubs/cpi.pdf).

  • Vova

Hi Kostya,

Thanks for your comments! We are great fans of AddressSanitizer ourselves,
we use it extensively and also plan to contribute to it in the future as
well.

Our understanding is that ASAN's main use case is during testing; the goal
of SafeStack is to run in production, so it therefore offers a specific
form of protection that can be delivered at near-zero overhead. In
particular, we don't focus much on bug detection, but rather on making it
much harder to write exploits against code that has bugs. With the
SafeStack enabled, the distance between on-stack buffer that might overflow
and the return addresses (or sensitive spilled registers) that the attacker
might want to overwrite becomes much less predictable (or even randomized,
if ASLR is employed), as they're now stored on two separate stacks.

Our current SafeStack implementation is just a first step in that
direction. In the future, we plan to further protect the regular stack
using leak-proof randomization (as described in our paper): the regular
stack will be allocated at a random offset and the instrumentation will
ensure that neither %rsp value nor any other pointers to the regular stack
would ever be stored on the heap or on the unsafe stack. This would mostly
require changes to the libc/glibc to instrument setjmp/longjmp, stack
unwinding, and other code that accesses %rsp directly. With such protection
in place, overwriting the return addresses or pivoting the stack would
become nearly impossible in practice, along with many ROP attacks that are
based on it.

Please find answers to some of your specific questions below:

Hi Volodymyr,

disclaimer: my opinion is biased because I've co-authored AddressSanitizer
and SafeStack is doing very similar things.

The functionality of SafeStack is limited in scope, but given the near-zero
overhead and non-zero benefits I'd still like to see it in LLVM trunk.
SafeStack, both the LLVM and compiler-rt parts, is very similar to what we
do in AddressSanitizer, so I would like to see more code reuse, especially
in compiler-rt.

That would be great indeed and could simplify the SafeStack code in

several places. We will try to figure our how to do it without increasing
the overhead or complexity of the SafeStack (e.g., without requiring to
link with pthreads, etc.).

What about user-visible interface? Do we want it to be more similar to
asan/tsan/msan/lsan/ubsan/dfsan flags, e.g. -fsanitize=safe-stack ?

We've picked the -fsafe-stack option as it feels more similar to

-fstack-protector option, whose usage model we follow. The -fsanitize
options feel more associated with testing/debugging than the production use
(or at least we perceive it this way).

I am puzzled why you are doing transformations on the CodeGen level, as
opposed to doing it in LLVM IR pass.

As I explained on Phabricator, we want to apply the SafeStack

transformation as the very last step before code generation, to make sure
that it operate on the final stack layout. Doing so earlier might prevent
some other optimizations from succeeding (as it e.g., complicates the alias
analysis, breaks mem2reg pass, etc.) or might force the SafeStack pass move
more objects to the unsafe stack than necessary (e.g., if the operations on
such objects that the SafeStack considered potentially unsafe are actually
later optimized away). In principle, in some pathological cases, it might
even break correctness, e.g., if the SafeStack decides to keep some object
on the normal stack, but the subsequent optimization or instrumentation
passes add potentially unsafe operations on such objects.

LLVM code base is c++11 now, so in the new code please use c++11, at least
where it leads to simpler code (e.g. "for" loops).

Great point! I've fixed the code to use c++11 (along with many other

issues raised on the Phabricator) and will update the patch ASAP.

compiler-rt part lacks tests. same for clang part.

Yes, we plan to eventually add such tests in the future.

Are you planing to support this feature in LLVM long term?

We certainly want to see SafeStack used in the real world and will do our

best to support it in LLVM. That said, please keep in mind that we're just
a small group in a research institution with very limited resources, so we
hardly can make any promises and would greatly appreciate any help from the
community on supporting and improving the SafeStack.

You say that SafeStack is a superset of stack cookies.
What are the downsides?
You at least increase the memory footprint by doubling the stack sizes.
You also add some (minor) incompatibility and the need for the new
attributes to disable SafeStack.
What else?

Please see an earlier email

<http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/078475.html&gt; for
a discussion of the memory footprint. The main (minor) incompatibility that
we observed were related to the mark-and-sweep garbage collection
implementation for C++ that we saw in Chromuim: we had to change it to scan
the unsafe stack as well (in addition to the regular stack) when searching
for pointers to the heap. The change was rather small and well isolated
though, and pretty much aligned with already existing support for
AddressSanitizer in that garbage collector.

I've also left a few specific comments in phabricator.

Thank you for the comments! We plan to submit the updated patch ASAP.

- Volodymyr Kuznetsov

Hi Kostya,

Thanks for your comments! We are great fans of AddressSanitizer ourselves,
we use it extensively and also plan to contribute to it in the future as
well.

Our understanding is that ASAN's main use case is during testing; the goal
of SafeStack is to run in production, so it therefore offers a specific
form of protection that can be delivered at near-zero overhead. In
particular, we don't focus much on bug detection, but rather on making it
much harder to write exploits against code that has bugs. With the
SafeStack enabled, the distance between on-stack buffer that might overflow
and the return addresses (or sensitive spilled registers) that the attacker
might want to overwrite becomes much less predictable (or even randomized,
if ASLR is employed), as they're now stored on two separate stacks.

No disagreement here.
We do want to use asan in production (and we have some results!) but asan's
overhead will remain much higher than SafeStack's.

Our current SafeStack implementation is just a first step in that
direction. In the future, we plan to further protect the regular stack
using leak-proof randomization (as described in our paper): the regular
stack will be allocated at a random offset and the instrumentation will
ensure that neither %rsp value nor any other pointers to the regular stack
would ever be stored on the heap or on the unsafe stack. This would mostly
require changes to the libc/glibc to instrument setjmp/longjmp, stack
unwinding, and other code that accesses %rsp directly. With such protection
in place, overwriting the return addresses or pivoting the stack would
become nearly impossible in practice, along with many ROP attacks that are
based on it.

Please find answers to some of your specific questions below:

Hi Volodymyr,

disclaimer: my opinion is biased because I've co-authored AddressSanitizer
and SafeStack is doing very similar things.

The functionality of SafeStack is limited in scope, but given the near-zero
overhead and non-zero benefits I'd still like to see it in LLVM trunk.
SafeStack, both the LLVM and compiler-rt parts, is very similar to what we
do in AddressSanitizer, so I would like to see more code reuse, especially
in compiler-rt.

That would be great indeed and could simplify the SafeStack code in

several places. We will try to figure our how to do it without increasing
the overhead or complexity of the SafeStack (e.g., without requiring to
link with pthreads, etc.).

What about user-visible interface? Do we want it to be more similar to
asan/tsan/msan/lsan/ubsan/dfsan flags, e.g. -fsanitize=safe-stack ?

We've picked the -fsafe-stack option as it feels more similar to

-fstack-protector option, whose usage model we follow. The -fsanitize
options feel more associated with testing/debugging than the production use
(or at least we perceive it this way).

I have no strong opinion here.

I am puzzled why you are doing transformations on the CodeGen level, as

opposed to doing it in LLVM IR pass.

As I explained on Phabricator, we want to apply the SafeStack

transformation as the very last step before code generation, to make sure
that it operate on the final stack layout. Doing so earlier might prevent
some other optimizations from succeeding (as it e.g., complicates the alias
analysis, breaks mem2reg pass, etc.) or might force the SafeStack pass move
more objects to the unsafe stack than necessary (e.g., if the operations on
such objects that the SafeStack considered potentially unsafe are actually
later optimized away). In principle, in some pathological cases, it might
even break correctness, e.g., if the SafeStack decides to keep some object
on the normal stack, but the subsequent optimization or instrumentation
passes add potentially unsafe operations on such objects.

asan instrumentation is happening at the very end of the optimization chain
and effectively we achieve what you need (run after all optimizations).
I would still suggest you to at least explore such possibility.

LLVM code base is c++11 now, so in the new code please use c++11, at least

where it leads to simpler code (e.g. "for" loops).

Great point! I've fixed the code to use c++11 (along with many other

issues raised on the Phabricator) and will update the patch ASAP.

compiler-rt part lacks tests. same for clang part.

Yes, we plan to eventually add such tests in the future.

Are you planing to support this feature in LLVM long term?

We certainly want to see SafeStack used in the real world and will do our

best to support it in LLVM. That said, please keep in mind that we're just
a small group in a research institution with very limited resources, so we
hardly can make any promises and would greatly appreciate any help from the
community on supporting and improving the SafeStack.

You say that SafeStack is a superset of stack cookies.

What are the downsides?
You at least increase the memory footprint by doubling the stack sizes.
You also add some (minor) incompatibility and the need for the new
attributes to disable SafeStack.
What else?

Please see an earlier email

<http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-November/078475.html&gt; for
a discussion of the memory footprint. The main (minor) incompatibility that
we observed were related to the mark-and-sweep garbage collection
implementation for C++ that we saw in Chromuim: we had to change it to scan
the unsafe stack as well (in addition to the regular stack) when searching
for pointers to the heap. The change was rather small and well isolated
though, and pretty much aligned with already existing support for
AddressSanitizer in that garbage collector.

I wonder if SafeStack can hook into the existing asan<=>gc interface?

Dear LLVM developers,

We’ve applied the feedback we received on Phabricator on the SafeStack patches, and added tests for all components of the SafeStack (thanks to our new developer Alexandre Bique at EPFL, who is working on the SafeStack starting this week). We would appreciate any suggestions on how we can further improve the SafeStack and help it’s inclusion in LLVM.

Hi Volodymyr,

This week is very deadline-heavy for me, but I hope to be able to do another review over the weekend.

David

Hi Stephen,

Dear LLVM developers,

We've applied the feedback we received on Phabricator on the SafeStack
patches,

Did you investigate the possibility of moving the transformation from
codegen to the LLVM level, i.e. the same level where asan/msan/tsan/dfsan
work?
I understand that it's a lot of work, but it will pay off with greater
portability and maintainability later.

Also, did you reply to my comments about reusing compiler-rt code from
sanitizer_common?
I see lots of places in lib/safestack where you duplicate existing
functionality from lib/sanitizer_common

and added tests for all components of the SafeStack (thanks to our new

Hi Kostya,

Dear LLVM developers,

We've applied the feedback we received on Phabricator on the SafeStack
patches,

Did you investigate the possibility of moving the transformation from
codegen to the LLVM level, i.e. the same level where asan/msan/tsan/dfsan
work?
I understand that it's a lot of work, but it will pay off with greater
portability and maintainability later.

We're currently considering doing something in-between. We could place
the SafeStack pass in lib/Transformation/Instrumentation, so that it
can be easily invoked with opt and operate on IR, but still schedule
the pass during code generation by default.

The latter is especially important when using LTO: running SafeStack
before LTO would both impair the effectiveness of LTO (e.g., by
breaking alias analysis, mem2reg, etc), and prevent SafeStack from
taking advantage of the extra information obtained during LTO (e.g.,
LTO can remove some pointer uses through inlining and DCE, etc.). Even
without LTO, some of the passes scheduled during code generation (in
addPassesToGenerateCode) could affect the security and the performance
of the generated code as well.

Do you think moving the pass to lib/Transform/Instrumentation but
scheduling it during code generation would make sense ? If so, we'll
do that and change the safestack tests to use opt instead of llc.

Also, did you reply to my comments about reusing compiler-rt code from
sanitizer_common?
I see lots of places in lib/safestack where you duplicate existing
functionality from lib/sanitizer_common

Yes, we would like to use some of the functions from sanitizer_common
(e.g., internal_mmap/munmap and some pthread-related functions), but
linking the entire sanitizer_common just for that might be an
overkill. E.g., it can make small programs like coreutils up to 3x
larger, and it requires compiling with -pthread. Perhaps we can move
those functions to separate files in sanitizer_common and make them
usable independently from the rest of sanitizer_common?

- Vova

Hi Kostya,

+nlewycky

Hi Kostya,

>
>> Dear LLVM developers,
>>
>> We've applied the feedback we received on Phabricator on the SafeStack
>> patches,
>>
>
>Did you investigate the possibility of moving the transformation from
>codegen to the LLVM level, i.e. the same level where asan/msan/tsan/dfsan
>work?
>I understand that it's a lot of work, but it will pay off with greater
>portability and maintainability later.

We're currently considering doing something in-between. We could place
the SafeStack pass in lib/Transformation/Instrumentation, so that it
can be easily invoked with opt and operate on IR, but still schedule
the pass during code generation by default.

The latter is especially important when using LTO: running SafeStack
before LTO would both impair the effectiveness of LTO (e.g., by
breaking alias analysis, mem2reg, etc), and prevent SafeStack from
taking advantage of the extra information obtained during LTO (e.g.,
LTO can remove some pointer uses through inlining and DCE, etc.). Even
without LTO, some of the passes scheduled during code generation (in
addPassesToGenerateCode) could affect the security and the performance
of the generated code as well.

Do you think moving the pass to lib/Transform/Instrumentation but
scheduling it during code generation would make sense ? If so, we'll
do that and change the safestack tests to use opt instead of llc.

I think this should be doable, however I am not an expert in the pass
manager and LTO, hopefully someone else can comment.
Nick?

>Also, did you reply to my comments about reusing compiler-rt code from
>sanitizer_common?
>I see lots of places in lib/safestack where you duplicate existing
>functionality from lib/sanitizer_common

Yes, we would like to use some of the functions from sanitizer_common
(e.g., internal_mmap/munmap and some pthread-related functions), but
linking the entire sanitizer_common just for that might be an
overkill. E.g., it can make small programs like coreutils up to 3x
larger, and it requires compiling with -pthread. Perhaps we can move
those functions to separate files in sanitizer_common and make them
usable independently from the rest of sanitizer_common?

Clearly.
samsonov@ would know more about splitting sanitizer_common into pieces,
he has already done similar things.

--kcc

I haven't read the paper or patch yet, but reading the thread it does sound
like we should put it into an IR pass if possible. We'll have the
flexibility to schedule when it runs; I agree in the LTO case it's
important not to run it until right before codegenprepare, but we can sort
that out later (we want the pass pipeline for compiles in LTO builds to be
different from the pipeline for regular compiles producing object files,
but it isn't yet).

There is some access to TargetMachine from the IR passes, but instead of
extending that, could we add new intrinsics? There already is
@llvm.returnaddress and @llvm.frameaddress. Do you want @llvm.stackaddress?
or would @llvm.frameaddress suffice? And while we could add
@llvm.stackalignment, would it work to deduce minimum alignment from the
alloca statements present?

Nick

Hi Nick,

Thanks for your suggestions! Please find some replies and more questions below.

The SafeStack pass essentially picks some of the alloca instructions and replaces them with allocations on the unsafe stack. Since the unsafe stack frames are simpler then regular stack frames (e.g., they don’t contain any register spills) and LLVM doesn’t know about the unsafe stack anyway, the SafeStack pass is itself responsible for computing the layout of the unsafe stack frames. This computation is pretty low-level and needs the concrete value of the unsafe stack alignment, the intrinsic wouldn’t suffice for this purpose.

The stack alignment must be enforced inter-procedurally: each function expects it to be at least given predefined function. Hence, analyzing local alloca instruction won’t be enough.

In principle, we could just make the alignment to be some constant large value across all platforms, but that would impact performance. Getting the actual stack alignment for the current platform makes much more sense.

The unsafe stack uses it’s own stack pointer, which is stored either in a thread-local variable or in the thread control block data structure. This is very platform dependent, so we added a function to TargetLowering which determines this location for each platform, based on TargetMachine (similarly to the existing getStackCookieLocation function, which is used for analogous purpose by the existing StackProtector pass). Should we just create the TargetMachine instance in opt (similarly to how it is created during link-time optimizations) ?

Thanks!

  • Vova