Zero'ing Registers on Function Return

I’ve been thinking about the issues with securely zero’ing buffers that Colin Percival discusses in his blog article, and I think I’d like to take a stab at fixing it in clang. Here’s my proposal:

Add a function attribute, say attribute((clear_regs_on_return)) which when a thus annotated function returns will zero all callee owned registers and spill slots. Then, all unused caller owned registers will be immediately cleared by the caller after return.

As for why, I’m concerned with the case where a memory disclosure vulnerability exposes all or a portion of sensitive data via either spilled registers or infrequently used registers (xmm). If an attacker is able to analyze a binary for situations wherein sensitive data will be spilled, leveraging a memory disclosure vulnerability it’s likely one could craft an exploit that reveals sensitive data.

What does the list think?
-Russ Harmon

Add a function attribute, say attribute((clear_regs_on_return)) which when a thus annotated function returns will zero all callee owned registers and spill slots.

Seems reasonable, if insufficient. For compiler-guaranteed clearing, the annotated function would have to have other restrictions. (Can’t call external functions; can’t call local functions that aren’t also marked clear-on-return; can’t have optimizations do things like convert memset-style loops into memset calls.)

Then, all unused caller owned registers will be immediately cleared by the caller after return.

Seems completely wrong. Why should the caller clear its own registers? ABI says callee left them alone (or restored them) therefore they contain no sensitive state left behind by the callee. Or did I misunderstand this part of the spec?

–paulr

Seems reasonable, but I think you would need to zap the stack memory too, as well as the memory used for inner calls.

This would probably end up being an LLVM IR function attribute that gets handled in the backend.

* Russell Harmon <eatnumber1@google.com> [2014-09-12 02:30:39 +0000]:

I've been thinking about the issues with securely zero'ing buffers that
Colin Percival discusses in his blog article
<http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html>,
and I think I'd like to take a stab at fixing it in clang. Here's my
proposal:

Add a function attribute, say __attribute__((clear_regs_on_return)) which
when a thus annotated function returns will zero all callee owned registers
and spill slots. Then, all unused caller owned registers will be
immediately cleared by the caller after return.

while true that the abstract machine of c cannot make sure that there are
no lower level leaks (the lower layers are always allowed to hold on to
state at the wrong time and copy it somewhere that the abstract machine
cannot observe) there is a way to avoid information leaks in practice

instead of trying to figure out what are the possible leaks and using
workarounds for them (like volatile function pointer memset) just reexecute
the same code path the secret computation has taken, this is also useful for
verifying that the cryptographic computation was not miscompiled (which
happens and can have catastrophic consequences), this is the "self test trick"
the author seems to be unaware of although it is used in practice:

http://git.musl-libc.org/cgit/musl/tree/src/crypt/crypt_blowfish.c#n760

eg. this is the crypt code in musl libc contributed by Solar Designer

there are ways in which this can still break in theory but it works well
when the language is compiled ahead of time and there is no heavy runtime
(so the exact same code path is taken and the exact same state is clobbered
one just has to make sure that the "test" cannot be optimized away by the
compiler or not inlined with different choice for temporaries).

As for why, I'm concerned with the case where a memory disclosure
vulnerability exposes all or a portion of sensitive data via either spilled
registers or infrequently used registers (xmm). If an attacker is able to
analyze a binary for situations wherein sensitive data will be spilled,
leveraging a memory disclosure vulnerability it's likely one could craft an
exploit that reveals sensitive data.

in general a 'no info leak' attribute is hard to do (the proposed
semantics in the article are grossly underspecified)

the compiler cannot give strong guarantees: the state it is aware of
might not be everything (eg on a high level backend target where state
is handled dynamically, or timing related leaks) and it is hard to apply
recursively: if a function with such attr calls other functions which
also spill registers, then even the proposed "zero all used registers"
is problematic

what is probably doable is a non-recursive version (which still can be a
help to crypto code, but harder to use correctly). however i suspect even
that's non-trivial to specify in terms of the abstract machine of c

for recursive things i think the type system has to be invoked: eg a
sensitive type qualifier that marks state which the compiler has to
cleanup after.

however this whole issue is hard because it only matters if code already
invoked ub (otherwise the state left around is not observable by the
attacker), so probably this kind of hardening is entirely the wrong
approach and anything that deals with sensitive data should just be
completely isolated (priv sep, different process etc)

Add a function attribute, say __attribute__((clear_regs_on_return)) which when a
thus annotated function returns will zero all callee owned registers and spill
slots.

Seems reasonable, if insufficient. For compiler-guaranteed clearing, the
annotated function would have to have other restrictions. (Can't call external
functions; can't call local functions that aren't also marked clear-on-return;
can't have optimizations do things like convert memset-style loops into memset
calls.)

Then, all unused caller owned registers will be immediately cleared by the
caller after return.

Seems completely wrong. Why should the caller clear its own registers? ABI says
callee left them alone (or restored them) therefore they contain no sensitive
state left behind by the callee. Or did I misunderstand this part of the spec?

"zero all the callee-modified registers which are not return values" perhaps?

Jon

Add a function attribute, say __attribute__((clear_regs_on_return)) which when a
thus annotated function returns will zero all callee owned registers and spill
slots.

Seems reasonable, if insufficient. For compiler-guaranteed clearing, the
annotated function would have to have other restrictions. (Can't call external
functions; can't call local functions that aren't also marked clear-on-return;
can't have optimizations do things like convert memset-style loops into memset
calls.)

Suppose a function hardened with this attribute calls another which has not been hardened (like some libc function, for example). It might make sense to have the caller clear any stack slots & registers that are dead across the call. Though, as you say, this provides weaker guarantees than if every function called by a hardened one is also hardened.

Jon

I’m somewhat of a fan of Paul’s solution - disallowing calls to non annotated functions.

Would clearing the stack implicitly help all that much if the programmer has already properly cleared the sensitive via a call to memset_s?

I was wrong in saying to clear the caller owned registers, although we should also clear all the argument registers on return.

CIL

I’ve been thinking about the issues with securely zero’ing buffers that
Colin Percival discusses in his blog article
<http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html>,
and I think I’d like to take a stab at fixing it in clang. Here’s my
proposal:

Add a function attribute, say attribute((clear_regs_on_return)) which
when a thus annotated function returns will zero all callee owned registers
and spill slots. Then, all unused caller owned registers will be
immediately cleared by the caller after return.

while true that the abstract machine of c cannot make sure that there are
no lower level leaks (the lower layers are always allowed to hold on to
state at the wrong time and copy it somewhere that the abstract machine
cannot observe) there is a way to avoid information leaks in practice

instead of trying to figure out what are the possible leaks and using
workarounds for them (like volatile function pointer memset) just reexecute
the same code path the secret computation has taken, this is also useful for
verifying that the cryptographic computation was not miscompiled (which
happens and can have catastrophic consequences), this is the “self test trick”
the author seems to be unaware of although it is used in practice:

http://git.musl-libc.org/cgit/musl/tree/src/crypt/crypt_blowfish.c#n760

eg. this is the crypt code in musl libc contributed by Solar Designer

I also wasn’t aware of this technique, although it’s making quite a few assumptions about the behavior of the compiler. Although an interesting technique, I’d prefer some better guarantees around clearing of hidden state.

Trying to avoid a philosophical debate, I understand this is a difficult problem, but I don’t think that means it’s not worthwhile to attempt.

there are ways in which this can still break in theory but it works well
when the language is compiled ahead of time and there is no heavy runtime
(so the exact same code path is taken and the exact same state is clobbered
one just has to make sure that the “test” cannot be optimized away by the
compiler or not inlined with different choice for temporaries).

As for why, I’m concerned with the case where a memory disclosure
vulnerability exposes all or a portion of sensitive data via either spilled
registers or infrequently used registers (xmm). If an attacker is able to
analyze a binary for situations wherein sensitive data will be spilled,
leveraging a memory disclosure vulnerability it’s likely one could craft an
exploit that reveals sensitive data.

in general a ‘no info leak’ attribute is hard to do (the proposed
semantics in the article are grossly underspecified)

the compiler cannot give strong guarantees: the state it is aware of
might not be everything (eg on a high level backend target where state
is handled dynamically, or timing related leaks) and it is hard to apply
recursively: if a function with such attr calls other functions which
also spill registers, then even the proposed “zero all used registers”
is problematic

I’m not trying to deal with every case. I’m specifically trying to deal with hardening in case of memory disclosure bugs. An attacker e.x. reading from the swap device directly is outside of the scope of this protection, as you require more than just a memory disclosure to exploit.

what is probably doable is a non-recursive version (which still can be a
help to crypto code, but harder to use correctly). however i suspect even
that’s non-trivial to specify in terms of the abstract machine of c

for recursive things i think the type system has to be invoked: eg a
sensitive type qualifier that marks state which the compiler has to
cleanup after.

I’m not clear on why disallowing calls to non-annotated functions from within an annotated function won’t handle these issues.

however this whole issue is hard because it only matters if code already
invoked ub (otherwise the state left around is not observable by the
attacker), so probably this kind of hardening is entirely the wrong
approach and anything that deals with sensitive data should just be
completely isolated (priv sep, different process etc)

Agreed, priv sep is another important feature to have when dealing with secure data, but I see this as a component of a defense-in-depth approach, and in my opinion saying that a program shouldn’t perform ub isn’t really a sound argument to begin with.

* Russell Harmon <eatnumber1@google.com> [2014-09-12 17:02:16 +0000]:

I'm somewhat of a fan of Paul's solution - disallowing calls to non
annotated functions.

considering the abstract machine the compiler is allowed to make
transformations that adds new libc function calls in the code
which have no annotations

which is not what you want here, hence you need to be careful
how to specify the behaviour of the attribute

Would clearing the stack implicitly help all that much if the programmer
has already properly cleared the sensitive via a call to memset_s?

as any other function in annex k memset_s depends on global state
in case of runtime-constraint violation which is, unlike ub, part
of the semantics of the function and hence users can rely upon

since constraint handler is global state it cannot be reasonably
set by a library so i would not recommend the use of annex k
functions in general (there are other problems with functions in
annex k but that's a different topic)

I'm not trying to deal with every case. I'm specifically trying to deal
with hardening in case of memory disclosure bugs. An attacker e.x. reading
from the swap device directly is outside of the scope of this protection,
as you require more than just a memory disclosure to exploit.

ok

> what is probably doable is a non-recursive version (which still can be a
> help to crypto code, but harder to use correctly). however i suspect even
> that's non-trivial to specify in terms of the abstract machine of c
>
> for recursive things i think the type system has to be invoked: eg a
> sensitive type qualifier that marks state which the compiler has to
> cleanup after.
>

I'm not clear on why disallowing calls to non-annotated functions from
within an annotated function won't handle these issues.

eg.

  struct s a = b;

will often translate to memcpy(&a, &b, size), ie a libc call

memcpy will not be annotated as 'cleanup the regs' and it
can clearly cause info leak

it would be annoying if the compiler did such transformation
and then failed to compile the code

so i think a bit more is needed than 'disallow unannotated calls'

* Russell Harmon <eatnumber1@google.com> [2014-09-12 17:02:16 +0000]:

I'm somewhat of a fan of Paul's solution - disallowing calls to non
annotated functions.

Would clearing the stack implicitly help all that much if the programmer
has already properly cleared the sensitive via a call to memset_s?

I was wrong in saying to clear the caller owned registers, although we
should also clear all the argument registers on return.

other issues with this function attribute approach:

on signals all registers are saved on the stack (or somewhere
else if sigaltstack was used)

so if you absolutely want to avoid info leak then you have to
remember to mask all signals (and disable thread cancellation)
(this affects the self-test trick as well)

a possible mitigation of this is (linux) kernel hardening:
make sigreturn always clean up after itself (i'm not sure if this
can break some tools though)

there are other minor issues if floating point arithmetics is used
(info leaking through fenv) but that's not much different than
writing static or thread local storage from the annotated function
(it's just less obvious)

and in my opinion saying that a program shouldn't perform ub isn't really a
sound argument to begin with.

you misunderstood the point i was trying to make:

if you extend c with new semantics that is only observable
when ub is invoked, you will have a hard time specifying it

Hi Russel,

I didn't realise that Colin had blogged about it. We've discussed it a few times in the past and I have an implementation of it that we are evaluating.

David

Hey David,

I’d love to discuss and/or have a look at your implementation. Szabolcs brings up some very good points about the difficulty of doing this correctly. Are you at the point where you’re willing to share your work?

Thanks,
Russ Harmon

It's in the CTSRD-CHERI LLVM / Clang trees on GitHub. It's quite MIPS / CHERI specific. The implementation is mostly in the back end and stores 0 to any stack slot that is used and invalidates and registers that are used. I did most of the implementation a year ago and haven't had a chance to get back to it. There are a few known issues that are fixable, but not quite implemented.

I described it to Colin at BSDCan, but as it's unpublished work and needs a detailed security evaluation I haven't yet written it up properly.

David

David,

I had a look at your llvm changes, but wasn’t able to find the clang ones. Some questions.

How do you handle the calling of non-zero-on-return functions from within a zero-on-return function?
How much of that work do you think is applicable to other architectures? I’m interested in working on an implementation that (starting with x86) will work across platforms. If you plan on revisiting your work on it though and trying to get it committed upstream, I won’t intrude.

How do you handle the calling of non-zero-on-return functions from within a zero-on-return function?

Currently, clang emits a warning when a function with __attribute__((sensitive)) calls one without. I have some stashed changes that try to ensure that all callee-save registers that are touched before a call are spilled to the stack and zero'd before the call, but it's probably not worth finishing it. __attribute__((sensitive__)) also has to imply __attribute__((noinline)) (or the sensitive attribute be propogated into all callers where it's inlined), or there's the potential for information leaks.

How much of that work do you think is applicable to other architectures? I'm interested in working on an implementation that (starting with x86) will work across platforms.

Only the front-end changes really. Most of the work is in the back end, which has to identify which registers are live, which stack slots are used, and zero them.

If you plan on revisiting your work on it though and trying to get it committed upstream, I won't intrude.

Now that Colin has blogged about it and there's been a lot of public discussion, it's probably much more difficult to get it published, which reduces my motivation to finish it a bit. I will probably try to find an interested student to work on it this term, but I'm happy to collaborate on the security evaluation if someone else wants to take the lead on doing a proper implementation.

For our architecture, it's a bit more important because our capability registers contain rights to memory (rather than just data) and these don't want to accidentally end up on the stack where an uninitialised variable in a later function might now suddenly grant the rights to access a chunk of memory (rather than trapping if you try to use it as a pointer).

David

* David Chisnall <David.Chisnall@cl.cam.ac.uk> [2014-09-16 08:53:07 +0100]:

Now that Colin has blogged about it and there's been a lot of public discussion, it's probably much more difficult to get it published ...

heh i didnt know the idea was supposed to be news, but i think
an implementation would be

the register zeroing came up a lot before, eg. just this year
i could find this one (where alan cox points out the leak in
simd registers):

https://plus.google.com/111049168280159033135/posts/YTDoSRTrktc

the self-test trick i mentioned was invented to address such issues

http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/glibc/crypt_blowfish/crypt_blowfish.c.diff?r1=1.22;r2=1.23

and in most related discussions someone goes 'if only the compiler
did this for us..' with various approaches (using a function
attribute is not unheard of)