Security fail (memset being optimized away)

A few days ago Ilja van Sprundel held a talk at the 35C3 (Chaos Computer Club
annual convention) calling it "Memsad - why clearing memory is hard".
It can be found here:

or here:

The problem he described arises, when you have a buffer with sensitive content
(crypto key) and try to clear this from memory, before the buffer goes out of
scope:

#include <string.h>

void foo( int x ) {
     char buf[ 10 ];
     size_t i;
     for( i = 0; i < sizeof( buf ); ++i )
         buf[ i ] = x++;
// ...
     memset( buf, 0, sizeof( buf ) ); // rightfully removed by optimizer
}

He showed how this sort of code could be found in many implementations, thus
"leaking" sensitive information into memory (core dumps), which was intended
to be overwritten. He described the problematic of finding a portable solution
for a "forced memset".

Purely out of curiosity, this made me wonder about how to force the execution
of a function that would otherwise be optimized away - in a portable manner.
I came up with these unsophisticated lines:

#include <string.h>

void foo( int x ) {
     char buf[ 10 ];
     size_t i;
     for( i = 0; i < sizeof( buf ); ++i )
         buf[ i ] = x++;
// ...
     // Declare volvar as volatile pointer to function (void*, int, size_t ) returning void*:
     void* ( *volatile volvar )( void*, int, size_t ) = &memset;
     // Call memset from within this "black box":
     ( *volvar )( buf, 0, sizeof( buf ) );
}

The idea behind this is simple. "volvar" is a volatile variable. Therefore,
the compiler can make no assumptions on its contents at compile time. It
cannot remove the function call, which is based on that variable's content
(at that given time).

This code looks portable to me. I tested it on "Compiler Explorer"
( https://godbolt.org/ ). It compiles without warnings on all targets I
tested. Except for one, all contain the reference to memset and display the
call. The exception is the PowerPC platform in conjunction with CLANG
("power64le clang - trunk" and "powerpc64 clang - trunk". I'm guessing, that
this is running "some version" of memset (inlined?) as well, but references
to ".LC0@toc@ha" and such leave me puzzled.

Without wanting to disturb too much, I'm asking about your opinion.
Is this solution portable?
Does it assure that the function call will not be optimized away?
And - does the CLANG PowerPC version work as well?

Thank you very much and a happy new year!
  LC

Hi:

There's a discussion of this very issue here:

https://wiki.sei.cmu.edu/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations

-Paul

I do not know PPC assembly, however note that Godbolt WILL highlight (with color) which assembly is the result of each line of code. See here: Compiler Explorer

I note that changing the mem-set value is altering the load-immeidate line that is currently "li 4, 0". Perhaps this is just being inlined?

It appears that the compiler explorer doesn’t show the entire contents of the assembly. The missing part that matters is:

.LC0:
.tc memset[TC],memset

So the pertinent sequence to do the indirect call to memset (I won’t bother describing the concept of the TOC in excruciating detail, but it is a Table Of Contents that contains addresses to globals or in some case global values themselves):
addis 4, 2, .LC0@toc@ha # Load the bottom 16 bits of memset’s TOC entry
ld 4, .LC0@toc@l(4) # Load the high 16 bits of memset’s TOC entry
std 4, 40(1) # Store memset’s TOC entry to the stack
ld 12, 40(1) # Load the actual address of memset into R12
mtctr 12 # Move that value into the CTR (count register)
bctrl # Branch to address in the CTR

Hope that helps.
Nemanja

Please don't start a new thread for every reply, that makes reading the
list unncessary annoying.

A few days ago Ilja van Sprundel held a talk at the 35C3 (Chaos Computer Club
annual convention) calling it "Memsad - why clearing memory is hard".
It can be found here:

media.ccc.de - Memsad
or here:
https://www.youtube.com/watch?v=0WzjAKABSDk

This topic is neither new nor surprising. It is the result of a gray
area between desirable optimisations and hard scrubbing requirements by
implementations. The implementation generally used is to have a new
function ("explicitly_memset" or similar) and either implement it in
assembler or use one of various compiler-specific hacks. Memory barriers
like asm volatile ("":::"memory") for example are typically good enough
even with LTO for most GCC compatible compilers. Indirecting through a
volatile global function pointer is another approach, but comes with a
potentially noticable performance hit. It is the most portable solution
though.

Joerg

If I understand correctly what you’re looking for, this should be relevant: https://github.com/google/benchmark/blob/master/include/benchmark/benchmark.h#L309-L312

Best,

AM,

Why not just write an inline function which does after memset :-

if(*buf!=0) *buf=0;

For security, it’s better to actually verify every byte, sanity check. You never know, maybe programmer cleared wrong size.

Jonny.

This would only zero one byte, but it would get optimized away
as the buffer goes out of scope (= is being discarded).

This is C++ code (my example also works with C) and it uses the usual #ifdef
tree, if I see that correctly. That is, it's not ONE piece of portable code,
but lots of pieces for each different compiler/OS (which I wanted to avoid).

Maybe add an abort() ?

eg

inline void check_memset(void *s, int c, size_t n);
{
     const char * buf = (char*)buf;
     memset(2, 0, n);

     if(0 != *buf)
     {
         abort();
     }
}

Or use a for loop to verify all bytes are now 0.

Jonny

...
Maybe add an abort() ?

eg

inline void check_memset(void *s, int c, size_t n);
{
const char * buf = (char*)buf;
memset(2, 0, n);

 if\(0 \!= \*buf\)
 \{
     abort\(\);
 \}

}

I'm afraid, that won't cut it either. On the "Compiler Explorer"
website ( https://godbolt.org/ ) you can see that many compilers
implement "their own" inlined version of memset - especially
when you turn on "max optimizations" (-O3). The compiler might
simply decide to only clear the first byte as the rest is not
being access anyhow...

Or use a for loop to verify all bytes are now 0.

The compiler knows that the buffer has to be all zeros as it
knows, it just cleared it before. This is basically a more
complicated version of:
{
  int a = 0;
  if( a != 0 )
    abort();
}
This can never call abort and will therefore be removed
completely.

> ...

Maybe add an abort() ?

eg

inline void check_memset(void *s, int c, size_t n);
{
const char * buf = (char*)buf;
memset(2, 0, n);

 if\(0 \!= \*buf\)
 \{
     abort\(\);
 \}

}

I'm afraid, that won't cut it either. On the "Compiler Explorer"
website ( https://godbolt.org/ ) you can see that many compilers
implement "their own" inlined version of memset - especially
when you turn on "max optimizations" (-O3). The compiler might
simply decide to only clear the first byte as the rest is not
being access anyhow...

Did you verify that in practice with my code? (optimising out memset)

I've not not time to run tests on godbolt.org

Or use a for loop to verify all bytes are now 0.

The compiler knows that the buffer has to be all zeros as it
knows, it just cleared it before. This is basically a more
complicated version of:
{
int a = 0;
if( a != 0 )
abort();
}
This can never call abort and will therefore be removed
completely.

Compilers are not static analysers, they don't know when ram addresses were touched as far as I am aware. Do you have a source for this information?

If you don't want to use libc memset() if you feel it might be lost, write your own. Or put those functions in a library that is not optimised at all.

Whatever you do. verify how it looks after compilation to be sure. Share your results, I'm interested to hear.
Jonny

My usual go-to expression for "do not optimize this thing away" is
something like:

    asm volatile ("" : : "r"(thing));

which I've never used with memory, but "m" seems to work as well.

I'm not sure how portable this is. `asm` usually means it's not
portable, but notice that the assembly "code" is just a blank string,
with simply a mention of the thing you want to keep / keep the results
of.

It does seem to work ok with:

* clang, GCC, old/new versions
* x86-64, arm (32bit), arm64
* C11 and C++17
* local buffers of known size and scope. (References to "outside"
memory work with memset with or without this.)

Code:

    int test1() {
        char buf[128];
        memset(buf, 0, sizeof(buf));
        __asm volatile ("" : : "m"(buf));
        return 0;
    }

If this still doesn't cut it, try poking around in Folly's
implementation of "doNotOptimizeAway", which was created for just this
type of thing. (It's buried in the benchmarking code, and not a
public library function, but you can at least see how it's done
there).

https://github.com/facebook/folly/blob/master/folly/Benchmark.h

> The compiler knows that the buffer has to be all zeros as it
> knows, it just cleared it before. This is basically a more
> complicated version of:
> {
> int a = 0;
> if( a != 0 )
> abort();
> }
> This can never call abort and will therefore be removed
> completely.

Compilers are not static analysers, they don't know when ram addresses
were touched as far as I am aware. Do you have a source for this
information?

For the simple 'int a = 0;' case, it's a well understood constant-value
propagation optimization. The value of 'a' is known to be zero, there
are no other assignments to 'a', therefore we can replace uses of 'a'
with '0', which makes 'if (a != 0)' become 'if (0 != 0)' which is
unconditionally false, so the code under it is unreachable, so poof.

A static analyzer would report the always-false condition; the compiler
just optimizes it away.

It requires slightly more smarts to do the same thing for pointed-to
values, but only slightly.

FTR, I was a security guy fighting this same battle 20 years ago, and
compilers have only gotten more clever in the meantime. We generally
used 'volatile' and bought the performance hit.
--paulr