Does a inline assembly or a memory barrier disable optimizations in a function

GCC disables optimizations in a function if it encounters inline
assembly. That means we can use a memory barrier to ensure dead stores
are not optimized out:

    delete m_ptr;
    m_ptr = NULL:

    __asm__ __volatile__ ("" ::: "memory");

Clang defines __GNUC__, but its not clear to me if the same behavior
is present. Looking at Language Compatibility at
Language Compatibility, there is no discussion of
the behavior.

Does inline assembly or a memory barrier tame the optimizer in a
function so that dead stores are not removed?

Thanks in advance.

That construct is just a memory barrier that applies to escaped objects. Dead stores to unescaped locals for example will be removed during SSA conversion.

That construct is just a memory barrier that applies to escaped objects.
Dead stores to unescaped locals for example will be removed during SSA
conversion.

OK, thanks. That should probably be documented because its different
behavior from GCC.

Its important information for someone writing a zeroizer for
compliance reasons, like FIPS 140. (it might even cause a security
related defect).

Jeff

please see https://llvm.org/bugs/show_bug.cgi?id=15495

That construct is just a memory barrier that applies to escaped objects.
Dead stores to unescaped locals for example will be removed during SSA
conversion.

OK, thanks. That should probably be documented because its different
behavior from GCC.

GCC’s behavior is the same: https://goo.gl/R5AMDG

please see https://llvm.org/bugs/show_bug.cgi?id=15495

Thanks Richard.

That's actually quite interesting. I was recently inquired (again)
about this topic on the GCC mailing list, and one of the GCC devs told
me to use it.

The recommendation grew out of a conversation on trying to use
volatile to tame the optimize modulo Undefined Behavior (casting
tricks to/from T and volatile T) and Ian Lance' Taylor's blog on the
volatile keyword (Airs – Ian Lance Taylor » volatile). Taylor's
blog is important because GCC will optimize away the stores on the
volatile objects _on occasion_, so its _not_ a complete remediation.

Complete remediations are important in security engineering. See, for
example, Saltzer and Schroeder's "The Protection of Information in
Computer Systems"
(https://www.acsac.org/secshelf/papers/protection_information.pdf).

Jeff

(resending since I accidently used the old list address)

I personally have this definition in headers I use for baremetal programming:

// compiler write barrier, limited to specified object
template< typename T > __attribute__((always_inline))
static inline void write_barrier( T const &target ) {
        asm volatile ( "" :: "m"(target) );
}

Using such a barrier on the buffer after the memset should guarantee
it will not get eliminated. As mentioned in the bug thread, using
"r"(&target) will not work since that only makes the asm block depend
on the pointer value and not on the pointee (this is documented
behaviour of GCC). It does however make the pointer "escape" hence
following it (or combining it) with a memory-clobber works. I
generally prefer targeted barriers like above over a general
memory-clobber though.

I think you could replace the "m" constraint by an "X" constraint to
avoid allocating the target in memory if it otherwise would have been
kept in register. However, in that case I can imagine it's also
possible that even if data was previously stored in memory, the memset
(since it fully overwrites the target and no pointer to it has escaped
yet) effectively makes a new allocation for the target which may be in
register, and hence memset + barrier will only affect those registers
and leave the data previously stored in memory intact.

This actually shows there's a more general fatal flaw to these
approaches: the compiler is free to have transiently stored the data
elsewhere, and there's no way to find or erase such locations in plain
C augmented with asm-barriers. In particular it may leave potentially
sensitive values in registers, which can subsequently get written to
memory on task switch. (Especially if the crypto code uses registers
not used by the calling application, e.g. Neon-optimized crypto
algorithms).

I don't think there's any architecture-independent way out of this
situation without something like an __attribute__((confidential)) to
instruct the compiler to diligently avoid leaving copies of the data
in locations that are invisible to the programmer model.

In the meantime, the only solution I see that has even a remote chance
of being reliable is a tiny bit of architecture-dependent wrapper
written in assembly that calls the crypto code and then clears all
caller-save registers (except when used for return value) and stack
used. Figuring out how much of the stack to clear would still be a
challenge though.