Implement `memset_explicit`

cor3ntin · March 5, 2024, 6:30pm

My recollection of discussion in both WG21 and WG14 is that there was a desire that a high-quality implementation would flush cache lines.
There is discussion of that in the WG21 paper secure_clear

It might be worth asking WG14 to add a a better description of their intent to the wording even if not normatively

@AaronBallman

AaronBallman · March 5, 2024, 7:10pm

To the greatest extent possible, subject to QoI.

FWIW, I agree in general with what Jens is saying (specific details may have different nuances, of course). That matches the discussions we had about this feature in WG14.

The abstract machine in C and C++ make this highly unlikely. We spent a significant amount of committee time trying to find a reasonable way to specify this function within the bounds of the abstract machine. As soon as you start talking about things like cache lines, even non-normatively, you run into people saying “well my DSP has no cache lines”, etc. And technology is always changing, so whatever we say about today’s latest optimization levers, it will be outdated in a few years when a new lever comes along.

The security aspects of this function are left up to QoI and the committee clearly stated our intent: we want a version of memset that is as secure as can be made for whatever architecture you care about. How implementations go about deciding what is or isn’t secure enough for them is a matter for discussion, and blog posts telling people things to keep in mind are useful to that process.

When working out the design for this function, the primary question is: can the security of calling this function be defeated by the optimizer and if the answer is yes, then the design is wrong. Other questions like whether cache lines should or should not be flushed will depend on the target, etc are left to the implementation but the expectation is that the function be secure not fast. If any given tradeoff makes things faster but less secure, it’s a bad tradeoff for this function.

FWIW, most of the interesting meeting minutes are at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2767.pdf (see 5.9 secure_clear N2631) and https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3036.pdf (see 5.22 memset_explicit N2897) in case those are of help to folks. I can also dig out my personal meeting notes on the discussions if the public minutes lead to questions.

AaronBallman · March 5, 2024, 7:18pm

Btw, N3096 is the last publicly available draft for C2x. However, be aware that the final version will have quite a few technical differences from that draft because of NB comment resolutions that were applied before publication. You might want to look at an early draft of C2y just to be sure nothing changed.

nickdesaulniers · March 5, 2024, 9:20pm

Is an implementation that calls sleep(10); then powers off the machine conforming? The “sensitive information stored in the object” has been made “inaccessible.” Mission accomplished.

But that’s not what the standard says, as written.

The purpose of this function is to make sensitive information stored in the object inaccessible.367)

That literally doesn’t mean anything. Is the destination buffer supposed to be unreadable? What does memset or memset_explicit have to do with that?

367)The intention is that the memory store is always performed (i.e. never elided), regardless of optimizations. This is in
contrast to calls to the memset function (7.26.6.1)

That’s the only semantically useful requirement for implementers, and the compiler barrier satisfies that. This is a useful semantic point with clear meaning to implementers.

But now we have requirements coming out of thin air based on committee discussions. Satisfying these requirements adds significant complexity to implementations, and still doesn’t end up being useful to actual users of non-C abstract machines. If no implementation does these additional side-channel suggestions (as is currently the case)(because the standard didn’t say they had to), then users cannot portably rely on these non-standard behaviors.

Worse, I suspect we’ll get vulnerabilities reported for not implementing every trick in the book to ensure zeroing. (The man page for explicit_bzero has a funny anecdote on how explicit_bzero can actually decrease security.)

As specified, users cannot simply call memset_explicit and rely on cache clearing, memory ordering, or any of the other QoL suggestions for the blog cited earlier.

Consider the example of a library author that would like to rely on memset_explicit to provide guarantees around cache clearing (and/or memory ordering).

memset_explicit(dest, ch, count);

When their libary is linked against libc-a.so, which provides such QoL additions in their implementation of memset_explicit, everything works great. Now their library gets linked against libc-b.so perhaps on a different platform, which doesn’t have such QoL additions in their implementation of memset_explicit (because the standard doesn’t say so). Perhaps that results in a security vulnerability or exploit. It seems that perhaps they would have been better off with just:

memset(dest, ch, count);
asm(""::"r"(dest):"memory");
arch_specific_cache_flush_not_provided_by_libc(dest);
atomic_signal_fence(memory_order_seq_cst);

That inline asm is not ISO C, and I suspect that whatever arch_specific_cache_flush_not_provided_by_libc does MUST also be implemented using non-ISO C extensions. So as long as WG14 has strict adherence to only referencing only ISO C and the C abstract machine, I don’t see how they can compel implementations to implement specific semantics for machines which are unspecified. As such, users cannot rely on semantics that WG14 members wish but did not specify/standardize functions as having, and instead must use non-standard language extensions in order to get the fine grain guarantees they desire for their target platform. As such, having such additions to the standard library is questionable.

If that’s what you meant, then you should have ~~put a ring on it~~ put it in the spec.

atomic_signal_fence is standard ISO C, why was that not made explicit in the spec (for implementation of memset_explicit)(rather than in out of band blog posts)?!

jyknight · March 6, 2024, 6:05pm

I don’t think the argument from the POV of “what’s the least the standard might possibly let us get away with” is particularly helpful. From my POV, we should strive to make this function useful for its indended purpose – if that means going beyond the strict requirements of the spec, that’s fine.

But, despite disagreeing with the argument, I agree with the conclusion. Again, (from what I can tell – counterexamples welcome!), it seems that nobody who’s implemented this functionality in their own code has thought that doing a cache-flush was important. Libraries like OpenSSL certainly don’t shy away from using asm if required, so if they thought it was important, I would expect they’d have done it! Same goes for projects like OpenBSD, who historically have been all about security hardening and invented explicit_bzero…

If all these projects are wrong, and we really do need a more strict view of what this function should do, I think there needs to be better rationale as to why. Otherwise, it does seem best to follow the precedent, and simply forbid the compiler from deleting “dead” stores.

AaronBallman · March 6, 2024, 6:49pm

If that’s the QoI you want, yes, that conforms. I don’t think that’s the QoI we want for our implementation though.

I read that as saying that whatever values were stored in the byte array that’s being overwritten should no longer have their same values when the call exits, modulo values which don’t change (e.g., if there’s already a zero byte in the array, overwriting with a zero should still lead to the value zero). In other words, the semantic effect of the memset cannot be as-if’ed away; the function call itself is sort of like a volatile access. “Inaccessible” as in “you can’t reconstitute the original data from the final data.”

Absolutely correct. Users cannot rely on anything other than QoI here – the committee is assuming that implementers won’t be malicious (which, frankly, is a dangerous assumption because the whole reason we needed this interface at all is because of aggressive optimizations that are sometimes indistinguishable from malice when viewed under the lens of security rather than performance).

All this said, if you have concrete wording improvement suggestions you’d like me to bring to WG14, I’d be more than happy to work with you on the paper and championing it in the committee.

Strongly agreed.

Yeah, I am carefully not expressing an opinion on cache flushing behavior because that level of security is enough outside of my wheelhouse that I think the call should be made by people closer to the problem. The paper in C has “further issues and improvements” which mention caches among other considerations. But again, the goal here is “do your best at ensuring secrets don’t leak” even if that’s slow and even if that’s not consistent across targets you support. The intent is portable, the interface is portable, the implementation details are not portable.

SchrodingerZhu · March 7, 2024, 3:22pm

As updated in [libc][c23] add memset_explicit by SchrodingerZhu · Pull Request #83577 · llvm/llvm-project · GitHub, what we will do for now is to go with the least requirement from the standards. However, we may offer options such as LIBC_ENABLE_HARDENING in the future.

Topic		Replies	Views
Security fail (memset being optimized away) Clang Frontend	13	265	January 4, 2019
Zero'ing Registers on Function Return Clang Frontend	14	162	September 16, 2014
RFC: Speculative Load Hardening (a Spectre variant #1 mitigation) LLVM Dev List Archives	8	256	October 2, 2018
Security fail (memset being optimized away) Clang Frontend	0	83	January 3, 2019
RFC: do not optimize on basis of __attribute__((nonnull)) in glibc headers Clang Frontend	25	672	March 22, 2017

Implement `memset_explicit`

Related topics