[RFC] Bounds-checking interfaces for LLVM libc

Context

Annex K of C11, Bounds-checking interfaces, introduced a set of new, optional functions into the standard C library with the goal of mitigating the security implications of a subset of buffer overflows in existing code. This comes with new set of function to a broad set of headers in libc including <string.h> , <stdio.h> , <stdlib.h> , and many more.

Proposal

We propose to directly use existing LLVM libc’s functions to implement the interface, but without direct dependency on them. We propose to separate the common implementation parts to common internal utility shared between existing code and the newly added interface, which reduces code duplication with existing behaviors and in the same time doesn’t conflict with other symbols coming from other sources (like dynamically linked libraries defining the same symbols, e.g. glibc.so/libgcc ).

For example, strncpy and strcpy_s share some exact logic, so this piece of logic is moved to internal utility internal::sized_string_copy which has “copying logic” and used by both functions.

strcpy_s     strncpy

    |           |

internal::sized_string_copy

Potential Steps

  • Annex K interface functions will be only available if the user defines __STDC_WANT_EXT1__ to equal 1. other number values has no meaning right now, so to adhere to the standard only value 1 is checked to mark the interface as available. Headers, types functions and macro declarations introduced in the interface won’t be available unless the user defined macro is available and equals to 1.
  • A new errno_t is introduced as a return value type for the interface functions to deal with errno values. Not only errno is returned, but the specification left setting errno value to the implementation, and for compatibility we propose to set errno value to match the expected behavior.
  • For runtime constraints handlers, the standard specifies a single process-global runtime-constraint handler with no specifications for thread-local runtime-constraint handlers (which is bad for multithreaded applications). We propose to adhere to the standard and implement process-global runtime-constraint handler which is set by set_constraint_handler_s function, this function returns the current handler if it received NULL as an argument.
  • Separate common logic between existing functions and annex k interface functions to a common utility and use them inside dependent functions as needed.

Further Considerations

Current specification for constraint handlers are limited for multithreaded applications and the standard haven’t yet decided for an API to deal with it. Existing implementations like MSVC ignores the standard and introduced different APIs to deal with thread-local and process-global handlers.

I suggest to add a new API to our implementation (since unspecified in the standard) named set_thread_constraint_handler_s to deal with independent thread handlers. The suggested behavior for thread handlers is either to inherit the parent handler found in the parent thread (thread-local handler) on thread creation, or inherit the global constraint handler.

1 Like

Hi! While Annex K is still in the C standard, it’s worth reading N1967 — Field Experience With Annex K — Bounds Checking Interfaces . One of the co-authors of this paper is Carlos O’Donell, one of the glibc maintainers.

A key part of the paper:

Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.

Do you have new information? If not, I think we should follow the lead of folks who’ve studied this and not implement Annex K.

Do you have new information? If not, I think we should follow the lead of folks who’ve studied this and not implement Annex K.

Standard committee decided to retain Annex K functions in C23 as an optional part of the standard.

Any specification of the standard shall appear in the implementation regardless of the “satisfaction” of the provided standard. That’s always my argument, any thing in the standard must be in the implementation (ideally 1-to-1).

It’s not an implementation problem but “satisfaction” problem, hence the argument should be how to increase “satisfaction” of such APIs or modify the standard to be more useful, so it’s a standard domain not implementation domain.

FWIW, that was not a compelling paper. Nobody ever suggested rewriting working code to use new APIs for these interfaces; of course that introduces bugs. That paper was ultimately not sufficiently compelling in WG14 to withdraw the annex and I think it’s even less compelling as a reason to not support the Annex at all.

I support adding Annex K. We’ve had previous user requests for this, both in Discourse and in the issue tracker (from different folks), and it is a reasonably popular feature on Windows (though their implementation is nonconforming because they got bitten by being early adopters of the functionality before it was fully ratified by WG14). Microsoft warns about use of non-Annex K interfaces at some (non-maximal) warning levels: Compiler Explorer

1 Like

I’m in favor of memory safety, and if someone’s interested in doing the work to implement annex K I’m not opposed to it. Given that other linux libcs don’t provide these functions it may be a useful point of differentiation for LLVM-libc.

On the other hand if there are no existing compliant implementations I’d like to know more about why. Is it just that the Windows API was defined before the standard was complete and they don’t want to break compatibility? If so, what is the difference? Alternately is there a flaw in the standard’s definition that needs to be fixed before a conforming implementation is practical?

Additionally, this is an optional part of the standard so a compliant implementation doesn’t need to provide it. The end goal is not the most standards compliant libc, but the libc that’s the most useful for our users. If there are existing users want this then we can prioritize it, but if this is just for hypothetical users it might be best to focus on other things first.

There are, such as GitHub - sbaresearch/slibc: Implementation of C11 Annex K "Bounds-checking interfaces" ISO/IEC 9899:2011

Microsoft was an early adopter of Annex K and implemented it before C11 was finalized. WG14 made some modifications very late in the standard cycle and Microsoft was not able to implement those modifications due to ABI issues given that they already had a shipping implementation. That’s why they “don’t conform”. To wit, the differences are:

  • _set_invalid_parameter_handler instead of set_constraint_handler_s
  • No abort_handler_s or ignore_handler_s, nor memset_s (this is required for C23, so I expect they’ll implement it regardless of Annex K)
  • No RSIZE_MAX
  • They do not define __STDC_LIB_EXT1__ to claim a conforming implementation nor require use of __STDC_WANT_LIB_EXT1__ to get the Annex K interfaces.

AFAIK, that’s the full list of differences, but if it’s useful I can go through the Annex and compare it with Microsoft’s headers in more detail.

There are (at least) two flaws, one of which is actionable and the other of which is not.

  • The standard does not clarify whether the global constraint handler is thread-local or not or whether it needs to guard against race conditions. We can do whatever we think is best (Microsoft added an additional handler for setting thread-local constraint handlers).
  • Having a resumable constraint handler at all means you have to assume any call to a library interface will trigger a constraint violation which calls the handler, that handler may mess with every escaped pointer and global variable, and then resume to the caller. This is (IMO) a design flaw with Annex K, but is not a reason to not implement the annex (if we’re not implementing things based on design flaws, locales enters the room like the Kool Aid Man).

We have users who have asked for it:

I don’t think Annex K is the most critical thing for llvm-libc to support, but I think supporting it would be a good idea if someone is willing to do the implementation effort and maintain it.

First step in the implementation: [libc][stdio] Add fopen_s and bootstrap annex k. by bassiounix · Pull Request #152248 · llvm/llvm-project · GitHub

Active discussion on interface design: [libc] Annex K Design · Issue #156244 · llvm/llvm-project · GitHub

Hi all,

I want to add another data point for the usefulness of this. We at Arm have some users who require some of the functionality, particularly the functions in string.h. If this were implemented in LLVM libc, it’d greaty facilitate their adoption of it.