[RFC] Linux Syscall Cleanup

Linux Syscall Cleanup

Currently the handling of linux syscalls is handled inconsistently, leading to confusion. This design intends to unify how linux syscalls are handled and make them safer and easier to use. This design is not intended to touch any other OS interface layers, those will be handled in a separate RFC.

Context

Objective

Linux syscalls should be done in a safe, consistent manner in LLVM-libc. For each syscall, there should be exactly one place where the raw syscall function is invoked.

Background

Currently raw syscalls are used in most places where the syscall is needed. This is mostly in the syscall wrapper functions, but there are other functions that also need to access low level utilities so they also call the raw syscall, as shown in the diagram.

This leads to duplication of logic, since many syscalls actually have multiple possible numbers. For example, mmap can be handled by SYS_mmap or SYS_mmap2. The raw syscall interface also doesn’t have any type checking so each time it’s used is another opportunity for mistakes.

There are also some places where syscalls are called using the public syscall wrapper, as shown in the next diagram.

This requires extra logic to handle errno, and means these functions are not properly independent, making this not an optimal solution either.

For non-syscall functions the expected behavior is to split functionality away from the public interface so it can be shared internally. As an example, the ctype_utils.h header provides functions like isspace both for the public isspace function and for scanf, which treats space characters as separators.

Design

Overview

The proposed design is to move all calls to the “syscall_impl” function into the Linux-specific OSUtil directory (libc/src/__support/OSUtil/linux). Each syscall should be in an “internal wrapper function”, which should be called anywhere this syscall is needed. The diagram below shows the intended dependency chain:

Calls to the internal wrapper function should be done from /linux subdirectories since this is just a wrapper over the syscall. More target-generic interfaces are out of scope for this proposal.

Detailed Design

The internal wrapper functions should take the arguments for a given syscall, dispatch them to the appropriate syscall number, and return its result as an ErrorOr (equivalent to std::expected<T, int>). Syscalls that have variadic arguments should be implemented using non-variadic arguments with default values. An example implementation of the internal openat is shown below, with blue highlighting the differences from a normal syscall wrapper.

These functions should be in headers and marked LIBC_INLINE. One function per header, same organization as in src, so mmap would go in libc/src/__support/OSUtil/linux/sys/mmap/mmap.h and munmap would go in libc/src/__support/OSUtil/linux/sys/mmap/munmap.h.

Alternatives Considered

Next steps:

It should be possible to automate much of the refactoring by using the information from headergen and a list of syscalls. Headergen has a list of the function prototypes for all our syscall wrappers, and from that we can generate the trivial internal syscall wrappers. These will need some cleanup (e.g. fixing variable names, adding alternate syscall names, etc.) but it should be less work than rewriting all of them manually.

Assuming this RFC is approved, I will look into writing a followup design for a syscall wrapper generator (historically called wrappergen).

Then why target::name instead of linux::name or something (linux::mmap in the example)?

Can we also see an example with a “result parameter”, e.g. fstat? Will these be kept as pointer arguments as they are in the syscall C signatures, or bundled into a return value struct (as is the generally preferred pattern for return values with std::expected-style “result types”)?

I’m not sure off hand of an example of a syscall that has a not-just-error return value and also a result parameter (this counts only fixed-sized result parameters, not variable-sized buffers). If there are any, would those use a one-off composite return value struct instead (again, as is generally preferred idiom with result types)?

For calls like read/write that take buffer+count pairs, would these still be separate arguments to match the syscall C signatures, or be passed as spans as we usually prefer? (For the void* cases being span<byte> or span<uint8_t>.)