[RFC] Porting LLVM libc to MacOS

LLVM Libc in the current state full builds for linux and builds in overlay mode for a handful of other targets (baremetal, gpu and MacOS, for example). Support for MacOS (darwin) is currently limited. Having the ability to statically link and use LLVM libc in applications that require cross-platform compatibilty would be desirable. Applications expecting consistency of behavior across platforms would greatly benefit from this. This document outlines a plan for getting a full build to
work on MacOS.

Plan

Phase 1: Implementing primitives in __support

Implement __support/threads/darwin as it’s fundamental to getting functionalities like File and exit to work. As comment one and two mention, completion of this part, in co-operation with libc maintainers of linux and generic frontend would result in the frontends becoming less linux focused. Primarily implements the Mutex abstraction.

Implement __support/OSUtil/darwin to match that of linux. This should be slightly easier now that we have sync primitives implemented in the previous part. Functions like exit, abort and assert will be implemented to make a basic build executable.

Phase 2: File I/O and stdio

With synchronization available, this phase implements the I/O subsystems, enabling basic functionality like printf.

Implement Low-Level I/O Wrappers: Create the Darwin implementations for fcntl, unistd (read, write, lseek, close), and sys/stat (fstat). These are the syscalls needed for file operations.

Implement __support::File: Create the __support/File/darwin/ implementation. This internal abstraction requires the Mutex from Phase 1 for thread-safe operations.

Implement stdio: Create the stdio/darwin/ backend. This will provide the FILE* structures (including stdin, stdout, stderr) and functions (fopen, fprintf, fread), which are built on top of the __support::File abstraction.

Phase 3: Memory, Time, and Process Management

This phase adds the remaining core components required for a modern POSIX environment.

Implement mman: Create sys/mman/darwin/ to provide memory management functions like mmap, munmap, and mprotect.

Implement Time: Create __support/time/darwin/ for time abstractions and sys/time/darwin/ for syscalls like gettimeofday.

Implement Architecture-Specifics: Implement the setjmp/darwin/aarch64/ assembly functions for setjmp and longjmp.

Implement Process & Signals: Create spawn/darwin/, sys/wait/darwin/, and signal/darwin/ to support process creation and signal handling.

Phase 4: Full POSIX Compliance and pthread

This final phase completes the library by implementing the remaining high-level POSIX APIs.

I/O Multiplexing: Implement poll (poll/darwin/) and select (sys/select/darwin/). The poll implementation can be built on top of the macOS-native kqueue API (sys/event.h).

Networking: Implement the sys/socket/darwin/ API.

Complete sys/ Modules: Implement the remaining POSIX sys/ modules, such as sys/resource, sys/utsname, and termios.

Implement pthreads: Build out the full pthread/ API (e.g., pthread_create) on top of the synchronization primitives from Phase 1 and the thread management APIs.

Dynamic Linking: Implement dlfcn/darwin/ (dlopen, dlsym). (These are missing from linux too)

Tangential Outcomes

  • Many ‘generic’ implementations in the current libc are only Linux compliant; these would be made OS-agnostic
  • Libc src contains many TODOs that can be addressed.
  • Existing documentation can be improved and supplemented

Testing

Existing entrypoints.txt based design allows specifying which functions to include/exclude from the build, and by extension, from testing. Functions can be cleanly tested in the CI as they are added to the codebase thanks to entrypoints.txt.

libc linux has many generic high-level tests. These would be leveraged as is, or be made generic as required. For macos specific functionalities, new tests would be written along with the patch.

Notes

This is a long-running project. After phase 1 and 2, phases 3 and 4 can invite beginner contributors to the project too.

In case of implementation differences between LLVM libc (linux) and libSystem (library that implements libc on MacOS), an attempt to match it to LLVM libc (linux) instead of libSystem would be made.

Only apple silicon (ARM64) MacOS would be targeted.

This project would result in a lot of small patches needing review. If you’re a
libc maintainers, please let me know if you are okay being tagged in PRs related
to this project so a review load can be distributed.

1 Like

@michaelrj-google @SchrodingerZhu @petrhosek @lntue @leonardchan @roland @mikhailrgadelha @RossComputerGuy @jhuber6

FYI, in case this going down the wrong route since the plan did not mention how to implement syscall. On macOS, <syscall.h> are all private APIs. There does not exist a stable syscall number across OS versions and the stable syscall APIs is libc on macOS.

1 Like

I am aware that these are private and not meant to be stable. This may sound naive, but do they change that often? I browsed the xnu source across past years, didn’t notice much changing.

Does this render the whole project futile?

Yes, absolutely. That will change from version to version (and GoLang made that mistake before). Unless you think static linked llvm libc is ok to be deployed to a single (or a narrow range) OS version and provides no stabilities, or somehow maintain the syscall number inside llvm libc (which drastically increase the maintenance cost), this might be a deal break for the project.

However the big portion of the libc that do not rely on syscalls is totally doable.

what did they do to fix it? go back to using libSystem.dylib?

this would probably be math functions and algorithms stuff, correct? most of the standard stuff (like io, exit etc.) involves syscalls ultimately.

I did some digging. I think the golang bug is mostly tracked here: cmd/link: use libsystem_kernel.dylib or libSystem.dylib for syscalls on macOS ¡ Issue #17490 ¡ golang/go ¡ GitHub

I am not involved in golang fixes so maybe not the best person answering that question.

The release notes for v1.12 mentions using libSystem

For extra reference, cosmopolitan libc is achieving this through the “high-maintenance” route of keeping track of syscalls.master on the xnu github repo. They also document the “quality” of implementation in this article.

Cool. I don’t have a real opinion for llvm libc implementation. I just want to bring up the discussion and I think the evaluation for how to implement syscall on macOS should be part of this RFC.

Thanks for your comments! I’ve been implementing some functions locally by using xnu’s syscalls.master file so far. I don’t have a strong opinion either on how these should be approached. It would be nice to hear what libc maintainers think about this.