[RFC] UEFI platform in LLVM libc

This is an RFC for adding the UEFI (Unified Extensible Firmware Interface) platform into LLVM libc.

Context

Objective

Implement a C standard compatible libc environment targeting UEFI with relevant POSIX extensions.

Background

UEFI functions like an operating system, it supplies various interfaces called protocols (see UEFI specification for specifics). UEFI protocols can implement things such as networking, filesystems, graphics output, and many other things. Supporting these on baremetal is complex and difficult due to the wide range , of possible devices. UEFI provides a general interface to the hardware that simplifies things for operating systems & bootloaders.

Many developers of UEFI applications use Tianocore’s EDKII’s SDK. This SDK provides a simplified libc-like setup. However, this requires compiling EDKII and isn’t easy to adapt outside of the EDKII repository.

Design

Overview

When calling an LLVM libc function such as fopen, puts, or anything which traditionally calls a syscall, this can call a UEFI protocol function.

LLVM libc can provide access to the EFI system table and image handle for applications. This is necessary for applications to access protocols which are not used by LLVM libc due to it being out of the scope of a libc to provide functions for. This covers protocols like the graphics, networking, and various device driver interfaces.

Detailed Overview

Options for providing EFI system table & image handle

The EFI system table is the entry point table into every protocol and system API UEFI has to offer. The image handle is an opaque type which represents the UEFI application itself. These are both necessary for applications to perform functions which are not implemented in the libc. Without these, it is impossible to do things like load drivers, interface with user accounts, or perform graphics operations.

  1. Global variables
    • Simplest
    • Smallest code-wise and binary size
  2. Wrapper functions
    • Requires wrapping many functions
    • Needs to still provide direct access to EFI_HANDLE
    • Allows for a more POSIX like experience
      • Adds errno handling
    • Limits what is possible due to needing to wrap more things
      • Possible to only provide enough to get access to other protocols and wrap EFI_SYSTEM_TABLE only
  3. Holding functions
    • Similar to global variables
    • Only two functions
    • Uses the call stack

Start Files (crt0)

The start file needs to provide an EfiMain instead of the traditional _start function. This function needs to keep the EFI_SYSTEM_TABLE and EFI_HANDLE it receives from the arguments and store it in a way the LLVM libc can utilize it. It would be beneficial to expose these two to applications. This is because they allow functionality beyond the libc. This functionality includes things such as the graphics output protocol, compression, security, user account mangement, networking, and multiprocessor management. It also allows for direct access to the drivers for various kinds of devices like USB and PCI.

File I/O

UEFI has a few different ways of providing file I/O. Two typical ways are EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and SHELL_FILE_HANDLE (from UEFI shell protocol). The primary method works outside a UEFI shell but requires mapping the devices to find the correct filesystem. On the otherhand, the SHELL_FILE_HANDLE can be accessed simply via a UTF-16 string as the path. Fortunately, as both types buffer the output, that makes it possible to provide UEFI_FILE as a union to mimick the behavior of a file descriptor. By including the EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL and EFI_SIMPLE_TEXT_INPUT_PROTOCOL as stdout/stderr and stdin, we can provide something akin to this:

typedef struct {
    union {
        EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL* out;
        EFI_SIMPLE_TEXT_INPUT_PROTOCOL* in;
        SHELL_FILE_HANDLE* shell;
        EFI_FILE_PROTOCOL* file;
    } data;
    enum {
        UEFI_FILE_IS_EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL,
        UEFI_FILE_IS_EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL,
        UEFI_FILE_IS_SHELL_FILE_HANDLE,
        UEFI_SHELL_IS_EFI_FILE_PROTOCOL,
    } tag;
} UEFI_FILE;

Networking

It is possible to provide network access via EFI_SIMPLE_NETWORK_PROTOCOL, EFI_TCP4_PROTOCOL, EFI_TCP6_PROTOCOL, EFI_UDP4_PROTOCOL, and EFI_UDP6_PROTOCOL. Fortunately, the networking API’s are asynchronous and can be wired up in a way that is non-blocking.

EFI_SIMPLE_NETWORK_PROTOCOL

Simplest, just sends/receives data. However it requires implementing a full network stack.

EFI_TCP4_PROTOCOL & EFI_TCP6_PROTOCOL

IPv4 & IPv6 TCP protocols

EFI_UDP4_PROTOCOL & EFI_UDP6_PROTOCOL

IPv4 & IPv6 UDP protocols

Pthreads

It is possible to provide a pthreads compatible API via using EFI_BOOT_SERVICES’s events and timers. Using the EFI_MP_SERVICES_PROTOCOL from the UEFI Platform Initialization specification, it is possible to do multi-processor workloads. However, this would require implementing a scheduler. It is possible to implement one or require the application to provide one. Although it is possible to implement pthreads, this likely will be a feature to be implemented much later on.

UEFI Protocol & libc mapping

A non-exhaustive list of UEFI protocols which can map to libc functions. Some UEFI protocol functions cannot be mapped or do not make sense to be mapped. Some functions not worth mapping are the string functions, memcpy, and memset as LLVM libc already provides those functions.

UEFI Protocol libc
EFI_BOOT_SERVICES.Stall sleep
EFI_RNG_PROTOCOL.GetRNG getrandom
EFI_SHELL_PROTOCOL.GetEnv *1 getenv
EFI_SHELL_PROTOCOL.SetEnv *1 setenv
EFI_SHELL_PROTOCOL.Execute *1 exec

*1 Defined by the UEFI shell protocol specification

7 Likes

Really appreciate this RFC — targeting UEFI in LLVM libc is ambitious but makes a ton of sense given where LLVM’s heading. If we’re serious about LLVM as a systems bootstrap environment, then UEFI is where you land before there’s an OS, and libc is the gateway.

That said, I’d love to see this grounded with a toy spec or user story. As someone using LLVM and UEFI together today, I’m wondering:

  • Am I able to CRUD files from UEFI Shell or even earlier, using fopen/fread/fwrite, abstracted cleanly over EFI_FILE or SHELL_FILE_HANDLE?
  • Can I download and run a binary from a UEFI shell or GRUB using exec-like behavior, pulling from TCP or even HTTP with no kernel?
  • If I wanted to build an agent/operator-style toolchain — lightweight and remote-controllable — how far does this get me with just libc + protocols?
  • Are we planning for POSIX extensions as a goal, or is that just incidental? I’d argue it’s worth explicitly designing for a plumbing/porcelain split so different dialects (Zig, C++, even Rust) can interop cleanly on top.

Also — not a blocker, but pushing libc this low in the stack raises some good questions:

  • Are we exposing too much power too early? Things like exec and getenv sound great, but at UEFI level there’s no real sandboxing — just a flat memory model and access to drivers.
  • How do we secure or validate execution if downloading and running binaries is in scope? If this is going to act like early userland, we might need signature or digest support from the start.

Overall, this feels like one of the missing pieces for LLVM-as-infrastructure. But I’d love to see a minimal walkthrough — even just a main.c that uses fopen, prints to screen, and optionally pulls from a socket — to know what the MVP looks like in practice.

Definitely watching this thread closely. Would be happy to test builds or contribute once it firms up a bit.

—Brett

1 Like

I think all of these are possible. With POSIX in particular, I believe it may be a necessity to increase compatibility.

There’s no way to sandbox whatsoever so I don’t think it’s worth worrying about this.

This is handled by the UEFI image loader itself and it invokes things like secure boot.

Heh yeah, I’ll be making posts on my blog and LinkedIn as more progress is made. The LLVM libc monthly meetings will also cover things as well. Currently, I have https://github.com/llvm/llvm-project/pull/132150 which implements the start file. We mainly just need to answer what we should do with the system table and image handle.

Thanks, Ross. I’ve been reading through your commits, the discussion here, and some of the UEFI Shell spec you linked to. I’m still catching up and trying to fully understand the context, so apologies if I repeat anything or miss something that’s already been covered. Just trying to get aligned with where things are at.

From what I gather, the crt1.cpp implementation gives us a basic UEFI entry point with EfiMain, which sets up access to the image handle and system table. That seems like a solid foundation for launching portable applications in a UEFI environment.

Based on that, it looks like some of the next pieces that might follow could be:

  1. POSIX syscall wrappers
    Mapping common functions like read, write, or open to the UEFI file and console protocols. That would help libc clients rely on familiar interfaces without needing to think about EFI internals.
  2. UTF conversion utilities
    Since UEFI uses UTF-16 and libc usually expects UTF-8, having a clean conversion layer would be key for working with paths and strings.
  3. Memory management
    It looks like malloc and free could be backed by AllocatePool and FreePool using the boot services table. That would enable a wider range of programs to run.
  4. Basic testing
    There’s already a simple test for startup, but maybe some additional ones that try file I/O or console output would help confirm behavior.
  5. Documentation
    Even just a short status update or README would be helpful to understand what’s working so far and what’s still planned.

I know there are still open questions around how best to expose the system table and image handle. Right now they’re globals, which seems reasonable as a starting point. If that changes later, I’ll keep an eye on how the approach evolves.

Again, I appreciate the work you’ve put into this. Just trying to understand the direction and what’s already in motion before jumping ahead.

—Brett

Yeah, generally LLVM libc uses an internal function when calling functions that do the same thing. It is likely that open and fopen would call this internal function.

Yes, that is actually a bit of a blocker on certain things. We need proper support for UTF-8 and UTF-16 conversion in both directions.

That is possible but I think it’d be great if the UEFI platform could integrate with SCUDO or other allocators which can plug in. couch couch snmalloc.

Yes, there will be more testing as progress is made. It’ll happen quite early on. I had already came up with a method but it was slow. It is possible we may need to generate a single binary with every unit test in it.

There will be updates via the monthly meeting and under the UEFI section on libc.llvm.org.

Thanks for letting me know — I’m still getting scoped into this community and the RFC process, and every little bit of context really helps. I’ve been working through the UTF-16 to UTF-8 issue, and I get now that if it’s not handled carefully, there’s definitely an overhead penalty. UEFI uses UTF-16 for legacy reasons, so we can’t really rebuild the wheel there. The smarter move seems to be building a lightweight UTF-8 application runtime on top of it — something that creatively bridges the two by reading from UTF-16 when needed, and caching frequent conversions both ways. It doesn’t need to be a full runtime at first — just enough to support UTF-8 where we need it most, especially for anything hitting UEFI calls often. Right now, we can’t cleanly interop with tools like Node, Deno, or Wasmtime because they expect UTF-8 everywhere. Maybe you could patch it on the fly or JIT some bridge, but really, the goal is to carve out a solid UTF-8-based runtime (an ART) that understands how to live alongside UEFI’s UTF-16 without constantly paying the conversion cost.

Once again, I apologize if I’m retracing ground that’s already been architected or discussed — I’m almost certainly walking through ideas others have explored. From what I can tell, Tristan’s RFC and implementation leave us with two clear paths forward:

  1. A chainloaded .efi, which inherits full UEFI access and gives us a clean space to build a UTF-8-native application runtime (ART). This approach is loosely coupled, modular, and well-suited for versioning, experimentation, or composing multiple runtimes.
  2. Embedding the UTF-8 ART directly into the initial .efi, which simplifies the deployment but tightly couples the runtime logic with UEFI’s boot and service layer.

At this stage, the first path feels like the more flexible long-term architecture — it lets us isolate UEFI constraints at the edge, iterate on the UTF-8 runtime independently, and more easily embed lessons learned back into future stages.

That sounds extremely overcomplicated. All we need are functions which convert UTF-8 and UTF-16 and shove them in where needed.

Note that UEFI, like windows, doesn’t require well-formed UTF-16. That means that converting to UTF-8 could fail, which is not great thing in a low-level file API. E.g. when you’re iterating files in a directory, it’s not good if there could be filenames on disk aren’t representable in the API.

So it may be good to decode/encode filenames with WTF-8 instead of UTF-8. Under WTF-8, all well-formed UTF-16 converts to/from the same bytes as UTF-8, but invalid lone surrogates in UTF-16 can also be successfully round-tripped.

2 Likes

Yes, that’s a possibility. I’ve done work on Zig’s UEFI support in their stdlib and WTF is used. This obviously will have to have more discussions later on but is certainly something I’ve been thinking about. This likely will come as an RFC.