[RFC] Implementation of stdio on baremetal

This RFC discusses the strategy for implementing stdio.h on baremetal.

Context

Currently, FILE is not supported on baremetal platforms. Therefore, functions like fprintf have not been implemented at this current time, which means I cannot use LLVM-libc to compile programs requiring both stdout and stderr.

So, I submitted a PR (https://github.com/llvm/llvm-project/pull/144567), apparently there had already been discussions about the design, but it had not been written down.

Requirements

If both these use cases can be supported, it would be amazing.

Semihosting

On ARM platforms, a lot of testing is done through the semihosting API, which, for example, allows interaction with the host filesystem (see section 6, semihosting operations). Currently a lot of the work to do with semihosting is done downstream, which I would like to upstream some time in the future, along with the C runtime (crt0).

True baremetal

True embedded projects (that are not connected to a debugger) do not use semihosting, and another interface like UART to execute I/O functions. On true baremetal, code size is an important factor, so ideally we would like to throw away the scaffolding to support semihosting.

Design

As there had already been an informal design before I joined the project, I would love to hear about it. I will document a few concerns I had with preliminary testing.

Compatibility with picolibc/newlib

They define and declare stdout/stderr/stdin as a FILE * const, and we define it as FILE *, which is more in line with other libcs. Ideally I’d like a plug-and-play approach where we have some compatibility, but I know that there are a lot of trade-offs.

I’m definitely not an authoritative voice in this space, but I do presently have the time to give my thoughts, and I do have some experience with embedded stdio implementations, so I’ll do so.

As far as I can tell, the linked PR doesn’t introduce any new API surfaces between llvm-libc and baremetal developers hooking up their underlying platform abstractions to stdio. Accordingly, there seems little risk to extending it in the specific scope of the linked PR; there’s a limit to how much functionality one can implement without being able to construct new FILE* instances. There’s some additional complexity to consider with fseek and setvbuf, though, even with just the three stdxxx streams.

Once you can make new FILE* instances, though, then we’re into entirely different territory, since FILE is a struct that needs to contain something, and as such, there’s a lot of questions about what it should contain. In particular, the generic stdio implementation in llvm-libc already has a quite generic FILE in service of fopencookie; it seems quite plauisible to reuse this for baremetal support in concert with an extension of the existing embedding API for standard streams and a new one for other types of files. It seems very much doubtful that we’d want to redo all this work on baremetal if what’s in generic mostly suffices, accordingly, it seems like it would be a question of how to refactor things in the libc build so that it could optionally be used on baremetal platforms. We may also want to slice and dice it more granularly than we presently can; for example, excluding buffering and/or seeking.

I’d also personally love if the “minimal stdio” (stdxxx streams only) could be weakly linked in and replaced with the “full stdio” (full FILE*) if needed, but that’s a pet feature of mine and others may not agree.

There’s also the question of the various directory manipulation functions; that’s a whole can of worms and I have much less experience with them than others likely do.

1 Like

I’m also not any sort of authority here, but I do have interest in this topic because I do work in baremetal and want to use llvm-libc for future projects in with ARM MCUs. I’m admittedly thinking beyond the scope of your current PR; I’m sort of just rambling right now.

Vendors nowadays may provide their own middleware that contains their own file IO library. These are commonly used to access either removable media or a small onboard flash part. It would be nice to be able to define our own FILE to wrap the vendor-specific type and to be able to implement our own File IO functions to wrap the vendor-specific functions. I’m not sure if the proper path is for llvm-libc to provide partial File IO implementations with the user implementing hooks–like with __llvm_libc_stdio_read/write()–or if llvm-libc should just leave it up to the user to implement whatever File IO functions they need. I would personally lean toward the latter, mostly because I think users might end up needing to do most of the work in the hooks anyway.

I have wondered if a baremetal FILE could be an alias to the __llvm_libc_stdio_cookie or vice versa. I haven’t really thought much about it, so that’s not a proposal or anything like that.

Another issue is that some File IO declarations are needed to build libcxx with lllvm-libc. Currently, I have my build script manually insert the needed declarations into the generated stdio.h and that does work. I have even tried out std::println() on my MCU just to see it run. A new CMake option was recently added to llvm-libc to allow a user to force all function declarations to be generated even if the implementations are not. That means this build script workaround wouldn’t be necessary as long as the user is okay with non-File IO declarations be added.

1 Like

Thanks for the comments, I’ve taken some time away from this RFC to focus on other things, but I am coming back. I’ve taken some time to rethink my decision with the PR, and how it will affect things down the line.

We do require full FILE* instances as per our customer requirements, and as @mysterymath said, I think a lot of it could be shared with the generic implementation. I think there were plans (cc: @michaelrj-google) to redo the FILE interface anyways. I would disagree with the idea to let users just implement whatever functions they require, as this ends up rendering most of the libcxx I/O functions useless.

Personally I have seen 2 main uses for baremetal, semihosting and UART. As long as we define the hooks for write, read, seek?, flush? and flags, I think anything should work.

In the future we could get a mini-stdio library similar to what a lot of embedded systems currently have or an option to disable full FILE* support.

Proposal 1: Tear down the stdio/baremetal folder

We can leave in some functions like remove() but this proposal would get rid of stdio/baremetal/printf.cpp and all other functions that the generic function would implement.

To do this, we take the Linux implementation and adapt it to the baremetal implementation. In src/__support/File we can use a CookieFile instead of defining a unique class for baremetal. Then, we can hook up the embedding API to these cookies.

From a user’s POV:

Proposal 2: Stick with the stdio/baremetal folder

In this alternative, we fill in every function that is implemented in the generic folder. This is the approach suggested by the linked PR.

This way, we don’t actually require a full FILE* interface, and we can easily optimise out for code size.

Thoughts

I do think that (1) is the cleaner approach, but I may have missed an approach. I would like to get started on this quite soon so I am open for any suggestions. Thank you for reading.

1 Like

This tactic sounds like it would have serious licensing problems.

No opinion on the proposal in the abstract, just on this implementation tactic.

Sorry I meant the Linux implementation at libc/src/stdio/linux and libc/__support/File/linux - which is already in the libc project.

First, I wanted to give a huge THANK YOU for taking this on. As someone with experience in baremetal but not in toolchains, I appreciate yours and others’ work to make a real libc for baremetal.

So, would there be things like __llvm_libc_stdio_seek(), __llvm_libc_stdio_flush() and stuff like that? Would that be usable with the current `__llvm_lib_stdio functions?

That is, if I just wanted a UART or Semihosting, could I implement __llvm_lib_stdio_read()/write() like I do now with the cookies? And if I wanted full on file IO, then I would implement additional __llvm_libc_stdio hooks? That would be really great if so.

Full file IO likely won’t be common, but it does happen. We have baremetal projects that can access a USB flash drive, for example, using a vendor-supplied library. I’m flexible with either solution, but if I’m understanding your descriptions correctly then it sounds like (1) is indeed cleaner.

Definitely in favor of option 1, but with the caveat that the code size implication you mentioned will likely be pretty major. In particular, a lot of stuff in stdio may involve malloc, and including that in the link when it isn’t necessary is a huge bummer. It’s definitely possible to structure a stdio implementation such that malloc (et al) isn’t brought into the link unless it’s needed (e.g., setvbuf, fopen), but that may involve some refactoring work in libc’s generic stdio.

From offline discussions, it sounds like llvm-libc’s stdio is generally pretty rudimentary anyway, so it’ll need surgery to support everything it needs to on host too. Once @petrhosek is back online, he’ll likely have a lot more to say on the specifics.

What things in stdio get malloced? Is it just FILE structure or do things like buffers need malloc, too? If it’s just FILE, could something like __llvm_libc_stdio_alloc_FILE() be added to let the user, say, allocate a FILE from a static array or something?

FILE structures do need to be malloced to support an arbitrary number of open files. One could allocate a fixed amount in an array (standard dicates FOPEN_MAX >= 8), but that’s a bit of a double-edged sword, since buffers for those 8+ streams would also need to be statically allocated. It’s generally expected that, if these streams are buffered, their default buffers be at least BUFSIZ, which is mandated to be at least 256 bytes. So that’s a minimum of 2K of RAM that may never end up used; in such case it may be preferable to bring a malloc into XIP flash ROM instead.

Implementing setvbuf completely is also not generally possible without malloc, since it can arbitrarily resize the internal buffer. So it would need to either fail or bring in malloc.

Gotcha, thank you for the explanation! I can see how that becomes a pickle for baremetal. Do stdout, stderr, and stdin have to point to or be backed by valid FILE structures or do they just have to be unique values? That is, do they need to be allocated in that 2KB of RAM?

I ask because I wonder if it is possible to avoid the heap and malloc at least in the case the user just wants a simple UART or Semihosting with stdout, stderr, and stdin like is available now.

Supposedly FILE is opaque, so the only need for the standard streams to be backed by actual structs is so that the effects of various function calls on them take the proper effect. If the linker can determine that certain calls are impossible, then it’s possible to make them more minimal without any observable effect beyond a pointer change, and it’s possible to rig a stdio implementation to operate this way using e.g. weak linking. (I’ve done so in the llvm-mos SDK for 6502 systems: there’s a weak minimal stdio where the only legal FILEs are std, and a full stdio with malloc linked in that replaces the minimal stdio if you call e.g. fopen anywhere. I’m obviously biased, but that system seems to work quite well!)

I see. Your llvm-mos SDK stdio setup sounds great. I presume doing something like that is what would involve the refactoring work you mentioned previously.

Being able to avoid the heap for stdio for the basic UART case while requiring it for the full-fat version seems very reasonable to me. Admittedly, I have the luxury of using larger MCUs (>=128kB RAM), but from my limited experience the resources needed to support USB or flash memory access have been much greater than the extra resources llbm-libc might need anyway.

Thanks again! I look forward to seeing how this progresses.

Also, I’m not sure if this helps matters at all, but we currently use Microchip’s XC32 toolchain and that requires a heap to be declared if you use stdio stuff. This is true even for basic UART-type IO. I don’t know if this is common, MIcrochip’s tools are just wack, or what. Still, a major vendor requires a heap, though it can be very small–like 1kB or so.

I was going to edit my previous post so I’m not clogging this topic, but I guess there’s a time limit for editing and that expired.

I’m not too familiar with Microchip’s XC32 toolchain. It seems based on GCC? I’m not sure what chip you are targeting but there are LLVM-based toolchains that do come with LLVM-libc shipped.

LLVM-libc also requires a heap. This is provided in the linker, which should come with the toolchain.

On another note, here is a link to the PR: [libc] Migrate from baremetal stdio.h to generic stdio.h by saturn691 · Pull Request #152748 · llvm/llvm-project · GitHub

1 Like

The philosophy I have been following so far for the baremetal implementation of LLVM libc is to avoid pulling in things you don’t need (and relying on linking semantics and linker features for eliminating dead code to avoid having too many options which could lead to combinatorial explosion of multilibs).

To give a concrete example, we migrated several baremetal projects at Google to LLVM libc, but so far none of them needed anything beyond printf and scanf. To me that is a strong signal that many baremetal projects don’t need a complete stdio implementation. Thus, the current implementation of LLVM libc only provides printf and scanf (and few additional functions, neither of which use FILE*) which are implemented in terms __llvm_libc_stdio_write and __llvm_libc_stdio_read. This is about as minimal as you can get.

With the approach proposed in [libc] Migrate from baremetal stdio.h to generic stdio.h by saturn691 · Pull Request #152748 · llvm/llvm-project · GitHub, we would provide a more complete stdio implementation with little additional code, but we would also introduce an indirect call for every use of stdio functions plus we would need to run the CookieFile constructors at startup. While convenient, this is unnecessary for many projects and counter to the existing philosophy.

I believe we need to provide a more complete stdio implementation, but the direction I envisioned is a layered approach:

  1. The bottommost layer is what we have today, that is printf and scanf directly implemented using __llvm_libc_stdio_write and __llvm_libc_stdio_read.
  2. The middle layer would be minimal stdio where the only legal FILE* are the std* streams akin to what was implemented in [libc] Add putc, fputc, and fprintf to stdio/baremetal by saturn691 · Pull Request #144567 · llvm/llvm-project · GitHub.
  3. The topmost layer would be fopencookie which you give you the full flexibility with the associated overhead.

This way, every baremetal project can decide how much of stdio they need and how much overhead they’re willing to pay. I expect that most projects would only need the first two layers; if you need the third layer it’s likely because you’re doing something non-trivial and the overhead of fopencookie is likely not a concern.

A potential alternative is to use the CookieFile abstraction for the middle layer reusing the approach from [libc] Migrate from baremetal stdio.h to generic stdio.h by saturn691 · Pull Request #152748 · llvm/llvm-project · GitHub, but making the CookieFile constructor and related methods constexpr. I tried that locally and it seems like this approach eliminates the need for global constructors but more testing is needed to see if this is a better approach in terms of overhead.

As a first step, how about we support two cases? This would suit both of our needs for the time being, and it shouldn’t be too difficult to add the intermediate stage back in when we need it.

Basic Advanced
stdio/stderr/stdout Requires users to define their own FILE if required Provided via CookieFile
Functions printf/scanf All
Implementation Existing baremetal folder Generic folder
CMake Option Default LIBC_FULL_BAREMETAL_STDIO
Overhead Minimal Global constructors at compile time via constexpr, extra indirect function call per function

With regards to testing, that is its own topic, we are running some tests downstream, however, we don’t have a mechanism for testing stdio.

Also, with the advanced stdio, I did manage to get std::cout working before the patch was reverted. From our side, there is a massive demand for C++ on embedded and we would like to work on this sooner than later.

What’s the motivation for introducing a build option? I’d hope that we can support both of these cases within the same build and rely on linker semantics for omitting unused code.

I must have misinterpreted your first statement then (“avoid pulling in things… and [avoid] relying on”). Yes, I would rather rely on linker semantics as well.

In that case we would just enable the rest of the stdio.h functions as a placeholder. But we would prefer to use the ones in the baremetal folder, as would have less overhead. That way:

  1. If you’re only using printf/scanf, then there would be no observable difference.
  2. If you’re using other functions outside of the baremetal folder such as fflush, we would have to set up the streams (set to constinit/constexpr and linked in as needed).

The tricky part would have to come later (if the user uses fprintf(FILE, ...) as if we would like to support a minimal FILE as you suggested + a full FILE, then I’m not sure the linker can handle that. However, we can fill that in later if that is a requested feature.

Thanks everyone for your input. I’d like to progress the discussion.

For purposes of this discussion, let’s call each level of support a tier. Based on your comments, I categorise each tier as follows:

  • Tier 1: the way it is now. Baremetal only provides functions such as printf and scanf.
  • Tier 2: provides all functionality in stdio.h, but only for stdin, stdout and stderr.
  • Tier 3: provides all functionality in stdio.h for general FILE*.

I had discussions with some colleagues at Arm (thanks in particular to @smithp35 and @statham-arm) and we gathered some thoughts about the option of handling it all via linker semantics:

  1. lldwill search libraries in the specified search path order, picking the first object that defines an unresolved non-weak reference. The order of symbol references is determined by the order of input objects so it’s not really under control from the library authors’ perspective.
  2. Having the linker choose a tier based on just symbol references is difficult. If the tiers had no intersection of symbol names between them, it would be doable, as the linker could pull in more than one tier with no risk of a multiple definition error. However, as soon as alternative implementations of the same symbol can exist, this doesn’t work anymore.
  3. Assuming that each tier is a superset of the one above, we could have one separate stdio library for each tier, then delegate the choice to the user by the use of -lstdio-{tier_identifier}. This can be prone to user error if they specify a tier that isn’t enough to satisfy all symbol references used.
  4. One might enable alternative implementations by the use of weak definitions. By that, the most basic tier could have all its public symbols defined as weak, and leave the most complete tier to define its own symbols as strong. Then, the two tiers would be fed to the linker, and in case that advanced functionality is required, both tiers would be pulled in by the linker and the symbols of the most complete one would override the weak ones. Unfortunately this approach has a couple of shortcomings: (a) it can’t be done with more than two tiers, since symbols can only be either weak or strong (that is, there are only two states); (b) the mixture of object files from different tiers can happen inadvertently and might lead to undesired behaviour (some functions will come from weak definitions and others from strong definitions).
  5. Finally, we could also only support the building of a single stdio tier at a time, having as default perhaps the current implementation (called tier 1 in this made up nomenclature). Toolchain providers can switch to another tier at library build time if so they wish. One obvious weakness is that a configuration flag is required in the libc’s build system to select which tier to build.

Expanding on (3), for toolchain providers, it’s possible to leverage clang’s multilib selection mechanism. We already have the multilib custom flags feature I’ve implemented some time ago ( Multilib — Clang 22.0.0git documentation ). We could have each stdio variant in its own static library, and clang’s driver can choose which one to pick up based on the command-line:

  • clang .. -fmultilib-flag=stdio-minimal
  • clang .. -fmultilib-flag=stdio-stdstreams
  • clang .. -fmultilib-flag=stdio-full

In my opinion, the best solution is either (3) or (5). I’d be happy to hear your thoughts in order to make progress.

1 Like