A libc in LLVM

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc implementations don’t quite address. This is pushing us to start working on a new libc implementation.

Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.

To be very clear: we don’t expect our needs to exactly match everyone else’s – part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the “as a library” philosophy of the LLVM project: even though “the C Standard Library” is nominally “a library,” most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification – there will be some parts which simply aren’t worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors’ extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.

There are also few areas which we do not intend to invest in at this point:

  1. Implement dynamic loading and linking support.

  2. Support for more architectures (we’ll start with just x86-64 for simplicity).

For these areas, the community is of course free to contribute. Our hope is that, preserving the “as a library” design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren’t needed.

We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM project?

Thank you,

Siva Chandra and the rest of the Google LLVM contributors

disclaimer: I work at Google so don’t take my +1 as an independent vote forward.

We would like to use this on Fuchsia and I am particularly interested in creating a dynamic linking library for ELF with Roland McGrath’s guidance. We spoke about creating a library for writing dynamic linkers internally and I don’t see why this can’t be upstreamed.

On Fuchsia we critically need support for AArch64; What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.

What do you expect the support for Windows to be? Certainly, I don't
expect you to provide Windows support personally if you don't need it,
but given that LLVM supports Windows, it should at least be done in
such a way that the design lends itself to interested parties
contributing Windows support.

Currently clang-cl has several dependencies on having a Visual Studio
installation present on your machine, and one of these is because to
provide an implementation of the CRT (i.e. libc). So having a libc
implementation which supports Windows and is compatible with MSVCRT
would be useful for people using clang on Windows as well.

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.

Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,

+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.

and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.

To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.

Great.

There are also few areas which we do not intend to invest in at this point:

  1. Implement dynamic loading and linking support.

It will be useful to have a design document that describes the kind of system and capabilities that you're targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.

  1. Support for more architectures (we'll start with just x86-64 for simplicity).

For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.

We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM project?

This is something that I'd like to see.

-Hal

Thank you,

Siva Chandra and the rest of the Google LLVM contributors

Hey Siva,

HardenedBSD is a derivative of FreeBSD that aims to perform a
clean-room reimplementation of the publicly-documented bits of the
grsecurity patchset. We're extremely interested in llvm's CFI to fill
the gap of PaX's/grsecurity's patented/GPLv3'd excellent RAP
implementation. We've made measurable and tangible progress in
researching and integrating Cross-DSO CFI (even producing a pre-alpha
Call-For-Testing of Cross-DSO CFI in HardenedBSD base).

One hard problem I need to solve is tight integration of the sanitizer
library into both our libc and our RTLD while also attempting to keep
diffs minimal with our upstream FreeBSD.

Having a libc that was sanitizer-centric (or, at least, aware) and
could serve as a drop-in replacement for our libc would be a major
win and would even enable quicker development of novel security
technologies in the future.

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc
implementations don't quite address. This is pushing us to start working on
a new libc implementation.

Informal conversations with others within the LLVM community has told us
that a libc in LLVM is actually a broader need, and we are increasingly
consolidating our toolchains around LLVM. Hence, we wanted to see if the
LLVM project would be interested in us developing this upstream as part of
the project.

To be very clear: we don't expect our needs to exactly match everyone
else's -- part of our impetus is to simplify things wherever we can, and
that may not quite match what others want in a libc. That said, we do
believe that the effort will still be directly beneficial and usable for
the broader LLVM community, and may serve as a starting point for others in
the community to flesh out an increasingly complete set of libc
functionality.

We are still in the early stages, but we do have some high-level goals and
guiding principles of the initial scope we are interested in pursuing:

   1.

   The project should mesh with the "as a library" philosophy of the LLVM
   project: even though "the C Standard Library" is nominally "a library,"
   most implementations are, in practice, quite monolithic.
   2.

   The libc should support static non-PIE and static-PIE linking. This
   means, providing the CRT (the C runtime) and a PIE loader for static
   non-PIE and static-PIE linked executables.

Having a portable, permissively-licensed CSU/CRT that supports static
PIE would be a very welcomed project, especially if HardenedBSD could
make use of it.

   3.

   If there is a specification, we should follow it. The scope that we need
   includes most of the C Standard Library; POSIX additions; and some
   necessary, system-specific extensions. This does not mean we should (or
   can) follow the entire specification -- there will be some parts which
   simply aren't worth implementing, and some parts which cannot be safely
   used in modern coding practice.
   4.

   Vendor extensions must be considered very carefully, and only admitted
   when necessary. Similar to Clang and libc++, it does seem inevitable that
   we will need to provide some level of compatibility with other vendors'
   extensions.
   5.

   The project should be an exemplar of developing with LLVM tooling. Two
   examples are fuzz testing from the start, and sanitizer-supported testing.

There are also few areas which we do not intend to invest in at this point:

   1.

   Implement dynamic loading and linking support.

That is correct. Implementing a runtime linker (RTLD) is orthogonal.
However, it seems to be the next logical (and welcomed!) step. Not
within scope of a libc implementation, though.

   2.

   Support for more architectures (we'll start with just x86-64 for
   simplicity).

For these areas, the community is of course free to contribute. Our hope is
that, preserving the "as a library" design philosophy will make such
extensions easy, and allow retaining the simplicity when these features
aren't needed.

We intend to build the new libc in a gradual manner. To begin with, the
new libc will be a layer sitting between the application and the system
libc. Eventually, when the implementation is sufficiently complete, it will
be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM
project?

Even if the new libc isn't merged into llvm, it would be very
interesting to collaborate on. I would hope that Google would remain
interested in keeping in open sourced, and perhaps maintained in a
fashion that multiple OS vendors can adopt.

Thanks,

<disclaimer: I work at Google, though not on anything related to this project>

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the “as a library” philosophy of the LLVM project: even though “the C Standard Library” is nominally “a library,” most implementations are, in practice, quite monolithic.

This is awesome. I’d really love to see a corpus of functionality built as a set of libraries that can be sliced and remixed in different ways per the needs of different use-cases.

For these areas, the community is of course free to contribute. Our hope is that, preserving the “as a library” design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren’t needed.

Fantastic!

We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

So, what do you think about incorporating this new libc under the LLVM project?

I would love to see this, and I think it would fill a significant missing piece in the LLVM ecosystem.

-Chris

I’m not totally sold on the idea of having it be a layer between system libc and application. I think this is likely to create a split between windows and non windows that will be difficult to overcome.

It also seems like it brings with it its own set of difficulties. Where can you make a separation in libc such that you’re guaranteed that the two pieces do not share any state, especially given that not everyone is going to be using the same libc?

Have you considered just starting with a blank slate?

Some natural questions:

  1. Will libm be included?
  2. How will llvm libc be different from musl in design perspectives?
    musl is another widely used libc implementation, available on many Linux distributions (https://wiki.musl-libc.org/projects-using-musl.html#Linux-distributions-using-musl and even on Windows! https://midipix.org/), often used by prebuilt packages because of its lightweightness.

It’d be great if the library will be designed with multiple kernels in mind. That can be a purpose why another libc implementation is needed. :slight_smile: Then another natural question is how the kernel differences will be effectively isolated. The platform specific macros in compiler-rt may be a bit messy now. I hope we can prevent that situation.

Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors’ extensions.

I’m glad to see this. Many uses of glibc symbol versioning are actually “bug-compatibility”.
It’d be good to push applications to fix their own problems.

Implement dynamic loading and linking support.

Lack of support for dynamic linking circumvents many problems: PLT lazy binding, dlclose, ABI compatibility (newer binary on older loaders), etc. However, it is good to make the intention clear whether the feature will ever be implemented in an early stage because it will influence many design choices of many interfaces.
Entirely forgetting it may bring trouble when it is eventually decided to be implemented in the future.

Support for more architectures (we’ll start with just x86-64 for simplicity).

This is fine. musl has 5 or 6 arch-dependent files for each port (arch//.h) and a few more in the user interface arch//bits/.h . It proves that a new port does not need a bunch of additional logic. Many optimized routines may inevitably get added, though…

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc implementations don't quite address. This is pushing us to start working on a new libc implementation.

Are you able to share what some of these needs are? My reason for
asking is to see if there is a particular niche where existing libc
designs are not working, or if there is an approach that will handle
many use cases better than existing libc implementations.

Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need, and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.

To be very clear: we don't expect our needs to exactly match everyone else's -- part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.

I'm definitely interested in hearing more. Assembling an LLVM based
toolchain when there isn't an obvious native platform C library that
can be used could in theory benefit greatly from something like this.
As you point out, this might not be in your set of needs though.

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

The project should mesh with the "as a library" philosophy of the LLVM project: even though "the C Standard Library" is nominally "a library," most implementations are, in practice, quite monolithic.

There can be good reasons for designs to be monolithic though, for
example https://wiki.musl-libc.org/design-concepts.html . I'm not
enough of a C-library expert to say that this is always true, but it
does at least highlight that there is a risk that a toolkit suitable
for many libraries becomes too cumbersome to use in practice.

The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

Interesting. I've seen an embedded static-PIE loader embedded into an
image so that it could relocate itself. As all the dependencies were
statically linked there were only simple relative relocations to
resolve. Are you thinking of something along those lines or an
external loader program?

If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification -- there will be some parts which simply aren't worth implementing, and some parts which cannot be safely used in modern coding practice.

I'm interested in what sort of platform that the libc could run on and
what would be needed to be provided externally? In particular I'm
interested in whether a platform OS is required? I'm also interested
in where the boundaries of the libc, for example I'm thinking of
something like the separation of newlib and libgloss here?

Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors' extensions.

The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.

There are also few areas which we do not intend to invest in at this point:

Implement dynamic loading and linking support.

Support for more architectures (we'll start with just x86-64 for simplicity).

I strongly recommend you choose at least one other architecture and
build cross platform support in from the beginning. I suspect that
trying to put this in retroactively will put huge stress on the design
and the supporting infrastructure such as the build system. There is
also a danger of baking design decisions favouring one architecture
into the system, 32-bit vs 64-bit support is one obvious case. I'm
thinking that this is one area where the community could contribute.

For these areas, the community is of course free to contribute. Our hope is that, preserving the "as a library" design philosophy will make such extensions easy, and allow retaining the simplicity when these features aren't needed.

We intend to build the new libc in a gradual manner. To begin with, the new libc will be a layer sitting between the application and the system libc. Eventually, when the implementation is sufficiently complete, it will be able to replace the system libc at least for some use cases and contexts.

I'm interested to see which system libc and existing platforms you
intend to support? Does this go as low as embedded system where the
platform is more like a board support package, or is this purely a
libc for platforms?

So, what do you think about incorporating this new libc under the LLVM project?

Personally I think that if it can satisfy the needs of a sufficiently
broad segment of the community then I'm in favour. I'm looking forward
to seeing more.

Peter

Since I have a little experience in this area, I'd like to chime in on
it. :slight_smile: TL;DR I think it's a reall, REALLY bad idea.

First, writing and maintaining a correct, compatible, high-quality
libc is a monumental task. The amount of code needed is not all that
large, but the subtleties of how it behaves and the difficulties of
implementing various interfaces that have no capacity to fail or
report failure, and the astronomical "compatibility surface" of
interfacing with all C and C++ software ever written as well as a
large amount of software written in other languages whose runtimes
"pass through" the behavior of libc to the applications they host, all
contribute to the scale of work, and of knowledge/expertise, involved
in making something of even decent quality. (As an aside, note that I
love to see hobby libc projects even if they have major problems, but
that's totally different from proposing something that lots of people
will end up stuck using.)

Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia -- a project which I contributed
significant amounts of free labor to in terms of tracking down folks
for license clarification their lawyers wanted, only to have them
never bother to ask me why technical things were done they way they
were before making random useless and broken changes in their fork. A
corporate-led project does not have to answer to the community, and
will leave whatever bugs they introduce in place for the sake of
bug-compatibility with their own software rather than fixing them.

Third, there is tremendous value in non-monoculture of libc
implementations, or implementations of any important library
interfaces or language runtimes. Likewise there's tremendous value in
non-monoculture of tooling (compilers, linkers, etc.). Avoiding
monoculture preserves the motivation for consensus-based standards
processes rather than single-party control (see also: Chrome and what
it's done to the web) and the motivation for people writing software
to write to the standards rather than to a particular implementation.
A big part of making that possible is clear delineation of roles
between parts of the toolchain and runtime, with well-defined
interface boundaries. Some folks have told me that I should press LLVM
to make musl the "LLVM libc" instead of whatever Google wants to do,
but that misses the point: there *shouldn't be* a "LLVM libc", or any
one library implementation that's "first class" for use with LLVM
while others are only "second class".

So, in summary:

Point 1 is why making a libc for real-world use is not to be taken
lightly.

Point 2 is why, if it is done, it shouldn't be a Google project.

Point 3 is why there should not be an "LLVM libc".

Hope this is all helpful.

Regards,

Rich

Doesn't having additional libc implementations to choose from
contribute *to* the ideal of not having a monoculture?

Also, I didn't read the proposal as segregating the world into first
class and second class libc implementations. For example, libc++
currently works fine with non LLVM-based toolchains, and libstdc++
currently works fine with LLVM-based toolchains. Do you see libc as
fundamentally different in this regard?

Regarding your second point, if Google were to write a libc
implementation and then upstream it in bulk, I would agree with you.
But being done in the open appears to solve the exact problem you are
concerned about, which is that corporate interests will lead to
lasting design decisions that aren't in the best interest of the
general public. By doing it in the open, such problems can be
addressed before the code is ever committed.

I’m gonna let the folks working on this respond to technical points, but some meta points about discussion on this list…

Since I have a little experience in this area, I’d like to chime in on
it. :slight_smile: TL;DR I think it’s a reall, REALLY bad idea.

In case there is any confusion, I’m really glad you’re participating in the discussion here because of this background.

Second, corporate development teams are uniquely qualified to utterly
botch a libc, yet still push it into widespread use, and the cost is
painful compatibility hacks in all applications. Apple did this with
their fork of BSD libc code. Google has done it once already with
their fork of musl in Fuchsia

Let’s keep this focused on technical issues and LLVM issues, none of the above (or the text in this paragraph I’ve snipped out) has anything to do with those, and I don’t think the LLVM list is the right place to discuss that.

LLVM has a long and effective history of both individuals and corporations working effectively together in the open as part of the project. I don’t think this project poses any risk there, much like Zach points out in his reply. Google is specifically discussing this early and trying to participate in the open process of the LLVM community from the outset. =]

Also, I’d suggest using more specific technical language than “botch” and “hacks” to make the discussion more productive.

With that, I’ll wander off and let you all dig into the real issues here.
-Chandler

disclaimer: I work at Google so don’t take my +1 as an independent vote forward.

We would like to use this on Fuchsia and I am particularly interested in creating a dynamic linking library for ELF with Roland McGrath’s guidance. We spoke about creating a library for writing dynamic linkers internally and I don’t see why this can’t be upstreamed.

If dynamic linking support is added in a “as a library” fashion, so that it can easily be excluded if not required without affecting the rest of the system, I do not see any problems adding it.

On Fuchsia we critically need support for AArch64; What do you expect to be architecture dependent? I struggled to think of where the architecture and not the operating system was the issue.

I think syscalls are an example of being architecture specific? And, items like program startup and PIE loader are operating system/exe format specific?

Just for my knowledge, why is answering these questions at a general level important?

Syscalls are operating system specific and architecture dependent so I think we’ll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren’t even syscalls on all operating systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they’re trivially abstracted away.

why is answering these questions at a general level important?

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I’d very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

What do you expect the support for Windows to be? Certainly, I don’t
expect you to provide Windows support personally if you don’t need it,
but given that LLVM supports Windows, it should at least be done in
such a way that the design lends itself to interested parties
contributing Windows support.

We are not going to disallow support for an item/features we do not plan to implement ourselves. Contributions will be welcome.

As I have mentioned in another email, we really want to develop everything in a “as a library” fashion so that adding support for new items/features isn’t blocked by design.

Not that right here at this exact moment is the right place to discuss this but a secondary email to discuss and gather requirements for an operating system abstraction layer seems to be required then. We don’t want the implementation to be coupled too tightly with Linux if we want to support BSD, Windows, and Fuchsia as well. I also have hopes that hobbyist operating system developers could use this. Libc implementations for hobby OS projects were a pain point for me personally.

Hello LLVM Developers,

Within Google, we have a growing range of needs that existing libc implementations don’t quite address. This is pushing us to start working on a new libc implementation.

Informal conversations with others within the LLVM community has told us that a libc in LLVM is actually a broader need,

+1 - This has also been my experience: Many people over many years have expressed a desire to have a libc has part of the LLVM project. It is currently a large gap in our LLVM toolchain offering. Moreover, from the standpoint of my organization, an LLVM libc could provide benefits on both production platforms and research/experimental hardware.

and we are increasingly consolidating our toolchains around LLVM. Hence, we wanted to see if the LLVM project would be interested in us developing this upstream as part of the project.

To be very clear: we don’t expect our needs to exactly match everyone else’s – part of our impetus is to simplify things wherever we can, and that may not quite match what others want in a libc. That said, we do believe that the effort will still be directly beneficial and usable for the broader LLVM community, and may serve as a starting point for others in the community to flesh out an increasingly complete set of libc functionality.

We are still in the early stages, but we do have some high-level goals and guiding principles of the initial scope we are interested in pursuing:

  1. The project should mesh with the “as a library” philosophy of the LLVM project: even though “the C Standard Library” is nominally “a library,” most implementations are, in practice, quite monolithic.

  2. The libc should support static non-PIE and static-PIE linking. This means, providing the CRT (the C runtime) and a PIE loader for static non-PIE and static-PIE linked executables.

  3. If there is a specification, we should follow it. The scope that we need includes most of the C Standard Library; POSIX additions; and some necessary, system-specific extensions. This does not mean we should (or can) follow the entire specification – there will be some parts which simply aren’t worth implementing, and some parts which cannot be safely used in modern coding practice.

  4. Vendor extensions must be considered very carefully, and only admitted when necessary. Similar to Clang and libc++, it does seem inevitable that we will need to provide some level of compatibility with other vendors’ extensions.

  5. The project should be an exemplar of developing with LLVM tooling. Two examples are fuzz testing from the start, and sanitizer-supported testing.

Great.

There are also few areas which we do not intend to invest in at this point:

  1. Implement dynamic loading and linking support.

It will be useful to have a design document that describes the kind of system and capabilities that you’re targeting, and then we can discuss how the libc might have a modular design that can be adapted for other use cases. I mention modularity because, for example, we have accelerator hardware and various kind of low-variability/embedded environments where many, but not all, POSIX/libc capabilities make sense.

I am of the opinion that modularity should be as fine-grained as possible. For example, one should be able to pick and package individual functions into a libc as suitable for their platform.
That said, I am open to other ideas you might have about modularity. I am also open to getting convinced that function level granularity is an overkill.

The main concern I have is that Windows is so different from
everything else that there is a high likelihood of decisions being
baked in early on that make things very difficult for people to come
along later and contribute a Windows implementation. This happened
with sanitizers for example (lack of support for weak functions on
Windows), LLDB (posix api calls scattered throughout the codebase),
and I worry with libc it will be even more difficult to correctly
design the abstraction because we have to deal with executable file
format, syscalls, operating system loaders, and various linkage
models.

The most immediate thing I think we will run into is that you
mentioned wanting this to take shape as something that sits in between
system libc and application. Given that Windows' libc and other
versions of libc are so different, I expect this to lead to some
interesting problems.

Can you elaborate more on how you envision this working with llvm libc
in between application and system libc?

Syscalls are operating system specific and architecture dependent so I think we’ll want an abstraction layer around the fundamental operations the syscalls support anyway. Some things like open aren’t even syscalls on all operating

Right, syscalls are OS and architecture dependent. So yes, one will have to build abstraction layers over fundamental operations in general.

systems. There might be a generic syscall layer added that would be architecture and not operating system specific but even on x86_64 there are two different ways to do syscalls I think. Loading, startup, and linking are all both format and operating system specific and a few of these details involved are determined by the architecture but they’re trivially abstracted away.

why is answering these questions at a general level important?

Because I wanted to make sure I understood the direction and the restriction stated. The restriction on what architecture will be used without stating a restriction on the operating system seemed like an odd statement. I’d very much like operating system abstractions to be considered right out of the gate and this seems like a bigger issue than the architecture to me.

Ah, I see what happened.
So, we are definitely not restricting anything by design here. All we are saying is that we do not intend to contribute beyond x86_64 and Linux to begin with. The community is free to contribute and widen the scope as suitable.

With respect to how exactly we want to build the abstractions, I am of the opinion that we have to go on a case by case basis. The scope of the project is so large that I think it is more meaningful to discuss designs at a more narrow level based on the area that is being worked on. Sure, we might end up discovering patterns down the road and choose to unify certain things eventually.

A typical application uses a large number of pieces from a libc. But, it is not practical to have everything implemented and ready in a new libc from day one. So for that phase, when the new libc is still being built, we want the unimplemented parts of the new libc to essentially redirect to the system libc. This brings two benefits:

  1. We can build the new libc in a gradual manner.
  2. Applications stay operational while gaining the benefits of the new implementations.

Do you foresee any problems with this approach on Windows?