2025 Runtimes Workshop Notes

These notes were taken in the moment and are not 100% accurate. They also don’t necessarily represent any sort of settled consensus, they’re more what people said at the time.

  • LLVM-libc allocators and scudo
    • Sanitizers have their own allocator, Scudo grew out of that
      • Scudo has quarantining, checksum instructions
      • Scudo standalone was a rewrite since the original sanitizer components weren’t production ready
      • Scudo’s the default allocator on android, also available on linux
    • Scudo has been focused more on security than performance.
      • Scudo performance has been improving, closer to jemalloc now.
    • LLVM-libc needs an allocator, didn’t want to write our own
      • Decided to pull in Scudo, since it was already production ready.
      • Also we have a new malloc for embedded, it’s not well suited for linux
      • Pulling in scudo is annoying, since it’s buried deep in compiler-rt
    • One option to make scudo easier to pull in would be to make it its own top level project
    • Scudo’s build system has a lot of configurations
    • Is the standard way to share a piece of code between parts of the LLVM project to split things out into their own folder?
      • There isn’t really an established method, hand in hand has been one of the first.
      • Compiler-rt has some code sharing but it hasn’t been principled.
    • How many allocators does LLVM have?
      • Seems like 3
        • LLVM-libc’s malloc
        • Scudo
        • RP malloc - maintained externally, but not actually part of LLVM or being actively developed.
      • We should probably do the same thing for all of them
    • What about libc++ new/delete?
      • Is there a way to namespace things so that allocating with malloc and freeing with delete crash?
    • Given that RP malloc is more of a third party project it might be best to put it in a third party directory instead of mix it in
      • Also RP malloc can’t be used as an allocator for LLVM-libc right now (though there’s been some work done)
    • libc++ new/delete generally just forward to the C allocator so there’s not currently a way to namespace the allocations.
      • There would need to be a new system to tell the system allocator where the allocation is coming from.
      • There is new work going on to do tagged allocations in clang, which would detect this.
        • Apple and Google are both working on this.
    • Should the LLVM-libc embedded allocator be turned into a general allocator?
      • Probably not, it’s much more set up for a system without an MMU.
    • There’s also heapprof, the heap profiling runtime
      • Idea is that you can instrument your code and generate a profile so it can optimize.
      • Can dispatch to different allocator entrypoints based on if the section will be hot or cold
      • Only supported by TCmalloc for now.
  • Build and source organization
    • compiler-rt has a weird build system built on custom commands because they needed support for fat binaries before cmake supported fat binaries
    • bunch of different runtimes which are all treated as llvm subprojects (kinda like external frontends)
    • Time to deprecate LLVM_ENABLE_PROJECTS for building runtimes.
      • This will hopefully make improving the runtimes build much easier
    • The runtimes should be separated from the LLVM subproject code because they’re very different.
    • Common build logic should be unified, libc++, libc++abi, libunwind are a good example.
      • They are mostly copy/pasted.
    • Petr suggests breaking compiler-rt into many top level projects
      • need to decide what should be on its own and what should be combined.
        • Scudo
        • builtins
        • sanitizers (might want a different name, since the profiler also uses sanitizer common)
    • For the LLVM side, the llvm directory is the root and shares a bunch of cmake etc. which many subprojects (i.e. clang) pull in.
      • There’s something like this which is the runtimes directory, but it’s not as widely used.
      • The cmake directory was broken out relatively recently, we’d like to accelerate that.
      • Every time we move something there are exotic targets that break
      • Example: There previously wasn’t a useful way for cmake to check if a linker has a flag. LLVM added one, but it wasn’t ideal so there was a second one added for runtimes. After that cmake added a built in way to do it as well, and that works slightly differently again.
    • Duplication of configuration options in the runtimes
      • Specifically libc++ and the libc++ runtimes
      • Focus on enabling exceptions
      • there are three flags LIBCXX_ENABLE_EXCEPTIONS, LIBCXXABI_ENABLE_EXCEPTIONS, LIBUNWIND_ENABLE_EXCEPTIONS.
        • They all control things separately
      • Louis proposes: RUNTIMES_ENABLE_EXCEPTIONS
        • That flag sets up an interface target at the top level, and subprojects can pull in that interface target if they want to respect the flag.
        • This is a way to carry the state in a way that’s cleaner than flag checks everywhere.
        • This would mean that you can’t build libc++, libc++abi, and libunwind with different flags.
          • Might not matter for exceptions but could matter for other options.
          • Then again, for most of this sort of thing it probably doesn’t matter.
    • Best way to find who’s using each option might be to just add a deprecated flag on the old option.
    • Building with mixed sets might technically work, but it might not be useful to support.
    • Can we unify all the various cmake feature checks?
      • Yes, it’d make things faster since Cmake caches results, but it’s annoying to find.
    • libc++ has a different system for configuration
      • Basically there’s a cmake file that explains how to do certain things.
        • link in the system libraries, compiler runtime
      • This system allows more customization, people can bring their own flags, downstream users can bring a whole file and also have their own flags.
    • Are some of the flags not necessary?
      • Are there any compilers that don’t support fno-exceptions?
      • yes, windows.
      • could query cmake for OS to get information about the system, but you could end up in a weird spot.
    • Might be useful to have multilibs so there’s one library that works for no exceptions and one that works for yes exceptions.
      • Weird compilers in the room?
    • It seems okay to be opinionated on “use only one version of a certain library at once”
    • Right now when you build LLVM, it defaults to using the gnu toolchain. Once LLVM has a full toolchain it might be good to switch the defaults.
      • It would be important to keep the ability to switch individual pieces, e.g. replace just libc++ without changing your libc.
    • Is it possible to build the runtimes from sources for embedded?
      • There would need to be some different decisions
      • Runtimes on demand is one way to do that, but it’s based on bazel
      • If you were doing this you’d want to install sources as well as libraries
        • Also need to include some metadata to explain the flags for building those sources.
      • Would want to do this in a uniform way. Avoid having 5 different ways for installing sources.
        • Uniform cmake would help with that.
    • Crazy idea:
      • .a file full of self-compiling sources
    • For breaking up compiler-rt, it seems like we want to at least break up builtins
      • The builtins currently have two ways to be built, one that builds just the builtins and one that builds them with the rest of compiler-rt.
      • Cmake would need to have some handling because cmake needs to check if it has builtins.
    • cmake times
      • cmake checks a whole bunch of features of your compiler and libraries. It doesn’t need to do that for well known configurations.
    • Corey: We have given up on dynamic libraries in our build
      • Everything’s static in the toolchain.
      • They had to redo how cmake does its checks, since you can’t change parts later if they’re static.
      • They do a multistage build.
        • First stage: Just a simple compiler
        • Second stage: Compiler and system headers
        • Third stage: now have a complete compiler and can run tests.
      • Just shipping sources doesn’t really solve this.
      • Deleting cmake would solve this, but it’s too much effort.
    • Aiden did some cmake performance testing for the buildbot
      • 60% of the time is spent in the cmake binary, applying PGO only got 10%.
  • After the break
  • Upstream testing of libc/compiler-rt using emulation upstream
    • Runtime testing is difficult on embedded
      • Current testing assumes you have a complete system, usually runs a program
      • The program often hasa posix-y assumptions
    • Emulation is better than hardware, hardware tends to break randomly
      • If you’re doing testing you need to have startup code for your platform
      • Testing LLVM-libc means building startup code
      • There’s been some work to use picolibc for baremetal testing of libc++ under qemu
        • Probably extensible to LLVM-libc
        • Need to decide where the the startup code goes.
        • Also need to decide how we handle support
        • ARM has startup code for qemu on arm, but wouldn’t work on other systems
    • How much benefit is there to having several similar configurations?
      • LLVM-libc is very configurable, you might end up with specific combinations of modules
      • Also it might be testing if there is exists a configuration for your platform
    • The libc++ buildbot is currently active
      • is it fast?
        • An individual run is on the order of minutes, but trying every config is longer
      • Want to avoid the geometric explosion of configurations, focus on a set of targets that will hit the major configuration points.
      • Also need to set a time budget for precommit
      • Each configuration takes around 5 minutes, takes 10 minutes currently for 32 bit and 64 bit builds.
      • Also need to decide precommit vs postcommit
        • Generally: Focus on latency for precommit vs coverage for postcommit
    • The libc++ picolibc bot takes about an hour with an hour wait time.
      • This isn’t a problem for libc++, they have staged builds where things are set up to be fastest first then slow later.
      • From the libc++ side having it precommit is much better than postcommit.
        • Having it in postcommit means someone needs to fix it when it breaks. Libc++ doesn’t really want to have someone managing the build.
    • There are situations where you run into weird bugs on specific platforms
      • Might be useful to have regression style tests, so that we can check the things that have been problems in the past.
      • Basically, just run the tests that have failed.
    • LLDB has a test suite where it debugs some example programs, it works well on a host but doesn’t work by default on hexagon.
      • They ended up making a host with a simulator it reaches into for debugging.
      • Might be useful for other runtimes.
    • For running the LLVM-libc tests, might be useful to break things down smaller, avoid significantly duplicated tests
    • Some targets can be done just with qemu user mode. Peter used that for compiler-rt builtins.
      • Doing a full system boot on baremetal might not actually be very expensive.
      • Also qemu user mode might not be available on every platform
    • Qemu isn’t perfectly accurate, there are simulators that ARM has which are more accurate but aren’t open source.
      • Would people be okay with running a buildbot with a closed source simulator?
      • For Corey, they wouldn’t be able to share the emulator at all
        • Probably can’t really help with the pieces that can’t be shared, but if there are close proxy platforms we could keep that green.
    • Qemu testing also involves starting qemu once for each program, which can be slow.
      • Is there a way to have some sort of test server?
      • Maybe but who’ll build it?
    • Does LLVM-libc want to have a skeleton of boot code that can be customized?
      • Fine with me
    • Semihosting is a thing on arm/aarch64, lets you do host communication for things like file writing.
      • might also be supported on risc-v 64.
      • Problem: Not really testing what you ship.
      • If you want to do a different approach of testing, Daniel has done some atari 2600 testing on github actions.
        • Basically, can take a crc of the memory buffer and check if it matches.
  • Libc++ performance and metrics
    • There’s been a bunch of work on libc++ performance work, but they need help setting their metrics.
    • They’ve been using microbenchmarks so far, but those may not be representative of real world use.
    • Need to figure out what metrics people actually care about.
    • Ideal world: If you’re contributing to libc++, or even libc and the other runtimes, you have a button to press that will let you check some set of benchmarks before you submit.
    • Aiden: Google’s canonical open source benchmark is fleetbench.
      • One problem: Fleetbench uses abseil for some pieces, if you want to be specifically testing libc++ it might be important to replace that.
      • Orthogonal problem: Need to have quiet machines to get useful performance results
        • Eventually need to get dedicated hardware.
      • Other option: clang itself. It’s got a lot of stuff going on.
    • Working on LNT to improve measurement.
    • Where does the noise in the performance signal come from?
      • Lots of sources, MacOS specifically has a less noisy configuration that isn’t available publicly.
      • Noise also is less important if you can look at long term trends.
    • Are there specific functions you care about?
      • Sometimes people ask “can you compare these two libraries?”
        • not really helpful: usually people actually only care about a few functions
      • Currently focusing on associative containers
        • Things like vector, string, etc.
        • People use them widely, even though they’re not the most performant.
        • They also have a lot of space to optimize, have already improved them significantly
        • Currently looking into inlining vector and string, but that might cause regressions on some targets.
      • The prioritization has been focused on frequency of use within libc++
    • The libc++ performance suite is using their conformance suite which can be run with libstdc++, which allows for direct comparison.
    • There can be noise from things like alignment, not just from the machine.
      • For some Google performance tests, there are options to force alignment to avoid that.
      • Forcing alignment doesn’t work on embedded since it blows up codesize
    • How could the inlining cause performance issues?
      • It can explode the code size if the compiler can’t eliminate dead code, which can also cause cache misses.
    • There are some performance speedups that are in the pipeline that are still being confirmed for standards conformance
      • Example: Iterating through the linked list from both ends to give more time for cache hits
    • LNT as community supported?
      • Discussions will be ongoing
      • Hopefully have an LLVM-wide way to track performance over time
    • How are the performance improvements checked for regressions?
      • They’re using the libc++ benchmarks and the spec benchmarks (since Apple has a license)
      • They also have the ability to test the new benchmarks on old code, but they have not done it yet.
    • Will libc++ be more aggressive about breaking the ABI?
      • There is the stable ABI and the unstable ABI. The unstable ABI is available and tested.
      • Apple cares a lot about the ABI staying stable right now.
    • Does libc++ test the performance of both the stable and unstable ABI?
      • In the long term the goal is to test all the configurations people care about, including the stable and unstable ABI, also hardening.
      • Performance data for hardening would be very helpful for pushing it forward.
      • Google uses unstable ABI for both fuchsia and linux production.
        • They also care about hardening on those targets.
    • There’s interest in working groups improving the performance of clang as well, with an eye to also improve rust.
      • This would be relevant to hardening optimizations as well.
    • Binary size also matters for some users.
      • It’s a less noisy metric than runtime performance, might be able to piggyback on the existing setup.
      • Would it be okay to just check the size of the test binary?
        • Might be too coarse grained.
      • “Identical code folding” causes noise in binary size
    • Memory usage also matters.
      • More of a macro scale issue
  • Sharing math code between runtimes
    • There are currently many implementations of the same functions inside of LLVM
    • Things like type conversions, basic arithmetic
    • Not all of these are well tested
    • There was a bug in compiler-rt’s div implementation
    • There is inconsistent support
      • compiler-rt might have just some of the operations
    • lots of assembly implementations in compiler-rt
      • Not necessarily well optimized for all uses
      • Might be large and performance optimized, but only show up on embedded hardware without a float coprocessor
    • If we focus on generic implementations, we can have broader support and most of the performance
    • Idea: Move to using generic implementations from LLVM-libc
      • Benefits: Well tested, existing, supports specializations
      • Challenges: Missing some targets, system dependent fenv, compiler-rt’s cmake is a pain
    • The work to move LLVM-libc’s math to header only is ongoing
      • Can be used within libc++ and LLVMsupport soon.
    • The math functions being discussed here are mostly the basic operations
    • Having assembly versions may be handy, if it’s only a 5% improvement it might not be worth the effort but 2x or 4x probably would be.
    • The compiler-rt builtins are just kinda there, nobody’s really maintaining them.
      • Moving to the libc team maintaining them would be an improvement.
    • Does LLVM-libc have everything that compiler-rt does?
      • For the math builtins, yes.
      • Let’s do it then!
    • Where would the test suite be?
      • LLVM-libc doesn’t support everything that compiler-rt does.
      • For now, use the compiler-rt test suite.
      • Over time move the correctness testing to the libc side.
    • Are there users depending on the builtins being in C instead of C++?
      • The only places in LLVM that are written in C are builtins, profiling, and some block device for Apple. It would be beneficial to unify everything in C++.
        • Petr is suggesting we should move everything to C++ regardless.
    • There exists a target where branches are free and shifts are expensive, is it possible to support specializing the builtins for that?
      • Yes, it can use the same specialization as everything else.
  • Building libc++ on LLVM-libc
    • How do we want to handle pieces that libc++ uses but LLVM-libc doesn’t want to support, like locales?
    • Do we want a carve out for some pieces in libc++ or just stub things in LLVM-libc
    • Louis: Might be best to do some of both
      • Things like locales that are widely considered unhelpful it might be useful to have a carveout.
      • On the other hand if we have something that only LLVM-libc is not supporting then it might be best to just stub.
      • Example: If we want to fulfil the interface of the localization side without actually having locales that would be fairly easy, fuchsia already has a header to do that.
    • Sunsetting may be more complex than expected, if you turn off localization then you also need to turn it off in the test suite.
      • Doing this on the libc++ side would be better since it already has a switch to do that.
    • Things like threading area already abstracted
      • Tests don’t call the thread API directly, they call “test::thread” which is provided by a specific header
      • This means that vendors can provide their own override header to provide constants or do something similar.
  • Runtimes have some duplications on how assertion failures are handled
    • Libc++ and LLVM-libc have their own separate internal assertion setups.
    • Would it be good to have a shared assertion implementation?
    • Putting it in some place where any runtime can grab it would be useful
      • That place would need to be very restricted
  • Discussion on switching between libcs as a driver flags
    • There is interest in this.
    • Are the libcs assumed to be compatible?
    • How are you going to find the libc?
    • There’s already an llvm flag in the target triple, which selects things like startup code or library.
    • One problem: There are a lot more libc implementations than libc++ and the other things that have their own flag.
      • Do we really want to be enumerating all libc implementations in the clang driver?
      • Also libc has a standardized name, they’re all “libc.a” or “libc.so”
      • Some have other things that need to be linked in, e.g. libgloss
      • Not every platform has the same set of libcs.
    • Is there a way to generalize the multilib yaml into a way to describe what libc means for my platform?
      • Similar: GCC spec files, which are not good.