These notes were taken in the moment and are not 100% accurate. They also don’t necessarily represent any sort of settled consensus, they’re more what people said at the time.
- LLVM-libc allocators and scudo
- Sanitizers have their own allocator, Scudo grew out of that
- Scudo has quarantining, checksum instructions
- Scudo standalone was a rewrite since the original sanitizer components weren’t production ready
- Scudo’s the default allocator on android, also available on linux
- Scudo has been focused more on security than performance.
- Scudo performance has been improving, closer to jemalloc now.
- LLVM-libc needs an allocator, didn’t want to write our own
- Decided to pull in Scudo, since it was already production ready.
- Also we have a new malloc for embedded, it’s not well suited for linux
- Pulling in scudo is annoying, since it’s buried deep in compiler-rt
- One option to make scudo easier to pull in would be to make it its own top level project
- Scudo’s build system has a lot of configurations
- Is the standard way to share a piece of code between parts of the LLVM project to split things out into their own folder?
- There isn’t really an established method, hand in hand has been one of the first.
- Compiler-rt has some code sharing but it hasn’t been principled.
- How many allocators does LLVM have?
- Seems like 3
- LLVM-libc’s malloc
- Scudo
- RP malloc - maintained externally, but not actually part of LLVM or being actively developed.
- We should probably do the same thing for all of them
- Seems like 3
- What about libc++ new/delete?
- Is there a way to namespace things so that allocating with malloc and freeing with delete crash?
- Given that RP malloc is more of a third party project it might be best to put it in a third party directory instead of mix it in
- Also RP malloc can’t be used as an allocator for LLVM-libc right now (though there’s been some work done)
- libc++ new/delete generally just forward to the C allocator so there’s not currently a way to namespace the allocations.
- There would need to be a new system to tell the system allocator where the allocation is coming from.
- There is new work going on to do tagged allocations in clang, which would detect this.
- Apple and Google are both working on this.
- Should the LLVM-libc embedded allocator be turned into a general allocator?
- Probably not, it’s much more set up for a system without an MMU.
- There’s also heapprof, the heap profiling runtime
- Idea is that you can instrument your code and generate a profile so it can optimize.
- Can dispatch to different allocator entrypoints based on if the section will be hot or cold
- Only supported by TCmalloc for now.
- Sanitizers have their own allocator, Scudo grew out of that
- Build and source organization
- compiler-rt has a weird build system built on custom commands because they needed support for fat binaries before cmake supported fat binaries
- bunch of different runtimes which are all treated as llvm subprojects (kinda like external frontends)
- Time to deprecate LLVM_ENABLE_PROJECTS for building runtimes.
- This will hopefully make improving the runtimes build much easier
- The runtimes should be separated from the LLVM subproject code because they’re very different.
- Common build logic should be unified, libc++, libc++abi, libunwind are a good example.
- They are mostly copy/pasted.
- Petr suggests breaking compiler-rt into many top level projects
- need to decide what should be on its own and what should be combined.
- Scudo
- builtins
- sanitizers (might want a different name, since the profiler also uses sanitizer common)
- need to decide what should be on its own and what should be combined.
- For the LLVM side, the llvm directory is the root and shares a bunch of cmake etc. which many subprojects (i.e. clang) pull in.
- There’s something like this which is the runtimes directory, but it’s not as widely used.
- The cmake directory was broken out relatively recently, we’d like to accelerate that.
- Every time we move something there are exotic targets that break
- Example: There previously wasn’t a useful way for cmake to check if a linker has a flag. LLVM added one, but it wasn’t ideal so there was a second one added for runtimes. After that cmake added a built in way to do it as well, and that works slightly differently again.
- Duplication of configuration options in the runtimes
- Specifically libc++ and the libc++ runtimes
- Focus on enabling exceptions
- there are three flags LIBCXX_ENABLE_EXCEPTIONS, LIBCXXABI_ENABLE_EXCEPTIONS, LIBUNWIND_ENABLE_EXCEPTIONS.
- They all control things separately
- Louis proposes: RUNTIMES_ENABLE_EXCEPTIONS
- That flag sets up an interface target at the top level, and subprojects can pull in that interface target if they want to respect the flag.
- This is a way to carry the state in a way that’s cleaner than flag checks everywhere.
- This would mean that you can’t build libc++, libc++abi, and libunwind with different flags.
- Might not matter for exceptions but could matter for other options.
- Then again, for most of this sort of thing it probably doesn’t matter.
- Best way to find who’s using each option might be to just add a deprecated flag on the old option.
- Building with mixed sets might technically work, but it might not be useful to support.
- Can we unify all the various cmake feature checks?
- Yes, it’d make things faster since Cmake caches results, but it’s annoying to find.
- libc++ has a different system for configuration
- Basically there’s a cmake file that explains how to do certain things.
- link in the system libraries, compiler runtime
- This system allows more customization, people can bring their own flags, downstream users can bring a whole file and also have their own flags.
- Basically there’s a cmake file that explains how to do certain things.
- Are some of the flags not necessary?
- Are there any compilers that don’t support fno-exceptions?
- yes, windows.
- could query cmake for OS to get information about the system, but you could end up in a weird spot.
- Might be useful to have multilibs so there’s one library that works for no exceptions and one that works for yes exceptions.
- Weird compilers in the room?
- It seems okay to be opinionated on “use only one version of a certain library at once”
- Right now when you build LLVM, it defaults to using the gnu toolchain. Once LLVM has a full toolchain it might be good to switch the defaults.
- It would be important to keep the ability to switch individual pieces, e.g. replace just libc++ without changing your libc.
- Is it possible to build the runtimes from sources for embedded?
- There would need to be some different decisions
- Runtimes on demand is one way to do that, but it’s based on bazel
- If you were doing this you’d want to install sources as well as libraries
- Also need to include some metadata to explain the flags for building those sources.
- Would want to do this in a uniform way. Avoid having 5 different ways for installing sources.
- Uniform cmake would help with that.
- Crazy idea:
- .a file full of self-compiling sources
- For breaking up compiler-rt, it seems like we want to at least break up builtins
- The builtins currently have two ways to be built, one that builds just the builtins and one that builds them with the rest of compiler-rt.
- Cmake would need to have some handling because cmake needs to check if it has builtins.
- cmake times
- cmake checks a whole bunch of features of your compiler and libraries. It doesn’t need to do that for well known configurations.
- Corey: We have given up on dynamic libraries in our build
- Everything’s static in the toolchain.
- They had to redo how cmake does its checks, since you can’t change parts later if they’re static.
- They do a multistage build.
- First stage: Just a simple compiler
- Second stage: Compiler and system headers
- Third stage: now have a complete compiler and can run tests.
- Just shipping sources doesn’t really solve this.
- Deleting cmake would solve this, but it’s too much effort.
- Aiden did some cmake performance testing for the buildbot
- 60% of the time is spent in the cmake binary, applying PGO only got 10%.
- After the break
- Upstream testing of libc/compiler-rt using emulation upstream
- Runtime testing is difficult on embedded
- Current testing assumes you have a complete system, usually runs a program
- The program often hasa posix-y assumptions
- Emulation is better than hardware, hardware tends to break randomly
- If you’re doing testing you need to have startup code for your platform
- Testing LLVM-libc means building startup code
- There’s been some work to use picolibc for baremetal testing of libc++ under qemu
- Probably extensible to LLVM-libc
- Need to decide where the the startup code goes.
- Also need to decide how we handle support
- ARM has startup code for qemu on arm, but wouldn’t work on other systems
- How much benefit is there to having several similar configurations?
- LLVM-libc is very configurable, you might end up with specific combinations of modules
- Also it might be testing if there is exists a configuration for your platform
- The libc++ buildbot is currently active
- is it fast?
- An individual run is on the order of minutes, but trying every config is longer
- Want to avoid the geometric explosion of configurations, focus on a set of targets that will hit the major configuration points.
- Also need to set a time budget for precommit
- Each configuration takes around 5 minutes, takes 10 minutes currently for 32 bit and 64 bit builds.
- Also need to decide precommit vs postcommit
- Generally: Focus on latency for precommit vs coverage for postcommit
- is it fast?
- The libc++ picolibc bot takes about an hour with an hour wait time.
- This isn’t a problem for libc++, they have staged builds where things are set up to be fastest first then slow later.
- From the libc++ side having it precommit is much better than postcommit.
- Having it in postcommit means someone needs to fix it when it breaks. Libc++ doesn’t really want to have someone managing the build.
- There are situations where you run into weird bugs on specific platforms
- Might be useful to have regression style tests, so that we can check the things that have been problems in the past.
- Basically, just run the tests that have failed.
- LLDB has a test suite where it debugs some example programs, it works well on a host but doesn’t work by default on hexagon.
- They ended up making a host with a simulator it reaches into for debugging.
- Might be useful for other runtimes.
- For running the LLVM-libc tests, might be useful to break things down smaller, avoid significantly duplicated tests
- Some targets can be done just with qemu user mode. Peter used that for compiler-rt builtins.
- Doing a full system boot on baremetal might not actually be very expensive.
- Also qemu user mode might not be available on every platform
- Qemu isn’t perfectly accurate, there are simulators that ARM has which are more accurate but aren’t open source.
- Would people be okay with running a buildbot with a closed source simulator?
- For Corey, they wouldn’t be able to share the emulator at all
- Probably can’t really help with the pieces that can’t be shared, but if there are close proxy platforms we could keep that green.
- Qemu testing also involves starting qemu once for each program, which can be slow.
- Is there a way to have some sort of test server?
- Maybe but who’ll build it?
- Does LLVM-libc want to have a skeleton of boot code that can be customized?
- Fine with me
- Semihosting is a thing on arm/aarch64, lets you do host communication for things like file writing.
- might also be supported on risc-v 64.
- Problem: Not really testing what you ship.
- If you want to do a different approach of testing, Daniel has done some atari 2600 testing on github actions.
- Basically, can take a crc of the memory buffer and check if it matches.
- Runtime testing is difficult on embedded
- Libc++ performance and metrics
- There’s been a bunch of work on libc++ performance work, but they need help setting their metrics.
- They’ve been using microbenchmarks so far, but those may not be representative of real world use.
- Need to figure out what metrics people actually care about.
- Ideal world: If you’re contributing to libc++, or even libc and the other runtimes, you have a button to press that will let you check some set of benchmarks before you submit.
- Aiden: Google’s canonical open source benchmark is fleetbench.
- One problem: Fleetbench uses abseil for some pieces, if you want to be specifically testing libc++ it might be important to replace that.
- Orthogonal problem: Need to have quiet machines to get useful performance results
- Eventually need to get dedicated hardware.
- Other option: clang itself. It’s got a lot of stuff going on.
- Working on LNT to improve measurement.
- Where does the noise in the performance signal come from?
- Lots of sources, MacOS specifically has a less noisy configuration that isn’t available publicly.
- Noise also is less important if you can look at long term trends.
- Are there specific functions you care about?
- Sometimes people ask “can you compare these two libraries?”
- not really helpful: usually people actually only care about a few functions
- Currently focusing on associative containers
- Things like vector, string, etc.
- People use them widely, even though they’re not the most performant.
- They also have a lot of space to optimize, have already improved them significantly
- Currently looking into inlining vector and string, but that might cause regressions on some targets.
- The prioritization has been focused on frequency of use within libc++
- Sometimes people ask “can you compare these two libraries?”
- The libc++ performance suite is using their conformance suite which can be run with libstdc++, which allows for direct comparison.
- There can be noise from things like alignment, not just from the machine.
- For some Google performance tests, there are options to force alignment to avoid that.
- Forcing alignment doesn’t work on embedded since it blows up codesize
- How could the inlining cause performance issues?
- It can explode the code size if the compiler can’t eliminate dead code, which can also cause cache misses.
- There are some performance speedups that are in the pipeline that are still being confirmed for standards conformance
- Example: Iterating through the linked list from both ends to give more time for cache hits
- LNT as community supported?
- Discussions will be ongoing
- Hopefully have an LLVM-wide way to track performance over time
- How are the performance improvements checked for regressions?
- They’re using the libc++ benchmarks and the spec benchmarks (since Apple has a license)
- They also have the ability to test the new benchmarks on old code, but they have not done it yet.
- Will libc++ be more aggressive about breaking the ABI?
- There is the stable ABI and the unstable ABI. The unstable ABI is available and tested.
- Apple cares a lot about the ABI staying stable right now.
- Does libc++ test the performance of both the stable and unstable ABI?
- In the long term the goal is to test all the configurations people care about, including the stable and unstable ABI, also hardening.
- Performance data for hardening would be very helpful for pushing it forward.
- Google uses unstable ABI for both fuchsia and linux production.
- They also care about hardening on those targets.
- There’s interest in working groups improving the performance of clang as well, with an eye to also improve rust.
- This would be relevant to hardening optimizations as well.
- Binary size also matters for some users.
- It’s a less noisy metric than runtime performance, might be able to piggyback on the existing setup.
- Would it be okay to just check the size of the test binary?
- Might be too coarse grained.
- “Identical code folding” causes noise in binary size
- Memory usage also matters.
- More of a macro scale issue
- Sharing math code between runtimes
- There are currently many implementations of the same functions inside of LLVM
- Things like type conversions, basic arithmetic
- Not all of these are well tested
- There was a bug in compiler-rt’s div implementation
- There is inconsistent support
- compiler-rt might have just some of the operations
- lots of assembly implementations in compiler-rt
- Not necessarily well optimized for all uses
- Might be large and performance optimized, but only show up on embedded hardware without a float coprocessor
- If we focus on generic implementations, we can have broader support and most of the performance
- Idea: Move to using generic implementations from LLVM-libc
- Benefits: Well tested, existing, supports specializations
- Challenges: Missing some targets, system dependent fenv, compiler-rt’s cmake is a pain
- The work to move LLVM-libc’s math to header only is ongoing
- Can be used within libc++ and LLVMsupport soon.
- The math functions being discussed here are mostly the basic operations
- Having assembly versions may be handy, if it’s only a 5% improvement it might not be worth the effort but 2x or 4x probably would be.
- The compiler-rt builtins are just kinda there, nobody’s really maintaining them.
- Moving to the libc team maintaining them would be an improvement.
- Does LLVM-libc have everything that compiler-rt does?
- For the math builtins, yes.
- Let’s do it then!
- Where would the test suite be?
- LLVM-libc doesn’t support everything that compiler-rt does.
- For now, use the compiler-rt test suite.
- Over time move the correctness testing to the libc side.
- Are there users depending on the builtins being in C instead of C++?
- The only places in LLVM that are written in C are builtins, profiling, and some block device for Apple. It would be beneficial to unify everything in C++.
- Petr is suggesting we should move everything to C++ regardless.
- The only places in LLVM that are written in C are builtins, profiling, and some block device for Apple. It would be beneficial to unify everything in C++.
- There exists a target where branches are free and shifts are expensive, is it possible to support specializing the builtins for that?
- Yes, it can use the same specialization as everything else.
- Building libc++ on LLVM-libc
- How do we want to handle pieces that libc++ uses but LLVM-libc doesn’t want to support, like locales?
- Do we want a carve out for some pieces in libc++ or just stub things in LLVM-libc
- Louis: Might be best to do some of both
- Things like locales that are widely considered unhelpful it might be useful to have a carveout.
- On the other hand if we have something that only LLVM-libc is not supporting then it might be best to just stub.
- Example: If we want to fulfil the interface of the localization side without actually having locales that would be fairly easy, fuchsia already has a header to do that.
- Sunsetting may be more complex than expected, if you turn off localization then you also need to turn it off in the test suite.
- Doing this on the libc++ side would be better since it already has a switch to do that.
- Things like threading area already abstracted
- Tests don’t call the thread API directly, they call “test::thread” which is provided by a specific header
- This means that vendors can provide their own override header to provide constants or do something similar.
- Runtimes have some duplications on how assertion failures are handled
- Libc++ and LLVM-libc have their own separate internal assertion setups.
- Would it be good to have a shared assertion implementation?
- Putting it in some place where any runtime can grab it would be useful
- That place would need to be very restricted
- Discussion on switching between libcs as a driver flags
- There is interest in this.
- Are the libcs assumed to be compatible?
- How are you going to find the libc?
- There’s already an llvm flag in the target triple, which selects things like startup code or library.
- One problem: There are a lot more libc implementations than libc++ and the other things that have their own flag.
- Do we really want to be enumerating all libc implementations in the clang driver?
- Also libc has a standardized name, they’re all “libc.a” or “libc.so”
- Some have other things that need to be linked in, e.g. libgloss
- Not every platform has the same set of libcs.
- Is there a way to generalize the multilib yaml into a way to describe what libc means for my platform?
- Similar: GCC spec files, which are not good.