RFC: Add an LLVM CAS library and experiment with fine-grained caching for builds

The idea entirely replaces object files with a store that contains equivalent data but without the same overheads and redundancies (it’s not a cache). The compiler writes its output to the store, the linker, debugger, and other downstream tools read directly from it to do their jobs. We’ve got incremental compilation working within the system.

FTR, it lives in a repo here and right now can compile, link, and run small C and C++ programs.

I’m not sure what you mean here. Ignoring things like strings and debug-info for a moment, we associate a hash of each global object’s (IR) source code with its binary representation and a hash of each translation unit’s names/linkages/glbal objects with a list of the same. “Ticket” files take the place of object files as proxy compiler output, linker input, and so that existing build systems continue to work.

The specially-written back-end store (pstore has some useful properties for the PR: separate indexes to minimize the linker’s work when searching for a compilation; internal “pointers” to allow fast inter-object references, append-only for lock-free read-only access, and so on. This was written because I couldn’t find a suitable pre-existing implementation. My thinking was that such a beast wasn’t really appropriate for upstreaming so the intention was to ultimately allow that back-end to be replaced.

Clearly having some sort of persistent store within the LLVM umbrella could be great for the PR. If it’s possible to implement the PR’s schema using a CAS then I’d love to learn more.

Yes, it very much sounds like it!

1 Like

These seem worthy and exciting goals. I watch this effort, prepo, and zapcc from afar with interest.

The repeated work the compiler does was very noticeable in manyclangs (devmtg talk, elfshaker), for example. One observation we have from that is that we could store builds of at each commit (+ all off the LLVM tools) with an amortized cost of something on the order of ~40kiB/build.

This is a number that haunts me a little because first-pass intuition and basic experiments (e.g. with ccache, or compressing binaries) would tell you to expect it to be much higher. This number seems a strong indicator that the caching and duplicate work story could be much better than it is.

One of the things manyclangs fights though is that building is already ‘quite cheap’ with modern hardware, as it stands. For typical development I’m already in the situation where the wallclock time is O(10s-1m) for rebases across main of llvm+clang. On a typical day I see ~a minute or two to update main, then more rapid builds thereafter. With my ccache size set to ~50G, and daily builds I see a lot of hits even with a fair amount of moving around.

We were able to reach an amortized incremental build rate of ~300/hour on one 64-core machine for manyclangs, using only ccache and a jobserver enabled ninja to avoid OOM.

Another observation I can share is that compressing the ccache with elfshaker, I reached a value of ~300MiB to store a whole month worth of llvm-project ccache. This is again with ‘quite rapid access’ that elfshaker allows, to unpack this whole ccache with quite a small CPU time.

It would be useful to have a standard set of experiments where you build a specific set of commits in a particular order from a cold cache, so you can make a fair comparison against different approaches such as vanilla ccache or bazel, etc.

1 Like

Thanks for the clarifications! Seems like a great system and I think we’ll find lots of ways to collaborate.

The “CAS-optimized object file experiment” I mention in the RFC does the same trick as ProgramRepository does with ticket files, writing the CAS object identifier into .o files. It also handles .a files as CAS trees; not sure if ProgramRepository does something similar.

At the very least, I bet ideas from pstore would be useful for the builtin on-disk CAS implementation… pretty similar design goals, but pstore seems like it has had more (any!) in-practice use. The on-disk CAS is also append-only, lock-free for reading (also, rarely locks for writing), and has fast inter-object references (as of last week). (Eventually I imagine garbage collection could be useful—current model is wipe and restart—but doesn’t seem necessary initially. Could be implemented through interposition of a new (initially empty) CAS that imports not-yet-dead stuff from the old one before deleting it, I think.)

I suspect it is (possible). I imagine you could store every entity in the CAS; you could use the first ref (or first byte of data) as a type-id, if that’s useful/necessary. The CAS gives you back a handle (CAS identifier), which you can then use to look it up again. Anything you want/need to look up by something other than its content or its identifier you can add to an action cache (key-value map). One thing missing from the prototype built-in CAS that ProgramRepository might want is a “name service” (similar to “refs” in Git) which can change what they point at; it sounds like maybe the filesystem is providing that service (via the tickets in the .o files), but could also be a useful thing to add.

Could you put a minimal CAS interface with a trivial implementation in llvm/Support and put the spiffy/complicated/real implementation in an Incubator?

-Chris

Yeah, I think the on-disk CAS implementation could be in an incubator.

Does that address the concerns? It’d avoid accidentally promising stability for the on-disk format, which is good. Most of the experimentation would be in LLVM and Clang (changing code to (optionally) use the CAS). We’d also need a CMake option for LLVM to depend on the incubator project (I assume this is possible?).

We should be able to work through injection? LLVM should be able to be built in isolation, and then the incubator be built against a pre-built LLVM, injecting its logic in some way.
Basically users of the CAS interface could address an abstract interface, and it’ll all about passing in a concrete implementation.

That would also ensure that another implementation of the CAS “backend” could always be used with LLVM (no direct coupling with the incubator implementation of the interface).

Thanks for pointing me at manyclangs and elfshaker! Incredible compression / speed numbers. Seems like some crossover with the CAS-optimized object file experiments we’ve started into.

My impression is that many developers are not in this situation yet; but even for those that are, compiling faster and/or with less power is still a good thing :).

Just want to point out that while many of the initial experiments have focused on full compilation caching in Clang, that’s only one place we’re interested in using the CAS; looking to experiment with using it for fine-grained stuff throughout the compiler stack.

That said, a standard set of experiments for full compilation caching makes sense to me. Simplest would be to pick a tag of llvm-project—such as llvmorg-15-init—then build llvm-test-depends on the last 100 affecting commits in order (git rev-list --reverse -100 $TAG -- llvm/).

Sure, that’s possible, and I think it’s the right place to be in the end, but not where we were hoping to start.

What you’re describing is landing plugin support for connecting an external CAS, and the incubator would be one such external CAS plugin. We don’t have a design for plugins, and I don’t think the design will be trivial. (For example, for security model reasons, we are envisioning plugins using IPC, not dlopen; if the builtin on-disk CAS is only accessible via a plugin, then the CAS library in LLVM would need the protocol logic for connecting to an external plugin/daemon; and we might need to solve IPC efficiency issues before experimenting further.)

IIUC, this suggestion would block experimenting in the compilers until we’d resolved all that.

  • Is that the right thing? I’m not saying it’s the wrong thing… just pointing out that it’s a big piece of work that we haven’t done yet. (I was hoping to delay plugins until “later”.)
  • Is it good to have LLVM support an external CAS plugin interface right off the bat? Is the potential instability of that interface easier to reason about than an LLVMCAS library?

Alternatively, if there’s a way to set up a CMake flag to (optionally) pull an incubator into the LLVM build, then we won’t need plugins (yet); or maybe you’re thinking of “injection” in some other way?

I guess you’re bringing the question of complexity and plugins because you’d like to be able to build in-tree the clang binary itself with support for the CAS implementation builtin? I was coming from a library development approach instead.

There is also a precedent for what you’re looking for (which I think is fairly terribly intrusive right now), search for HAVE_TF_API inside LLVM! (we really should aim at something more pluggable and minimally intrusive instead).

There seems to be a discussion around what specifically should go into llvm/Support, would it be more reasonable to have the CAS library as its own independent library in llvm, without putting anything into llvm/Support?

This seems tricky because you likely should use ADT and other things from Support inside CAS.
That said I’m in favor of moving libSupport (or a subset of it) at the top-level of the repo, which should address this!

To clarify, there is concern with putting it in llvm/lib/CAS (with lib/CAS depending on lib/Support but not the other way around)?

Ah sorry, I thought you mean a top-level /CAS in the repo to make it independent, llvm/lib/CAS may be a good step with the current directory setup indeed!

1 Like

We’re going to host a video call on Friday at 1pm PT next week for anyone that wants to discuss further!
(@akyrtzi will reply with a time and link soon).

The main agenda is to discuss questions relating to the immediate plan:

  • Should LLVMCAS exist at all, or should its pieces just land in LLVMSupport?
  • Where should we be working on follow-up experiments?
  • Start with SHA1? Or join LLD and use BLAKE3 from the start?

See more details below.

I’ve also included here my interpretation of consensus for the other questions in the RFC (not planning to discuss on the call, but if someone has concerns, let me know).

Questions relating to the immediate plan!

Should LLVMCAS exist at all, or should its pieces just land in LLVMSupport?

LLVMCAS should exist! Plan (all with the usual incremental patches, tests, code review, etc.—this will take some time):

  • Land data structures and filesystem support in LLVMSupport (and/or ADT).
  • Land interfaces for CAS and action cache, with trivial, always-fail implementations, in LLVMCAS.
    • Later: if we need to add a user in LLVMSupport, we may sink this part to LLVMSupport.
  • Land builtin in-memory CAS in LLVMCAS.
    • Only unit tests and experiments (off-by-default) will use this. All tools will use the trivial CAS by default.
  • Land CAS-related utilities in LLVMCAS.
    • Experiments will use these utilities in llvm/ and clang/ and lld/.
    • Unit tests need the in-memory CAS.
    • Loop in Windows experts for design discussions on utilities related to the filesystem.
  • Land builtin on-disk CAS in LLVMCAS.
    • Configured off by default (opt-in CMake flag to compile and test).
    • Later: if LLVMSupport is moved outside of llvm/ (not proposed here!), we can evaluate moving to an incubator that the opt-in CMake flag causes llvm/ to depend on.
    • Loop in Windows experts for design discussions on filesystem and memory mapping.

Where should we be working on follow-up experiments?

It feels like people are happy with main branch at llvm/llvm-project!

Plan:

  • Prepare incremental patches for experiments (with RFCs where appropriate (e.g., for design questions)).
    • Most of these will add experimental options in existing tools (LLVM/Clang/LLD) using utilities in LLVMCAS.
    • We can decide on a case-by-case basis with relevant reviewers whether to configure out command-line options, or to allow hidden flags like -Xclang -fexperimental-raw-token-caching, or to add more visible flags.
  • Use an incubator project for any experiment that isn’t tightly coupled to existing tools.
    • E.g., an incubator seems like a great fit for a generic toolchain protocol for task discovery.

Start with SHA1? Or join LLD and use BLAKE3 from the start?

How soon is LLD planning to use switch to BLAKE3? Happy to switch to that as soon as LLD does!

Other questions

Does LLVMCAS need to support Windows immediately?

No! But:

  • Loop in Windows experts for design discussions.
  • Need to be sure things are implementable on Windows.

On design of the CAS object storage and action cache abstractions

Should the abstractions be stable to help downstream code?

  • No; they evolve incrementally, as usual)

Should the abstractions support plugins?

  • Eventually; and the plugin interface will need to be stable

Should plugins be “figured out” or examples implemented before landing?

  • No

On the implementation of the builtin CAS

Is the serialization of CAS objects stable?
Is the CAS hash function stable?
Is the persistent on-disk format stable?
Should clients be able to configure which stable serialization/hash/etc. to use?

  • Eventually! Not necessary for landing.
  • Also: align hash function with what LLD uses for PDB (currently uses SHA1 but may move to Blake3).

Do we need users of the library in-tree?

Likely moot; we expect to propose experimental adoption in clang and parts of LLVM shortly.

Does accepting this proposal commit us to […]?

No :).

From an inclusivity standpoint, I know there is no time that is universally convenient wherever you are on the planet, but in terms of day Friday really does not seem great: it is already the weekend in almost the entire world but America!

That’s a good point, thank you for raising it! Disregard that time-slot, we’ll do it next week.

We’ll have it on Tuesday, Mar 1, 2:30pm PST! Here’s the Webex link: Cisco Webex Meetings

1 Like

Thanks to everyone who joined the call for their time!

Notes from the call below (somewhat confusingly, inline with the agenda topics). Please excuse the poor notes; I missed a lot; filled in a few things after the fact (since I didn’t capture my own thoughts live); all mistakes are my fault!

TL;DR

  • Consensus for the plan as posted in the agenda
    • Except: re-evaluate whether on-disk implementation is off-by-default once there’s a patch for review
  • LLD not going to drive adding BLAKE3 to LLVM; we’ll look into it

Should LLVMCAS exist at all, or should its pieces just land in LLVMSupport?

  • Chris: key thing about LLVMSupport is it can’t depend on other libraries (particularly LLVM libraries, but also protobuf)
    • Thread was talking about moving LLVMSupport out of llvm/
    • But goal is for it to be layered underneath LLVM
    • Treat it like a separate thing even though it’s not currently
    • Aside: LLVMSupport looks like a centralized thing, but could be split up with e.g. networking / XPC long-term
      • Core generic interfaces in support make sense
      • Implementation separate is better
  • Reid: filesystem: useful to have it look up in the CAS
  • Matthias: sounds like we want the minimum for unit testing
  • Chris: shouldn’t it be finer-grained than VFS?
    • E.g., rather than duplicating a high-level build system, for instruction selection you want the fine-grained immediate input to the computation (the function itself)
    • Duncan: that’s right
      • The VFS parts are (only) useful for higher-level tasks (such as full compilation caching, or modules)
      • Fine-grained computations should not use the filesystem
  • Chris: Also, lots of this are like a research project
    • Core interfaces (such as VFS)—don’t want to regress these for all research that doesn’t work out
    • Important to be careful; off-by-default / incubator where appropriate; don’t want to add burden / tax where it doesn’t make sense
  • Matthias: Object design
    • Duncan: three kinds of objects
      • Abstract node (data plus references to other objects)
      • Tree (filesystem tree: names sub-objects)
      • Blob (data only)
    • Matthias: difference between tree and node?
      • Duncan: Node is the core interface; Tree has extra metadata for filesystems
      • Motivation for having “tree” at the high level is if it’s useful when plugging in an external CAS

LLVMCAS should exist! Plan (all with the usual incremental patches, tests, code review, etc.—this will take some time):

  • Land data structures and filesystem support in LLVMSupport (and/or ADT).
  • Land interfaces for CAS and action cache, with trivial, always-fail implementations, in LLVMCAS.
    • Later: if we need to add a user in LLVMSupport, we may sink this part to LLVMSupport.
    • Chris: is that going to work?
    • Duncan: I think so
      • The main question is whether we need to add something to llvm::vfs::FileSystem API for the VFS-related CAS stuff
      • If so, we may need to sink a couple of interfaces into LLVMSupport; can figure it out when those utilities get reviewed
  • Land builtin in-memory CAS in LLVMCAS.
    • Only unit tests and experiments (off-by-default) will use this. All tools will use the trivial CAS by default.
  • Land CAS-related utilities in LLVMCAS.
    • Experiments will use these utilities in llvm/ and clang/ and lld/.
    • Unit tests need the in-memory CAS.
    • Loop in Windows experts for design discussions on utilities related to the filesystem.
  • Land builtin on-disk CAS in LLVMCAS.
    • Chris: why isn’t this portable? is it fundamental?
      • Duncan: using memory mapping. Some files are created as 4MB then mapped in as 1GB… on POSIX, once the file grows, the mmap will lazily support it. IIUC, this doesn’t work the same way on Windows; there’s an extra layer of indirection available that I found that should work, but haven’t done it.
    • Chris: recent blog post about databases: mmap not great way to do this stuff
    • Configured off by default (opt-in CMake flag to compile and test).
      • Argyrios: motivation: lack of Windows support
      • Opinions about on/off-by-default
    • Reid: experience: people add things, add tests, tests don’t work out of the box
      • Going through a stabilization process, with things configured off for everyone, seems better
      • Off by default
      • Get it ready
      • Then a switch to turn it on
    • David Blaikie: a different take
      • If you get accustomed to off-by-default, don’t foresee the constraints
      • Then stabilization period can be really rough as bots start to see the code
      • Whereas developing incrementally could be more portable
      • debuginfod had a kernel dependency
      • Clang modules feels similar. They’re not used by default in the resulting clang binary, but support is always compiled in. (Duncan: one difference with CAS is the Windows bit)
      • Don’t feel super strongly; not convinced
    • Chris: we can lazy-evaluate when the patch is proposed
      • Doesn’t have to be one monolithic choice, individual pieces can be configured on by default when it makes sense.
      • (consensus)
    • Later: if LLVMSupport is moved outside of llvm/ (not proposed here!), we can evaluate moving to an incubator that the opt-in CMake flag causes llvm/ to depend on.
    • Loop in Windows experts for design discussions on filesystem and memory mapping.

Where should we be working on follow-up experiments?

It feels like people are happy with main branch at llvm/llvm-project!

  • Chris: only concern is about research — carving massive tracks through Clang wouldn’t be great
  • If we use same usual standard, should be okay; just watch out for intrusive stuff

Plan:

  • Prepare incremental patches for experiments (with RFCs where appropriate (e.g., for design questions)).
    • Most of these will add experimental options in existing tools (LLVM/Clang/LLD) using utilities in LLVMCAS.
    • We can decide on a case-by-case basis with relevant reviewers whether to configure out command-line options, or to allow hidden flags like -Xclang -fexperimental-raw-token-caching, or to add more visible flags.
  • Use an incubator project for any experiment that isn’t tightly coupled to existing tools.
    • E.g., an incubator seems like a great fit for a generic toolchain protocol for task discovery.
    • Chris: e.g., full distributed compute backend talking to services

Start with SHA1? Or join LLD and use BLAKE3 from the start?

How soon is LLD planning to use switch to BLAKE3? Happy to switch to that as soon as LLD does!

  • Reid: no plan to work on it; hashing type records doesn’t actually affect compile time
  • This seems like a use case though
  • One reason it wasn’t prioritized, was licence / copyright / etc.
  • Duncan: current licence is public domain and Apache2
  • Clean slate implementation would be easier
    • They may be willing to add Apache 2 with LLVM exception
  • Reid: they have a simple reference implementation
    • And completely family of vector implementations
    • Could just take the simple one?
  • Create a Phabricator review with reference implementation; ping Chris
3 Likes

Thank you again for driving this discussion Duncan and also for capturing the notes!

Is this something that’s part of LLVMCAS or something that prepo would need to add itself?

I should add that prepo’s pstore back-end uses memory mapping and works on both POSIX and Windows today. If there’s anything in there we can share, I may be able save you some work! I didn’t find it easy to deal with the differences between Windows and POSIX VM.

(Apologies for missing the call: my viasat internet has been out for over a week.)