LLVMCAS Upstreaming

LLVMCAS Upstreaming

From what feels like ages ago, we posted our RFC about integrating Content Addressable Storage into LLVM to enable compiler caching (RFC: Add an LLVM CAS library and experiment with fine-grained caching for builds). If you don’t remember the details, you can also watch our LLVM dev meeting talk (https://youtu.be/E9GdNKjGZ7Y) and read about our round-table summary from the meeting (Round Table about CAS and Compiler Caching in 2022 LLVM Dev Mtg).

While we got initial good feedback from the community, we struggled to find reviewers for our patches when we started the upstreaming process. Thanks to @dblakie and others put in time and effort to provide valuable feedback, but we didn’t quite get enough feedbacks to comfortably land our changes for such a major new component. Even though there isn’t much action happening here, we definitely didn’t give up on the CAS or compiler caching. We continue to work on downstream (GitHub - apple/llvm-project: The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. This fork is used to manage Apple’s stable releases of Clang as well as support the Swift project.) and improve on what we proposed. Since then, we have clang modules working with CAS and we prototyped CAS support into swift compiler as well.

Now with llvm-17 branched, it is a good time to revisit CAS upstreaming and we need your help. If you are interested in CAS, caching and build performance, please help us upstream our implementation. The overall changes are big and I will try to break down into different parts so reviewers can have easier time. Let me know if you can help with any of them so I can add you as reviewer (as I create even more patches).

CAS Implementation

We have both in-memory and on-disk implementations with basic functions. Even though we haven’t done much performance tuning yet, our implementation is very efficient (e.g. small on disk size without compression) and fast (loading/traversing CAS objects). It can be broken down into 3 different categories:

Clang Integration

  • llvm VirtualOutputBackends: utilities to virtualize compiler outputs so outputs can be re-direct/mirror into different OutputBackend. It is a useful tool to integrate CAS into various tools (clang, llvm-tblgen, swift, etc).
  • clang dependency scanning daemon: an out-of-process dependency scanner daemon, which an ongoing GSoC project is also actively exploring:
    • clang -cc1depscand which starts a daemon from clang binary
    • clang-driver can coordinate caching build without build system support with daemon
    • Currently, it has a very simple protocol and communicate with clang processes via Unix socket (needs to support non-Unix platforms)
  • Clang cache integration: with CAS and dependency scanner, implement clang cache that is sound by natural.

Other

  • MCCAS ObjectFormat: using CAS to efficiently store object files, which is very well received during dev-meeting. It is lower on priority list because its dependency on rest of the work.

Current patches need review:

There are still lots of patches need to be created. If you want to see some areas mentioned above being prioritied, please let me know so I can adjust my work.

13 Likes

How far is Windows support in this? Was the design done with Windows in mind? Windows is sufficiently different that I think that we should ensure that Windows support is feasible before merging these changes. Ideally, that would have enough of the functionality implemented to demonstrate that feasibility. A particular example of problematic things is assumption of atomic file moves on the same volume (files cannot not always be overwritten when open on Windows).

There are few layers of answers to your question. The short answer is that our solution is definitely not ruling out any platforms from CAS enabled build flow.

First of all, we have windows implementations (in memory CAS isn’t really platform specific, while on disk implementation is) which seem to work on Windows but probably need an expert of review and endorse the implementation. Your concern about Windows volumes is valid but that is not a windows specific problem. You also can’t pick arbitrary file system to host our current on disk CAS implementation on macOS or Linux. Existing implementation relies on Posix conformed file system and in the long term, we need establish some criteria for what file system can OnDiskCAS be hosted.

Secondly, we are not trying to tied the bigger concept of speeding up compilation using a CAS with a specific CAS implementation. We want the API to enable compiler to interface with any CAS implementation. We are actively exploring storing compiler artifacts on the remote CAS that can be shared between multiple machines. One side note is that our OnDiskCAS implementation is extremely good at building/traversing complicated CAS Object structure and is the only CAS we are aware of that can enable CAS-based caching build with almost zero performance overhead.

Thirdly, the only reason why there isn’t a working demo for CAS enabled build on Windows is that such build requires either build system support, or using a daemon to coordinate dependency scanning. Our daemon is communicating with clang using Unix socket and we haven’t implemented any build system support that is available on Windows. None of those is theoretically blocking us from support windows platform in near future.

Once upstream, we will have more resources (windows buildbot, CI access) to better support Windows CAS support.

1 Like

(Just getting to this after some summer travels…)

I had Windows in mind when I designed this stuff. I researched Windows file mapping when I was writing the OnDiskCAS to be sure things were implementable (e.g., for managing “resizable” large memory maps, Windows actually seems more flexible than posix since there seems to be an extra layer of indirection). Aside from the OnDiskCAS implementation details, I don’t think there’s much Windows-specific stuff besides what hits the usual/non-CAS compiler flow.

You can go back to the original discourse thread (almost two years ago) for a summary of the original RFC call, which included discussion of Windows.

The conclusion at the time was that it was reasonable to make forward progress in tree. Your concerns don’t sound different than the ones raised at the time, but I could be missing something, or maybe it was the wrong conclusion.

Regardless, the CAS abstraction should be flexible enough to allow some windows-based file-backed CAS to be implemented efficiently, even if the existing posix one is the wrong design for Windows. There’s no assumption that they use the some on-disk representation. And things interacting with the CAS don’t count on it being persistent (the InMemoryCAS isn’t!).

These two seem like the main things to land to unblock forward progress. I was following along quite keenly, and was glad that they seemed to be almost ready to commit, but then it looked (to me) like you’d abandoned them. Here’s why I thought you’d moved on:

  • The patches leading to them were LGTM’ed but never committed.
  • While some patch updates were posted after the last reviewer feedback, never a comment saying they were ready for another look.
  • Then no pings at all (no updates since January, until this post).

Glad to hear that I understood incorrectly!

If you mention on those reviews that you think the latest concerns were addressed, perhaps @dblaikie would be willing/able to page them back in? (It’s possible he also assumed they were abandoned…) Or if David can’t help anymore, maybe he and/or I can help find a new reviewer…

I feel like once InMemoryCAS lands, including the CAS abstraction, it’ll be much easier to motivate other patches (e.g., the output proxy patches you linked in this thread).

(I’m still happy/eager to help with reviews… especially once you’re working upstream on stuff that I wasn’t the original author of, where I feel I have the distance to give unbiased feedback.)

Hi Duncan,

As a user, I’m very interested in CAS-related developments, but I’m a complete newbie WRT the LLVM contribution process and to phabricator, so I don’t know how this normally works, but I guess you’re talking about these three items:

D133713 [Support] Introduce ThreadSafeAllocator
D133714 [ADT] Introduce LazyAtomicPointer
D139035 [ADT] Add more ArrayRef ↔ StringRef conversion functions

…?

These were all marked “accepted and ready to land”. What was supposed to happen after that?

Once commits are approved, the next step is to land/push them and close the revision.

1 Like

[Still happy to page in enough context to pickup reviews, especially ones I’ve already engaged with previously - just need to ping them (though I’m out the next couple of weeks traveling) - usual recommendation applies, ping roughly weekly.]

Once commits are approved, the next step is to land/push them and close the revision.

Right, I was holding off the commits because I didn’t get enough traction from the main PR request for CAS interface and implementation. I don’t want to add lots of un-used data structure into repo. I will get them commit soon.

Thanks!

I had Windows in mind when I designed this stuff. I researched Windows file mapping when I was writing the OnDiskCAS to be sure things were implementable (e.g., for managing “resizable” large memory maps, Windows actually seems more flexible than posix since there seems to be an extra layer of indirection).

We switched to a spared file based implementation for OnDisk memory location so you might be able to help review. After talking to some file system people, this is more well defined under Posix. But we do have a file mapping version working under windows, so if that is something works better under windows, we can bring that back. Like I said, the only gap for windows is making a file system abstraction that works with Windows. I will see if I can something working.

Has this development moved over to github? (I ask because phabricator is going to become read-only in a couple weeks.)

Sorry for missing this comment. I am going to commit those approved on Phabricator this week and moving all the not yet reviewed patch to GitHub PR.

It is a bit tricky because all the dependencies between patches. @dblaikie @JamesWidman Should I do the upstreaming one patch at a time or should I combine some closely connected bits to form bigger PRs to be reviewed?

1 Like

not rightly sure what to do about dependent patches on github - could post your own branch and one pull request for the first change on the branch - so we can review that & can provide some non-official-review feedback on the branch, potentially, then post pull requests from there linearly/rebasing/etc the branch based on the pull request changes/feedback

1 Like

To keep the promise and lay a good foundation for the dev meeting, I proceed to finish following:

  • commit the patches for data structures that were reviewed
  • migrate patches to GitHub, with additional changes that can help reviewers to understand the bigger picture of the overall design.

The PR for CAS implementation is: [CAS] LLVMCAS implementation by cachemeifyoucan · Pull Request #68448 · llvm/llvm-project · GitHub. I will spend more time to refine the PR, including breaking it part, add documents and tests. Feel free to take a look and provide high level feedback!

One of the other pieces that is not directly related CAS but important for compile caching is to virtualize the output, whose patch can be seen here: [Support] Add VirtualOutputBackends to virtualize the output from tools by cachemeifyoucan · Pull Request #68447 · llvm/llvm-project · GitHub

1 Like