RFC: Stand-alone build support

TLDR; Skip to the Proposal section.

Hi,

Since the migration to the monorepo and Github, stand-alone build configurations for LLVM have existed in a very ambiguous state. There’s no documentation for how to use them, no buildbots testing them, and yet we still make a best effort to support this configuration in tree.

I would like to propose that we come up with concrete requirements for stand-alone builds and make sure that they are a supported build configuration going forward.

What are stand-alone builds?

I think everyone may have their own idea of what ‘stand-alone build’ means, but for the purposes of this proposal I would like to define stand-alone builds as follows:

A stand-alone build is a way of building an LLVM sub-project with only the source from the sub-project’s directory, plus the cmake files, headers, built libraries and tools from any other sub-project it depends on.

The important thing to note about this definition is that it means stand-alone builds should not be referencing source code (outside of the cmake files and includes) of other sub-projects. Some of the stand-alone build implementations in the tree currently use the LLVM_MAIN_SRC_DIR to find the llvm source directory. This would have to be changed if we adopt the above definition of stand-alone builds.

Benefits of stand-alone builds

Stand-alone builds make it possible to quickly update an LLVM sub-project by limiting the amount of code that needs to be built to produce a new build. This is important for Linux distributions (like Fedora and Red Hat Enterprise Linux, which I maintain), that need the flexibility to be able to deliver fixes quickly and without causing too much churn in the distribution. For example, stand-alone builds make it possible to patch a bug in lld and get a new build in 15 minutes vs the 12+ hours it would take to rebuild all the projects.

It also makes security and license analysis of the source code much easier if you only have the code for a single sub-project to analyze versus trying to analyze the entire monorepo.

Stand-alone builds also provide a good test case for building against a pre-installed version of LLVM or Clang. Consumers of the LLVM and Clang libraries build this way, so it’s beneficial that we have a test case for this in tree.

Disadvantages of stand-alone builds

The disadvantage of supporting stand-alone builds is that it adds complication to the build system, which may negatively impact users and developers who aren’t interested in this specific configuration.

Proposal

I propose that we formally support stand-alone builds (as defined above) for the llvm, clang, lld, and lldb sub-projects. Adding support for more sub-projects would need to be proposed via another RFC.

The high-level requirements for supporting stand-alone build would be:

  • Sub-projects must not use llvm-config and should instead use CMake files for reading llvm project configuration (This will help stand-alone builds operate more like the monorepo builds).
  • Sub-projects must not reference the LLVM_MAIN_SRC_DIR CMake variable when building in stand-alone mode.
  • Sub-projects must be able to build and run make check successfully (or at least not have any extra failures in the case of lldb) in the supported stand-alone configuration (Note: This may be a subset of the tests supported in the monorepo build e.g. lit tests but no unit tests).
  • The supported stand-alone build configurations must be documented including the CMake arguments.
  • There must be a buildbot to test the supported stand-alone build configuration(s).
  • There must be a code owner to ensure these requirements are met.

I’ve chosen these specific requirements, because I believe they will strike a good balance between providing useful stand-alone builds while not complicating the build system too much. However, I’m also open to other suggestions and ideas.

-Tom

5 Likes

I’m trying to understand this part better. What makes a standalone build different here than a non-standalone build where you use ninja lld to rebuild just LLD?

1 Like
cmake ../llvm && ninja lld

will rebuild all of libLLVM, not just LLD.

cmake ../lld && ninja lld

will rebuild just LLD against a separate existing pre-built libLLVM.

I’m talking patching a production binary, that is always rebuilt completely from source, so there is no build directory any more to run ninja lld.

Thanks for opening this discussion! This also affects conda-forge (think of a cross-platform distribution that grew out of the python ecosystem), and solving this would definitely help us as well!

We currently build the following subprojects separately:

  • llvm
  • openmp
  • lld
  • clang (together with clang-tools-extra)
  • libcxx (together with libcxxabi)
  • compiler-rt
  • mlir

(and I expect we’ll add flang & bolt in the future).

I think compiler-rt should be considered as well. It already has a COMPILER_RT_STANDALONE_BUILD switch, though that is currently running into just the kind of problems that this RFC is discussing.

The reason I left out compiler-rt, libomp, and libcxx is because I’m not familiar as much with the new runtime builds, so I wasn’t sure exactly how hard it would be to support stand-alone builds upstream. There have also been some concerns about stand-alone builds raised by the maintainers of some of those projects in the past.

However, I don’t think it would be impossible to support those, it’s just a matter of trying to make the changes as minimal as possible.

Hi Tom,

Did I get you right, that your main concern is build performance for a clean build (not an incremental one)?

Some alternative solutions that come to mind:

  1. Doesn’t jrtc27’s proposal already solve the problem? I generates a new /build folder and only builds your modified lld. Isn’t that what you wanted?

  2. Solving the problem with hardware rather than engineering effort: If you don’t have frequent builds it’s sometimes cheaper to just throw additional hardware at the problem rather than investing engineering hours.

  3. I’ve seen massive improvements in build performance on pre- and post-merge build servers using local build caching (ccache on Linux, sccache on Windows). We could offer something similar with remote/shared build caching, e.g. using sccache. This way you only need to build things locally that are not available on the cache. We could have a couple of buildbot workers fill the cache in some cloud storage and then let users pull from that cache (read-only).

I would like to propose adding flang to this list, since we already have a buildbot that does a standalone build of flang.

1 Like

Thanks Tom, +1

I’ve been baffled sometimes by standalone builds, and having them documented and tested will be a big improvement. One reason CMake is a haunted graveyard for me is these configurations I don’t understand and can’t verify.

Distro maintainers’ work is extremely valuable and this is clearly important to them, so to me standalone builds are worth the costs. If someone finds a simpler way to achieve the same end, great. But until then, this RFC should make standalone builds both cheaper and better.


Can you clarify the situation around clang-tools-extra? Today can’t be built fully-standalone, but can be buit as part of a standalone clang. This clang build seems to violate “only the source from the subproject’s directory”.

How is this setup affected by this proposal? Will it be supported, keep the ambiguous status quo, or explicitly unsupported? If standalone builds aren’t supported for certain subprojects, can the CMake logic be dropped?

Yes, this is the workflow, that I’m proposing we formally support, up until now it’s been unclear if this is supported.

If this were for development builds, then I think this would make sense, but our production build environment is locked down, so there is no internet access and build caching tools like ccache are not allowed.

It looks like the flang stand-alone build uses LLVM_MAIN_SRC_DIR to find googletest sources, so in order to include flang, we would need to either drop this part of the CMake code or append this proposal to allow using LLVM_MAIN_SRC_DIR.

clang-tools-extra is a special case, because it’s never supported stand-alone builds, I would be fine supporting it as part of the clang build if it’s not too difficult.

As for the other sub-projects not listed in this proposal, I tried to keep the proposal small and include only the ‘easy’ packages just so we can make some kind of progress. I would like to revisit some of the other packages in a part 2 proposal and see which projects can be practically built in stand-alone mode. But I don’t really want to keep the support ambiguous forever, so we may want to just pick a date to remove standalone build support if the part 2 proposal never materializes.

1 Like

Sidenote: it would be really nice if the subproject sources didn’t mangle the folder names. I completely agree that the tarball should be versioned - e.g. clang-14.0.0.src.tar.xz, but it would be nice if the contained folder would be called clang rather than clang-14.0.0.src (reason being that we would otherwise need manual editing of the patches we carry for every single release and release-candidate; it wasn’t an issue before LLVM 14 because there was only one folder, but following the addition of the shared cmake folder there are now two).

Would it be better to just package the cmake directory in a separate tarball? It seems like including it in the other tarballs has caused a few issues?

It’s a fair question to ask, but then it still falls back (IMO) on the larger issues that the subproject sources would become useless without the cmake-folder, and discovering the need for a separate tarball to go along with that is going to catch a lot of people on the wrong foot.

I understand the case for a shared cmake-infra, so from my POV the best would be to keep that and just adapt llvm/utils/release/export.sh slightly to keep the original folder names (while still producing versioned tarballs).

PS. I don’t know what other issues it caused; it’s possible that it’s still a worthwhile choice long-term to “educate” users to use subproject tarball + cmake tarball.

ISTM it would be good for the subproject sources to look like a filtered version of the whole repository extracted under a single toplevel directory. Having a tarfile extract to multiple toplevel directories is a bit unusual/unexpected. E.g. clang-14.0.0.src.tar.xz should extract to clang-14.0.0.src/clang/{lib,include,...} and clang-14.0.0.src/cmake/... rather than the current clang-14.0.0.src/{lib,include,...} and (unnumbered separate toplevel dir) cmake/....

1 Like

I’d have thought clang would be in the same boat here, as the clang unittests also make use of googletest. Are clang and flang doing things differently?

clang does the same thing as flang, and what I’m proposing is that we drop support for LLVM_MAIN_SRC_DIR and thus the unittests in the stand-alone builds. I think this will help simplify the CMake files. However, I am open to modifying this proposal to allow LLVM_MAIN_SRC_DIR or maybe instead we want to do something like LLVM_GOOGLE_TEST_SRC_DIR since it seems like the main use of LLVM_MAIN_SRC_DIR is to find the googletest sources.

Right, so unless we are able to package googletest as a “built library” with external headers, clang and flang lose the ability to build unittests in a standalone build. This is probably not desirable in the long run, but re-enabling that maybe could be deferred to a separate task.

Here’s two more options to consider:

  • Figure out how to eliminate the local patches, so we can use a vanilla unpatched googletest. Then, we can optionally use an existing system-wide installation of the libraries, or ask the user to download and provide their own copy of the sources.
  • Decouple llvm’s patched googletest from llvm, and ship it in each source tarball, like the shared cmake files are.

I like the first option. Why should we ship googletest?

The local patches do things like allow llvm’s string-like types (StringRef etc) to be used with gtest in ways that are natural to the llvm ecosystem. So, while technically they could be eliminated, they’re awfully convenient.

I was going to suggest moving llvm’s copy out to its own sibling tree, so it could be used more easily from other subprojects.

Why should we ship googletest? Because otherwise it’s Yet Another Dependency and llvm historically has wanted to minimize those.