Incrementally compiling LLVM

Hi all,
I noticed that doing a partial recompilation on Windows (I tried
compiling with both MSVC and clang, in both cases using ninja generated
by CMake) does not seem to be any faster than compiling from scratch.

Is there any way to improve the situation? Is partial recompilation
supported at all, or am I doing something wrong?

Thanks,
Francesco Bertolaccini

Sounds like something's going wrong for sure. How are you measuring
the time? (how much time is it taking) and what are you testing by
"partial recompilation"? Touching an ADT .h file is likely to
recompile nearly everything so might not be much better than a clean
build - but touching a .cpp file (especially a .cpp file for a single
tool, rather than a library) might be quite quick. So try timing a
full clean build (ninja -t clean, ninja clang) and then incrementally
touching just a .cpp file (eg: clang/tools/driver/driver.cpp ) and
see how they compare

I currently do not have sufficient resources to compile LLVM on my
personal machine, so I am (ab?)using GitHub Actions. My workflow is
setup such that the build artifacts are cached and restored on every
worflow run. The time is measured automatically by GitHub itself, and is
around ~2h for each run, whether from scratch or starting from the cache.

I don't generally edit .h files directly, unless they are generated by
modifying TableGen files.

The fact that someone else has successfully been able to do incremental
builds _does_ make it seem like it's a configuration issue on my part.

For doing incremental compilation in such a context, I’d try to enable some sort of CCACHE instead, we have this on a bot and it’s quite effective in this context!

It’s not using GitHub actions thought, but here is a recipe for GitHub actions (I haven’t tried it): https://cristianadam.eu/20200113/speeding-up-c-plus-plus-github-actions-using-ccache/

Best,

> Sounds like something's going wrong for sure. How are you measuring
> the time? (how much time is it taking) and what are you testing by
> "partial recompilation"? Touching an ADT .h file is likely to
> recompile nearly everything so might not be much better than a clean
> build - but touching a .cpp file (especially a .cpp file for a single
> tool, rather than a library) might be quite quick. So try timing a
> full clean build (ninja -t clean, ninja clang) and then incrementally
> touching just a .cpp file (eg: clang/tools/driver/driver.cpp ) and
> see how they compare

I currently do not have sufficient resources to compile LLVM on my
personal machine,

A side note, have you tried these things:
https://llvm.org/docs/GettingStarted.html#common-problems - often use
of binutils ld makes it difficult to build/link (especially with high
parallelism of multicore processors) & switching to gold or lld can
make things much more manageable.

This flow is being used successfully by both npcomp and circt, which are llvm incubator projects. You should be able to leverage/build off of the github actions flows that they define. see https://github.com/llvm/circt and https://github.com/llvm/mlir-npcomp

Steve

I have not yet tried the ccache solution mentioned before, but I tried
changing my flow to use the Visual Studio generator instead of Ninja,
like the linked projects do, and it still takes a long while to do a
recompilation, even though I just re-run the job
(https://github.com/frabert/llvm-project/runs/2354303561)

I am now wondering: does the fact that I am *not* enabling the host
target affect something? I am only interested in the Mips target so it's
the only one I enable, thinking it was going to speed up compilation,
but maybe I am not doing myself any favors?

Thanks to everyone so far, I am going to try ccache next

Francesco

> This flow is being used successfully by both npcomp and circt, which are
> llvm incubator projects. You should be able to leverage/build off of
> the github actions flows that they define.
> see https://github.com/llvm/circt
> and https://github.com/llvm/mlir-npcomp
> <https://github.com/llvm/mlir-npcomp>
>
> Steve
>
I have not yet tried the ccache solution mentioned before, but I tried
changing my flow to use the Visual Studio generator instead of Ninja,
like the linked projects do, and it still takes a long while to do a
recompilation, even though I just re-run the job
(https://github.com/frabert/llvm-project/runs/2354303561)

Yeah, I don't know how github actions work at all, or what might be
causing them to rebuild everything... maybe they copy the files into a
new location every time they run the actions, which updates the last
modified time and causes everything to rebuild? No idea.

I am now wondering: does the fact that I am *not* enabling the host
target affect something? I am only interested in the Mips target so it's
the only one I enable, thinking it was going to speed up compilation,
but maybe I am not doing myself any favors?

No, I shouldn't think that would adversely effect the ability to rebuild things.

Is it the same flow?
As far as I can tell you’re not getting a build directory in order to do an incremental rebuild with modified sources, but instead getting the build artifacts in order to build a downstream project entirely.

The only way I can see incremental compilation to work would be to get the entire source tree from the cache alongside with the build tree, and then run git fetch && git checkout <rev>. This would mark only the source file changed by git during the checkout.
Otherwise a fresh git clone will have the source files modified data as the time of the checkout, so more recent than the cached build dir.

Thanks, this seems to be the crucial piece of info I was missing. I was
under the impression that something like the hashes of the files were
taken into account, not the last edit time. I'll experiment with this.

Francesco

Good point. I don’t think it’s the same flow…sorry for the confusion.

Steve

Thanks guys, I finally figured it out. Caching the whole git repo +
build artifacts was the missing piece.

Cheers!

Francesco

FWIW, there are some build systems that do the fingerprinting sort of
thing you're thinking of - but yeah, most do it with timestamps. (once
the bazel build files are in-tree, you could try those, for instance -
as bazel does fingerprinting (though I'm not sure, maybe it skips
that/is faster when the timestamps are enough to conclude that no work
needs to be done))