Preparing BOLT for LLVM monorepo

Just as a heads up, this pulls in a few dotfiles outside the bolt tree. The .dockerignore adding .git could conceivably break someone, so you might want to commit that separately so that it can be discussed/reverted. The other .github/workflows presumably aren’t intended for this repository either(?).

To diff the merge commit’s first parent (upstream llvm) against the merge commit (assuming bash for brace expansion of {^1,}), ignoring the bolt directory:

$ git diff-tree -r ‘HEAD^{/Merge BOLT into LLVM}’{^1,} :^bolt

:000000 100644 0000000000000000000000000000000000000000 6b8710a711f3b689885aa5c26c6c06bde348e82b A .dockerignore

:000000 100644 0000000000000000000000000000000000000000 477e9927279e0b6706a48f0d18f834137b92f115 A .github/workflows/Dockerfile

:000000 100644 0000000000000000000000000000000000000000 8deef62d15079ace72ac40db2335acf59d9a7e98 A .github/workflows/Dockerfile.aarch64

:000000 100644 0000000000000000000000000000000000000000 02ee68ff868f6865825db153ee3aea7d4f76813f A .github/workflows/docker-image.yml

Regards,

Peter

Good catch! Thanks for spotting this, I’ll remove those.

Best,

Rafael

Should be fixed now.

From a technical repository-health standpoint, this looks reasonable to me. (I have not reviewed and have no opinion on the actual code contained here.)

Your proposed repository has:

  • The expected commit-history-shape: A new unconnected root commit as the parent of a linear history of all the other bolt-specific commits. Followed by a single merge commit, with llvm mainline as the 1st parent, the bolt history in 2nd-parent.
  • The expected contents: All the bolt-side commits touch only files in bolt/. The merge commit causes no content changes: the bolt/ tree is identical compared to the 2nd parent, and the remainder are identical to the 1st-parent.
  • The size is not unexpectedly large (fetching bolt main and then running “git gc --aggressive” adds ~3MB to the repository size – 894M vs 891M).

Hi,

The instructions to build Bolt mention: -DLLVM_ENABLE_PROJECTS=“clang;lld;bolt”

It seems that clang is just a dependency here because you build C++ sources during lit-tests execution, but it’s not clear to me why we don’t use the host compiler or a prebuilt release here? Is there an intrinsic coupling between clang and Bolt at HEAD?

Having to build clang at HEAD just to run the bolt test seems like a non-trivial overhead for the casual developer, if the host compiler (or some clang pulled with apt-get or your favorite package manager) would just work.
Also in CI, even if you have a beefy enough machine, you add risks for the Bolt bot to be broken because Clang/LLVM itself would be broken: so instead of tracking Bolt bugs you end up tracking clang issues. Also the fact that clang being broken implicating you can’t test Bolt means that in the meantime new bugs can land in Bolt and complicate auto-bisection.
(The same question applies to lld I think?)

Thanks!

Hi Auler,

Thank you folks for bringing BOLT to LLVM. I’m supportive for the merge.

I don’t see any serious problems with a quick glance. Here are some general questions/suggestions. But they are not the block issues for the merge.

Hi Mehdi,

You raise a good point. At the moment we reluctantly went with the clang dependency route for small tests and for more involved BOLT development we have an external binary repo that stores large pre-built applications to be used as tests. You are right with respect to the downsides, but the alternative, which is to depend on the host compiler (we were previously using host_cc in LIT for quite a while) can cause more headaches in terms of test stability because we don’t have control over which compiler was used to build LLVM. BOLT operates at level that changing even the linker can already introduce changes that may modify the outcome of a test.

Regarding bugs in clang breaking BOLT tests, that is true. I wonder if this affects lldb a lot, though. It also depends on the trunk clang to build its tests. I imagine a debugger is also quite sensitive to the compiler used to build its test inputs.

To summarize, I see 3 options here:

1 - depend on host compiler

2 - depend on trunk clang

3 - depend on pre-built binaries

#1 is too unstable at the moment. It’s hard to communicate with other devs regarding why their test is failing because I can only repro it in my machine with my host compiler.

#2 may be unstable because trunk clang can be broken, and new bugs may be introduced while clang is not fixed. In my experience, BOLT is not easily broken by LLVM code, though, so we can afford some time with the tests being offline.

#3 is a burden on the repo because the pre-built binaries may occupy significantly more space than source code, and also are hard to read (even if using yaml).

Best,

Rafael

Hi Phoebe,

Thank you for the work of checking the current state of the codebase, I appreciate it. We will open an issue to address these points and will get back to them ASAP.

Best,

Rafael

Thanks for the explanations, it makes sense.
You may want to look at some point in the future into a CMake option for folks to provide a pre-built clang/lld (we even have such an option for providing a prebuilt TableGen when building LLVM).
That way one could invoke cmake -DLLVM_ENABLE_PROJECTS=bolt -DBOLT_CLANG_PATH=/toolchain/bin/clang -DBOLD_LLD_PATH=/toolchain/bin/lld
I would expect that for day-to-day development on a non-very-powerful machine that could make quite a difference for the casual developer.

Best,

That’s a good idea, I’ll look into enabling this. Thanks!

Best,

Rafael

Hi All,

As I understand, the merge is completed now
(https://github.com/llvm/llvm-project/commit/4c106cfdf7cf7eec861ad3983a3dd9a9e8f3a8ae).

My sincerest congratulations to the Meta team and all BOLT developers!

Hopefully, this will drive further adoption and development of BOLT
and BOLT-based tools!

Yours,
Andrey

Hi All,

As I understand, the merge is completed now
(https://github.com/llvm/llvm-project/commit/4c106cfdf7cf7eec861ad3983a3dd9a9e8f3a8ae).

My sincerest congratulations to the Meta team and all BOLT developers!

Congratulations to the BOLT team for landing this! A great milestone and a very useful tool to squeeze that extra performance! Kudos!

Cool! Congrats and thanks for the tremendous effort to make it happen!

David

Thanks again for all feedback and support!