Hi Maksim,
Any updates on adding BOLT to LLVM?
If you need any help / support, feel free to ask. The World is waiting
for BOLT!
Yours,
Andrey
Hi Maksim,
Any updates on adding BOLT to LLVM?
If you need any help / support, feel free to ask. The World is waiting
for BOLT!
Yours,
Andrey
One more thing (to clarify my interest): my team is working on Golang
support in BOLT, and we're keen to open-source our developments
(pending approvals from the higher-ups). It's much more preferable for
us to contribute our code to LLVM project.
Hi Andrey,
We appreciate your interest and we look forward to collaborating. We are currently rebasing BOLT on top of LLVM trunk. Since itâs been a while since the last rebase, this is a bit of an involved task and we need to work through a rather lengthy list of conflicts. After we finish this and make sure BOLT works on the new repo, we plan to publish the list of commits and the merging diff so the community can evaluate a project merge proposal that works.
Regarding the project organization, remember BOLT was created before llvm monorepo. To address this, we are currently going for a similar approach to the one used by flang, re-editing all of our history on top of a new folder structure (root repo /bolt, similar to /flang), but trying to keep old commits mostly intact so we preserve project history â Iâm happy to change this to whatever makes more sense to the community. The least intrusive way to do this that I found was the flang merge approach. Now, because the project is not so small, we need a starting point that works in LLVM trunk, everything self-contained in /bolt with as few diffs as possible in /llvm, and then from there possibly work on evolving the project to other suggested organization (such as breaking up BOLT in a lib in llvm/lib). But first we wanted to start with the rebase that we knew would take some time.
Thatâs the gist of the current direction, thanks for pinging!
-Rafael
Hi Rafael,
Thanks for the update!
I understand that preparing a big project for inclusion into LLVM
properly is a ton of work. Again, if you need any help / support,
please let me know.
Yours,
Andrey
Hi,
We finished rebasing BOLT on top of the LLVM monorepo and we verified that the new BOLT is performing as expected. To make BOLT work, we have a few changes to LLVM libs, which we will submit for review (first changes are already up: D97531, D97830, D97899, D97898, D97891, D97830).
The plan for the initial BOLT commit is to include all its parts under a single directory, either /bolt or /llvm/tools/llvm-bolt. Once complete, this approach will allow people to directly contribute to the project and start using BOLT as part of LLVM. After this phase, we would like to start working with the community to break BOLT into separate components that will make it easier to build new tools based on the BOLT technology. As suggested by Propeller folks, we will split the disassembler component from the rest and make it possible to perform optimizations on low-level binary IR, which will likely have a serializable form.
Itâs still unclear, though, the proper location of BOLT in the monorepo. In our rebased branch, we are currently in a /bolt top-level folder in the monorepo, but are also considering /llvm/tools/llvm-bolt.
We are trying to work out the pros and cons of living in these locations and would appreciate community input. From our understanding, living under the /bolt top-level folder would give BOLT the following advantages:
Living in /llvm/tools/llvm-bolt, on the other hand, is perhaps more aligned with a longer-term goal of migrating BOLT to live as a lib under /llvm/lib and has the following advantages:
Any thoughts on this?
Iâm probably not the most relevant opinion here, so take any of this with a grain of salt: Generally Iâd err towards inclusion in the llvm subproject, as you say, for easy movement of code into reusable libraries, etc - though I guess if youâre a sibling like clang or lld thatâs still possible - sinking code down into the common llvm infrastructure as desired.
How much code is bolt? If itâs in llvm, how much more CPU time does it add to build and test?
As for testing - is llvm-mcâd assembly sufficient for testing? That might be a tipping point in deciding whether it should live separately (so that folks can opt out of it)
âreal-world test binariesâ are probably not at thing that should be part of the usual testing, if by that you mean existing/production binaries, as opposed to small targeted binaries of only a few instructions (enough to demonstrate some specific feature of bolt). In the same way that lldâs test suite doesnât have âreal worldâ object files being linked into full production binaries, but small targeted/hand-crafted examples.
Hi David,
With respect to the amount of code, what we would add is pretty much the code that is in https://github.com/facebookincubator/BOLT but rebased to use updated LLVM interfaces. These files/folders in the root folder of the facebook github repo would then live in llvm/tools/llvm-bolt (thatâs how we do it but in an older fork of llvm).
Regarding CPU time used for the build of llvm, the burden we add is about 80 new C++ files and 2 binaries to be linked (llvm-bolt and merge-fdata â other tools are just a symlink to llvm-bolt). I did a quick check here and my machine built llvm+clang+lld in 6m5s (user time 273m) and llvm+clang+lld+bolt in 6m20s (user time 284m). Testing in LIT is minimal at 20 tests (running in a few seconds), but we would like to expand it and support it better. Internally we have more LIT tests, but unfortunately they rely a lot on real binaries (not necessarily large, but think of bzip2, for example, which is large enough to do not make sense to put it into the repo because it doesnât isolate a single feature of bolt that needs testing).
This smaller set of 20 tests we currently have are targeted hand-crafted inputs written in assembly, which are nice to read and understand, but the problem is that they require the linker to be consumed by BOLT. If we canât use a linker, I guess we could check the binaries directly to the repo if they are minimal, even though people wouldnât be able to easily read the contents. We could make BOLT read .o files directly for testing purposes (straight out of llvm-mc), but that feature needs to be developed.
In general lllvm/tools are supposed to be entry points that exercises the LLVM Libraries. Iâd be concerned about adding a tool/bolt that contains more than that (i.e. the entire implementation of the framework, instead of having it live in libraries). But it seems like you intend this as a step towards this? Is there a well defined plan to get there?
Is it difficult / overly involved to split things like the disassembler and other components in libraries that can live in llvm/lib/...
and use them from tools/bolt/? Can this be done ahead of time and upstream these libraries first ahead of bolt itself?
Thanks,
Hi Rafael, Thanks for the update on the plan.
I have a question about upstreaming phase ordering. Is there a strong reason to proceed with the order as proposed? It seems more natural to me to do the other way around: 1) refactoring bolt code; 2) check-in utility libraries in LLVM, and then 3) push the BOLT main implementation. There are many advantages doing that:
Thoughts?
thanks,
David
Hi Mehdi and David,
Indeed, we share similar concerns. We do intend to move functionality of BOLT to live as a library, but the timeline is unclear. In fact, most of BOLT could live in a library already, itâs just a matter of moving some files into separate components. Instead of the files living in tools/llvm-bolt, most could just be moved under lib/something, and we already have a llvm-bolt.cpp file that instantiates the driver that coordinates the binary rewriting process, which is the entry point of BOLT as a library. People could already leverage this to use BOLT in different ways (for example, I wrote some time ago a different utility that runs the driver for two different binaries and compares the two â this was named boltdiff later).
My main reason for committing the project as a whole first, in the same way as flang did, though, (as a project merged into the monorepo), is because BOLT is already opensource for a while, and it is a 6-year old project with about 800 commits and 50K lines of code and we know we have people who forked the project and would like to contribute to it. If I commit into LLVM a different BOLT (not just rebased), then I (a) break or make it hard for any work on top of it from other contributors, (b) lose the original history or make it harder to preserve it. Thatâs why I was going for a more smoother transition. I, as a developer, put value in the ability to blame and to understand why things were built a certain way, and not bringing BOLTâs history (in the same way as flang did) would mean we and the community loses a lot of context on the decisions of the project. And I guess thatâs also the rationale for a monorepo, to have multiple projects merged together.
Because of that, I initially put bolt under /bolt, following flangâs model of merging the history so every developer has the right context. But the original location was under llvm/tools.
That makes sense, but something unclear to me is that refactoring it in separate libraries in-tree right after merging it will also âbreak any work on top of itâ from people who forked it, wouldnât it? How would this be managed after Bolt gets in-tree?
I guess a first step could be to produce a âsnapshotâ of the monorepo after you rebase, so that folks can look at the actual proposal, the code structure, and discuss the actual modifications that would be required pre-merge and agree and the plan post-merge. How does it sound to you?
Best,
As with others, Iâm not very aware of the internal architecture of bolt, so take this with a grain of salt:
From what I understand, I have a slight preference for starting this out as a /bolt top level âsubprojectâ, because the code currently sounds monolithic. As the implementation logic is refactored into more reusable units, those library can be cleanly movable within the monorepo, e.g. under the llvm-project/llvm directory if appropriate.
The advantage of doing this is that nothing in the llvm-project/llvm repo can come to depend on the bolt code until and if it gets refactored. This is also how things like LLDB started out (and it would be great for more of the reusable libraries in LLDB to be merged into LLVM over time).
Does anyone have any concerns about this approach?
Unrelatedly, Iâd also love to see the llvm repository exploded a bit into more top level repos, e.g. splitting support/adt out to their own thing. It is also worth considering splitting the MC layer out to its own thing as well, LLVM IR and the mid-level optimizer into its own thing, and CodeGen and the targets into its own thing.
The major constraint we need is that we want the dependences between top-level subproject to be a strong DAG between the subproject now and defensible into the future, and we donât want minor evolution of the codebase to cause libraries to have to be moved around. The benefit of splitting it up is easier to enforce layering, encouraging LLVM developers to work across subproject a bit more, and making it easier for subproject to depend on slices of âthe big llvm directoryâ.
-Chris
Dropping Bolt to the top level directory sounds reasonable, but perhaps a hybrid approach similar to what is mentioned by Medhi can be applied. Basically Bolt first goes through a round of refactoring in github upstream first with design that is close to the future structure in LLVM, and then drops in as a monolithic piece initially. This will make future restructuring much easier. There are other benefits: 1) it is a good opportunity to clean up Boltâs internal APIs 2) It is time to beef up unittests; 3) it makes code review easier.
David
Chris, the approach of living under /bolt sounds reasonable to me.
Mehdi and David, the difference of doing things in-tree vs out-of-tree is that, currently, BOLT out-of-tree has
(1) different legal requirements for accepting contributions (external contributions require devs to sign a CLA). So I agree with Mehdi that the same forks will get broken as we refactor code, but once BOLT is in the llvm monorepo, at least they will have the chance to upstream it with different legal requirements. If they donât want to upstream it, thatâs fine too, but I would like to give them a chance.
(2) a different development workflow that is less open than LLVMâs. Because we want the input of the community on a refactoring that reflects how they want to use the libraries too, it would be more natural for this to happen inside in-tree LLVM.
David, if we try to coordinate this refactoring happening in both repos (library part in LLVM while the client part in our separate repo), that will be challenging to do because we wouldnât be able to easily test the LLVMâs diffs â a problem we are already facing with upstreaming our changes to LLVM without BOLT being there to easily show devs how our changes are actually used and tested. Moreover, other contributors who donât have easy access to our github repo will have a hard time working with us in the refactor as they wouldnât be able to do work on the tool (just the open library).
Mehdi, your suggestion looks good, I intend to show everyone the monorepo snapshot. We are making sure it is ready to be published and thatâs why Iâve been referring to our snapshot as âimagine our github repo contents are under /boltâ because that is pretty much it, but I will present it soon.
Mehdi, here is the snapshot of the LLVM monorepo with bolt living in /bolt:
https://github.com/facebookincubator/BOLT/tree/rebased
Last 6 commits will probably be rewritten or removed as we upstream changes to LLVM that need to land before the commits that change exclusively files in /bolt are pushed.
Chris, the approach of living under /bolt sounds reasonable to me.
Mehdi and David, the difference of doing things in-tree vs out-of-tree is that, currently, BOLT out-of-tree has
(1) different legal requirements for accepting contributions (external contributions require devs to sign a CLA). So I agree with Mehdi that the same forks will get broken as we refactor code, but once BOLT is in the llvm monorepo, at least they will have the chance to upstream it with different legal requirements. If they donât want to upstream it, thatâs fine too, but I would like to give them a chance.
(2) a different development workflow that is less open than LLVMâs. Because we want the input of the community on a refactoring that reflects how they want to use the libraries too, it would be more natural for this to happen inside in-tree LLVM.David, if we try to coordinate this refactoring happening in both repos (library part in LLVM while the client part in our separate repo), that will be challenging to do because we wouldnât be able to easily test the LLVMâs diffs â a problem we are already facing with upstreaming our changes to LLVM without BOLT being there to easily show devs how our changes are actually used and tested. Moreover, other contributors who donât have easy access to our github repo will have a hard time working with us in the refactor as they wouldnât be able to do work on the tool (just the open library).
Hi Rafael, I am not actually proposing an intermediate state where parts of BOLT lives in LLVM while the client lives in a separate repo. What I meant is a restructuring step within BOLT before dropping in LLVM. For instance, in the boltâs top directory, there are lots of different things â different driver programs, profile reader/writers, debug info handling, exception handling code, BOLT IR/core data structures (BB, Loop, Function) etc, pass managers etc. The Pass directory is also pretty flat. Some preliminary reorganization with more tests added can reduce a lot of churns in the future. WDYT?
thanks,
David
Let me add my modest +1 vote to committing BOLT as it is, and *then*
restructuring it as a part of LLVM development process -- with proper
reviews, etc.
This is how flang and OpenMP runtime had been added to LLVM project.
This is a sure way to start things going; otherwise we may end up with
a project preparing for inclusion into LLVM ad infinitum.
Yours,
Andrey
I think one thing we can all agree upon is the community wants a good balance between velocity and quality (ensured by proper reviews). I believe doing some preliminary restructuring and cleanups can help not only the quality, but improves velocity as well. A good structure serves the purpose of âself-documentationâ and will greatly help code reviewers (to be more effective).
thanks,
David
Let me add my modest +1 vote to committing BOLT as it is, and then
restructuring it as a part of LLVM development process â with proper
reviews, etc.This is how flang and OpenMP runtime had been added to LLVM project.
Actually if I remember correctly flang went through multiple months of preparatory upgrade that were asked for by some people in the community, and they did so out-of-tree before getting ready to land in a single merge.
This is a sure way to start things going; otherwise we may end up with
a project preparing for inclusion into LLVM ad infinitum.
We just have to make the expectation very clear and having a âmoving goalpostsâ situation and it should work fine. Any particular reason that would put us in a âad infinitumâ situation?
Let me add my modest +1 vote to committing BOLT as it is, and then
restructuring it as a part of LLVM development process â with proper
reviews, etc.This is how flang and OpenMP runtime had been added to LLVM project.
Actually if I remember correctly flang went through multiple months of preparatory upgrade that were asked for by some people in the community, and they did so out-of-tree before getting ready to land in a single merge.
As the person who requested the most changes for flang I concur here. There was some negotiation as to what was reasonable to expect before and what was easier to add after. I think we should get a proposal and a change that shows what weâre looking at as far as inclusion and we can make our evaluations at this point.
Thanks!
-eric