Add more projects into Git monorepo

I am planing to add projects into https://github.com/llvm-project/llvm-project in near future, possibly this week.

Before doing that, I would like to ask users of it.
1st option is my preference in each paragraph. Let me know if you have other suggestions.

  • What is added?
    I will add; libunwind, llgo, openmp and parallel-libs.
    May I also add debuginfo-tests?

  • Will inactive projects be dropped?
    I won’t do.

  • Where shall I push the new monorepo?

  • Create a new repository, like llvm-project-201705XX, or *-(number of repos)
  • Push new branches into the current llvm-project.git as different branch name, like master-201705XX.
  • Force overwrite llvm-project.git/master. I won’t do since I was blamed when I tried in the last time.
  • When I migrate trunk to the new one, how the previous branches will be?
  • Push a merge-commit to the new one. Its commit message will announce the new repo.
  • Leave.
  • Update also previous branches as well. (I don’t want to do)

Thanks, Takumi

I have done just now. 5 repos added including debuginfo-tests.
ATM, it includes 17 repos total.

Rebasing to new branch(es) will take a few minutes. Please be patient.

Thanks, Takumi

Is this intended to be the monorepo that eventually becomes the official repo, because if so I strongly object to putting libunwind, libc++ and libc++abi in it. I have recently been working on bring-up for libc++ and libunwind on a new platform and the integration of libunwind with the LLVM build system is already annoying (you can’t build it unless you have a working C++ standard library implementation for your target, even thought it’s a dependency for libc++), having to have a complete LLVM checkout would be even more overhead.

David

Hi all,

I don't hope so, I haven't seen any public poll by the LLVM Foundation about which projects to include (might have missed that though).

I'm also opposed to having openmp and parallel-libs in there as it adds unnecessary overhead to everyone just interested in the OpenMP runtime. On the poll to the git migration in general, there was some mention on "only include repositories that are version-locked", ie LLVM libs, Clang, compiler-rt and possibly lld (?).

Cheers,
Jonas

Just to be sure:

https://github.com/llvm-project

is different from

https://github.com/llvm

isn’t it?

The migration of the reference/official llvm repository would be towards the latter I presume?

  • Matthias

I have done just now. 5 repos added including debuginfo-tests.
ATM, it includes 17 repos total.

- Created the new repo; https://github.com/llvm-
project/llvm-project-20170507.git
  Branches will come later.
- The previous repository has a merge commit that includes new repo. It
will stick to r302363.

Can you update
http://llvm.org/docs/GettingStarted.html#for-developers-to-work-with-a-git-monorepo
with new instructions for how to get a monorepo that is up to date?

Please clarify the overhead.

Hi Takumi,

I understood you wanted to add projects in the repo, but I didn’t get from the email below that the existing master llvm-projects.
This is not great since this was used by some people as a day-to-day development repo and is still documented on the getting started page!

I updated the instructions with the new URL in r302459.

If you are an existing user, you can run ‘git remote set-url origin https://github.com/llvm-project/llvm-project-20170507.git’ and ‘git reset --hard origin/master’ on the master branch to keep everything in the same repo. Rebasing existing branches onto master is prohibitively slow and error prone, so I recommend creating new feature branches and cherry picking the commits from old branches.

Even though it said I had on the order of 17,000 different commits, a simple git pull --rebase finished in a couple of seconds and everything seems to be working as intended.

My clone of libunwind is around 4MB. A clone of LLVM is 2-3 orders of magnitude bigger. The clone on my local system doesn’t matter too much (though it would be an annoying waste), because I have spare disk space, but each project, once it’s working, also gets cloned to our CI system, which is always short on disk space because it archives build artefacts. Network bandwidth is also an issue.

There’s also the secondary issue that it is valuable to be able to build these components out of tree, yet this is currently fragile and is likely to be broken even more if we’re insisting on the monorepo.

We are currently able to target our platform from LLVM (as a cross-compiler), but not build LLVM to run on it, so it is unhelpful to have stuff that we compile for x86 and stuff that we compile for our target in the same repo, because we aggregate the stuff that we build for the target (libunwind, libc++, and so on) when we build images.

Finally, there’s the philosophical / software engineering issue. There should be no tight coupling between libunwind and anything else in the LLVM tree. Libunwind implements a set of well-documented and stable APIs. These are used by other components, but are equally useful in other contexts (i.e. any compiler for any language that uses the Itanium unwind model). From the perspective of someone hacking on libunwind, LLVM is an unrelated project (though one that shares coding conventions - an analogy would be two projects under the Apache umbrella) and there is absolutely no reason to insist that libunwind developers should clone a massive unrelated project to work on the code that they want to work on.

All of this applies to libc++ and libc++abi as well.

David

>
>
> 2017-05-07 1:01 GMT-07:00 David Chisnall via llvm-dev <
llvm-dev@lists.llvm.org>:
> Is this intended to be the monorepo that eventually becomes the official
repo, because if so I strongly object to putting libunwind, libc++ and
libc++abi in it. I have recently been working on bring-up for libc++ and
libunwind on a new platform and the integration of libunwind with the LLVM
build system is already annoying (you can’t build it unless you have a
working C++ standard library implementation for your target, even thought
it’s a dependency for libc++), having to have a complete LLVM checkout
would be even more overhead.
>
> Please clarify the overhead.

My clone of libunwind is around 4MB. A clone of LLVM is 2-3 orders of
magnitude bigger. The clone on my local system doesn’t matter too much
(though it would be an annoying waste), because I have spare disk space,
but each project, once it’s working, also gets cloned to our CI system,
which is always short on disk space because it archives build artefacts.
Network bandwidth is also an issue.

I'd expect any CI system to be able to cache this.
Also if you're issue is archiving a lot of build artifacts, the constant
cost of the checkout isn't gonna matter that much.
Finally, the read-only individual repo can still be used by CI, which
address this entirely.

There’s also the secondary issue that it is valuable to be able to build
these components out of tree, yet this is currently fragile and is likely
to be broken even more if we’re insisting on the monorepo.

I don't see any rational for this.
Whatever has a CI is gonna continue to work. This is already the case
today: if you care about a configuration, provide CI for it and it'll
continue to work.

We are currently able to target our platform from LLVM (as a
cross-compiler), but not build LLVM to run on it, so it is unhelpful to
have stuff that we compile for x86 and stuff that we compile for our target
in the same repo, because we aggregate the stuff that we build for the
target (libunwind, libc++, and so on) when we build images.

Finally, there’s the philosophical / software engineering issue. There
should be no tight coupling between libunwind and anything else in the LLVM
tree. Libunwind implements a set of well-documented and stable APIs.
These are used by other components, but are equally useful in other
contexts (i.e. any compiler for any language that uses the Itanium unwind
model). From the perspective of someone hacking on libunwind, LLVM is an
unrelated project (though one that shares coding conventions - an analogy
would be two projects under the Apache umbrella) and there is absolutely no
reason to insist that libunwind developers should clone a massive unrelated
project to work on the code that they want to work on.

There is another philosophical perspective: encouraging communities to get
closer together. You talking about "libunwind developers", and there are
"lldb developers" as well, I rather get closer to: "we're working on the
same project", with shared practices and goals. And ultimately, to come
back to your software engineering practices, encouraging code motion and
code reuse between sub-projects.

All of this applies to libc++ and libc++abi as well.

Ultimately I don't know about libunwind, and if it has to live separately
it is not a big deal. The others (libc++ and libc++abi for instance) are
more tied to the rest of the project though.
We duplicate the demangler from libc++abi in llvm for instance, and this is
quite an important software engineer issue to me.

I'd expect any CI system to be able to cache this.
Also if you're issue is archiving a lot of build artifacts, the constant cost of the checkout isn't gonna matter that much.
Finally, the read-only individual repo can still be used by CI, which address this entirely.

If we want to pull in new libunwind fixes from upstream, we’ll also pull in irrelevant LLVM, clang, lldb, lld, and so on changes. This translates to extra bandwidth and storage requirements for *every* copy of the libunwind repo that we need.

If we follow the monorepo approach downstream and merge these independent repos, then we add extra merges for everyone downstream because people committing improvements to our LLVM and clang trees will require rebase pulls for anyone working on libc++ or libunwind, even though the changes were to a component that they’re not needing to build, let alone modify.

There is another philosophical perspective: encouraging communities to get closer together. You talking about "libunwind developers", and there are "lldb developers" as well, I rather get closer to: "we're working on the same project", with shared practices and goals. And ultimately, to come back to your software engineering practices, encouraging code motion and code reuse between sub-projects.

I disagree, as someone who wears hats as a libunwind, libc++, clang and LLVM developer: I am no more engaged between the different groups by having the repos combined, but I am inconvenienced by having to carry around clones of unrelated code when I am working on one component and by having to rebase my libunwind repo because someone committed to clang.

Combining the clang and LLVM repos is a necessary evil. If we could have clean layering and well-defined APIs for the LLVM APIs needed for clang, then I would be opposed to this as well, but unfortunately this has too high an engineering cost and so we need to be able to perform atomic commits of LLVM and LLVM-using projects (this, unfortunately, means that we often don’t see the cost that this imposes on developers of other front ends). In contrast, if we need to perform an atomic commit between libc++ and clang or libunwind and clang then this tells us that we have a bug: a new version of clang may introduce a feature that relies on a new libc++ or libunwind, but a new libunwind or libc++ should always work with an old clang (or an old gcc, or any other compiler that targets it).

All of this applies to libc++ and libc++abi as well.

Ultimately I don't know about libunwind, and if it has to live separately it is not a big deal. The others (libc++ and libc++abi for instance) are more tied to the rest of the project though.
We duplicate the demangler from libc++abi in llvm for instance, and this is quite an important software engineer issue to me.

The requirements for a libc++abi demangler and a generic LLVM one are very different. For libc++abi, the requirements are:

- Must be small (the binary size of libc++abi is very important)

- Must be tolerant of out-of-memory conditions (it is used for generating error messages when an out-of-memory exception is thrown)

- Must use malloc() / realloc() for providing the demangled string (a requirement of the Itanium ABI public APIs)

In contrast, the demangler for the rest of LLVM:

- Must be flexible (e.g. lldb wants to be able to get the base name of a demangled function, so that it can insert breakpoints on all overloads)

- Must be fast (e.g. lldb wants to demangle every symbol in a library in a UI-critical path)

- Must provide structured information about the demangled symbol, not just a string as output.

- Must integrate with other memory allocation mechanisms (e.g. support std::allocator)

Copying the demangler was a quick way of getting something to work portably, but it wasn’t a good solution given the different requirements (the libc++abi demangler doesn’t do a good job of meeting either set of requirements), so this is a very bad justification for merging the repos.

David

> I'd expect any CI system to be able to cache this.
> Also if you're issue is archiving a lot of build artifacts, the constant
cost of the checkout isn't gonna matter that much.
> Finally, the read-only individual repo can still be used by CI, which
address this entirely.

If we want to pull in new libunwind fixes from upstream, we’ll also pull
in irrelevant LLVM, clang, lldb, lld, and so on changes. This translates
to extra bandwidth and storage requirements for *every* copy of the
libunwind repo that we need.

I'm not sure if you really read the last sentence of what I wrote, or if
you followed the previous discussions on the plan here?
At this point I believe that this concern is non-existent per the read-only
individual repo.

If we follow the monorepo approach downstream and merge these independent
repos, then we add extra merges for everyone downstream because people
committing improvements to our LLVM and clang trees will require rebase
pulls for anyone working on libc++ or libunwind, even though the changes
were to a component that they’re not needing to build, let alone modify.

Every downstream has its own choice, I'm not sure what's the point here? If
it is not relevant to you, then don't do it...

> There is another philosophical perspective: encouraging communities to
get closer together. You talking about "libunwind developers", and there
are "lldb developers" as well, I rather get closer to: "we're working on
the same project", with shared practices and goals. And ultimately, to come
back to your software engineering practices, encouraging code motion and
code reuse between sub-projects.

I disagree, [...].

We can leave it there :slight_smile:
There have been extensive discussions, a BoF, and documentations, please
refer you to these first (granted we haven't really talked about libunwind,
but I'm not sure many people will be strongly opposed to libunwind having
its separate life).

>> All of this applies to libc++ and libc++abi as well.
>
> Ultimately I don't know about libunwind, and if it has to live
separately it is not a big deal. The others (libc++ and libc++abi for
instance) are more tied to the rest of the project though.
> We duplicate the demangler from libc++abi in llvm for instance, and this
is quite an important software engineer issue to me.

The requirements for a libc++abi demangler and a generic LLVM one are very
different.

They are so different that we ask anyone who want to change something to
the LLVM demangler to make the change in compiler-rt first and then pull
the patch...
Maintaining two demanglers (or more) is silly IMO (no-one is even
maintaining the existing one...).

I'm not sure if you really read the last sentence of what I wrote, or if you followed the previous discussions on the plan here?
At this point I believe that this concern is non-existent per the read-only individual repo.

The read-only repo is only useful if you don’t intend to contribute stuff back upstream. There is no convenient workflow for cloning libunwind / libc++ / libwhatever, hacking on it, and sending pull requests.

We can leave it there :slight_smile:
There have been extensive discussions, a BoF, and documentations, please refer you to these first (granted we haven't really talked about libunwind, but I'm not sure many people will be strongly opposed to libunwind having its separate life).

There have been multiple discussions, and the conclusion from all that I have participated in was that projects that are tightly version locked to LLVM should be in the monorepo, everything else should be separate. Apparently there is now a plan underway to not do this and to make life harder for people who work on the projects that are not version locked to LLVM.

I am simply repeating the objections that numerous people have made in each of these discussions.

David

>
> I'm not sure if you really read the last sentence of what I wrote, or if
you followed the previous discussions on the plan here?
> At this point I believe that this concern is non-existent per the
read-only individual repo.

The read-only repo is only useful if you don’t intend to contribute stuff
back upstream.

Your point was about CI...
(unless you're working on some CI that would fix bugs and send PR?)

There is no convenient workflow for cloning libunwind / libc++ /
libwhatever, hacking on it, and sending pull requests.

We considered git-svn for this though.

> We can leave it there :slight_smile:
> There have been extensive discussions, a BoF, and documentations, please
refer you to these first (granted we haven't really talked about libunwind,
but I'm not sure many people will be strongly opposed to libunwind having
its separate life).

There have been multiple discussions, and the conclusion from all that I
have participated in was that projects that are tightly version locked to
LLVM should be in the monorepo, everything else should be separate.
Apparently there is now a plan underway to not do this and to make life
harder for people who work on the projects that are not version locked to
LLVM.

We have a different understanding.

I feel like I need to take a minute here to voice my supreme frustration with the way this discussion has gone and this small sentence captures it entirely.

From beginning to end these git migration conversations have involved a whole lot of people talking past each other and a lot of assumptions that are not shared. Many of us are very much not on the same page. The only thing that we had a significant consensus on was that we’d like to move to GitHub. Other than that we have more disagreement than agreement.

We do not have consensus on an all-in-one monorepo, and any notion that we do is ignoring the significant dissent. There was less disagreement with a mono-repo that had only tightly coupled projects, but that itself is hard to nail down and define, and there are still many people (myself included) who prefer the multi-repo solution.

Mehdi, I don’t know if it is your intent, but in many places in this thread you sound as if this decision has been made and the community is fully supporting your decision. Please don’t do that. It would be nice if as a community we considered the concerns of our members instead of offhand dismissing them.

I think we should spend some time discussing and understanding the needs of our corporate contributors and the needs of the other open source projects that contribute to, use, and distribute LLVM. I believe that disregarding the concerns of communities like the BSD and Linux communities would be a severe detriment to the LLVM project.

-Chris

Hi Chris,

Personally, I don’t think it is even possible to please everyone. While I don’t want to ignore the dissent, I also don’t want to ignore the fact that the dissent has been a minority, and I believe there has been more dissent in not adopting monorepo as there has been to adopting monorepo.

Somewhere along the line, someone has to pull the trigger and make a decision. There are people who are just as inconvenienced today (which, btw, has been a lingering inconvenience for many years in some cases) by not having a mono-repo as others would be by having a mono-repo.

Discussions are only meaningful insofar as they contribute something new to the playing field. We have had so many discussions, surveys, talks, email threads, experiments, etc that at this point I honestly don’t know how else to meaningfully contribute something new?

Thank you for doing this, Takumi. Many of us who choose to use your
unofficial monorepo find it invaluable, and adding these additional
repos will be a big help.

-Justin