Git Move: GitHub+modules proposal

Thank you very much for driving all of this. I just have one quick question:

Will existing clones from the LLVM git mirror and / or llvm-mirror on GitHub continue to work by simply switching the remote in the config?

David

How would you coordinate dependent updates to the sub-modules? For
example, in the case where someone makes a change to the LLVM sub-module
that requires changes to the Clang sub-module? Would there be some way
for a developer to push both sets of updates as an atomic update to the
umbrella project? It probably doesn't matter often, as long as the
updates to both sub-modules are pushed close together in time.

Tom.

(This is kinda a sidenote, because it doesn't actually change the
problem-space at all, but.... :slight_smile:

I really disagree that it'd cause big problems to merge them all.
Especially when using git, which makes it a lot easier to keep a local
shared copy of the repository, so you don't need to re-download the whole
world every time you want a clean checkout. The entire set of llvm code,
all put together, is really just not that big in the end. But I do find it
annoying to have the many different repositories to track, and I don't
really see the value of having as many as we do.

However, even without big problems, it does make some sense to keep the
C/C++ language separate from the (mostly-)language-independent llvm
backend. There are a multitude of other frontends which use LLVM too: go,
swift, rust, etc. Would we really want to pull in all of those into a
single repo, as well, if they happened to get contributed to the llvm
project organization? Probably not.

So, while this wouldn't affect the need for a "llvm-project" repository, it
might be nice to consider merging some of the other ones together....

E.g.:
llvm: Core language-independent functionality: LLVM, assembler, and linker
tools. (merge in lld, and maybe compiler-rt, to the llvm repository).
clang: C/C++ frontend and related libraries (merge in clang-tools-extra,
libcxx, and libcxxabi into the clang repository).

> I think that trying to create a ordering/rev number between independent
git
> repositories is fundamentally unreliable.
>
> If we want to keep llvm and clang in lock step we should probably
probably
> just have them in the same repository like
> https://github.com/llvm-project/llvm-project.

That is similar to the proposal we have, except that llvm-projects
will have sub-modules.

Having all of them in the same physical repository is a big problems
for those that only use a small subset of the components, which is the
vast majority of users, most of the time (all buildbots, Jenkins,
local development, downstream users, even releases don't clone all
repos).

(This is kinda a sidenote, because it doesn't actually change the
problem-space at all, but.... :slight_smile:

I really disagree that it'd cause big problems to merge them all.
Especially when using git, which makes it a lot easier to keep a local
shared copy of the repository, so you don't need to re-download the whole
world every time you want a clean checkout. The entire set of llvm code,
all put together, is really just not that big in the end. But I do find it
annoying to have the many different repositories to track, and I don't
really see the value of having as many as we do.

This is anecdotal, but using llvm-project on a ~daily basis, I can say that
the place where the larger repo is noticeable is the increased size of the
checkout; this affects the time of `git status` and many other frequently
used commands. It isn't terrible though (even on windows; at least with an
SSD; I haven't tried HDD).

However, even without big problems, it does make some sense to keep the
C/C++ language separate from the (mostly-)language-independent llvm
backend. There are a multitude of other frontends which use LLVM too: go,
swift, rust, etc. Would we really want to pull in all of those into a
single repo, as well, if they happened to get contributed to the llvm
project organization? Probably not.

Clang is special in that we have the expectation that developers need to
update clang if their patch to LLVM breaks it. (I assume this is largely
due to its role in self-hosting). It is unlikely that any other frontend
will ever get that special treatment since it does entail a high
maintenance burden. So I don't see a strong reason to split out clang just
because it is a "frontend".

Roughly speaking, I would prefer a repo division (if any) to be along the
lines of "core toolchain" (clang, llvm, lld, compiler-rt) and "extra stuff
not strictly required".

Just my 2c

-- Sean Silva

I hope so. There isn't anything (modulo mistakes) stopping us from
having a clean migration.

We'll have to re-organise the GitHub mirrors, though, as Takumi has
mostly driven out of his own account, and it'll have to be owned by
"The LLVM Project" somehow.

Mehdi suggested us to flip the mirror (from GitHub to LLVM Git to
SVN), as soon as we mark them read-only, and then open RW only in
GitHub.

We don't want to use GitHub's "pull requests" for every patch, so
we'll have to add push rights to all committers today.

Meaning, you can just change the remote to the official GitHub repo
and get rid of Git-SVN.

cheers,
--renato

We won't. Not now. Maybe later.

Right now, doing that means changing how we work and we're trying to
make the least number of changes at a time.

But this is the most requested feature and I'm sure we'll find
something that will work as soon as everything is stable.

cheers,
--renato

The problem comes when different people consider "core" different projects. :slight_smile:

We're always reviewing the projects and we do split them when people
agree it's needed.

Examples are the libunwind coming out of Compiler-RT, and the recent
discussion to do the same with the sanitizers and others. This is not
just about preference, but it's about cross dependency, and the only
sane way we can cross-build to multiple targets.

cheers,
--renato

If I understand the proposal and the status quo correctly, the migration to Git won't make cross-repo updates any worse than it is currently. I think that avoiding regressions is an excellent goal for the initial migration.

> Roughly speaking, I would prefer a repo division (if any) to be along the
> lines of "core toolchain" (clang, llvm, lld, compiler-rt) and "extra
stuff
> not strictly required".

The problem comes when different people consider "core" different
projects. :slight_smile:

Sure. But selfhost (incl. stuff like selfhost w/ sanitizers) is a fairly
important special case we may be able to agree on. (and I say this as
somebody that largely builds cross-compilers (targeting PS4))

-- Sean Silva

In that case, RT wouldn't "have" to be core. We use GCC rt-libs on GNU
systems, even for self-hosting.

Even if we move to full RT builtins (which I really want), we'd have
to get rid of everything else in that repo to move it to core.

cheers,
--renato

Yup.

--renato

> Sure. But selfhost (incl. stuff like selfhost w/ sanitizers) is a fairly
> important special case we may be able to agree on. (and I say this as
> somebody that largely builds cross-compilers (targeting PS4))

In that case, RT wouldn't "have" to be core. We use GCC rt-libs on GNU
systems, even for self-hosting.

I think there is a compelling argument for selfhosting with sanitizers (the
sanitizer bootstrap bot has saved me more times than I care to admit).

-- Sean Silva

> Sure. But selfhost (incl. stuff like selfhost w/ sanitizers) is a fairly
> important special case we may be able to agree on. (and I say this as
> somebody that largely builds cross-compilers (targeting PS4))

In that case, RT wouldn't "have" to be core. We use GCC rt-libs on GNU
systems, even for self-hosting.

I think there is a compelling argument for selfhosting with sanitizers
(the sanitizer bootstrap bot has saved me more times than I care to admit).

Also libprofile for PGO selfhost.

-- Sean Silva

That makes it fragile, and that’s why I disagree with your “90% done” assessment.
What if the service behing the hook is down for a few days? Who will maintain it?

That makes it fragile, and that’s why I disagree with your “90% done” assessment.
What if the service behing the hook is down for a few days?

In the long-term view, a pretty trivial catch-up script ought to be
able to synthesize a sane history after any amount of down-time.
People could even run it locally for their bisecting needs if it was
that important to them.

In the short term, I don't think it's a critical enough service to
worry about, frankly. What we already have is hopelessly fragile:
right now when LLVM's server plays up it takes out absolutely
everything, in the proposed world it would take out this bisecting
convenience feature. Seems like a strict improvement to me.

Who will maintain it?

I'm not the best scripter and I'd be happy to cede to someone else,
but I'd be willing if it meant we could make progress.

Tim.

That makes it fragile, and that’s why I disagree with your “90% done” assessment.
What if the service behing the hook is down for a few days?

In the long-term view, a pretty trivial catch-up script ought to be
able to synthesize a sane history after any amount of down-time.
People could even run it locally for their bisecting needs if it was
that important to them.

Yup. If the script is stable (as in sort stable), anyone running it
locally will get the same results as upstream and each other.

In the short term, I don't think it's a critical enough service to
worry about, frankly. What we already have is hopelessly fragile:
right now when LLVM's server plays up it takes out absolutely
everything, in the proposed world it would take out this bisecting
convenience feature. Seems like a strict improvement to me.

I think it's even less important than that. Bisecting will work
*better* when using submodules than it does using SVN (because git
bisect is more powerful, allows me to track all modules' history, and
will rid me of a complicated downstream set of SVN-bisect scripts).

The only thing we *have* to have a sequential number for, are
releases. Even that can be ran manually.

We agreed to have sequential numbering from the start to allow
infrastructure to migrate slowly to a Git model. That can also have an
extra step to run the script if IDs are not populated yet.

Who will maintain it?

Whoever maintains the current infrastructure, which is currently the
Foundation. All scripts will be upstream.

So far, they (Tanya, Anton, Galina) have been very responsible to
every downtime and problems I found. I have no doubt that this will
continue to be a trend.

cheers,
--renato

That makes it fragile, and that’s why I disagree with your “90% done” assessment.
What if the service behing the hook is down for a few days?

In the long-term view, a pretty trivial catch-up script ought to be
able to synthesize a sane history after any amount of down-time.
People could even run it locally for their bisecting needs if it was
that important to them.

Yup. If the script is stable (as in sort stable), anyone running it
locally will get the same results as upstream and each other.

In the short term, I don't think it's a critical enough service to
worry about, frankly. What we already have is hopelessly fragile:
right now when LLVM's server plays up it takes out absolutely
everything, in the proposed world it would take out this bisecting
convenience feature. Seems like a strict improvement to me.

I think it's even less important than that. Bisecting will work
*better* when using submodules than it does using SVN (because git
bisect is more powerful, allows me to track all modules' history, and
will rid me of a complicated downstream set of SVN-bisect scripts).

The only thing we *have* to have a sequential number for, are
releases. Even that can be ran manually.

LNT and ‘llvmlab bisect’ also currently rely heavily on having sequential numbers as commit identifiers.

Fred

One of the steps of the migration is to re-write the infrastructure to
use Git's history instead of sequential numbers.

LLVM Lab bisect is probably easier than LNT, but as I said, this is
*only* a problem when the service goes down, which shouldn't be common
at all.

cheers,
--renato