GitHub anyone?

Folks,

There has been some discussion on IRC about SVN hosting and the perils
of doing it ourselves. The consensus on the current discussion was
that moving to a Git-only solution would have some disvantages, but
many advantages. Furthermore, not hosting our own repos would save us
a lot of headaches, admin costs and timed out connections.

TL;DR: GitHub + git submodules [1] could replace all the functionality
we have currently with SVN.

(also GitLab, BitBucketc, etc).

Here are some of the arguments made on IRC...

1. Due to SVN, we can't re-write history. If we use some GitHub
properties [2], we could have the same effect.

2. Due to SVN, we have a mandatory time sequence, so commits go first
in LLVM, then Clang (for example), and buildbots don't get lost. If we
use submodules [1], we can have a similar relationship, but in a more
explicit way, and the problem could be solved elegantly.

3. Some people still can only use SVN. For that, GitHub has an SVN
interface [3] to the repositories.

4. We currently host our own SVN/Git, ViewVC and Klaus, Phabricator,
etc. Not only this incurs in additional admin cost, but it also gets
outdated, locally modified, and it needs to be backed up, etc. GitHub
gives all that for us for free.

5. We can still use Bugzilla (and lock GitHub's own bug system), but
we can also use GitHub's system to manage releases (it's actually
quite good for that).

6. GitHub has automated testing of merge requests, meaning we can have
pre-commit tests enabled on a set of fast bots, triggered by GitHub's
own validation hooks. Even though that wouldn't cover everything,
having a few pre-commit bots would considerably reduce the need to
revert patches [citation needed].

7. With git submodules, we'd probably want to follow the same style we
have today (llvm-projects/<prj>) instead of modelling how they look in
tree (llvm/tools/clang still as a symlink).

8. Once we're solo Git, we can shop around *much* more easily. By
using SVN, we're basically forced to host, or choose Source Forge.
Using just Git, we can choose GitLab, BitBucket and many others, if
GitHub is not appealing enough. Essentially, it doesn't matter where
you are, the tools are good, there and largely replaceable [citation
needed].

What do people think? Any issue not covered that we should? How would
that disrupt downstream users? Would it be a temporary disruption, but
with long lasting benefits? Or will it just break everything for you?

cheers,
--renato

[1] Git - Submodules
[2] Defining the mergeability of pull requests - GitHub Docs
[3] Support for Subversion clients - GitHub Docs

peanut gallery comments from a git hater (We use git for all source
revision control)

Folks,

There has been some discussion on IRC about SVN hosting and the perils
of doing it ourselves. The consensus on the current discussion was
that moving to a Git-only solution would have some disvantages, but
many advantages. Furthermore, not hosting our own repos would save us
a lot of headaches, admin costs and timed out connections.

TL;DR: GitHub + git submodules [1] could replace all the functionality

we have currently with SVN.

(also GitLab, BitBucketc, etc).

Here are some of the arguments made on IRC...

1. Due to SVN, we can't re-write history. If we use some GitHub
properties [2], we could have the same effect.

Are you referring to linear commit history being maintained going
forward or what specifically?

2. Due to SVN, we have a mandatory time sequence, so commits go first
in LLVM, then Clang (for example), and buildbots don't get lost. If we
use submodules [1], we can have a similar relationship, but in a more
explicit way, and the problem could be solved elegantly.

Sub modules don't work exactly like svn. afaik they are glued to a
specific commit?
So you get questions like this

So if this path is chosen, please make sure the minimum git version
supported is clarified.

3. Some people still can only use SVN. For that, GitHub has an SVN
interface [3] to the repositories.

4. We currently host our own SVN/Git, ViewVC and Klaus, Phabricator,
etc. Not only this incurs in additional admin cost, but it also gets
outdated, locally modified, and it needs to be backed up, etc. GitHub
gives all that for us for free.

5. We can still use Bugzilla (and lock GitHub's own bug system), but
we can also use GitHub's system to manage releases (it's actually
quite good for that).

Bugzilla sucks, but github's issue tracker sucks worse (imnsho). I'm
not a stakeholder and have no vote, but if it ain't broke don't fix
it.. or try jira.. (they give free licenses to open source and it
rocks)

6. GitHub has automated testing of merge requests, meaning we can have
pre-commit tests enabled on a set of fast bots, triggered by GitHub's
own validation hooks. Even though that wouldn't cover everything,
having a few pre-commit bots would considerably reduce the need to
revert patches [citation needed].

7. With git submodules, we'd probably want to follow the same style we
have today (llvm-projects/<prj>) instead of modelling how they look in
tree (llvm/tools/clang still as a symlink).

8. Once we're solo Git, we can shop around *much* more easily. By
using SVN, we're basically forced to host, or choose Source Forge.
Using just Git, we can choose GitLab, BitBucket and many others, if
GitHub is not appealing enough. Essentially, it doesn't matter where
you are, the tools are good, there and largely replaceable [citation
needed].

What do people think? Any issue not covered that we should? How would
that disrupt downstream users? Would it be a temporary disruption, but
with long lasting benefits? Or will it just break everything for you?

There's already git mirrors of all this stuff on github - *if* you're
going to go down this path, I think the 1st step is updating docs to
ask people to use those as a preferred solution. Based on that
feedback you can wean people off svn instead of hard cut.

This really falls into 2 categories - read-only and write..

For read-only the migration should be straight forward, but write gets
a bit more tricky as you'd need to likely rely on git-svn (which I
suspect some people are already using)

Even though git 1.7 or something deals with submodules better - I
don't personally like them at all. I'd rather have a convenience
script or something which pulls and clones the sources. Not everyone
needs all the sources and then there's the question of all the
subprojects and build.. etc Not changing the workflow here would be
most sane..

So clone llvm ; cd tools / ; clone ...

I'm in favour of the move. Git-svn just about works most of the time,
but I find it makes committing to release branches particularly
painful. It also randomly corrupts its database occasionally, just for
the giggles I assume.

Tim.

> What do people think? Any issue not covered that we should?

I'm in favour of the move. Git-svn just about works most of the time,
but I find it makes committing to release branches particularly
painful. It also randomly corrupts its database occasionally, just for
the giggles I assume.

This hit me over the weekend, it was quite annoying. +1 from me.

There has been some discussion on IRC about SVN hosting and the perils
of doing it ourselves. The consensus on the current discussion was
that moving to a Git-only solution would have some disvantages, but
many advantages. Furthermore, not hosting our own repos would save us
a lot of headaches, admin costs and timed out connections.

Personally, I’m hugely in favor of moving llvm’s source hosting to github at some point, despite the fact that I continue to dislike git as a tool and consider monotonicly increasing version numbers to be hugely beneficial.

The killer feature to me is the community aspects of github, allowing people to get involved in the project more easily and make “drive by” contributions through the pull request model. Github also has a very scriptable interface, allowing integration of external bug trackers etc into the workflow (which is good, because its bugtracker is anemic).

4. We currently host our own SVN/Git, ViewVC and Klaus, Phabricator,
etc. Not only this incurs in additional admin cost, but it also gets
outdated, locally modified, and it needs to be backed up, etc. GitHub
gives all that for us for free.

Yes, it would be great to get out of this business.

5. We can still use Bugzilla (and lock GitHub's own bug system), but
we can also use GitHub's system to manage releases (it's actually
quite good for that).

If we made this change, I think we should only change one thing at a time: change source hosting, but not phabricator or the bug tracker. We could then discuss moving off phabricator to the github PR model, etc.

6. GitHub has automated testing of merge requests, meaning we can have
pre-commit tests enabled on a set of fast bots, triggered by GitHub's
own validation hooks.

This works pretty well. The major problem is with tests that are flakey.

-Chris

svn sucks, and git with github rocks. It is much easier for new people to do pull and merge than to screw with svn.

+1 for git

Best regards,
Alexey Bataev

I’m in favor of both going to git as the source of truth, and then switching the hosting to github.

Echoing everyone else, this unlocks a lot of good stuff that I won’t repeat, and most of it can be handled independently from the VCS move.

The major blocker I see for the move is figuring out how we want to coordinate versions between the related LLVM projects. I hear terrible things about submodules, so I’d prefer a different sync mechanism, even if it is a bad reimplementation of repo, gclient, submodules, and all the other multi-repo sync tools.

Folks,

There has been some discussion on IRC about SVN hosting and the perils
of doing it ourselves. The consensus on the current discussion was
that moving to a Git-only solution would have some disvantages, but
many advantages. Furthermore, not hosting our own repos would save us
a lot of headaches, admin costs and timed out connections.

Not everyone thinks git is a step forward. Please do not force people
to use a "git-only" solution.

TL;DR: GitHub + git submodules [1] could replace all the functionality
we have currently with SVN.

(also GitLab, BitBucketc, etc).

Here are some of the arguments made on IRC...

1. Due to SVN, we can't re-write history. If we use some GitHub
properties [2], we could have the same effect.

2. Due to SVN, we have a mandatory time sequence, so commits go first
in LLVM, then Clang (for example), and buildbots don't get lost. If we
use submodules [1], we can have a similar relationship, but in a more
explicit way, and the problem could be solved elegantly.

I actually consider the monotonically increasing revisions to be a
feature, but not sufficient to warrant a decision one way or the
other.

3. Some people still can only use SVN. For that, GitHub has an SVN
interface [3] to the repositories.

Are we sure that github's svn integration works with common tools on
Windows, like TortoiseSVN?

4. We currently host our own SVN/Git, ViewVC and Klaus, Phabricator,
etc. Not only this incurs in additional admin cost, but it also gets
outdated, locally modified, and it needs to be backed up, etc. GitHub
gives all that for us for free.

5. We can still use Bugzilla (and lock GitHub's own bug system), but
we can also use GitHub's system to manage releases (it's actually
quite good for that).

6. GitHub has automated testing of merge requests, meaning we can have
pre-commit tests enabled on a set of fast bots, triggered by GitHub's
own validation hooks. Even though that wouldn't cover everything,
having a few pre-commit bots would considerably reduce the need to
revert patches [citation needed].

7. With git submodules, we'd probably want to follow the same style we
have today (llvm-projects/<prj>) instead of modelling how they look in
tree (llvm/tools/clang still as a symlink).

8. Once we're solo Git, we can shop around *much* more easily. By
using SVN, we're basically forced to host, or choose Source Forge.
Using just Git, we can choose GitLab, BitBucket and many others, if
GitHub is not appealing enough. Essentially, it doesn't matter where
you are, the tools are good, there and largely replaceable [citation
needed].

What do people think? Any issue not covered that we should? How would
that disrupt downstream users? Would it be a temporary disruption, but
with long lasting benefits? Or will it just break everything for you?

I'm not opposed to moving to GitHub, provided its svn interface proves
to meet our needs. I am opposed to switching to a git-only solution,
but I'm unclear whether that's currently on the table or not.

~Aaron

Likewise, I’d definitely be in favor of doing so. It would be great to have the entire LLDB development community on GitHub instead of the current story.

Kate Stone k8stone@apple.com
 Xcode Low Level Tools

That's a good question. Can you try them out and report back?

cheers,
--renato

Getting a monotonically increasing revision number seems doable in git with some server-side scripting using git notes or named tags (yet to be seen is how to achieve it *reliably* with github hosting).
However the challenge is more about sharing this number across repositories (i.e. having clang and llvm in sync). I can imagine some tooling for that, but with a github hosting it may still be fragile.

Ideally, I'd prefer the cross-repository to be handled with an extra layer, in a way similar as described in: https://gerrit-review.googlesource.com/Documentation/user-submodules.htm (somehow conceptually similar to Android manifests XML files).
It would be easy to have tooling/scripts for llvm that would easily say "checkout llvm+clang+compiler-rt+libcxx+clang-extra here", or "update all llvm subproject under this root", or "checkout this specific revision for all these" (with a monotonic number for the revision).

(+1 to all the rest of what you wrote)

At Linaro, we already have a set of scripts that do that. We're now
moving to git worktree, and I think it's going to simplify our work
considerably. But honestly, I'd rather not force anyone to use any set
of scripts, and let people work directly with git, so I'd be more in
favour of having a server-side solution, if at all possible.

cheers,
--renato

> Folks,
>
> There has been some discussion on IRC about SVN hosting and the perils
> of doing it ourselves. The consensus on the current discussion was
> that moving to a Git-only solution would have some disvantages, but
> many advantages. Furthermore, not hosting our own repos would save us
> a lot of headaches, admin costs and timed out connections.

Not everyone thinks git is a step forward. Please do not force people
to use a "git-only" solution.

Amen.

> 2. Due to SVN, we have a mandatory time sequence, so commits go first
> in LLVM, then Clang (for example), and buildbots don't get lost. If we
> use submodules [1], we can have a similar relationship, but in a more
> explicit way, and the problem could be solved elegantly.

I actually consider the monotonically increasing revisions to be a
feature, but not sufficient to warrant a decision one way or the
other.

Has the situation with git-submodules and bisect improved at all or is
bisecting clang+llvm going to be manual mess?

Joerg

Strong +1 to move to an external hosted git sooner rather than later!

  1. I personally had very good experiences with git submodules. They are certainly harder to get used to as you have to learn a bunch of extra magic on top of the already magical git: i.e. “git clone --recurse-submodules”, then learn how to have your submodules point to different commits locally sometimes, etc.
    I have had very good experience in another project that used to do llvm/clang style “just checkout those two project at the same date” and I found submodules more stable and robust and technically superior solution at the cost of a higher bar learning curve for new contributors.

So in this context I would recommend to change one thing at a time and only switch svn->git in step 1 and leave the switch to submodules as a 2nd step (or not do it at all if that is community consensus).

  1. As far as the “increasing revision” numbers go: In my opinion this is about:
  • We really should stay with a linear history and not introduce merge commits.

  • As long as we do not move to submodules we need a solution to enforce "CommitDate"s (or also "AuthorDate"s) to be the time a commit was pushed to the server so our current workflow of checking out llvm/clang at the same time still works.
    I believe that those two points to be solvable with some server side scripting. With those two properties in place actual increasing numeric revision numbers bring that much value to the table and I would assume we can go without them.

  • Matthias

I found bisecting with submodules (in another project) far superior to the manual mess I have to do in clang+llvm today.

- Matthias

for much, it's hard to feel comfortable with my level of testing. I'm
sure for read-only access, this will be sufficient. For read/write
access, I am less confident, so if others have had more experience
with this on Windows, I would appreciate hearing about it.

~Aaron

To be more exact here: I usually do not checkout llvm svn at a higher level because that forces me back to svn (which last time I used it did not have built-in support for bisection, not sure if that changed recently). So while I have a consistent state in svn I spend the time manually calculating the next commit to checkout. So what I do instead is using git for bisecting and having some (brittle) scripts that sync multiple git repositories to use a commit from the same time!

  • Matthias

Apparently I wasn't very clear: llvm and clang (and the others projects) would be simple decoupled, individual git repositories. You would be able to check them out however you want and commit to them individually.
There would be an extra "integration repository" on top that would only provide the service that tells "r12345 is llvm:36c941c clang:eaf492b compiler-rt:6d77ea5". This repository should be managed transparently by some server-side integration.
The provided scripting I was referring to would just be a convenience that is using this extra layer of metadata ("integration repository") to be able checkout the other individual repositories together at the right "rev-lock" revision.
This is not on your way if you don't want to use it, but it provides this "single increase monotonic revision number across multiple repository" that is convenient for some people.

Makes sense?

I'm in favor of both going to git as the source of truth, and then
switching the hosting to github.

Echoing everyone else, this unlocks a lot of good stuff that I won't
repeat, and most of it can be handled independently from the VCS move.

The major blocker I see for the move is figuring out how we want to
coordinate versions between the related LLVM projects. I hear *terrible*
things about submodules, so I'd prefer a different sync mechanism, even if
it is a bad reimplementation of repo, gclient, submodules, and all the
other multi-repo sync tools.

In previous months , I have studied many thousands of Github repositories ,
and cloned many of them locally to compile and run .

The following difficulties may exist during LLVM works :

If a directory contains large number ( approximately more than thousand )
of files , only a part of these files are displayed and others are not
allowed to view . You may check this from OS repositories .

During cloning , if there is a reference to another repository , clone
--recursive are giving errors about contained @ sign . In that case it is
necessary to enter into sub-directories and manually clone that referenced
sub-module ( or a script should do this ) . Instead of --recursive , the
other statements may be used , but all of them have
advantages/disadvantages .

When "Download" is selected for repository , sub-modules are not downloaded
into respective sub-directories . It is necessary to visit such directories
manually one by one and download , expand , and adjust these directory
contents .

When a file is viewed in Github and returned back , Github is switching to
the top of the directory , not aligning the page at the current cursor
position . When there are large number of files in a directory , it is
causing difficulty to go down to the current cursor position again and
continue from there .

A limited kind and size of files are shown to the user . There are many
kinds that it is possible to only viewing the content is to save repository
locally . To my experience , any single file in a repository directory is
not permitted to download to view it in that case .

I consider revision numbers as only a disastrous design : A very long
number conveying nothing other than inconvenience . Therefore it will not
be possible to specify "revert to _a_simple_number_" elegantly . I do not
know what will be shown to say "revert to _a_cryptic_number_" .

Previously it was possible to search Github for repositories on supplied
keywords . Now , they have disabled that feature . Now , it seems only
Internet searches may find a repository ( to my experience , only very few
of them are found ) , or it may be listed in their category lists to find
only ones selected by Github .

The most affecting points are the above ones for me as a "visitor user" of
Github repositories .
I do not have any experience as "developer user" of Github .

Mehmet Erol Sanliturk