GitHub anyone?

My company is using submodules for the better or worse. It’s not a perfect solution but it can work when using a version of git recent enough and some tooling.

The magic command to update everything to the current commit pointed by each submodule : git submodules update --init --recursive
To get the latest version you can do: git submodules --init --recursive --remote
This by default will get the latest version of master, but a change in the .gitmodules file can make it point to release branches.

If you want a stable version from days ago, you can have a bot updating the submodules everyday and pushing the submodules update, then use that commit.
With a bot like this, manual submodule bumps should be rare and people non familiar with git will soon forget about those.

For a linear history, you can have GitHub doing a rebase when merging the changes instead of a merge. I would recommend to do that to keep the history clean and have less “fixup” or “wip” commits in the history.

For the people who want to keep SVN, I’ve tried the compatibility layer from GitHub and it worked well enough for me in the past. But I would recommend to write a cheat sheet to help people migrate to Git long-term.
Recent versions of git are not as hard to use as the old versions of git and it doesn’t have to be more complicated than SVN.

/Florent

For whatever it's worth, our projects define a `buildnum` git alias:

  alias.buildnum=!sh -c "git rev-list --all | wc -l"

So from the shell:

  $ git buildnum
  17475

This number increases monotonically per commit.

Our build scripts make this number available in various #define forms.

(We use a little extra scripting logic to also determine whether there
are currently any unmerged or uncommitted changes, and add an annotation
to the program version in that case, e.g. "9.3.17475 [unmerged]")

It's all stupidly simple, but seems to work well enough for us.

Regards,

Bill

svn-bisect is a trival tool and should be part of every good svn
installation. While I never got around to script the part of "update all
subrepos to the same revision", it certainly doesn't involve any
addition checks. From what I can tell, git submodules don't even support
that easily. I might be wrong though.

Joerg

This actually sounds like a really good idea even if a full move to
git gets blocked for some reason. It seems like it could be a fairly
common requirement: I don't suppose you know of an existing script
that could do it? If not, I may take a stab.

Tim.

I don't know of a script that would update this automatically, but the Gerrit feature seems pretty close: https://gerrit-review.googlesource.com/Documentation/user-submodules.html ; and it is possible that some Android tooling exists around the repo tool: https://source.android.com/source/developing.html

I think it is fairly easy to setup tools that do that, it is harder to do it reliably (i.e. handle concurrent push in clang and llvm for instance), or what if the server-side script fails the update because of a network issue? Maybe it is not very important and one commit won't be part of the "global numbering"? Alternatively the push can be "failed" in such cases?

It does not work with branches though (we're not really planning to have branches I believe), but more importantly it won't handle cross-repository versioning (how do you relate the number this command prints in the llvm repo to the number it'll print in the clang repo?), which I believe is something important considering our setup.

In a nutshell:
git-submodules basically records a git revision of your submodules with the commits.
You can make such revision switches a natural part of commits.
"git submodule update [--recursive]" will bring your submodule checkouts in sync with what the toplevel repository expects.

In any way we can have this discussion separate from the discussion of moving to git. We can stay with our current ways of matching same date for now.

- Matthias

But the move to git introduces the UI regression in first place. No date
matching and the like is necessary with subversion.

Joerg

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of Mehdi
Amini via llvm-dev
Sent: Tuesday, May 31, 2016 2:38 PM
To: Bill Kelly
Cc: LLVM Dev; Clang Dev; LLDB Dev
Subject: Re: [llvm-dev] GitHub anyone?

>
>> Personally, I’m hugely in favor of moving llvm’s source hosting to
github at
>> some point, despite the fact that I continue to dislike git as a tool
and
>> consider monotonicly increasing version numbers to be hugely
beneficial.
>
> For whatever it's worth, our projects define a `buildnum` git alias:
>
> alias.buildnum=!sh -c "git rev-list --all | wc -l"

Or the cheaper "git rev-list --count --all" if your git is new enough.
We do something like this as well.

>
> So from the shell:
>
> $ git buildnum
> 17475
>
> This number increases monotonically per commit.

It does not work with branches though (we're not really planning to have
branches I believe),

You can get a per-branch unique number with this tactic. On our local
branches we use "rev-list origin/master.." which is the number of commits
since branching from master, and that's enough for our local purposes.

but more importantly it won't handle cross-repository
versioning (how do you relate the number this command prints in the llvm
repo to the number it'll print in the clang repo?), which I believe is
something important considering our setup.

Is it really that important? Or are we just used to the convenience?
If the Clang build number is a tuple (cfe-number, llvm-number) instead
of a single number, how horrible is that really? If you consider what
an out-of-tree front end probably does, it's exactly the same thing.

(I admit that locally we mush cfe+llvm into a single branch and do the
rev-list count to get a single number. But that's more for our own
convenience than anything else.)
--paulr

True to some extend, though I know several people (myself included) that rather use the git-svn inconvenience today simply because git-bisect exists and git being a lot faster at switching revisions than subversion. This tactic also starts failing once you work with custom release branches outside of llvm.org’s control.
In principle you could also put the toplevel llvm svn into a git-svn repository to fix the problems and I think there is some repository out there who does that, but IMO that just brings back part of the problem of slow checkouts (a few gigabytes of extra disk space required per checkout as well).

  • Matthias

I don't know how important it is. How would you bisect without this "convenience" for instance?
(There is nothing like "push date" in git)

From: mehdi.amini@apple.com [mailto:mehdi.amini@apple.com]
Sent: Tuesday, May 31, 2016 3:54 PM
To: Robinson, Paul
Cc: Bill Kelly; Clang Dev; LLDB Dev; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] GitHub anyone?

>
>
>
>> From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Mehdi
>> Amini via llvm-dev
>> Sent: Tuesday, May 31, 2016 2:38 PM
>> To: Bill Kelly
>> Cc: LLVM Dev; Clang Dev; LLDB Dev
>> Subject: Re: [llvm-dev] GitHub anyone?
>>
>>
>>>
>>>> Personally, I’m hugely in favor of moving llvm’s source hosting to
>> github at
>>>> some point, despite the fact that I continue to dislike git as a tool
>> and
>>>> consider monotonicly increasing version numbers to be hugely
>> beneficial.
>>>
>>> For whatever it's worth, our projects define a `buildnum` git alias:
>>>
>>> alias.buildnum=!sh -c "git rev-list --all | wc -l"
>
> Or the cheaper "git rev-list --count --all" if your git is new enough.
> We do something like this as well.
>
>>>
>>> So from the shell:
>>>
>>> $ git buildnum
>>> 17475
>>>
>>> This number increases monotonically per commit.
>>
>> It does not work with branches though (we're not really planning to
have
>> branches I believe),
>
> You can get a per-branch unique number with this tactic. On our local
> branches we use "rev-list origin/master.." which is the number of
commits
> since branching from master, and that's enough for our local purposes.
>
>> but more importantly it won't handle cross-repository
>> versioning (how do you relate the number this command prints in the
llvm
>> repo to the number it'll print in the clang repo?), which I believe is
>> something important considering our setup.
>
> Is it really that important? Or are we just used to the convenience?

I don't know how important it is. How would you bisect without this
"convenience" for instance?
(There is nothing like "push date" in git)

I know that on a single branch, "git bisect" deals with that for you.
I've seen the talk about submodules but I have no clue how that works
or whether git-bisect can operate cleanly in that situation.
--paulr

From: mehdi.amini@apple.com [mailto:mehdi.amini@apple.com]
Sent: Tuesday, May 31, 2016 3:54 PM
To: Robinson, Paul
Cc: Bill Kelly; Clang Dev; LLDB Dev; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] GitHub anyone?

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of

Mehdi

Amini via llvm-dev
Sent: Tuesday, May 31, 2016 2:38 PM
To: Bill Kelly
Cc: LLVM Dev; Clang Dev; LLDB Dev
Subject: Re: [llvm-dev] GitHub anyone?

Personally, I’m hugely in favor of moving llvm’s source hosting to

github at

some point, despite the fact that I continue to dislike git as a tool

and

consider monotonicly increasing version numbers to be hugely

beneficial.

For whatever it's worth, our projects define a `buildnum` git alias:

alias.buildnum=!sh -c "git rev-list --all | wc -l"

Or the cheaper "git rev-list --count --all" if your git is new enough.
We do something like this as well.

So from the shell:

$ git buildnum
17475

This number increases monotonically per commit.

It does not work with branches though (we're not really planning to

have

branches I believe),

You can get a per-branch unique number with this tactic. On our local
branches we use "rev-list origin/master.." which is the number of

commits

since branching from master, and that's enough for our local purposes.

but more importantly it won't handle cross-repository
versioning (how do you relate the number this command prints in the

llvm

repo to the number it'll print in the clang repo?), which I believe is
something important considering our setup.

Is it really that important? Or are we just used to the convenience?

I don't know how important it is. How would you bisect without this
"convenience" for instance?
(There is nothing like "push date" in git)

I know that on a single branch, "git bisect" deals with that for you.

Sure, but our case is worse than branches: it is *cross repositories rev-locks".

I've seen the talk about submodules but I have no clue how that works
or whether git-bisect can operate cleanly in that situation.

Submodules is one solution, I gave some other pointers in this thread previously.

Yes, makes sense; we have been doing exactly this for the last few
months. We created our own integration repo (to host our own build
integration scripts) and cloned the llvm and clang repos from (I assume)
https://github.com/llvm-mirror as sub-modules within it. We're just
using the native git command line to manage things and, so far, so good.

We're still working on getting a full continuous integration process in
place (right now we manually pull periodically), but expect to have that
soon. The CI process is just to inform us of conflicts and allow us to
resolve them proactively; we don't release product based on trunk.

Tom.

> What do people think? Any issue not covered that we should?

I'm in favour of the move. Git-svn just about works most of the time,
but I find it makes committing to release branches particularly
painful. It also randomly corrupts its database occasionally, just for
the giggles I assume.

I get hit by that every so often :-(.

As others have mentioned, the monotonically incrementing ids are extremely
useful, particularly when bisecting across clang/llvm. I think that
Medhi's suggestion may be a viable solution.

As long as a mechanism for bisecting across the repositories is worked out,
definitely a +1 from me.

There has been some discussion on IRC about SVN hosting and the perils
of doing it ourselves. The consensus on the current discussion was
that moving to a Git-only solution would have some disvantages, but
many advantages. Furthermore, not hosting our own repos would save us
a lot of headaches, admin costs and timed out connections.

Personally, I’m hugely in favor of moving llvm’s source hosting to github at some point, despite the fact that I continue to dislike git as a tool and consider monotonicly increasing version numbers to be hugely beneficial.

The killer feature to me is the community aspects of github, allowing people to get involved in the project more easily and make “drive by” contributions through the pull request model. Github also has a very scriptable interface, allowing integration of external bug trackers etc into the workflow (which is good, because its bugtracker is anemic).

Full agreed.

4. We currently host our own SVN/Git, ViewVC and Klaus, Phabricator,
etc. Not only this incurs in additional admin cost, but it also gets
outdated, locally modified, and it needs to be backed up, etc. GitHub
gives all that for us for free.

Yes, it would be great to get out of this business.

Yep.

6. GitHub has automated testing of merge requests, meaning we can have
pre-commit tests enabled on a set of fast bots, triggered by GitHub's
own validation hooks.

This works pretty well. The major problem is with tests that are flakey.

Performance can also be an issue; it takes a bunch of fast bots to keep up with developers testing their pull requests, especially when what you’re testing is a very large C++ code base. That said, “test and merge on success” workflows are *wonderful* for keeping the buildbots happy.

  - Doug

TL;DR :slight_smile:

Git-submodules works fine for bisecting for read-only use. I have the repo to do that.
https://github.com/llvm-project/llvm-project-submodule

With a simple hooks/post-checkout, It should help effective bisecting.
https://github.com/chapuni/llvm-project-scripts/blob/master/hooks/post-checkout

It has refs/notes/commits, aka git-notes.

That said, I am afraid that submodules would lead us to the hell for committers.
More discussions would be required to manage multiple git repos.

FYI, I have been using the unified repo, https://github.com/llvm-project/llvm-project , for years.
It requires a wrapper script to invoke git-svn commit-diff.

Indeed. These pre-commit machines have to be dedicated, and we may
have to have more than one, depending on the volume.

But the good news is that they can scale independently, and they'll
remove a huge strain from the current buildbots. Also, I think it's
easier to justify (commercially) a few additional pre-commit bots than
duplicating every single configuration I have today.

cheers,
--renato

With submodules, the current hash of each submodule is recorded in each master commit. If you check out a different master repository commit then you run ‘git submodule update’ and it checks out the corresponding commit in each submodule. I’m not sure why this isn’t automatic with the master repo commit checkout, but in any case it’s not difficult.

So when you do a git bisect, you just need to make sure there is a ‘git submodule update’ at the start of your bisect script.

It will be fun to get all the correct hashes recorded in the master repo during initial import. But not all that difficult: check out next svn revision; check and push in any submodules with changes in that revision; then commit the master repo for that revision.

Because there can be local changes that are not committed in the submodule. That can cause issues when switching to a different commit.