git Status

Did the project ever come to a decision about making a transition to
git? I'm trying to do some longer-term planning and it would be helpful
to know what the roadmap is.

Thanks!

                                 -Dave

Did the project ever come to a decision about making a transition to
git? I'm trying to do some longer-term planning and it would be helpful
to know what the roadmap is.

Me too. I've been catching up on the thread from a couple weeks ago, and I didn't see any clear conclusion. I have some comments about the Mercurial aspect of the discussion.

Comments on that point [Mercurial and local revision numbers] are welcome.

Oh, I really doubt that's true.

I'm a big Mercurial proponent, c.f. ...

  <http://jhw.dreamwidth.org/1868.html>
  <http://jhw.dreamwidth.org/2049.html>

...but even I know when to shut up and just go along to get along. If folks want LLVM to use a DVCS instead of Subversion, then I will be cheering from the sidelines if LLVM switches to Git.

For my purposes, I'll just use the HgGit extension <http://mercurial.selenic.com/wiki/HgGit> as my view on the Git repository. I use that routinely with other projects that use Git as their authoritative DVCS repository, and it works reasonably well. (There are performance improvements coming in both Mercurial and the HgGit extension that should make life even easier for Mercurial users in the not too distant future.)

Have you considered mercurial?

Please lets not go there.

I have to use mercurial form time to time in another project and it is
really painful. I use hg<->git bridge as often as possible, but it
doesn't work as well as the git<->svn one.

Ahh, the joy of anectodal evidence.

I'll add one too: I use bot Git and Hg in numerous projects, and they
are both stable and fast DVCS's and have great communities.

Any one of those two blows svn out of the water in so many ways, and
either choice will work great (bridged or not.)

+1 to this rebuttal.

I've a sneaking suspicion from reading Chris Lattner's messages about reviewing patches in a queue that Mercurial's MQ facility would come in handy for that.

  <http://hgbook.red-bean.com/read/managing-change-with-mercurial-queues.html>

The good news is that with recent Mercurial and Dulwich versions installed, the Hg-git extension is really quite reasonable, and it allows you to use MQ among your distributed Mercurial clones, and still push to a remote Git repository when you've finished editing a patch. I do this all the time.

One point worth noting about the Hg-git extension is that it *really* isn't very good at letting Git users interoperate with a Mercurial repository, mainly because there is metadata captured in the Mercurial format that Git doesn't know how to represent, but it's very nice for Mercurial users who need to interoperate with a Git repository, where a round-trip from Mercurial through Git and back is completely transparent.

Ignore the kvetching about Hg-git from the Git users. It's really not made for them, and most of them don't know what they're missing by not using Mercurial natively.

Seriously, I would be pleased to see LLVM switch the authoritative repository from Subversion to Git. I think it's a good idea for two reasons: A) to improve the integrity of the source code history, by distributing it widely, and B) to improve the performance of source code control operations. (Using a SVN repository is like sucking cold malt extract through coffee stirring straw. The speed improvement alone, actually, is reason to do it.) An important secondary reason for switching to Git is for its hugely improved representation of merge operations over how Subversion operates.

People like me, who hate Git and much prefer to use Mercurial, will generally be able to work with a Git repository using Hg-git, and with much less grousing than we do today using the Subversion repository and Hg-Subversion. Maybe we won't be as happy as if you picked Mercurial over Git, but we're going to be *greatly* outnumbered by the Git users who would scream bloody murder if you picked Mercurial instead.

In closing, just go with Git and tell people who don't like it to use Mercurial and Hg-git. We're adaptable.

p.s. The Mercurial subrepositories feature is loads better than git submodules and it's built into the tool. But never mind that. Just go with Git and don't look back. Nobody ever got fired for buying from the market leader.

It's stuck on:

1) A misunderstanding that global revision numbers are necessary and that 'git describe' along with frequent tagging (commit hooks) isn't good enough.

2) Nobody writing up how git should be used with the current llvm workflow (which is not going to adapt to an SCM, but the other way around, which is understandable.)

james woodyatt <jhw@conjury.org> writes:

p.s. The Mercurial subrepositories feature is loads better than git
submodules and it's built into the tool. But never mind that. Just go
with Git and don't look back. Nobody ever got fired for buying from
the market leader.

I don't use submodules enough to be a good juge, but my understanding is
that Git's submodules are conceptually nice, but that most people agree
that the UI is not good. There's a summer of code on that subject, so
there's still hope ;-).

3) Somebody doing all of the thankless infrastructure work to ensure that we don't regress on basic things like the web interface, post-commit email hooks, etc.

4) Having a few stakeholders audit the conversion process to ensure that "we do it right".

5) There are organizations that use the current git-svn bridge extensively to maintain long term topic branches before proposing/integrating them into the public repository. If the conversion changes the SHA1 of commits (and it likely will due to goal #4), then these organizations will need time to prepare for the conversion (to avoid unscheduled downtime and/or any loss of history).

davez

FlyLanguage <flylanguage@gmail.com> writes:

1) A misunderstanding that global revision numbers are necessary and
that 'git describe' along with frequent tagging (commit hooks) isn't
good enough.

I'm not sure what's needed here. I don't care about global revision
numbers at all so I can't really comment on what should be used to
replace them with git. Hopefully someone else who cares can chime in.

2) Nobody writing up how git should be used with the current llvm
workflow (which is not going to adapt to an SCM, but the other way
around, which is understandable.)

I'm willing to take a crack at this as I have to do something like this
already anyway. I won't be able to get to it until next week, though.

                                -Dave

Dave Zarzycki <zarzycki@apple.com> writes:

From chats with some llvm/clang developers at Apple, we're also stuck on:

3) Somebody doing all of the thankless infrastructure work to ensure
that we don't regress on basic things like the web interface,
post-commit email hooks, etc.

It's certainly a concern. Is Apple the only entity that has access to
this stuff or can others help?

4) Having a few stakeholders audit the conversion process to ensure
that "we do it right".

Has anyone been identified to play this role? I haven't seen any
requests go out on the list for volunteers.

5) There are organizations that use the current git-svn bridge
extensively to maintain long term topic branches before
proposing/integrating them into the public repository. If the
conversion changes the SHA1 of commits (and it likely will due to goal
#4), then these organizations will need time to prepare for the
conversion (to avoid unscheduled downtime and/or any loss of history).

I am probably not one of the organizations you're thinking of, but I
currently do a lot of this and am willing to throw away everything I
have to move to git proper. It's that much better.

                             -Dave

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

james woodyatt <jhw@conjury.org> writes:

p.s. The Mercurial subrepositories feature is loads better than git
submodules and it's built into the tool. But never mind that. Just go
with Git and don't look back. Nobody ever got fired for buying from
the market leader.

I don't use submodules enough to be a good juge, but my understanding is
that Git's submodules are conceptually nice, but that most people agree
that the UI is not good. There's a summer of code on that subject, so
there's still hope ;-).

git-subtree is a very nice way to handle the submodule project. Even
though it's clunky through the git-svn bridge, I still use it on this
end. With native git it's a breeze.

git-submodule is indeed a putrid pile of donkey dung. :slight_smile:

                              -Dave

The Python language recently migrated from SVN to Mercurial as their version control system. As part of this effort, a detailed migration plan was written:

http://www.python.org/dev/peps/pep-0385/

Now, I’m not proposing that we favor using Mercurial over Git*. But I would suggest that perhaps you could use the Python migration plan as a template for LLVM’s migration.

(*I use both Mercurial and Git on a regular basis. Although I think Git’s branching and history model makes more sense, as a command-line tool I find Mercurial significantly easier to use for day-to-day work.)

– Talin

Hi guys,

Just to be absolutely clear here, Mercurial is not being considered.

-Chris

This is an llvm.org thing, not an Apple thing. Several non-apple people have llvm.org access, e.g. Anton.

-Chris

The Python language recently migrated from SVN to Mercurial as their version control system. As part of this effort, a detailed migration plan was written:

http://www.python.org/dev/peps/pep-0385/

Hi guys,

Just to be absolutely clear here, Mercurial is not being considered.

If you look again at what I wrote, I wasn’t proposing Mercurial. I was proposing a migration process.

FlyLanguage <flylanguage@gmail.com> writes:

2) Nobody writing up how git should be used with the current llvm
workflow (which is not going to adapt to an SCM, but the other way
around, which is understandable.)

Here is a first cut at that. Other git users, please chime in with
suggestions, edits, etc. Non-git users, please ask for clarification
where needed. This is based on my notes on working with LLVM via
git-svn, modified to assume native git. There are hundreds of ways to
design a workflow that works with the current review process. This is
but one.

We should add this to the web page once it's polished if we make the
transition to git.

                         -Dave

LLVMgit.txt (10.8 KB)

David,

A few comments.

Naming Upstream

The intial clone from upstream results in a git remote reference with
the rather unhelpful name of “origin.” As more remote sources get
added, it is easy to forget what “origin” is. Therefore, add a remote
with a more descriptive name.

git remote add llvm-upstream http://llvm.org/git/llvm.git master

If the intent is to rename origin, this can be done directly:
git remote rename origin llvm-upstream

Updating LLVM - no local changes

Splitting this into “no local changes” / “with local changes” seems unnecessarily complicated. Why not just recommend doing ‘git pull --rebase’ all of the time?

I have to ask as well - is a linear history really desired? That seems to be the intent of your instructions.

I ask because for something like a reasonably sized feature that might have multiple commits, having the merge history in place can be useful, if only to separate the set of related changes from sets of unrelated changes. It also aids in reverting an entire feature composed of multiple commits.

One other comment - the way I use git (and from what I’ve read I am not alone), I end up committing multiple times per hour / many times per day, often into different branches. I then go back and use rebase to reorder and squash commits into logically related bite-sized chunks (e.g. combine some commits related to feature A into one or more commits, some related to feature B into one or more commits, and then there might be three bug fix commits for bugs C, D, and E.). It might be helpful to add some guidelines related to this that are in line with current LLVM review process - somewhat the reverse scenario of someone asking for commits to be split. In this case, they probably don’t want to see 15 different commits that in total add 75 lines of code to feature A.

Mark

FlyLanguage<flylanguage@gmail.com> writes:

> 2) Nobody writing up how git should be used with the current llvm
> workflow (which is not going to adapt to an SCM, but the other way
> around, which is understandable.)

Here is a first cut at that. Other git users, please chime in with
suggestions, edits, etc. Non-git users, please ask for clarification
where needed. This is based on my notes on working with LLVM via
git-svn, modified to assume native git. There are hundreds of ways to
design a workflow that works with the current review process. This is
but one.

We should add this to the web page once it's polished if we make the
transition to git.

                          -Dave

Hi Dave,

thanks a lot. This already reads very nice. Two smaller comments:

Sending Patches for Review
--------------------------

git includes a whole set of tools for managing the patch review
process. We kick things off with git format-patch:

git format-patch -o $HOME/patches/ifconvert --thread --src-prefix=old/ \
                  --dst-prefix=new/ --cover-letter HEAD~1..HEAD

I personally dislike typing all the time such a long command line. Maybe you can also point out, how to configure this in .git/config.

This places three text files in $HOME/patches/ifconvert, one for each
commit, plus a cover letter to send before each patch. These will get
sent to the e-mail list with the subject "[PATCH n/2]<commit
>" where "n" is the patch number (0 for the cover letter) and
<commit subject> is the first line of the commit message.

Edit these files to add any commentary you desire.

It is helpful for format-patch and send-email to have various bits of
information pre-selected for e-mail interaction. For example, I put
this in my .git/config file in the local repository:

[format]
         numbered = auto
         to =llvm-commits@cs.uiuc.edu
         inline = "---------"

I believe the current policy is not to inline patches, but to attach them. I believe we should keep following this policy to reduce the changes of this transition.

Updating Patches
----------------

Your patches will probably require some editing. git rebase -i and
git add -i are your friends.

For the typical case of editing your patches a bit, use git rebase -i:

git rebase -i HEAD~2

This brings up an editor with a document that looks something like
this:

   pick ef723 Start if conversion work
   pick 443de Middle of if conversion work, something interesting to commit

This is a control file you edit to state how git-rebase should work.
There are essentially three commands pick, edit and squash.

You missed the one I use most: 'fixup'. I use a different approach to edit patches. Here the text, feel free to add parts of it or to ignore it for the sake of simplicity.

Mark Lacey <641@rudkx.com> writes:

David,

A few comments.

    Naming Upstream
    ---------------
   
    The intial clone from upstream results in a git remote reference with
    the rather unhelpful name of "origin." As more remote sources get
    added, it is easy to forget what "origin" is. Therefore, add a remote
    with a more descriptive name.
   
    git remote add llvm-upstream http://llvm.org/git/llvm.git master

If the intent is to rename origin, this can be done directly:
git remote rename origin llvm-upstream

or simpler, do it at clone time:

  git clone --origin llvm-upstream

Note that git clone does not only set a remote, it also sets it as
"upstream" for the master branch, i.e. "git pull" without argument will
fetch from origin. Adding a new remote will not do that (but one can use
the --set-upstream option of various commands to fix that later).

That said, I don't think it's a good idea to ask users to rename their
upstream in a guideline document. Naming upstream is personnal
preference, and keeping the default seems sane for most users. If you
don't remember what "origin" is, then .git/config can remind you.

    Updating LLVM - no local changes
    --------------------------------

Splitting this into "no local changes" / "with local changes" seems
unnecessarily complicated. Why not just recommend doing 'git pull
--rebase' all of the time?

I'd also recommand "git pull --rebase" if the goal is to keep history
linear. Note that this has to go with a big, fat, warning, telling the
user that rebasing published history is bad. Rebase is a very good tool
to work with private history, but as soon as you've pushed it to some
place visible by other people, you should stop using it.

I ask because for something like a reasonably sized feature that might
have multiple commits, having the merge history in place can be
useful, if only to separate the set of related changes from sets of
unrelated changes.

Rebase is exactly meant to do this separation, and avoid history looking
like

- start working on feature
- merge from upstream
- continue working on feature
- merge again from upstream
...

and remove the useless "merge from upstream" commits, that would just
distract reviewers. Keeping merge history when merging several clean,
published branches is good though.

If you look at how Git itself is developped, people use rebase a lot to
send clean patch series, and the maintainer uses merge a lot to merge
multiple patch series together.

I'd also recommand "git pull --rebase" if the goal is to keep history
linear. Note that this has to go with a big, fat, warning, telling the
user that rebasing published history is bad. Rebase is a very good tool
to work with private history, but as soon as you've pushed it to some
place visible by other people, you should stop using it.

This is enormously important - once submodule maintainers starts rebasing, we're screwed.

greened@obbligato.org (David A. Greene) writes:

Updating LLVM - no local changes
--------------------------------

To update your clone to the latest LLVM sources, use git pull:

git pull llvm-upstream

That should probably be either

  git pull

or

  git pull llvm-upstream master

since the first is shorter and works if llvm-upstream is the upstream
for the current branch, and "git pull llvm-upstream" will ask for a
branch name if it isn't.

git cherry-pick 44ef3
git cherry-pick afe3d

The dag looks like this:

...-A---E-F{master}[HEAD]
     \ / /
      B-C-D{ifconvert}

Err, no. cherry-pick won't record B and C as parents.

A common convention would be to call E and F as B' and C' to reflect the
fact that they are different, but similar to B and C.

Also, if you are to draw DAGs in your explanations (which is good), you
should mention gitk and/or git log --oneline --graph so that users can
experiment by themselves.

git rebase llvm-upstream/master

There may be conflicts from this operation. If so, resolve them in
the usual way.

... and use "git rebase --continue" as needed.

Perhaps point to git-rebase(1) for more details.

Now our history looks like this:

...-A-----G-H-I-E'-F'{master}[HEAD]
     \ /
      B-C-D{ifconvert}

I don't get it. B and C were local commits, and you didn't explain the
user how to push them, so they're still local. As I Understand It, G and
H are the remote commits you've just fetched, so G cannot have C as
parent.

You probably wanted to show this history after fetch

      G-H{llvm-uptream}
     /
...-A--E-F{master}[HEAD]

and this one after rebase

...-A-G-H-E'-F'{master}[HEAD]
    

git format-patch -o $HOME/patches/ifconvert --thread --src-prefix=old/ \
                 --dst-prefix=new/ --cover-letter HEAD~1..HEAD

I usually use git send-email right away, it can call format-patch in
modern Gits.

Edit these files to add any commentary you desire.

... comments that are not meant to appear in the commit message should
be added after the --- and before the diffstat in the patches.

?

[format]
        numbered = auto
        thread = shallow

these are the default, so better not annoy new users with them.

git rebase -i HEAD~2

I usually do

git rebase -i origin/master

or in this case

git rebase -i llvm-upstream/master

which has the advantage of offering me to edit local commits, and only
them. I cannot rewrite upstream history by mistake (which would lead to
weird things afterwards).

In this case, let's say that the second commit needs some work. We
edit the control file to do that:

  pick ef723 Start if conversion work
  edit 443de Middle of if conversion work, something interesting to commit

It's not the best example: rebase is not needed at all to edit the last
commit. You should edit the first to make the example more relevant.

git reset --soft HEAD^

git commit -a -c ORIG_HEAD

That seems to be a rather complex and potentially dangerous way of
saying

git commit --amend

git reset --mixed HEAD^

Your files are left in the "untracked files"

... or "changed, but not staged for commit"

state

Now commit:

git push llvm-upstream master

"Now commit" is confusing. If you talk to former SVN users, you can say
"now, do the equivalent of svn commit", and otherwise, "Now, send your
changes to the upstream repository".

My 2 (perhaps 3?) cents,

2) Nobody writing up how git should be used with the current llvm
workflow (which is not going to adapt to an SCM, but the other way
around, which is understandable.)

Here is a first cut at that. Other git users, please chime in with
suggestions, edits, etc. Non-git users, please ask for clarification
where needed. This is based on my notes on working with LLVM via
git-svn, modified to assume native git. There are hundreds of ways to
design a workflow that works with the current review process. This is
but one.

We should add this to the web page once it's polished if we make the
transition to git.

Great that you work on this, but I don't quite get the approach, nor some of your examples (which seems buggy).

I feel that:

a) We don't need another git tutorial - there's a lot of really good ones out there, even ones explaining git from the perspective of a current SVN user.

b) We need a document explaining how patches are reviewed and referred to, what commit rules (and hooks) are in place and how this would work in a git world. It would be short and sweet.

Mark Lacey <641@rudkx.com> writes:

    git remote add llvm-upstream http://llvm.org/git/llvm.git master

If the intent is to rename origin, this can be done directly:
git remote rename origin llvm-upstream

Much better. Thanks.

    Updating LLVM - no local changes
    --------------------------------

Splitting this into "no local changes" / "with local changes" seems unnecessarily complicated. Why not just recommend doing 'git pull --rebase'
all of the time?

Actually git pull can sometimes get you into trouble. Probably git
fetch / git rebase is the better combination for LLVM. I agree my
distinction is artificial but for the users who simply want the most
up-to-date LLVM, git pull is simpler. I certainly debated back a forth.
I have no problem reworking this.

I have to ask as well - is a linear history really desired? That seems
to be the intent of your instructions.

It is a stated requirement of the project.

I ask because for something like a reasonably sized feature that might
have multiple commits, having the merge history in place can be
useful, if only to separate the set of related changes from sets of
unrelated changes. It also aids in reverting an entire feature
composed of multiple commits.

Reverting is just as easy with linear history, I think. Chris has
stated her wants reviews to be a simple as possible and linear history
gets us that.

There are some more interesting points made here:

http://randyfay.com/node/91

Here's an interesting article with a proposal on how to get the best of
both:

http://softwareswirl.blogspot.com/2009/04/truce-in-merge-vs-rebase-war.html

The follow-up linked at the bottom explains how it mgiht be implement.
It requires changes to git, however. Personally, I kind of like the
ideas presented but I doubt it will happen any time soon.

One other comment - the way I use git (and from what I've read I am
not alone), I end up committing multiple times per hour / many times
per day, often into different branches. I then go back and use rebase
to reorder and squash commits into logically related bite-sized chunks
(e.g. combine some commits related to feature A into one or more
commits, some related to feature B into one or more commits, and then
there might be three bug fix commits for bugs C, D, and E.). It might
be helpful to add some guidelines related to this that are in line
with current LLVM review process - somewhat the reverse scenario of
someone asking for commits to be split. In this case, they probably
don't want to see 15 different commits that in total add 75 lines of
code to feature A.

That makes sense. I briefly mentioned squashing in the rebase -i
explanation but I can expand on that.

Thanks for your feedback!

                         -Dave