Howdy + GIT

Hello LLVMers,

It has been a while (8 years?) since I’ve been involved with LLVM but I’m considering picking it up again. My recent review of the code base has led me to wonder if it isn’t time to update the code base, specifically:

  • Convert to GIT
  • Refactor into smaller, separate, reusable libraries
  • Host the master repository at GitHub (amongst other places)

My reasons for doing this don’t have to do with the code repository choice, but rather encouraging innovation. LLVM’s private and centralized repository is now hurting the project, I believe. The GIT model fosters better code sharing and GitHub in particular makes integration of new ideas significantly easier. It is my opinion that LLVM is missing out on contributions simply because it doesn’t utilize the modern cloud-based tools.

So, what’s your (developer) response to this? “No way in hell”, “Interesting idea”, “We should do that”, or “Why haven’t we done it yet?”

FYI: Yes, I’m signing up to do a bunch of the leg work if the development community wants this. If not, I’ll make other plans.

Sorry if this topic has been covered … the archives are not searchable (google group, anyone?).

Reid Spencer

My personal responses: I’d love it, but others might object. Not worth it. I think this might be a good idea. There are various searchable archives of the mailing list online. In practice, I find google does a really good job of finding relevant discussion threads.

You can use the git mirrors if you want. Please check the mailing list
for the reasons against it.

Joerg

Hello again! :slight_smile:

I just wanted to note that many of us are already using the official git mirror at http://llvm.org/git/llvm.git and there are other unofficial git mirrors on github. Chapuni even has an all-in-one LLVM git repo at https://github.com/chapuni/llvm-project for easy git bisection.

While many people appreciate the local workflows that git enables, I think as collaborators we appreciate the simple, strictly linear history model that the upstream LLVM SVN repo enforces.

I also think the repository granularity is fine enough today. It is already a challenge for me to keep LLVM, Clang, LLDB, LLD, compiler-rt, and libc++ in sync, and bisecting across them is hard.

In terms of better collaboration tools, I think my biggest pain points have been:

  • Bugzilla isn’t the best issue tracker anymore
  • Occasional downtime for llvm.org and mailing lists
  • Buildbot mail is spammy and inaccurate, but github doesn’t help much here either

As far as code sharing/review tools go, I think Phabricator is doing a pretty good job.

At the end of the day, I don’t think my pain points add up to enough to make it worth switching any infrastructure.

Hi Phillip,

  • Refactor into smaller, separate, reusable libraries

Not worth it.

The effort is trivial with GIT. It is a couple of commands per extracted piece. There is cruft in the repository like hlvm (mine) and the unfinished java compiler, neither of which have been touched in 7 years. Do we really need to keep that legacy in the main repository or is it better spun off to a separate one? Similarly, stacker, sample, llvm-gcc, and probably several others should be in their own repositories. I think smaller repositories make LLVM seem less formidable and should encourage further adoption and contribution.

Sorry if this topic has been covered … the archives are not searchable (google group, anyone?).

There are various searchable archives of the mailing list online. In practice, I find google does a really good job of finding relevant discussion threads.

I find that it doesn’t: searching for “git” in llvm-dev yields 1.8 million results. Trying to sort through the discussions it yields is tedious.

I believe that this is far less of an issue than the massive and often gratuitous API change between versions. It's fine for Google and their rack-scale code refactoring tools, but it's problematic for everyone else unless they get their code in the tree.

The response of 'well, you should get your code in the tree' is not good, because we don't want every possible LLVM consumer to be in the tree (do we want JavaScriptCore to the in the LLVM repo because they use LLVM? What about GHC?).

How much work would you ever get done if every library that you used made significant changes to its public APIs every six months? We see the fallout from this in the FreeBSD ports collection, with the long tail of ports that depend on an old version of LLVM. The graphics stack is still on LLVM 3.3 (DRI drivers need it) and is unlikely to change soon. We've finally managed to get rid of the last ports that depended on 3.2 recently. None of those people is ever going to submit a patch, because even if they did construct one against their version of LLVM, it's so massively different from anything in the tree now that it's likely to be impossible to apply.

There is no other library that I use where I expect to have to rewrite my code that interfaces with it every few months. LLVM is the exception. This is a big barrier to adoption. The selling points of LLVM are:

- You can easily plug in a front end for your language.

- You can easily implement optimisations for your language, or the patterns in your library, on top of LLVM.

The first is sort-of true, but not quite. Using C++ APIs like IRBuilder is much easier than using the C API. We have never had even vaguely stable APIs for common things that optimisations might want to do. Even the clang tooling library, which exists explicitly for third-party consumers, encourages you to use APIs that change every few months.

Note that I'm not talking about A*B*I stability. That's hard for a C++ project, and these days recompiling is not that big a deal. We compile the entire FreeBSD ports collection (around 24K open source packages) every week, on a single machine (it takes about a day). If a library comes with a .so version bump and we need to recompile everything that depends on it, that's not a problem - we do it and the next package set ships with the new version. The packaging tools handle this automatically. Only adopting new A*P*I costs developer time.

As to git / GitHub... We already use git and a repo hosted on GitHub for some downstream projects that require modified LLVM. The fact that upstream uses svn as an authoritative store has precisely zero impact on this and we would not notice any change if the GitHub mirror became the authoritative source.

David

Woah! Long time no see!

I agree with what others have said about converting to git: it’s very useful as for local workflows, but having a simple linear history for LLVM mainline is really handy.

—Owen

I've found gmane and nabble to be of use for searching the mailing lists.

How many *libraries* do you use where your first step on looking at the library was to look in and judge its repository, rather than reading documentation? If I want to use a library, I ask my package manager to install the package for it and I look at API docs (and, ideally, tutorials). I do not know or care what the structure of its code is, until I want to contribute to it, and I don't want to contribute to a library until I've been using it for a while. If I ask my package manager to install libfoobar, and then a few months later an update replaces it with a completely incompatible libfoobar, then I become frustrated. That puts me off contributing to libfoobar long before I ever even look at its repository.

As a user of LLVM, I mostly want a libLLVM.so that I can link against and use. Big shared libraries don't bother me (unless they're running a load of constructors), because they're shared and only the bits I need are going to be paged in. One big monolithic library (as a build product, irrespective of the code layout) is more convenient because I don't need complex pkg-config scripts in my build system to work out which subset of a tightly-connected tangle I'm using.

David

Reasons from 4 years ago don’t necessary still hold.

For those interested:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041671.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-August/042537.html
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-September/042887.html

My take from this thread is that the main issue raised is the lack of monothonic increasing revision number.

Mehdi

Welcome back! ;]

Hello LLVMers,

It has been a while (8 years?) since I’ve been involved with LLVM but I’m
considering picking it up again. My recent review of the code base has led
me to wonder if it isn’t time to update the code base, specifically:

   - Convert to GIT

I looked at this, and my conclusion was that it would provide essentially

no benefit over svn + git mirrors + git-svn, while requiring *substantial*
work to ensure we end up with a clean, linear master history. Within the
community there has been long standing strong desire to continue to have
linear master history. Things like push and merge make the incremental
development and post-commit review process substantially harder.

In essence, git isn't a good fit for our desired master behavior, svn is,
and it is sufficiently easy to *use* git while having an svn master.

   - Refactor into smaller, separate, reusable libraries

I think the libraries are already mostly reusable. Where they aren't, that

should be fixed, but I think it is fine to fix these things somewhat lazily
-- IE, when we have a re-use case in mind.

I think the "smaller" is mostly a function of re-use. There is a fairly
natural factoring that results from this, and it doesn't make a lot of
sense to me to split further.

I think making them *separate* is actually a mistake. I think it would add
substantial complexity to the development process and slow the entire
project down. While there are advantages to this, they don't really seem
large enough to make it worth the cost.

   - Host the master repository at GitHub (amongst other places)

I'm not really opposed to this in any way other than the fact that git

seems like a bad fit for the master.

My reasons for doing this don’t have to do with the code repository
choice, but rather encouraging innovation. LLVM’s private and centralized
repository is now hurting the project, I believe.

What evidence do you see of this? Over the past 7 years, I have seen the
community and project growing by pretty significant degrees. I don't think
this kind of low-level structure is really a significant factor to limiting
the project in any way.

My two cents are essentially: the VCS is not going to significantly impact
the likelihood or effectiveness of people contributing. If they want to
contribute, especially significantly over a long period of time, then they
will steam roll right past this, much like they will other annoyances:
build systems, compiling prerequisites, etc.

The point at which we should change is when we have a reasonable number of
contributors all clamoring for a change with specific reasons why that
change will help them be more productive / effective / etc. This is how we
picked up CMake and now Ninja support for example.

-Chandler

Speaking only as a very minor contributor to LLVM, a move to GitHub would be fantastic. My current workflow involves using LLVM as a git-submodule/cmake-subdirectory. While I would eventually like to move to using the official releases (and remove LLVM as a submodule), in practice I’ve found that I run into enough small issues that I need to stick to master. Thus, after identifying+fixing a minor issue in my LLVM submodule, my workflow looks like:

  1. Clone LLVM SVN repo

  2. Copy patch to SVN

  3. Review documentation for submitting a patch to Phabricator for review

  4. Submit patch and CC mailing list

There's no need to switch to SVN. You can just git format-patch the
relevant commits and attach the patch to a mail. For Phabricator you
should increase the amount of context lines; or use Arcanist which does
diffing and submitting to Phab for you.

-Nico

It’s not clear from your suggestion if you are or are not planning on breaking the repositories into smaller chunks (e.g., how the llvm and clang repositories are hosted as separate git modules). If this is indeed what you are suggesting, then the following paragraph applies: Losing atomicity of commits in a project is a pain that you do not want to have to suffer, speaking as a maintainer of project which is bifurcated in two distinct repositories for the past 6 years. SVN allows you to do partial tree checkouts, a feature which is to my knowledge not replicated by any major DVCS. You thus get the choice between having one monolithic repository for all projects, or making do with non-atomic commits, and that last option is not viable given the coupling between llvm, clang, and compiler-rt at the very least. Separating them would subject developers to the worst kind of development hell that can only be accurately conveyed to those who have had the misfortune to be sentenced there. Or to those who recall CVS or other abominations of version control. :slight_smile: Beyond that, though, my impression of GitHub is that it encourages development models and processes which do not scale to large projects, which llvm undoubtedly is. Its issue tracking system is underpowered, its reviewer underwhelming, and you’re prone to lose important information if you, say, rebase pull requests (at least the last time I checked). It’s designed to encourage its particular development model, and if you want anything that’s different (say, linear history), well, you have to fight it all the way. GMane: . Which is infinitely better than Google Groups.

None of that is insurmountable. Though "out of the box" tools in git don't
support that use case trivially, it certainly isn't impossible.

A hook script to make sure master doesn't allow pushing any merge commits.
A script to [fetch, rebase / cherry-pick, push] instead of a simpler push
command and you're done.
Heck, it should be possible to write a hook script to run on the server
that resolves trivial rebase's automatically.

But I doubt we could be bothered....

That's easy to achive with git by using Gerrit.

A linear master history is as simple as always using "git merge --no-ff
feature-branch". That creates a single commit on the master (current)
branch for the new feature, while allowing the interested to browse the
individual commits as they were made to the branch.

This article is worth reading:

Reid Spencer wrote:

Convert to GIT

I am surprised noone has mentioned the one of the biggest advantages of
Git which is proper author attribution for non-core and drive-by patch
contributors. For example:

    getMangledTypeStr: clarify how it mangles types, and add tests · llvm-mirror/llvm@8f9d113 · GitHub

lists Philip Reames as the Git author but for the commit message lists
artagnon at gmail. Seeing that makes me wonder, how many commits are
authored by one person and committed by another where that second person
*forgets* mention where the patch came from in the commit message.

As for all the reason why the LLVM project does not use Git, I wonder
why large complex projects like the Linux kernel, Wine, MinGW-w64,
GHC and many many others don't seem to have any major problems using
Git.

Erik

Lots of projects are also happy with Mercurial, or BZR, or even CVS.

Every open source community has its own established workflows by which developers interact with and ultimately contribute to the mainline repository. Those workflows, and the common use cases that lead to them, strongly impact what specific SCM arrangements will or will not work.

To take the Linux kernel as an example, they use a very different integration strategy from LLVM that is predicated on having a significant hierarchy of developers whose major roles are to act as integrators for incoming patches. Obviously they use (and built!) an SCM tool that supports their workflow.

LLVM does not use such a workflow. The outcome of several iterations of this discussion on this list has been that, for the LLVM community’s workflow, the advantages of moving mainline to git have not been seen as substantial versus our current arrangement, and it will lose some features that are useful.

Any re-opening of this discussion needs to address the fundamental question of how switching mainline to git will actively help the LLVM community’s development process, and whether those benefits will outweigh the downsides.

—Owen

Hence, “I doubt we could be bothered”, since there is an unknown amount of existing infrastructure and processes that depend on the current svn implementation that would need to be migrated.

Even though git could be used in the same way as svn, why migrate just to re-create the current workflow? Doesn’t make too much sense to me. A migration to git would have to include some other benefit, not just be change for the sake of it.