Downstream distributions as first class citizens in the LLVM repository

Hi all,
I mentioned this idea yesterday on IRC already and would like to discuss
in the greater context of the mailing list. NetBSD is about to import
LLVM and Clang into its repository; FreeBSD already has done that a
while ago. This creates some interesting maintainance questions. FreeBSD
has followed the LLVM/Clang releases and backported various fixes
locally. NetBSD will after the 3.4 release likely end up doing the same.
In the past, this process has created some fragmentation for GCC as
various changed tended to accumulate over time. One part was always the
somewhat tidious process of getting those changes upstream, the other
problem was the difficulty of keeping track of who exactly had what
state.

Luckily with LLVM we are in much better position when it comes to
getting changes integrated, so that's not an issue. There is still the
problem of keeping track of who has which additional (bug fixing)
patches and release management in general. For this purpose I would like
to be able to create "vendor" branches in the main repository to reflect
exactly what it is used by the corresponding downstream repository.
This would increase the visibility of changes by any of the vendors
involved, so that others can pick up the same changes. The impact on
mailing list traffic should be low as changes are relatively rare
compared to the rest of the development speed. Code access should follow
similar practises as release management, e.g. every vendor branch has a
code owner responsible for it.

Joerg

I think Diego implemented something like this for gcc at google.

I would really like to have something like this. However, I think there
should be just be one 'vendor' branch. There are already way too many
forks of LLVM both public and private and having a common branch for
fixes would be very beneficial and save everyone a lot of work. Plus,
I think it would give users with private forks of LLVM an incentive to
contribute changes back to the project.

-Tom

Having a single branch doesn't work as soon as maintaince for releases
comes into the game. Consider FreeBSD 10 shipping with Clang 3.3 and
FreeBSD 11 with Clang 3.4...

Joerg

> > Hi all,
> > I mentioned this idea yesterday on IRC already and would like to discuss
> > in the greater context of the mailing list. NetBSD is about to import
> > LLVM and Clang into its repository; FreeBSD already has done that a
> > while ago. This creates some interesting maintainance questions. FreeBSD
> > has followed the LLVM/Clang releases and backported various fixes
> > locally. NetBSD will after the 3.4 release likely end up doing the same.
> > In the past, this process has created some fragmentation for GCC as
> > various changed tended to accumulate over time. One part was always the
> > somewhat tidious process of getting those changes upstream, the other
> > problem was the difficulty of keeping track of who exactly had what
> > state.
> >
> > Luckily with LLVM we are in much better position when it comes to
> > getting changes integrated, so that's not an issue. There is still the
> > problem of keeping track of who has which additional (bug fixing)
> > patches and release management in general. For this purpose I would like
> > to be able to create "vendor" branches in the main repository to reflect
> > exactly what it is used by the corresponding downstream repository.
> > This would increase the visibility of changes by any of the vendors
> > involved, so that others can pick up the same changes. The impact on
> > mailing list traffic should be low as changes are relatively rare
> > compared to the rest of the development speed. Code access should follow
> > similar practises as release management, e.g. every vendor branch has a
> > code owner responsible for it.
>
> I would really like to have something like this. However, I think there
> should be just be one 'vendor' branch. There are already way too many
> forks of LLVM both public and private and having a common branch for
> fixes would be very beneficial and save everyone a lot of work. Plus,
> I think it would give users with private forks of LLVM an incentive to
> contribute changes back to the project.

Having a single branch doesn't work as soon as maintaince for releases
comes into the game. Consider FreeBSD 10 shipping with Clang 3.3 and
FreeBSD 11 with Clang 3.4...

Of course, what I meant was there should be one branch per version of
clang/llvm. I thought the suggestion was that there should be a NetBSD
branch, a FreeBSD branch, an Ubuntu branch, etc.

-Tom

Yes. Still the same issue applies -- it is quite difficult to keep
downstream always in sync, especially if one platform cares about
certain changes and others don't.

Joerg

<snip />

May I just add a few points

1) Won't get rid of forks - ever.. forget it
2) Branches are "free" - having a single branch for dumping things is unlikely to suit the needs of all the work by everyone
3) Having things consolidated in one more or less easy to find place is better than all over the damn place.

<snip />

May I just add a few points

1) Won't get rid of forks - ever.. forget it
2) Branches are "free" - having a single branch for dumping things is
unlikely to suit the needs of all the work by everyone

I think that having a single stable branch would be the most efficient way to
track bug fixes for older versions, and help reduce the maintenance
burden on people distributing LLVM. If the stable branch doesn't suit
someone's needs then they can still maintain their own branches using the
stable branch as a base. This would be my preference.

That being said, I think the multiple vendor branches would still be
an improvement over what we have now, and I would sign up for a
Mesa branch if this is the way the community decides to go.

-Tom

> > > > Hi all,
> > > > I mentioned this idea yesterday on IRC already and would like to
discuss
> > > > in the greater context of the mailing list. NetBSD is about to
import
> > > > LLVM and Clang into its repository; FreeBSD already has done that a
> > > > while ago. This creates some interesting maintainance questions.
FreeBSD
> > > > has followed the LLVM/Clang releases and backported various fixes
> > > > locally. NetBSD will after the 3.4 release likely end up doing the
same.
> > > > In the past, this process has created some fragmentation for GCC as
> > > > various changed tended to accumulate over time. One part was
always the
> > > > somewhat tidious process of getting those changes upstream, the
other
> > > > problem was the difficulty of keeping track of who exactly had what
> > > > state.
> > > >
> > > > Luckily with LLVM we are in much better position when it comes to
> > > > getting changes integrated, so that's not an issue. There is still
the
> > > > problem of keeping track of who has which additional (bug fixing)
> > > > patches and release management in general. For this purpose I
would like
> > > > to be able to create "vendor" branches in the main repository to
reflect
> > > > exactly what it is used by the corresponding downstream repository.
> > > > This would increase the visibility of changes by any of the vendors
> > > > involved, so that others can pick up the same changes. The impact
on
> > > > mailing list traffic should be low as changes are relatively rare
> > > > compared to the rest of the development speed. Code access should
follow
> > > > similar practises as release management, e.g. every vendor branch
has a
> > > > code owner responsible for it.
> > >
> > > I would really like to have something like this. However, I think
there
> > > should be just be one 'vendor' branch. There are already way too
many
> > > forks of LLVM both public and private and having a common branch for
> > > fixes would be very beneficial and save everyone a lot of work.
Plus,
> > > I think it would give users with private forks of LLVM an incentive
to
> > > contribute changes back to the project.
> >
> > Having a single branch doesn't work as soon as maintaince for releases
> > comes into the game. Consider FreeBSD 10 shipping with Clang 3.3 and
> > FreeBSD 11 with Clang 3.4...
> >
>
> Of course, what I meant was there should be one branch per version of
> clang/llvm. I thought the suggestion was that there should be a NetBSD
> branch, a FreeBSD branch, an Ubuntu branch, etc.

Yes. Still the same issue applies -- it is quite difficult to keep
downstream always in sync, especially if one platform cares about
certain changes and others don't.

Could you maybe give a sampler of the kinds of things that would cause
problems?

I agree. I don’t see how a concept of “official vendor branches” is better than the concept of “stable” branches that take bugfixes. I think it would be simple and work well to just have vendors ask to get patches merged into 3.3.x or 3.4.x (whichever they are based on) stabilization branches, and then do their releases from that.

-Chris

In the FreeBSD tree, we would happily take a patch that fixed a bug that appeared on FreeBSD, even if it caused unacceptable performance regressions on OS X (for example). I would not expect an LLVM stable branch to do the same.

We also have potentially different requirements for ABI stability. We intend to ship clang, LLDB, and some binutils-like tools linked against LLVM libraries, but we explicitly do not support linking anything outside the base system against these libraries, and upgrades to the base system happen atomically. This means that we'd be happy with ABI breakage, as long as APIs used by these components were unchanged (or were simultaneously updated). In contrast, one of the goals of the stable branch that we discussed would be ABI stability.

I'm not entirely sure that there is a big advantage to having the FreeBSD-LLVM[1] in the LLVM tree. We import a snapshot of LLVM into the vendor branch in the FreeBSD tree and then merge it. It's quite easy for us to see any locally-applied diffs. Copying there from some LLVM-hosted svn branch wouldn't be any easier than copying there from trunk.

David

[1] We actually have a few forks of LLVM in various Perforce branches and git repositories with experimental features too, although most of those are intended to be merged upstream eventually.

Hi David,

The situation is similar to the Windows-specific patches now done for LLVM projects.

If a patch is applicable to other OS as well it’s inserted in the main trunk as is.

If it’s really specific to Windows it’s #ifdef and accepted into the trunk.

Either case it’s usually a negligible burden for the developer submitting Windows patches to make sure his patch plays well with the main trunk.

So LLVM is doing well without a Windows branch.

Would FreeBSD be any different?

Yaron

This is more or less the model we follow in GCC. Vendor (and, really,
anybody) can create their own branches.
These branches are usually created off of the main release branch for
a particular version. They are completely under the control of the
third party who created them and they usually contain a few additional
patches over the standard release.

The only requirements for third parties is that they should never
allow a bug in their branch to be filed as a GCC bug unless the same
bug can be reproduced in either trunk or the FSF release branch.

This model has worked well for many years.

At Google, we have a slightly more elaborate branching scheme because
of the amount of work we put in the compiler:

- There is one branch off of trunk (called 'integration') which we use
as a base for internal development. This branch contains a few patches
that are needed to integrate into our build system. These are usually
small changes which either make no sense to anyone else but us, or we
have not yet been able to send upstream. Ideally, this branch should
not exist.

- The branch where we do most of our development is called 'main'. It
contains the bulk of all the changes we make to the compiler for peak
performance in our applications. This branch is a buffer for major
development, which allows us to keep our internal development/release
schedule that is independent from upstream GCC. We are constantly
taking patches out of this branch and proposing them for trunk.

- Release branches. These branches are created from our 'main' branch
and the current FSF release branch. We use them for our internal
releases. It also acts as a continuous release branch which follows
all the minor releases from the FSF. We are merging changes in the
upstream release branches to get bug fixes and minor feature changes.

All these branches are publicly available from the FSF repository.
I've been in environments where branches are kept behind closed doors.
They are nothing but a big headache to maintain.

Diego.