[RFC] Introduce an LLVM "Incubator" Process

Hi all,

Today, we maintain a high bar for getting a new subproject into LLVM: first a subproject has to be built far enough along to “prove its worth” to be part of the LLVM monorepo (e.g. demonstrate community, etc). Once conceptually approved, it needs to follow all of the policies and practices expected by an LLVM subproject.

This is problematic for a couple reasons: it implicitly means that projects have to start somewhere else but proactively decide to follow LLVM design methodology and principles in the hope of being accepted. It is sometimes socially difficult to get these projects going because there are many other forces that could encourage other practices. For example, I personally encountered this at Google with MLIR - “why aren’t you using Google coding standards?”, several of us are currently discussing this in a new skunkworks project in the “compilers for hardware” world, and the Flang and other projects have found this challenging in the past. Once the project gets to a point of critical mass with the “wrong” approach, it is very difficult and expensive to convert to the LLVM style, and from a social perspective, inertia sometimes leads to forking off to separate projects instead of folding back in to LLVM.

A former colleague recently suggested the idea of introducing an incubator process of some sort (e.g. xref the Apache version of this idea). I think this is a really interesting idea, and it is much easier now that the majority of the “official” code is in the LLVM monorepo.

Here is a sketch of how this could work:

  • We maintain the same high bar to get into the LLVM monorepo, LLVM CI etc. No change here.

  • We have a very light-weight proposal process that allows people to create incubator projects in the LLVM organization, with no code up front. The project would be required to have e.g. a charter document and README.

  • Such projects are required to follow the LLVM developer policy, coding standards, CoC, etc, but can define their own stability and evolution process, code owners, etc.

  • When the project is ready to graduate, it would follow the existing process for becoming a first-class part of the mono repo.

  • We have some policy on when to retire/delete projects, which can be ironed out the first time it comes up (e.g. start with a nomination).

  • We could even try to help encourage new projects to include a ‘mentor’ that has experience with the LLVM project to help nudge things in the right direction and encourage proper development approach.

What do you think? Is anyone interested in helping to write up a more detailed proposal?

-Chris

+1 from me. I think this is a great idea. A few things I’ve been thinking about recently:

  1. What are best practices for developing ‘out-of-tree’ projects like this? In addition to coding guidelines, code of conduct, etc., these projects would probably benefit from common structures on cmake integration (e.g LLVM_EXTERNAL_PROJECT vs. live in llvm/projects vs. cmake configuration export?), buildbots, and code review. Of course projects should probably be allowed approach these things in new ways if they desire.

  2. Managing dependencies is often tricky, and could arise between several incubated projects. Even the ‘simple’ dependency on LLVM can be tricky to manage. I’ve often done this by using git-submodules, but this is far from perfect. Some projects put effort into supporting lots of different LLVM versions, with varying levels of success.

  3. Some LLVM projects have nice extensibility interfaces (e.g. “Targets”) which are relatively easy to live out of tree. Others LLVM projects lack such well-defined interfaces and commonly result in merge conflicts. Some projects could be relatively tightly integrated, whereas others (MLIR) might not.

Steve

Don't have especially strong feelings in any direction - though I can
see/do appreciate the concerns & the friction on both sides (the
external project looking to fold into LLVM, and LLVM itself dealing
with a project/community/culture that diverges from LLVM itself, etc).
How did some of the earlier projects start - such as Clang, or
Compiler-rt?

Also, this sounds like it might be a bit of a generalization of the
"experimental targets" process? Could that process be subsumed by this
new more general process, perhaps?

Hi Chris,

I think this is a great idea. Indeed, MLIR would have been a good case.

In addition to your points, the problems we have to worry about are also:

- The more subprojects we have, the larger the repo will be, the
harder it will be to know what's what.
- Adding and removing projects can create infrastructure problems in
testing and downstream processes.

What is the community adoption rate we want to start a new project?
Couple of people "really" interested and fast moving? A lot of people
and a slow moving project? I think all of the above, but that would
pollute the monorepo, creating the second problem.

So, while the idea is great and I'd love to see it running, I think
some deeper consideration should be done in answering the question:
how do we decide where the new project is going to live?

I can see three possible answers: outside of the LLVM project, inside
the project as a new repo, inside the monorepo. Each with different
criteria, and a clause where community pressure can revise those
criteria at some time, and we could move (not necessarily remove) the
projects later.

Makes sense?

cheers,
--renato

Hi Renato,

To clarify, I’m specifically proposing that these “incubator” projects be in the LLVM organization (under GitHub.com/llvm) but *not* in the LLVM monorepo.

-Chris

Ah! Sorry, I read wrong. That makes more sense. LGTM! :slight_smile:

Thanks Chris. As the “former colleague” my +1 is a bit implied :slight_smile:

I took the liberty of drafting an actual proposal doc: https://github.com/stellaraccident/llvm-www/blob/master/proposals/LP0002-LLVMIncubator.md

From a process perspective, I’m not entirely clear on the next steps here (and this is the first proposal after the proposal to have a proposal process – so I guess we’re dogfooding it). In my mind, even though there seems to be consensus on this RFC thread to move forward, this seems like a large enough change that we should commit a proposal to memorialize it (I imagine we’re going to revise it over the years, and the history will be useful). Should I create a separate “PITCH” thread or just commit a version of the above proposal for further revision? I’m also happy to send it out for an actual review but have actually never made changes to the llvm-www repo and don’t know how we review such things.

Happy to do whatever to move this forward!

(and sorry for the odd “from” name – I usually respond from a different mail client on this account and never realized my profile was set up wrong. My real name is Stella Laurenzo, not Stellar Accident)

Hi Stella,

I’ll give a +1 to this as well. This is something that’s been bugging me for a while - especially as I’ve helped review some of the examples brought up and said “why wasn’t this done…”.

Thanks for writing this up.

-eric

Thanks Chris. As the “former colleague” my +1 is a bit implied :slight_smile:

I took the liberty of drafting an actual proposal doc: https://github.com/stellaraccident/llvm-www/blob/master/proposals/LP0002-LLVMIncubator.md

From a process perspective, I’m not entirely clear on the next steps here (and this is the first proposal after the proposal to have a proposal process – so I guess we’re dogfooding it). In my mind, even though there seems to be consensus on this RFC thread to move forward, this seems like a large enough change that we should commit a proposal to memorialize it (I imagine we’re going to revise it over the years, and the history will be useful). Should I create a separate “PITCH” thread or just commit a version of the above proposal for further revision? I’m also happy to send it out for an actual review but have actually never made changes to the llvm-www repo and don’t know how we review such things.

My understanding of the proposed process is that the proposal/PITCH process is only happening on controversial RFCs:

If it can be resolved through normal means, great - no need for additional process. We expect this to continue to be the common case. If a discussion turns controversial, escalate the RFC into a “proposal pitch”, to help frame both sides of the discussion.

The current RFC thread does not seem to have reached this point.

Hey Chris,

Hi all,

Today, we maintain a high bar for getting a new subproject into LLVM: first a subproject has to be built far enough along to “prove its worth” to be part of the LLVM monorepo (e.g. demonstrate community, etc). Once conceptually approved, it needs to follow all of the policies and practices expected by an LLVM subproject.

This is problematic for a couple reasons: it implicitly means that projects have to start somewhere else but proactively decide to follow LLVM design methodology and principles in the hope of being accepted. It is sometimes socially difficult to get these projects going because there are many other forces that could encourage other practices. For example, I personally encountered this at Google with MLIR - “why aren’t you using Google coding standards?”, several of us are currently discussing this in a new skunkworks project in the “compilers for hardware” world, and the Flang and other projects have found this challenging in the past. Once the project gets to a point of critical mass with the “wrong” approach, it is very difficult and expensive to convert to the LLVM style, and from a social perspective, inertia sometimes leads to forking off to separate projects instead of folding back in to LLVM.

A former colleague recently suggested the idea of introducing an incubator process of some sort (e.g. xref the Apache version of this idea). I think this is a really interesting idea, and it is much easier now that the majority of the “official” code is in the LLVM monorepo.

Here is a sketch of how this could work:

  • We maintain the same high bar to get into the LLVM monorepo, LLVM CI etc. No change here.

  • We have a very light-weight proposal process that allows people to create incubator projects in the LLVM organization, with no code up front. The project would be required to have e.g. a charter document and README.

Since the “incubator” aspect of this is that these projects are intended to integrate the monorepo ultimately (or be deleted from the organization), that means that the charter when starting such a project should have consensus that it conceptually belongs to the monorepo ultimately, right? I assume this is the main part of the “light-weight proposal process”?

  • Such projects are required to follow the LLVM developer policy, coding standards, CoC, etc, but can define their own stability and evolution process, code owners, etc.

  • When the project is ready to graduate, it would follow the existing process for becoming a first-class part of the mono repo.

  • We have some policy on when to retire/delete projects, which can be ironed out the first time it comes up (e.g. start with a nomination).

  • We could even try to help encourage new projects to include a ‘mentor’ that has experience with the LLVM project to help nudge things in the right direction and encourage proper development approach.

+1 for me overall.

Thanks,

Hey Chris,

Hi all,

Today, we maintain a high bar for getting a new subproject into LLVM: first a subproject has to be built far enough along to “prove its worth” to be part of the LLVM monorepo (e.g. demonstrate community, etc). Once conceptually approved, it needs to follow all of the policies and practices expected by an LLVM subproject.

This is problematic for a couple reasons: it implicitly means that projects have to start somewhere else but proactively decide to follow LLVM design methodology and principles in the hope of being accepted. It is sometimes socially difficult to get these projects going because there are many other forces that could encourage other practices. For example, I personally encountered this at Google with MLIR - “why aren’t you using Google coding standards?”, several of us are currently discussing this in a new skunkworks project in the “compilers for hardware” world, and the Flang and other projects have found this challenging in the past. Once the project gets to a point of critical mass with the “wrong” approach, it is very difficult and expensive to convert to the LLVM style, and from a social perspective, inertia sometimes leads to forking off to separate projects instead of folding back in to LLVM.

A former colleague recently suggested the idea of introducing an incubator process of some sort (e.g. xref the Apache version of this idea). I think this is a really interesting idea, and it is much easier now that the majority of the “official” code is in the LLVM monorepo.

Here is a sketch of how this could work:

  • We maintain the same high bar to get into the LLVM monorepo, LLVM CI etc. No change here.

  • We have a very light-weight proposal process that allows people to create incubator projects in the LLVM organization, with no code up front. The project would be required to have e.g. a charter document and README.

Since the “incubator” aspect of this is that these projects are intended to integrate the monorepo ultimately (or be deleted from the organization), that means that the charter when starting such a project should have consensus that it conceptually belongs to the monorepo ultimately, right? I assume this is the main part of the “light-weight proposal process”?

I think that logically follows, but I would be explicit about evaluating proposals fairly pragmatically (and we could write down some guidelines). There are multiple reasons that I have seen for wanting such incubation repositories, and many of them fall more into the governance and community-alignment category versus a strictly visible code connection to the monorepo (from the outset). There is also an element to incubating new things that creates a certain evolutionary expectation (i.e. might prompt further thought or organizational work on the monorepo vs just a “it would slot in here” kind of judgment). Also, I’ve seen a “what you can’t see and talk about, you can’t collaborate on” theme that would bias me to have more of a guideline-driven default accept policy vs consensus-based evaluation of the technical details. It should be fairly easy to get in, and consequently also, fairly easy to move out (in whole or part) if the alignment doesn’t emerge.

If you were to give me 60 seconds to formulate what my checklist for consideration would be, here are some of the points:

  1. Is this expected to build on or extend by way of dependency an existing core LLVM project?
  2. If successful, could this project provide components that would be of use to a broader audience that uses LLVM technology?
  3. Are the project goals roughly “in kind” with other parts of LLVM (i.e. more language/compiler based rather than, say, word processor based)?
  4. Is there potential, tangible mutual advantage to having the LLVM Foundation’s governance model applied to the project from its inception?
  5. Will the LLVM Foundation hosting this project aid LLVM contributors to better collaborate on the project?
    If it were me voting, I would consider #4 and #5 to be sufficient to vote yes, regardless of any ambiguity or unresolved issues about technical alignment. That is all off the top of my head and not deeply thought through.

So in short, at the outset, I would expect to at least be able to imagine a future where some portion of the project graduates to the monorepo, but I would be fairly lenient on being able to see the path from the outset.

Thanks Chris. As the “former colleague” my +1 is a bit implied :slight_smile:

I took the liberty of drafting an actual proposal doc: https://github.com/stellaraccident/llvm-www/blob/master/proposals/LP0002-LLVMIncubator.md

From a process perspective, I’m not entirely clear on the next steps here (and this is the first proposal after the proposal to have a proposal process – so I guess we’re dogfooding it). In my mind, even though there seems to be consensus on this RFC thread to move forward, this seems like a large enough change that we should commit a proposal to memorialize it (I imagine we’re going to revise it over the years, and the history will be useful). Should I create a separate “PITCH” thread or just commit a version of the above proposal for further revision? I’m also happy to send it out for an actual review but have actually never made changes to the llvm-www repo and don’t know how we review such things.

My understanding of the proposed process is that the proposal/PITCH process is only happening on controversial RFCs:

If it can be resolved through normal means, great - no need for additional process. We expect this to continue to be the common case. If a discussion turns controversial, escalate the RFC into a “proposal pitch”, to help frame both sides of the discussion.

The current RFC thread does not seem to have reached this point.

Sg - wasn’t looking to add process, but it was suggested OT that if I had the time, it might be useful to write a proposal doc. And then that is where I found the (new) proposal process.

It seems like the proposal process was created primarily for formalizing decision making for controversial items. Is it possible that it can/should also be used to memorialize “significant” policy decisions (even if not controversial)?

Hi Chris, hi folks,

The idea is great, obviously :slight_smile:

Here are some questions/concerns regarding the development process of such projects.

TL;DR: we need to outline development process and create some template for the new incubated projects.

1. Can those projects use pull requests?
If yes, then would they have to switch to the Phabricator as soon as they land in the monorepo?
If not, would they have to start with the Phabricator from the very beginning?
Either way, it could be a barrier/needless overhead for contributors.

2. Build system integration.
If I understand correctly, projects in the monorepo should strictly follow the LLVM’s approach to CMake.
This implies some limitations, or additional overhead for the maintainers if they want to overcome those limitations.
Anyhow, I think it is necessary to have a template for new projects that want to participate in the incubator program.

3. Continuous Integration
Currently, there is a number of great services that provide CI for OSS projects, while LLVM uses its own infrastructure for that.
So which CI system the projects should use? Extending LLVM infra for each incubated projects seems to be an overhead for the infra team.
On the other hand, maintainers would have to migrate their CI setup to the LLVM infra as soon as the project gets into the monorepo.

The concerns are coming from a practical experience: if I would want to include Mull[1] into LLVM at this stage, then I have to:

- rewrite the build system
- give up pull requests
- give up CI setup and
- (as the consequence) give up nightly builds and easy binary distribution
- reformat the source code :smiley:

[1] https://github.com/mull-project/mull

This seems pretty reasonable. There is a technical process question:
What does the incubator project's Git repository look like?
Specifically, I would recommend that projects that are serious about
this are started as a fork of the llvm-project mono-repository. They'd
live as a separate GitHub repository in the github.com/llvm/
organization, but share history. There'd be an expectation that such
projects regularly merge LLVM master (or rebase on top of it, but that
seems less likely for long-running projects) -- maybe once per LLVM
release cycle initially, and then more frequently as the project
becomes serious about being integrated in the monorepo. This allows an
eventual smooth merging of the project while keeping all history
without weird artefacts.

Cheers,
Nicolai

  1. What are best practices for developing ‘out-of-tree’ projects like this?

We have the example of Polly here, which recently adopted the « compiler extension » framework, which makes it possible to write new passes that work either as plugins or built-in passes, based on configuration options. I think that’s something we should favor when it makes sense.

Had there been an llvm incubator process, I would have encouraged flang to apply.

Flang became an llvm subproject this spring after a few years of development in github/flang-compiler.

The llvm development process and tools are basically the same as what we had been using in github. Transitioning to phabricator was not difficult.

The main technical difficulty with the transition was the integration of flang with the llvm build system (cmake files). LLVM is big, the build system is complex, test builds are slow, and there are a great many different configurations that must be supported.

The volume of changes coming from llvm-project is huge compared to flang-compiler/f18 -- we dialed back our CI to only trigger builds when files under flang or mlir change.

- Steve

    External email: Use caution opening links or attachments

    > - We maintain the same high bar to get into the LLVM monorepo, LLVM CI etc. No change here.
    >
    > - We have a very light-weight proposal process that allows people to create incubator projects in the LLVM organization, with no code up front. The project would be required to have e.g. a charter document and README.
    >
    > - Such projects are required to follow the LLVM developer policy, coding standards, CoC, etc, but can define their own stability and evolution process, code owners, etc.
    >
    > - When the project is ready to graduate, it would follow the existing process for becoming a first-class part of the mono repo.
    >
    > - We have some policy on when to retire/delete projects, which can be ironed out the first time it comes up (e.g. start with a nomination).
    >
    > - We could even try to help encourage new projects to include a ‘mentor’ that has experience with the LLVM project to help nudge things in the right direction and encourage proper development approach.
    >
    > What do you think? Is anyone interested in helping to write up a more detailed proposal?

    This seems pretty reasonable. There is a technical process question:
    What does the incubator project's Git repository look like?
    Specifically, I would recommend that projects that are serious about
    this are started as a fork of the llvm-project mono-repository. They'd
    live as a separate GitHub repository in the github.com/llvm/
    organization, but share history. There'd be an expectation that such
    projects regularly merge LLVM master (or rebase on top of it, but that
    seems less likely for long-running projects) -- maybe once per LLVM
    release cycle initially, and then more frequently as the project
    becomes serious about being integrated in the monorepo. This allows an
    eventual smooth merging of the project while keeping all history
    without weird artefacts.

    Cheers,
    Nicolai

Generally +1 on the idea.

This does sound like an extension of the existing “experimental backend” idea. At least at first, it sounds like the two are separate - experimental backends live in monorepo, incubation projects don’t - but there’s definitely some experience we can learn from and adapt.

One concern I have is about fragmentation and branding. Specifically, I’m not sure an end state with a bunch of incubator projects under the LLVM umbrella with distinct developer communities is something we’d want to encourage. This might need some discussion more broadly, but a few specific ideas:

  • Maybe we should require each incubator proposal to have a sponsor from within the existing community? This doesn’t have to be the lead or proposal author, but someone who already contributes who stands up and says they think this is beneficial to LLVM long term, and are willing to put some level (tbd) of supervision and steering into it.
  • Maybe we should be careful on the wording we require for describing such a project? Perception matters, and I’m hesitant to see discussions about “bugs in LLVM” if one incubator has quality issues. Maybe specifically require READMEs to be explicit about incubation status and strong discourage the use of “an LLVM project” and related phrases for incubators? (e.g. Recommend “X, an LLVM incubator” instead?)

If we’re going to have looser standards for incubators, I think we need to be very explicit about eventual “promotion”. It needs to be very clear in our definition of incubator which items must be fixed for inclusion in mono-repo, and which items must be fixed to be non-experimental. (If experimental is a distinct we maintain at least.)

Philip

Generally +1 on the idea.

This does sound like an extension of the existing “experimental backend” idea. At least at first, it sounds like the two are separate - experimental backends live in monorepo, incubation projects don’t - but there’s definitely some experience we can learn from and adapt.

One concern I have is about fragmentation and branding. Specifically, I’m not sure an end state with a bunch of incubator projects under the LLVM umbrella with distinct developer communities is something we’d want to encourage. This might need some discussion more broadly, but a few specific ideas:

  • Maybe we should require each incubator proposal to have a sponsor from within the existing community? This doesn’t have to be the lead or proposal author, but someone who already contributes who stands up and says they think this is beneficial to LLVM long term, and are willing to put some level (tbd) of supervision and steering into it.
  • Maybe we should be careful on the wording we require for describing such a project? Perception matters, and I’m hesitant to see discussions about “bugs in LLVM” if one incubator has quality issues. Maybe specifically require READMEs to be explicit about incubation status and strong discourage the use of “an LLVM project” and related phrases for incubators? (e.g. Recommend “X, an LLVM incubator” instead?)

Fwiw, the ASF Incubator has guidelines to similar ends: http://incubator.apache.org/guides/branding.html (see specifically the “Disclaimers”).

What do you think? Is anyone interested in helping to write up a more detailed proposal?

If you’d like a guinea pig for such a project, then you might consider the LLVM fork I’m working on now.

I’m developing an LLVM backend that emits code for a classic, well-known 8-bit architecture. As of this writing, the backend can assemble and disassemble machine code for the architecture, including fixups, relocations and relaxations. It also emits ELF and object files that interoperate with lld, llvm-objcopy, llvm-mc, etc.

I’ve tried to be a good boy and follow LLVM conventions, including forking from master and rebasing my changes into sensible pieces. I also run full builds on each checkin for three platforms. But I’m not 100% sure I’ve done the groundwork correctly.

Rather than post a link to my repo here, I’d appreciate the opportunity to submit the repo into your process, and see if I can get it into your incubator.

Thanks for your kind consideration.