I think it might be productive to fork this thread to discuss making the requirements for upstreaming a new LLVM target more explicit and open. I’d also like to gauge interest in an idea I’ve discussed privately with a few community members, namely the concept of having a semi-official “incubation” system whereby proposed backends could get a trial run before becoming part of LLVM mainline.
The proposed system would be something like this: a proposed contribution would receive a branch on llvm.org, and have six months (or some other predetermined length of time) to demonstrate that it and its developers are ready for mainline integration. At the end of their term, incubated projects would be evaluated on the following criteria, and either integrated to mainline, judged to be more appropriate as an external project, or given an extension and “needs improvement” feedback on specific criteria.
Active maintainership - Backends bit rot quickly, and unmaintained backends are large maintenance burden on everyone else. We need a core of developers who are going to actively maintain any candidate backend on mainline. That last point is critical: a code drop every six months is not an acceptable level of maintenance for a mainline target.
Contributions to core - Mainlining a new backend adds the expectation that mainline LLVM developers will invest the time and energy to keep your backend building and working (see test plan, below). However, that expectation of extra work doesn’t come for nothing: we expect you to contribute back fixes and improvements that you find, and to work with other community members to coordinate projects as appropriate. When looking at a new backend, I should expect to see few-to-no diffs outside of lib/Target/YourBackend, and a few other places (Triple.cpp, for example). All other changes should already be upstreamed.
Test plan - If you’re going to expect us to maintain and fix your code, then you need to have a good answer to how to test it. This includes, but is not limited to, a good set of regression tests that are comprehensible to normal developers (so we can fix them when they fail due to mainline change), and continuous testing in the form of buildbots or other infrastructure (so we can know when a patch breaks your backend).
Up to date with mainline - All mainline backends must work with top-of-tree LLVM, all of the time. A candidate for inclusion must be developed at, or close to, mainline. In practice, that probably means updating at least once a week, possibly more.
LLVM coding standards - While small deviations can be fixed after mainlining, gross violations of the LLVM code standards and conventions must be fixed prior to integration.
Actually a branch is a better idea for a couple of reasons:
1) It'll help show that you're dedicated to the idea of maintaining the target, and
2) If the overhead of merging is too high for the target there are probably patches
that need to be either a) upstreamed or b) the target needs to be rewritten since it
shouldn't affect the main code base.
The part about other users is a good idea, but how often are users (not developers)
checking out a copy of random ToT llvm in order to use it?
I am in favor of this. I think having specific criteria and time lines
will be beneficial for both maintainers and reviewers.
However, instead of having a separate branch, what do you think about
adding the backend to the main tree, but not building it by default.
This would make it easier for the backend maintainer to keep it up to
date and also make it easier for users to test it. At the same time,
the backend maintainer would still be responsible for updating it for
changes to the LLVM core API, so other developers wouldn't need to worry
about breaking the "backend-in-training".
This is very common for users of the Mesa Project, which the R600/SI
backend is currently a part of. Also some distributions package the
current TOT of various projects to offer users an easy way to try out the latest code.
[...] Also some distributions package the
current TOT of various projects to offer users an easy way to try out the latest code.
This would still require going to every such distribution and telling
them to flip the special build flag that enables building the
While I have no issue with having a more complete and documented method of submitting backends, the problem is the barrier to entry for some backends is being significantly raised, where they did not exist in the past. In the past AMD has reported issues that we have found from internal development to LLVM, along with patches in some cases. Some have been fixed, but others are unique to our backends and still are not in mainline. Many times the response to attempts to get them fixed has been to get the backend in the mainline development tree first. Now that we are finally able to push our backends out publicly, we are getting pushback that we need to contribute in other places first. While I completely agree with 4 of the five points below, the requirement that we contribute to core for things that are outside the scope of the backend seems overly onerous. Tom's AMDGPU backend is not the only backend AMD would like to push into mainline, as our production AMDIL backend is in the pipeline to be added and AMD has announced that the HSA foundations compiler will also be open sourced(which is a third backend from AMD, plus more, see http://www.phoronix.com/scan.php?page=news_item&px=MTEyMzQ). I would just hate to see these get delayed because of barriers to entry that seem artificial or out of scope of the proposed/required changes.
So I guess the issue is, what else is required to the backends to make them acceptable for LLVM.
Just to add my two cents to this discussion, I completely agree with Micah that expecting patch submitters to first contribute to other, unrelated parts of LLVM is unreasonable. Some of us contribute to the project not by working on the core compiler but by using it in our projects. Most interesting projects (like a code generator) require a lot of TLC; developers of such projects do not have time to work on unrelated problems.
When someone asks that their code be integrated into LLVM, I think it's fair to ask whether there is a long-term maintainer and whether anyone will use the new feature(s). Asking them to make their code more maintainable or to conform to coding standards is fair. Asking people to work on unrelated tasks is a waste of their time.
-- John T.
The idea behind asking folks to contribute to the core llvm bits is two-fold:
1) Pretty much every new back-end, especially one as different as a GPU (when compared to the normal RISC-like backends LLVM is used to), exposes bugs, inefficiencies, and unsupported edge cases, at the least. Improvements to those things can, and should, be done on trunk as much as possible.
2) Mainline patches that are smaller and of a more easily reviewed nature than "here's a new backend" give new developers a chance to develop a solid working relationship with the rest of the core developers and to get a better feel for what folks mean by phrases like, "llvm best practices," "llvm coding style," and "incremental development." This ties in well with (1), as patches for those sorts of things are excellent candidates for (2).
That is, the idea is to leverage the target independent bits of the work a new back end will require anyway to accomplish building the trust and history that's being requested prior to committing a large new backend to trunk. It's not additional work, but rather a means to use work that already needs to be done in order to accomplish the political goals.
While I have no issue with having a more complete and documented method of submitting backends, the problem is the barrier to entry for some backends is being significantly raised, where they did not exist in the past. In the past AMD has reported issues that we have found from internal development to LLVM, along with patches in some cases. Some have been fixed, but others are unique to our backends and still are not in mainline. Many times the response to attempts to get them fixed has been to get the backend in the mainline development tree first. Now that we are finally able to push our backends out publicly, we are getting pushback that we need to contribute in other places first. While I completely agree with 4 of the five points below, the requirement that we contribute to core for things that are outside the scope of the backend seems overly onerous. Tom's AMDGPU backend is not the only backend AMD would like to push into mainline, as our production AMDIL backend is!
Yeah, in these cases, we did not have the authority to submit our backends
until relatively recently(last December was when legal/code approval
finally came through).
So like I mentioned in the other email, the bottleneck is getting the
code into the public, from that point on, we can work on conforming to
the LLVM style. I agree with the incubation style approach, that could
solve these issues.
I imagine that the linux kernel development process has a similar
issue with adding new architectures since the parallels are pretty
significant (incremental development, time-based releases, etc.). It
might be a huge win for one of the more senior community members to
inquire on the LKML about how this similar problem is handled in their
community, whether there is anything that LLVM is currently doing that
"didn't work" for them, and general advice.