I’d like to start a discussion about how to improve some important issues we have in the LLVM community, regarding our license and patent policy. Before we get started, I’d like to emphasize that this is an RFC, intended for discussion. There is no time pressure to do something fast here – we want to do the right long-term thing for the community (though we also don’t want unnecessary delay, because there are real problems that need to be solved). My primary goal is to kick off the discussion now so that we can use time at the LLVM Dev Meeting to talk about ideas and concerns in person.
If you’re not familiar, our current relevant policies are laid out here, in the “license” and “patents” section:
This is a long email, so I’ve broken it up into three sections:
- The problem
- Potential solutions
- Recommended path forward
The TL;DR version of this is that I think we should discuss relicensing all of LLVM under the Apache 2.0 license and add a runtime exception clause. See below for a lot more details.
We, as a community, have three major problems to solve in this space. The first two are active problems, the third is a lurking one:
- Some contributors are actively blocked from contributing code to LLVM.
These contributors have been holding back patches for quite some time that they’d like to upstream. The root issue is that the wording in the Patent section is fuzzy, novel to LLVM, and overly broad. Corporate contributors (in particular) often have patents on many different things, and while it is reasonable for them to grant access to patents related to LLVM, the wording in the Developer Policy can be interpreted to imply that unrelated parts of their IP could accidentally be granted to LLVM (through “scope creep”). I will not make any claims about whether these fears are founded or not, but I will observe that this is a serious and current problem for the community.
- We cannot easily move code from LLVM to compiler_rt (for example).
We currently have a schism between the compiler code (licensed under the UIUC license) and the runtime library code (licensed under the UIUC & MIT license). This schism exists because the UIUC license carries what is known as a “binary attribution clause”. This clause requires someone to acknowledge their use of LLVM if they link part of the compiler itself into their app (e.g. in a readme, or an “about” panel). This is reasonable for the compiler, but isn’t reasonable for runtime libraries - you shouldn’t have to acknowledge LLVM just because you happened to build your app with Clang!
Our previous approach to solving this problem was to dual license the runtime libraries under both the UIUC and MIT licenses, because the MIT license doesn’t carry a binary attribution clause. This solved the attribution problem but prevents us from moving code from LLVM to compiler_rt, because the contributor may not have agreed to the use of the code under the MIT license. This is an active problem for ASAN and other technologies, that (e.g.) might want to use MC disassembler functionality from a runtime library.
- The patent coverage provided by the Developer Policy may not provide the protection it intends.
The UIUC and MIT licenses have nothing to say about patents, and the wording in the patents section of the Developer Policy was written by a well-intentioned grad student, not by a lawyer. The wording is fuzzy, imprecise, and potentially incomplete and so it probably doesn’t provide the protection it intends. Fortunately, to my knowledge, this protection hasn’t been tested, but if it ever was, it would be very bad for the community overall. Lack of protection could also lead to a potential user deciding not to use LLVM, if they perceived such use to be too risky.
The board decided to explore this area, and several of us have spent months looking into this space to see what we can do to improve the situation. This is a complicated topic that deals with legal issues and our primary goal is to unblock contributions from specific corporate contributors. Because of that, we’ve done some leg-work exploring options with a few of the largest contributors in this space. The board’s role here is to organize and facilitate a discussion that leads to progress, and I’d like to share what we’ve found so we (as a community) can discuss the options further.
We have four major options here, each of which implies that the license/patent section of the Developer Policy would be removed and rewritten. Once we agree on a course of action, we can discuss exact wording and roll-out strategies:
- We could introduce a novel legal solution.
We discussed a large number of different solutions that involve coming up with something new. For example, we could write an entirely new license, we could write an entirely new patent grant, we could take existing license or grant language and modify it, etc. This approach seems appealing in that we could get something specifically tailored to LLVM and its contributors.
Unfortunately, there are many problems with this. For example, using well known license and patent mechanics makes it much easier for new contributors to get permission to use and contribute to LLVM, because this often requires approval from a corporate legal team, and reviewing something novel takes a long time and a lot of energy. If we go with a well known / standard approach, everything goes simpler and faster.
Second, in legal circles, a novel solution (no matter how well intended) is almost always considered to be a bad thing, because it hasn’t been legally tested and hasn’t been scrutinized as much as existing ones. Third, designing such a thing is extremely complicated, because lawyers will all want to optimize the result for their specific organization’s interests, so coming to an agreement on such a new document will take a very long time and may be impossible to get actual agreement. We spent many of the months talking about possibilities in this space, so we’ve seen some of this in action with no closure in sight.
- We could require new contributors to sign the Apache CLA.
The Apache CLA (e.g. the Google CLA is one example of it) is a well known and highly respected way to define and scope guarantees when it comes to patent contribution and coverage. These are the specific forms I’m referring to: the first is for individuals and the second is for corporate contributors:
The upshot is that adding the Apache CLA would solve some real problems. It would unblock contribution by properly scoping patent contributions and it would ease fear of being sued (and though it wouldn’t help us with the binary attribution clause problem with the runtime libraries, we could add a runtime exception to solve that). Rolling this out would be very straight-forward: we could start requiring the CLA for new contributions, which provides guarantees going forward, and we could even try to get earlier contributors to provide retroactive coverage for previous contributions.
Unfortunately, adding the Apache CLA also has several disadvantages as well:
It adds new barriers for new contributors to LLVM. We don’t currently have a process where you need to sign (or click through) a form, and adding one is a barrier in certain situations (e.g. it requires individuals to disclose sensitive personal information like mailing addresses etc, and may require extra levels of legal approval in corporate situations).
It significantly increases the burden for the LLVM project, because we need to track a bunch of more data for each contributor (including their current employment affiliation) to verify whether we are allowed to accept a patch. Engineers move around between companies all of the time, and it is problematic (and invasive into personal privacy) for llvm.org to have to know about this.
The CLA requires corporations to keep an updated list of which employees are allowed to contribute from their company (see Schedule A of http://www.apache.org/licenses/cla-corporate.txt). This is a significant burden on llvm.org as well as on corporate contributors. The logical end result of this is that a company would designate a few people as contributors and funnel many other people’s work through them, which is not good for the existing successful engineering culture of the project.
The CLA has specific wording that lawyers at multiple prominent corporate contributors are reluctant to sign. The changes they request are small, but once we make changes to the CLA, it is now a novel document and we end up with all the problems of solution #1.
The CLA also provides power that I (personally) don’t think we “want" as a community. For example, it would allow the LLVM Foundation to arbitrarily relicense the project without approval from the copyright holders. While it may seem ironic based on what I’m suggesting below, I think that it is in the best interest of the project for any relicensing effort to be a painful and expensive process that requires contacting all contributors. Changing the license of the project is a big deal, and not something we should do frequently. Further, some individuals and corporations are wary of contributing to a project when their code can be taken and relicensed to something that they didn’t agree to, and we may lose them if we start requiring that.
- We could relicense all of LLVM under the Apache 2.0 license and add a runtime exception.
The Apache 2.0 license is a well known, widely used, and highly respected way to define and scope guarantees when it comes to patent contribution and coverage. This is the license in question:
The runtime exception would be a short addendum. For example, it could be something like:
“As an exception, if you use this Software to compile your source code and portions of this Software are embedded into the binary product as a result, you may redistribute such product without providing attribution as would otherwise be required by Sections 4(a), 4(b) and 4(d) of the License.”
This approach solves all three of the problems above: it would unblock the contributors in question by providing a known scope of patent grant based on contribution and it provides patent protection to users of LLVM. Adding a runtime exception solves the runtime library issue, and is a standard way that compilers do this sort of thing (e.g. GCC does something similar to avoid compiled code having to be GPL compatible).
Switching LLVM to the Apache 2.0 license has some large advantages:
We believe that we can use the license as-is, which avoids having to design a novel solution. Many companies already contribute to projects that use the Apache 2.0 license. Some of the companies we have spoken to have responded with comments like “Apache 2? Of course that would be fine with us.”
The patent coverage in license is “self executing,” which means that coverage is associated with a contributor putting code into the repository. There are no additional CLAs to agree to, no book-keeping or process issues for llvm.org or corporate contributors, no personal information that needs to be distributed, etc. A company or individual merely needs to decide whether they want to contribute a patch or not (a decision they need to make in any case!) and everything else is automatic.
However, it also has one big disadvantage: the time and cost of doing it. Relicensing a large code base with many contributors is expensive, time consuming, and complicated. However, it has been done with other large projects before (e.g. Mozilla) and in our early analysis, we believe it can be done with LLVM too.
- We could switch to another well-known solution.
We discussed many different well known licenses, CLAs, and other approaches to solving these problems. None of the ones considered fared as well as the Apache CLA or the Apache 2.0 license, but of course something may have been missed.
Recommended path forward
NOTE: This is an RFC - this is a recommendation, not a dictate
This probably isn’t a surprise at this point, but the LLVM Foundation board and I recommend that we relicense all of the LLVM Project code (including clang, lldb, …) currently under the UIUC and MIT licenses (i.e. not including the GPL bits like llvm-gcc and dragonegg) under the Apache 2 license. We feel that it is important to do something right for the long term, even if it is a complicated and slow process, than to do the wrong thing in the short term.
It will take quite some time to roll this out - potentially 18 months or more, but we believe it is possible and have some thoughts about how to do it. We have confirmed that choosing to go down this path would immediately unblock contributions from the affected companies, because we could start accepting new contributions under the terms of the UIUC/MIT/Apache licenses together and repeal the wording in the developer policy. If we get broad agreement that this is the right direction to go, we can lay out our early ideas for how to do it, and debate and fully bake that plan in public discussion.
With all this said, I’d love to hear what you all think. If you have a specific comment or discussion about one aspect of this, it might be best to start a new thread. I’d also appreciated it if you would mention whether you are a current active contributor to LLVM, whether you are using LLVM but blocked from contribution (and whether this approach would unblock that) or if you’re an interested community member that is following or using LLVM, but not currently (or on the edge of) contributing. I’d be particularly interested if you are blocked from contributing but this proposal doesn’t solve your concern.
If you plan to attend the developer meeting and would like to discuss it there, the LLVM Foundation BOF would be a great place to talk about this.