RFC: Improving license & patent issues in the LLVM community

Hi Everyone,

I’d like to start a discussion about how to improve some important issues we have in the LLVM community, regarding our license and patent policy. Before we get started, I’d like to emphasize that this is an RFC, intended for discussion. There is no time pressure to do something fast here – we want to do the right long-term thing for the community (though we also don’t want unnecessary delay, because there are real problems that need to be solved). My primary goal is to kick off the discussion now so that we can use time at the LLVM Dev Meeting to talk about ideas and concerns in person.

If you’re not familiar, our current relevant policies are laid out here, in the “license” and “patents” section:
http://llvm.org/docs/DeveloperPolicy.html#license

This is a long email, so I’ve broken it up into three sections:

  1. The problem
  2. Potential solutions
  3. Recommended path forward

The TL;DR version of this is that I think we should discuss relicensing all of LLVM under the Apache 2.0 license and add a runtime exception clause. See below for a lot more details.

The problem

We, as a community, have three major problems to solve in this space. The first two are active problems, the third is a lurking one:

  1. Some contributors are actively blocked from contributing code to LLVM.

These contributors have been holding back patches for quite some time that they’d like to upstream. The root issue is that the wording in the Patent section is fuzzy, novel to LLVM, and overly broad. Corporate contributors (in particular) often have patents on many different things, and while it is reasonable for them to grant access to patents related to LLVM, the wording in the Developer Policy can be interpreted to imply that unrelated parts of their IP could accidentally be granted to LLVM (through “scope creep”). I will not make any claims about whether these fears are founded or not, but I will observe that this is a serious and current problem for the community.

  1. We cannot easily move code from LLVM to compiler_rt (for example).

We currently have a schism between the compiler code (licensed under the UIUC license) and the runtime library code (licensed under the UIUC & MIT license). This schism exists because the UIUC license carries what is known as a “binary attribution clause”. This clause requires someone to acknowledge their use of LLVM if they link part of the compiler itself into their app (e.g. in a readme, or an “about” panel). This is reasonable for the compiler, but isn’t reasonable for runtime libraries - you shouldn’t have to acknowledge LLVM just because you happened to build your app with Clang!

Our previous approach to solving this problem was to dual license the runtime libraries under both the UIUC and MIT licenses, because the MIT license doesn’t carry a binary attribution clause. This solved the attribution problem but prevents us from moving code from LLVM to compiler_rt, because the contributor may not have agreed to the use of the code under the MIT license. This is an active problem for ASAN and other technologies, that (e.g.) might want to use MC disassembler functionality from a runtime library.

  1. The patent coverage provided by the Developer Policy may not provide the protection it intends.

The UIUC and MIT licenses have nothing to say about patents, and the wording in the patents section of the Developer Policy was written by a well-intentioned grad student, not by a lawyer. The wording is fuzzy, imprecise, and potentially incomplete and so it probably doesn’t provide the protection it intends. Fortunately, to my knowledge, this protection hasn’t been tested, but if it ever was, it would be very bad for the community overall. Lack of protection could also lead to a potential user deciding not to use LLVM, if they perceived such use to be too risky.

Potential solutions

The board decided to explore this area, and several of us have spent months looking into this space to see what we can do to improve the situation. This is a complicated topic that deals with legal issues and our primary goal is to unblock contributions from specific corporate contributors. Because of that, we’ve done some leg-work exploring options with a few of the largest contributors in this space. The board’s role here is to organize and facilitate a discussion that leads to progress, and I’d like to share what we’ve found so we (as a community) can discuss the options further.

We have four major options here, each of which implies that the license/patent section of the Developer Policy would be removed and rewritten. Once we agree on a course of action, we can discuss exact wording and roll-out strategies:

  1. We could introduce a novel legal solution.

We discussed a large number of different solutions that involve coming up with something new. For example, we could write an entirely new license, we could write an entirely new patent grant, we could take existing license or grant language and modify it, etc. This approach seems appealing in that we could get something specifically tailored to LLVM and its contributors.

Unfortunately, there are many problems with this. For example, using well known license and patent mechanics makes it much easier for new contributors to get permission to use and contribute to LLVM, because this often requires approval from a corporate legal team, and reviewing something novel takes a long time and a lot of energy. If we go with a well known / standard approach, everything goes simpler and faster.

Second, in legal circles, a novel solution (no matter how well intended) is almost always considered to be a bad thing, because it hasn’t been legally tested and hasn’t been scrutinized as much as existing ones. Third, designing such a thing is extremely complicated, because lawyers will all want to optimize the result for their specific organization’s interests, so coming to an agreement on such a new document will take a very long time and may be impossible to get actual agreement. We spent many of the months talking about possibilities in this space, so we’ve seen some of this in action with no closure in sight.

  1. We could require new contributors to sign the Apache CLA.

The Apache CLA (e.g. the Google CLA is one example of it) is a well known and highly respected way to define and scope guarantees when it comes to patent contribution and coverage. These are the specific forms I’m referring to: the first is for individuals and the second is for corporate contributors:

https://www.apache.org/licenses/icla.txt
http://www.apache.org/licenses/cla-corporate.txt

The upshot is that adding the Apache CLA would solve some real problems. It would unblock contribution by properly scoping patent contributions and it would ease fear of being sued (and though it wouldn’t help us with the binary attribution clause problem with the runtime libraries, we could add a runtime exception to solve that). Rolling this out would be very straight-forward: we could start requiring the CLA for new contributions, which provides guarantees going forward, and we could even try to get earlier contributors to provide retroactive coverage for previous contributions.

Unfortunately, adding the Apache CLA also has several disadvantages as well:

  • It adds new barriers for new contributors to LLVM. We don’t currently have a process where you need to sign (or click through) a form, and adding one is a barrier in certain situations (e.g. it requires individuals to disclose sensitive personal information like mailing addresses etc, and may require extra levels of legal approval in corporate situations).

  • It significantly increases the burden for the LLVM project, because we need to track a bunch of more data for each contributor (including their current employment affiliation) to verify whether we are allowed to accept a patch. Engineers move around between companies all of the time, and it is problematic (and invasive into personal privacy) for llvm.org to have to know about this.

  • The CLA requires corporations to keep an updated list of which employees are allowed to contribute from their company (see Schedule A of http://www.apache.org/licenses/cla-corporate.txt). This is a significant burden on llvm.org as well as on corporate contributors. The logical end result of this is that a company would designate a few people as contributors and funnel many other people’s work through them, which is not good for the existing successful engineering culture of the project.

  • The CLA has specific wording that lawyers at multiple prominent corporate contributors are reluctant to sign. The changes they request are small, but once we make changes to the CLA, it is now a novel document and we end up with all the problems of solution #1.

  • The CLA also provides power that I (personally) don’t think we “want" as a community. For example, it would allow the LLVM Foundation to arbitrarily relicense the project without approval from the copyright holders. While it may seem ironic based on what I’m suggesting below, I think that it is in the best interest of the project for any relicensing effort to be a painful and expensive process that requires contacting all contributors. Changing the license of the project is a big deal, and not something we should do frequently. Further, some individuals and corporations are wary of contributing to a project when their code can be taken and relicensed to something that they didn’t agree to, and we may lose them if we start requiring that.

  1. We could relicense all of LLVM under the Apache 2.0 license and add a runtime exception.

The Apache 2.0 license is a well known, widely used, and highly respected way to define and scope guarantees when it comes to patent contribution and coverage. This is the license in question:
http://www.apache.org/licenses/LICENSE-2.0

The runtime exception would be a short addendum. For example, it could be something like:
“As an exception, if you use this Software to compile your source code and portions of this Software are embedded into the binary product as a result, you may redistribute such product without providing attribution as would otherwise be required by Sections 4(a), 4(b) and 4(d) of the License.”

This approach solves all three of the problems above: it would unblock the contributors in question by providing a known scope of patent grant based on contribution and it provides patent protection to users of LLVM. Adding a runtime exception solves the runtime library issue, and is a standard way that compilers do this sort of thing (e.g. GCC does something similar to avoid compiled code having to be GPL compatible).

Switching LLVM to the Apache 2.0 license has some large advantages:

  • We believe that we can use the license as-is, which avoids having to design a novel solution. Many companies already contribute to projects that use the Apache 2.0 license. Some of the companies we have spoken to have responded with comments like “Apache 2? Of course that would be fine with us.”

  • The patent coverage in license is “self executing,” which means that coverage is associated with a contributor putting code into the repository. There are no additional CLAs to agree to, no book-keeping or process issues for llvm.org or corporate contributors, no personal information that needs to be distributed, etc. A company or individual merely needs to decide whether they want to contribute a patch or not (a decision they need to make in any case!) and everything else is automatic.

However, it also has one big disadvantage: the time and cost of doing it. Relicensing a large code base with many contributors is expensive, time consuming, and complicated. However, it has been done with other large projects before (e.g. Mozilla) and in our early analysis, we believe it can be done with LLVM too.

  1. We could switch to another well-known solution.

We discussed many different well known licenses, CLAs, and other approaches to solving these problems. None of the ones considered fared as well as the Apache CLA or the Apache 2.0 license, but of course something may have been missed.

Recommended path forward

NOTE: This is an RFC - this is a recommendation, not a dictate :slight_smile:

This probably isn’t a surprise at this point, but the LLVM Foundation board and I recommend that we relicense all of the LLVM Project code (including clang, lldb, …) currently under the UIUC and MIT licenses (i.e. not including the GPL bits like llvm-gcc and dragonegg) under the Apache 2 license. We feel that it is important to do something right for the long term, even if it is a complicated and slow process, than to do the wrong thing in the short term.

It will take quite some time to roll this out - potentially 18 months or more, but we believe it is possible and have some thoughts about how to do it. We have confirmed that choosing to go down this path would immediately unblock contributions from the affected companies, because we could start accepting new contributions under the terms of the UIUC/MIT/Apache licenses together and repeal the wording in the developer policy. If we get broad agreement that this is the right direction to go, we can lay out our early ideas for how to do it, and debate and fully bake that plan in public discussion.

With all this said, I’d love to hear what you all think. If you have a specific comment or discussion about one aspect of this, it might be best to start a new thread. I’d also appreciated it if you would mention whether you are a current active contributor to LLVM, whether you are using LLVM but blocked from contribution (and whether this approach would unblock that) or if you’re an interested community member that is following or using LLVM, but not currently (or on the edge of) contributing. I’d be particularly interested if you are blocked from contributing but this proposal doesn’t solve your concern.

If you plan to attend the developer meeting and would like to discuss it there, the LLVM Foundation BOF would be a great place to talk about this.

Thanks!

-Chris

Hi Chris,

IANAL etc, but I think this is a good move. Open Source license
proliferation is a serious problem and it does incur in very long
delays in contributions from some companies / groups.

I can't comment on the virtues of the Apache license (versus BSD,
LGPL, etc), and I personally don't have preferences as to which
license we move to, as long as it's well known, widely used and
provide the features we need. If the Apache 2.0 (plus the binary
clause, that GCC uses too) fits the bill, LGTM. :slight_smile:

The move may impose some extra work on the companies that already
contribute to LLVM but still don't have legal approval for Apache 2.0
licensed projects. But if the time frame is 18+ months, I think that's
not going to be that big of a deal.

cheers,
--renato

1) We could introduce a novel legal solution.

Please, no.

2) We could require new contributors to sign the Apache CLA.

To me, this is the most acceptable option of the listed terms.

3) We could relicense all of LLVM under the Apache 2.0 license and add a runtime exception.

This one I would consider a regression over the status quo. Your list is
missing "the license is significantly longer and harder to read".

Joerg

Why is this a consideration?

The apache license is incredibly well known, and easy to analyze.

1) We could introduce a novel legal solution.

Please, no.

2) We could require new contributors to sign the Apache CLA.

To me, this is the most acceptable option of the listed terms.

Please explain: why?

3) We could relicense all of LLVM under the Apache 2.0 license and add a runtime exception.

This one I would consider a regression over the status quo. Your list is
missing "the license is significantly longer and harder to read”.

To repeat Danny’s point, this doesn’t seem like a concern to me. Please explain your concern: does this affect users of llvm, contributors to llvm, or someone else? How?

-Chris

Hi Chris,

1) The problem
2) Potential solutions
3) Recommended path forward

The TL;DR version of this is that I think we should discuss relicensing all of LLVM under the Apache 2.0 license and add a runtime exception clause. See below for a lot more details.

I agree that this is a problem. In another community, we’ve deployed an Apache-style CLA. From a legal perspective, it’s definitely the best way forward, but it does add a significant barrier to entry (though not as big as copyright assignment, as for FSF projects). I have two concerns, one related to the Apache 2 license in general, the other related to switching license in general.

Because LLVM has not had a policy of including copyright holders in files (something else that we should change), it’s difficult to identify copyright holders. When we relicensed libcompiler_rt and libc++ under the MIT license, there were only a few contributors and it was easy to identify us all. Over LLVM, it’s not clear that the people who have committed code on behalf of others have been good at ensuring that it’s correctly attributed. I’d be interested to hear what the Foundation’s strategy for dealing with this is (and what will happen if a contributor of a significant amount of code does not permit their code to be relicensed if the overall consensus appears to be in favour of relicensing).

On the Apache 2 front specifically, we’ve been slowly reducing the amount of Apache 2 code in FreeBSD and would be quite unhappy to suddenly increase it. LLVM is one of the largest bits of contrib code in our base system and, for us, it would be a step in the wrong direction. One worry is that Apache 2 is incompatible with GPLv2 (is it incompatible with other licenses?), which limits the number of places where it can be used (though possibly not to a degree worth worrying about). A related concern is that I can read the UIUC and, as a non-lawyer, be pretty sure that I understand it. I can not make the same claim about Apache 2, in spite of having read it several times. It’s probably not a show-stopper for us, but it would probably reduce LLVM contributions from within the FreeBSD community, which is something that’s been slowly increasing recently.

With all this said, I’d love to hear what you all think. If you have a specific comment or discussion about one aspect of this, it might be best to start a new thread. I’d also appreciated it if you would mention whether you are a current active contributor to LLVM, whether you are using LLVM but blocked from contribution (and whether this approach would unblock that) or if you’re an interested community member that is following or using LLVM, but not currently (or on the edge of) contributing. I’d be particularly interested if you are blocked from contributing but this proposal doesn’t solve your concern.

I’m wearing a few LLVM-related hats (including a downstream consumer with my FreeBSD Core Team hat and a few research projects, and a contributor to various bits of LLVM, including clang, libc++, libcompiler_rt, optimisers, codegen and the MIPS back end). I’m not blocked from contributing by the current status quo (other than the fact that most of the changes in the LLVM trees that I’m responsible for at the moment are of no interest to people who don’t [yet?] have access to some currently quite rare hardware).

David

Unfortunately, adding the Apache CLA also has several disadvantages
as well:

- It adds new barriers for new contributors to LLVM. We don’t
currently have a process where you need to sign (or click through) a
form, and adding one is a barrier in certain situations (e.g. it
requires individuals to disclose sensitive personal information like
mailing addresses etc, and may require extra levels of legal approval
in corporate situations).

If you want to extend a patent license to any LLVM user, you need legal approval from the patent holder, and that inevitably means paperwork.

- The CLA also provides power that I (personally) don’t think we
“want" as a community. For example, it would allow the LLVM
Foundation to arbitrarily relicense the project without approval from
the copyright holders.

That's actually a necessity. Laws change, interpretations change, and it may be necessary to change the license to achieve the original goals.

If you want to make clear that it is not going to be "power to the LLVM", you can state what kinds of license change might be done in the future.

4) We could switch to another well-known solution.

One solution I haven't seen mentioned: Become one of the projects under the Apache Foundation umbrella.
Benefits are legal protection, established licensing terms, established organizational procedures, a pool of experience to tap for questions like Code of Conduct, legalese, or organisational details.
Downside is that you need to relicense everything for the Apache 2.0 license, but if you consider going there, it may be a good idea to go the full distance.

With all this said, I’d love to hear what you all think.

Speaking as a potential user of LLVM, I'd be happy with almost anything. Well, maybe not the GPL because that would restrict my own licensing options, but the GPL isn't on the table anyway, so I don't have any stakes in this.

Regards,
Jo

Speaking as an IP lawyer, No it does not require more than the CLA or
the license provide.
Period.
If you want details, i'm happy to chat, but suffice to say, either of
the CLA or relicensing option would provide the same patent
protection.

I really really do not like armchair lawyer discussions and this is
just flamebait if I've ever seen it...

It seems important to be clear in communications like this that the proposal is for LLVM to be relicensed under the “Apache 2.0 with extra LLVM runtime exception” license, NOT under the “Apache 2.0” license, with an extra LLVM runtime exception.

That seems important to be absolutely clear about, because at the very least presumably LLVM would not be allowed to integrate any code under the normal Apache 2.0 license, or else it’d lose the runtime exception, right?

>
>> 1) We could introduce a novel legal solution.
>
> Please, no.
>
>> 2) We could require new contributors to sign the Apache CLA.
>
> To me, this is the most acceptable option of the listed terms.

Please explain: why?

First part for me is that switching the code to a different license
doesn't address some of the legal concerns regarding "tainted" code.
A CLA can formalise this part and as long as the process is not too
obnoxious, it doesn't create a significant hurdle. While a click-through
CLA might not doable for individual contributors, a process like "sign
this, scan it, mail Chris a copy by email and post" sounds quite
acceptable as compromise. It means a new contributor can start
committing decently fast and if the snail mail copy doesn't arrive in a
decent time, it should still be possible to revert the contributions as
worst case.

Second part is the mentioned issue of patents. Let's say a
non practicing entity submits a uboot patch to LLVM and later starts to
sue LLVM users. I'm not clear on the APL2 by itself would help resolve
this case. I can understand the concerns of corporate contributors to
overly broad IP language in a CLA, but that's more a practical issue.
For me it seems to be more important to ensure that commits are clean
and place the burden of proof on the contributing entity.

Third part is the re-licensing question. This might be in some cases the
most troublesome part as you mentioned. Two considerations here. First
is that the CLA (not necessarily the Apache one) should provide
irrevocable license conditions for all contributors under the license at
the time contribution. The second question is whether making things less
restricted is considered a problem or not. Given the existing BSDish
nature, I'm almost inclined to say no, but that can be answered by the
CLA as well.

Fourth, the management overhead for corporate contributors. A given
contributor is responsible for either clearing with their employer that
they can contribute to Open Source projects and it is not considered to
be IP of the company. The patent question is the same as for any
non-employed contributor then. If it is considered part of work, someone
has to manage such a list? I think a combination of good faith and
providing appropiate tools is enough. But before making this a huge
problem, I'd defer this point to our Apple^WGoogle^Wcompany overlords.

>> 3) We could relicense all of LLVM under the Apache 2.0 license and add a runtime exception.
>
> This one I would consider a regression over the status quo. Your list is
> missing "the license is significantly longer and harder to read”.

To repeat Danny’s point, this doesn’t seem like a concern to me.
Please explain your concern: does this affect users of llvm,
contributors to llvm, or someone else? How?

Users, primarily. The more complicated it is to understand the license,
the more likely it is to distract folks. Not everyone has a resident
corporate lawyer to explain things and for a software license, quantity
is certainly not a good thing. I certainly agree that the APL2 is much
more readable than e.g. the GPL, but it is still significantly longer
and complicated than the license we currently have.

Joerg

This is interesting, I did not know that...

"Despite our best efforts, the FSF has never considered the Apache
License to be compatible with GPL version 2, citing the patent
termination and indemnification provisions as restrictions not present
in the older GPL license."

It seems to be compatible with GPLv3, though. But I think most
companies that use LLVM are stuck with GPLv2 due to the patents issue.

cheers,
--renato

Then how is a change in licensing needed at all?

The CLA = the Apache CLA option
The License = The Apache License option

Since neither of those options is currently used, ...

I really really do not like armchair lawyer discussions and this is
just flamebait if I've ever seen it...
---------------
#1 Is the submarine patent risk really that bad? (What's driving this)

It is a non-issue until it is a major issue. As I tried to explain in the writeup, at best, “submarine patents” are at best a tertiary issue. The other two issues are driving issues and actively causing a problem.

#2 Pragmatically have "you" even considered how to execute on this
relicense plan?

The answer is “yes”, but I’d prefer to keep the focus of this discussion on “what the right thing is” and now “how to roll it out”. It is clear that it will take a lot of time and be expensive, but if relicensing is the right thing to do, then lets do it.

-Chris

I really really do not like armchair lawyer discussions and this is
just flamebait if I've ever seen it...
---------------
#1 Is the submarine patent risk really that bad? (What's driving this)

It is a non-issue until it is a major issue. As I tried to explain in the writeup, at best, “submarine patents” are at best a tertiary issue. The other two issues are driving issues and actively causing a problem.

#2 Pragmatically have "you" even considered how to execute on this
relicense plan?

The answer is “yes”, but I’d prefer to keep the focus of this discussion on “what the right thing is” and now “how to roll it out”.

^ and NOT “how to roll it out”.

Yes, this is a known concern with the Apache 2 license, but I don’t know if there is an actual “known” answer to this question. DannyB or someone else can comment for sure, but my understanding is that the terms of the GPL2 prevent *any* license from including the sort of patent protection that we are looking for.

FWIW, this is what the FSF has to say about the topic:
http://www.gnu.org/licenses/license-list.en.html#apache2

Note that the FSF *recommends* the Apache 2 license for permissively licensed projects.

If you are still concerned about this issue, my question is simply: what specific GPL2 compiler (or other user) that might want to use LLVM would be affected?

-Chris

If you want to discuss Chris' plan, talk to Chris. I felt his posting had questions implied and tried to answer them from my perspective, best as I could.
If my posting was misinterpreted as mere flamebait, I guess my feedback isn't helpful or welcome, so I'll just shut up.

The point I was trying to make was that to accept patented code, the LLVM project would need a copyright and a patent license, and given published expert opinion (as far as I have seen it), this seems to be a lot easier for copyright than for patents.
E.g. a submitter could get away with claiming that the patent grant was unintentional, while for copyright that would be hard to believe if the act of submission is also the act of publication.

I wasn't 100% clear on that, as that was more a side issue rather than something I felt was very central.
Feel free to correct Chris if you think the current submission workflow is fully sufficient to handle patents.

Note: GPLV2 and GPLv3 are *also* incompatible with each other (see
Frequently Asked Questions about the GNU Licenses - GNU Project - Free Software Foundation), so you
have larger issues if you are mixing those ;-).

Besides the obvious ones, there are subtle ones:
For example, it's not okay to use an old GPLv2 version of gcc and a
new version of GPLv3 libgcc with the runtime exception, because old
compiler is not "GPL-compatible" as that license defines it.