LLVM as a shared library

Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system
(3) Update LLVM debugging mechanisms for being part of a dynamic library
(4) Move overridden sys calls (like abort) into the tools, rather than the libraries
(5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

We will be sending more specific proposals and patches for each of the changes listed above starting this week. If you’re interested in these problems and their solutions, please speak up and help us develop a solution that will work for your needs and ours.

Thanks,
-Chris

Sounds reasonable.

(adding Juergen and Pete who will be working on this with me)

We haven’t fully fleshed out the exact implementation yet. Our target user for the initial work is WebKit. For WebKit we want to generate a shared library which only exports the C API. We were discussing doing this with an exports list for the linker, but visibility annotations is another option.

-Chris

This is exciting!

I would be happy to help.

Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system
(3) Update LLVM debugging mechanisms for being part of a dynamic library
(4) Move overridden sys calls (like abort) into the tools, rather than the libraries
(5) Update TableGen to support stripping unused content (i.e. Intrinsics for backends you’re not building)

Also:

(6) Determine if command line options are the best way of passing configuration settings into LLVM.

It’s an awkward abstraction when LLVM is embedded. I suspect (6) will be closely related to (1) since command line option parsing was the hardest impediment to getting rid of static initializers.

My understanding of the shared library proposal is that the library only exposes the C API since the C++ API is not intended to allow for binary compatibility. So, I think we need to either add the following as either an explicit goal of the shared library work, or as a closely related project:

(7) Make the C API truly great.

I think it’s harmful to LLVM in the long run if external embedders use the C++ API. I think that one way of ensuring that they don’t have an excuse to do it is to flesh out some things:

- Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.
- Increase C API coverage.
  - For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM. We don’t want that.
  - Add more support for reasoning about targets and triples. WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target. That’s weird.
  - Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
    - Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness. WebKit has its own DWARF parser for this, which shouldn’t be necessary.
    - WebKit is about to have its own copies of both a compactunwind and EH frame parser. The contributor who “wrote” the EH frame parser actually just took it from LLVM. The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.
- Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.

I think that a lot of time when C API discussions arise, lots of embedders give excuses for using the C++ API. WebKit used the C API for generating IR and even doing some IR manipulation, and for driving the MCJIT. It’s been a positive experience and we enjoy the binary compatibility that it gives us. I think it would be great to see if other embedders can do the same.

-Filip

One other thing that I'd like to see is a common framework for
defining, describing and inferring architectural support.

Tools use a lot of string parsing and, as Filip said, command line
options don't normally mean the exact same thing across tools.

So that the same arch/cpu/fpu/abi/target options will be parsed
identically across all tools (and external users) and mean exactly the
same thing to the back-end, when building a new sub-target, it should
only accept a TargetDescription object or whatever holds all the
options.

cheers,
--renato

Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

This sounds great.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system

One problem we have with the Mesa project is that the automake and CMake build
system produce different shared libraries. Automake builds libLLVM-major.minor.so
while CMake builds a different shared library for each component: e.g. libLLVMSupport.so

To cope with this, Mesa's build system has to try to guess which build system
was used in order to find the libraries.

Do you have plans to standardize the shared libraries produced by LLVM's build systems?

Even better, will improving cross compiling in the CMake build system make
it possible to completely drop automake?

-Tom

(7) Make the C API truly great.

I think it’s harmful to LLVM in the long run if external embedders use the C++ API. I think that one way of ensuring that they don’t have an excuse to do it is to flesh out some things:

- Add more tests of the C API to ensure that people don’t break it accidentally and to give more gravitas to the C API backwards compatibility claims.
- Increase C API coverage.
        - For example, WebKit currently sidesteps the C API to pass some commandline options to LLVM. We don’t want that.
        - Add more support for reasoning about targets and triples. WebKit still has to hardcode triples in some places even though it only ever does in-process JITing where host==target. That’s weird.
        - Expose debugging and runtime stuff and make sure that there’s a coherent integration story with the MCJIT C API.
                - Currently it’s difficult to round-trip debug info: creating it in C is awkward and parsing DWARF sections that MCJIT generates involves lots of weirdness. WebKit has its own DWARF parser for this, which shouldn’t be necessary.
                - WebKit is about to have its own copies of both a compactunwind and EH frame parser. The contributor who “wrote” the EH frame parser actually just took it from LLVM. The licenses are compatible, but nonetheless, copy-paste from LLVM into WebKit should be discouraged.
- Engage with non-WebKit embedders that currently use the C++ API to figure out what it would take to get them to switch to the C API.

I think that a lot of time when C API discussions arise, lots of embedders give excuses for using the C++ API. WebKit used the C API for generating IR and even doing some IR manipulation, and for driving the MCJIT. It’s been a positive experience and we enjoy the binary compatibility that it gives us. I think it would be great to see if other embedders can do the same.

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don't have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

-eric

Hello LLVM community,

Over the last few years the LLVM team here at Apple and development teams elsewhere have been busily working on finding new and interesting uses for LLVM. Some of these uses are traditional compilers, but a growing number of them aren’t. Some of LLVM’s new clients, like WebKit, are embedding LLVM into existing applications. These embedded uses of LLVM have their own unique challenges.

Over the next few months, a few of us at Apple are going to be working on tackling a few new problems that we would like solved in open source so other projects can benefit from them. Some of these efforts will be non-trivial, so we’d like to start a few discussions over the next few weeks.

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

This sounds great.

The list of the problems we’re currently planning to tackle is:

(1) Reduce or eliminate static initializers, global constructors, and global destructors
(2) Clean up cross compiling in the CMake build system

One problem we have with the Mesa project is that the automake and CMake build
system produce different shared libraries. Automake builds libLLVM-major.minor.so
while CMake builds a different shared library for each component: e.g. libLLVMSupport.so

To cope with this, Mesa's build system has to try to guess which build system
was used in order to find the libraries.

Do you have plans to standardize the shared libraries produced by LLVM's build systems?

We had not planned to standardize the build systems, but it is interesting to consider. IMHO, maintaining two build systems is a royal pain.

Even better, will improving cross compiling in the CMake build system make
it possible to completely drop automake?

I would really like to think so, but there will be quite a bit of work involved in getting everyone using the Automake build system to migrate off. One snag with getting off automake that I’m aware of is compiler-rt. I haven’t been able to make sense of the compiler-rt CMake configs well enough to come up with a good solution for cross-compiling.

-Chris

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don't have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

Or at least document what our backwards compatibility promises are and
how we transition away from old APIs.

Two examples where we do break C APIs:

* An hypothetical off by one source range bug in clang. It will break
a user of libclang that might have been compensating for the bug. In
cases like this we seem to just assume there is a low risk and just
fix the bug.

* Dropping features like the old JIT. It will break users of the C API
that depend on the old JIT. In cases like this we provide an upgrade
path (MCJIT) and a deprecation period.

Cheers,
Rafael

Just to give a bit of perspective from another external LLVM client:

GoLLVM [1], the LLVM bindings for Go used by the llgo compiler [2], mostly
uses the C bindings, but it does need to resort to the C++ API (with its
own set of C bindings) for a few things:

Exporting bitcode to memory buffer:
https://github.com/go-llvm/llvm/blob/master/bitwriter.cpp

Use of attribute masks above 1 << 31:
https://github.com/go-llvm/llvm/blob/master/core.cpp

Debug info generation:
https://github.com/go-llvm/llvm/blob/master/dibuilder.cpp

Loading plugins and setting flags:
https://github.com/go-llvm/llvm/blob/master/support.cpp

Adding instrumentation passes:
https://github.com/go-llvm/llvm/blob/master/transforms_instrumentation.cpp

I think most of this could be upstreamable in some shape or form, but I heard
from debug info experts a few months ago that the IR format was unstable,
so the solution we went with was to wrap the C++ API so that we would be
notified (by the compiler) when the format changes, rather than creating
debug info directly and having it potentially silently discarded. I'm not
sure if the debug info situation has changed since then.

The plugin/flags stuff is valuable to external projects for exactly the same
reasons that Clang supports plugins and LLVM flags. I don't see any reason
to make the specific flag semantics stable, and we can document this as such.

Thanks,

Hi Chris,

Our primary goals are to (1) make it easier to embed LLVM into external projects as a shared library, and (2) generally improve the performance of LLVM as a shared library.

That sounds great. Just as a note, using LLVM libraries for external
projects using CMake was recently improved [1] (mostly by Brad King).
I've never built LLVM as a single shared library (I'm not aware of
there being a CMake option to do so) using CMake but it would be great
if doing this didn't create any problems for users of this interface.

If you'd like me to take a look at anything related to this, please let me know.

(2) Clean up cross compiling in the CMake build system

Brad King might be interested in this so it might be a good idea to CC
him in any patches related to this.

[1] Building LLVM with CMake — LLVM 18.0.0git documentation

Thanks,
Dan.

Can you come up with specific reasons why building a new API would be better for the community than maintaining the one we’ve got?

-Filip

Rafael came up with a few in his, but also having an API that lightly
wraps the C++ api is hard if we want to change a major C++ interface
or completely remove a class, etc. There's no existing way in the API
to either version or remove an interface given our current promises.

-eric

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don't have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

Or at least document what our backwards compatibility promises are and
how we transition away from old APIs.

Right. I believe this is the right approach.

Two examples where we do break C APIs:

* An hypothetical off by one source range bug in clang. It will break
a user of libclang that might have been compensating for the bug. In
cases like this we seem to just assume there is a low risk and just
fix the bug.

Yup.

Speaking for WebKit: we would be happy to get rid of workarounds even if it meant a brief period of breakage. We would handle that breakage on our end by ensuring that we don’t build the new (sans workaround) version of WebKit against the old (pre bugfix) version of LLVM or vice-versa. We’re OK with short-term pain for long-term gain.

* Dropping features like the old JIT. It will break users of the C API
that depend on the old JIT. In cases like this we provide an upgrade
path (MCJIT) and a deprecation period.

Yup. I wonder how many people still use the old JIT via the C API. I know of old JIT users but I thought that many (most? all?) were using the C++ API.

-Filip

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don't have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

Or at least document what our backwards compatibility promises are and
how we transition away from old APIs.

Right. I believe this is the right approach.

Two examples where we do break C APIs:

* An hypothetical off by one source range bug in clang. It will break
a user of libclang that might have been compensating for the bug. In
cases like this we seem to just assume there is a low risk and just
fix the bug.

Yup.

Speaking for WebKit: we would be happy to get rid of workarounds even if it meant a brief period of breakage. We would handle that breakage on our end by ensuring that we don’t build the new (sans workaround) version of WebKit against the old (pre bugfix) version of LLVM or vice-versa. We’re OK with short-term pain for long-term gain.

This is a really good point Filip. I'm totally down with being able to
do the occasional API migration etc. Also since we've got a release
branch now I wonder what the odds are of "we won't break it between
releases" would get? At any rate, being able to migrate API versions
in some clean way would be nice.

I'm almost tempted to want a SWIG wrapped API but I'm pretty sure
that's actually worse than what we've got and still wouldn't solve the
migration issues. :wink:

* Dropping features like the old JIT. It will break users of the C API
that depend on the old JIT. In cases like this we provide an upgrade
path (MCJIT) and a deprecation period.

Yup. I wonder how many people still use the old JIT via the C API. I know of old JIT users but I thought that many (most? all?) were using the C++ API.

Not sure. I seem to recall a few, but they may have moved off when I
wasn't looking.

-eric

Can you give a specific example of an intended C++ API change that wasn’t possible because of a C API?

Just because you have an API doesn’t mean that things can’t be deprecated, or that the API layer can’t be hacked to give the illusion of old behavior. Can you give an example of a C API deprecation proposal that was intended to make some C++ change possible, that was rejected on the grounds that it would break the C API?

I can only recall cases where the C API was broken by accident because of lack of testing, and in all of those cases, the issue was either resolved, or there is a plan to resolve it and a workaround was made available.

-Filip

Sure, these are going to be a bit vague because I'm a bit busy at the
moment, but I recall a couple of times during the year that we've had
API up for review (or even committed temporarily) that exposed
internal constants via enums, and I think Rafael had some issues with
visibility changes for the same reasons.

In a more recent case here's a thread:

[LLVMdev] Inconsistent third field in global_ctors (was Re: [llvm]
r214321 - UseListOrder: Visit global values)

and

[PATCH] Add return value attribute to C interface

also I think the conversation we were having in here:

[PATCH] Expose MCInst in C Disassembler API

is somewhat relevant :slight_smile:

Just a couple of quick things I could find with a search. I could
probably dig up more given some more time.

-eric

Filip,

As a non-WebKit embedder currently using the C++ API (www.liblikely.org), here are my thoughts:

  • Perhaps the only reason I’m using the C++ API instead of the C API is that the Kaleidoscope tutorial is written against the C++ API. That’s where I started, and momentum has prevented me from switching. It may make sense to re-write this tutorial using the C API if we want to encourage new developers to default to this interface.
  • Based on the (excellent) documentation online and a close following of this mailing list, I haven’t been convinced that the C API is really given first class support in LLVM. Perhaps this is just an issue of advertising better, but it makes me hesitant to change.
  • I run into enough minor bugs that my project ends up tracking the master branch pretty closely. As such, I don’t think I’ll get away from static builds in the near future. Being unable to switch to shared-library-tagged-releases disincentives the API switch.
  • A barebones transition guide hosted with the rest of the LLVM docs could lower the activation energy needed to switch.

Hope that helps!

v/r,
Josh

(7) Make the C API truly great.

I think it’s harmful to LLVM in the long run if external embedders use the
C++ API. I think that one way of ensuring that they don’t have an excuse to
do it is to flesh out some things:

  • Add more tests of the C API to ensure that people don’t break it
    accidentally and to give more gravitas to the C API backwards compatibility
    claims.
  • Increase C API coverage.
  • For example, WebKit currently sidesteps the C API to pass some
    commandline options to LLVM. We don’t want that.
  • Add more support for reasoning about targets and triples. WebKit
    still has to hardcode triples in some places even though it only ever does
    in-process JITing where host==target. That’s weird.
  • Expose debugging and runtime stuff and make sure that there’s a
    coherent integration story with the MCJIT C API.
  • Currently it’s difficult to round-trip debug info: creating
    it in C is awkward and parsing DWARF sections that MCJIT generates involves
    lots of weirdness. WebKit has its own DWARF parser for this, which
    shouldn’t be necessary.
  • WebKit is about to have its own copies of both a
    compactunwind and EH frame parser. The contributor who “wrote” the EH frame
    parser actually just took it from LLVM. The licenses are compatible, but
    nonetheless, copy-paste from LLVM into WebKit should be discouraged.
  • Engage with non-WebKit embedders that currently use the C++ API to figure
    out what it would take to get them to switch to the C API.

I think that a lot of time when C API discussions arise, lots of embedders
give excuses for using the C++ API. WebKit used the C API for generating IR
and even doing some IR manipulation, and for driving the MCJIT. It’s been a
positive experience and we enjoy the binary compatibility that it gives us.
I think it would be great to see if other embedders can do the same.

Honestly I think if you want to make the C API great we should burn it
to the ground and come up with another one - and one that can be
versioned as well so we don’t have the problems of being limited in
what we can do to llvm by needing compatibility with the C API.

Can you come up with specific reasons why building a new API would be better
for the community than maintaining the one we’ve got?

Rafael came up with a few in his, but also having an API that lightly
wraps the C++ api is hard if we want to change a major C++ interface
or completely remove a class, etc. There’s no existing way in the API
to either version or remove an interface given our current promises.

Can you give a specific example of an intended C++ API change that wasn’t
possible because of a C API?

Just because you have an API doesn’t mean that things can’t be deprecated,
or that the API layer can’t be hacked to give the illusion of old behavior.
Can you give an example of a C API deprecation proposal that was intended to
make some C++ change possible, that was rejected on the grounds that it
would break the C API?

I can only recall cases where the C API was broken by accident because of
lack of testing, and in all of those cases, the issue was either resolved,
or there is a plan to resolve it and a workaround was made available.

Sure, these are going to be a bit vague because I’m a bit busy at the
moment, but I recall a couple of times during the year that we’ve had

API up for review (or even committed temporarily) that exposed
internal constants via enums, and I think Rafael had some issues with
visibility changes for the same reasons.

In a more recent case here’s a thread:

[LLVMdev] Inconsistent third field in global_ctors (was Re: [llvm]
r214321 - UseListOrder: Visit global values)

There appears to be a patch up for review that takes care of this with a slightly careful dance and there is a PR tracking deprecating the bad construct (two-field version) eventually. So, I don’t think this qualifies as a change that was made impossible by the C API, since the patch demonstrates that it is possible.

and

[PATCH] Add return value attribute to C interface

This appears to be an observation that we should extend the API to better handle attributes. I agree with that observation and with the general sentiment that strings are better than bits. This isn’t a reason to burn the API to the ground. Someone should just make the change, which would probably involve allowing C API clients to use either bits or strings for the time being.

also I think the conversation we were having in here:

[PATCH] Expose MCInst in C Disassembler API

The arguments against exposing MCInst were pretty vague. I never agreed with any of them. It was an obviously useful addition and someone should still do it. That being said, this thread just covered adding more stuff to the C API; it’s not an example of a C++ change that couldn’t be made because of the C API.

is somewhat relevant :slight_smile:

Just a couple of quick things I could find with a search. I could
probably dig up more given some more time.

Your current examples are just small bugs that can be fixed - and in two out of the three examples, there are sensible patches up for review already.

-Filip

Filip,

As a non-WebKit embedder currently using the C++ API (www.liblikely.org), here are my thoughts:

  • Perhaps the only reason I’m using the C++ API instead of the C API is that the Kaleidoscope tutorial is written against the C++ API. That’s where I started, and momentum has prevented me from switching. It may make sense to re-write this tutorial using the C API if we want to encourage new developers to default to this interface.

That’s a great point! It would be awesome to have this. I remember that when I started working on WebKit’s FTL JIT I had to sort of mentally translate a lot of examples written against the C++ API. It wasn’t easy at first.

  • Based on the (excellent) documentation online and a close following of this mailing list, I haven’t been convinced that the C API is really given first class support in LLVM. Perhaps this is just an issue of advertising better, but it makes me hesitant to change.

It’s first-class enough that WebKit has been using it for months now. WebKit is a fairly big project and there is overlap between LLVM and WebKit contributors. In practice this means that the C API doesn’t break often and when it does, it gets fixed.

  • I run into enough minor bugs that my project ends up tracking the master branch pretty closely. As such, I don’t think I’ll get away from static builds in the near future. Being unable to switch to shared-library-tagged-releases disincentives the API switch.

We track master as well, because we also often find bugs. Using the C API means makes this easier. With the C++ API you risk having to change stuff on your end because something in the C++ API changed to support some LLVM refactoring. With the C API we almost never have this problem.

  • A barebones transition guide hosted with the rest of the LLVM docs could lower the activation energy needed to switch.

Good point!

Hope that helps!

It does, thanks!