The future of LLVM's C APIs: Notes and BoF.

Some users of llvm-c want stable API interfaces into various parts of
the LLVM infrasture, others want further ABI guarantees about this
usage, and still others simply want a way to bind to LLVM through their
language frontend’s existing FFI support for C.

If we want to improve the situation for any of these users, we need to
properly understand how these APIs are being used (or abused)
today. Juergen and I will be hosting a BoF at the dev meeting where we
can discuss what the requirements of a sustainable C API are, and how we
can organize things in LLVM to support this.

There's been a fair amount of discussion about this lately, and it's
pretty clear that the "hopefully stable bindings to whatever APIs
somebody needed" approach isn't really good enough for anybody. There
are a couple of points that I think are fairly non-controversial, but
will more or less drive the discussion at the BoF:

1. It isn't practical to keep a bindings API stable, unless the
   underlying API is also stable.

2. Handrolling bindings as they're needed tends to leave conspicuous
   gaps where some API is inaccessible for no good reason.

So based on (1), we'll really want to create some purpose built APIs
that we can keep stable for various tasks. What's needed here? People
want to do things like building a pass manager, setting up a canned JIT
config, and to some degree even emit IR. We'll discuss what's practical
and what people want, and hopefully strike a good balance.

Similarly, (2) implies that if we really need a *full* bindings API
we'll want to automate it. But what is a full bindings API? Who uses it,
and what do they want from it? If it's automated, should installing LLVM
install this API, or should we simply provide an easy way to generate
it?

In the end, I hope to have a good idea of what people are actually using
these APIs for, and both to support stable API users less haphazardly
and to make unstable API more thorough and/or easier to create.

  1. It isn’t practical to keep a bindings API stable, unless the
    underlying API is also stable.

  2. Handrolling bindings as they’re needed tends to leave conspicuous
    gaps where some API is inaccessible for no good reason.

So based on (1), we’ll really want to create some purpose built APIs
that we can keep stable for various tasks. What’s needed here? People
want to do things like building a pass manager, setting up a canned JIT
config, and to some degree even emit IR. We’ll discuss what’s practical
and what people want, and hopefully strike a good balance.

I’m beginning to think that the need here isn’t as obvious as it has been portrayed to be. So if you want a C API for anything stable please come prepared to justify its existence at all :slight_smile:

Similarly, (2) implies that if we really need a full bindings API
we’ll want to automate it. But what is a full bindings API? Who uses it,
and what do they want from it? If it’s automated, should installing LLVM
install this API, or should we simply provide an easy way to generate
it?

Want and can are two different things here. It would be nice to automate it, but not required I wouldn’t think.

Thanks.

-eric

(Moving this to llvm-dev)

(Moving this to llvm-dev)

Some users of llvm-c want stable API interfaces into various parts of
the LLVM infrasture, others want further ABI guarantees about this
usage, and still others simply want a way to bind to LLVM through their
language frontend’s existing FFI support for C.

If we want to improve the situation for any of these users, we need to
properly understand how these APIs are being used (or abused)
today. Juergen and I will be hosting a BoF at the dev meeting where we
can discuss what the requirements of a sustainable C API are, and how we
can organize things in LLVM to support this.

There's been a fair amount of discussion about this lately, and it's
pretty clear that the "hopefully stable bindings to whatever APIs
somebody needed" approach isn't really good enough for anybody. There
are a couple of points that I think are fairly non-controversial, but
will more or less drive the discussion at the BoF:

1. It isn't practical to keep a bindings API stable, unless the
   underlying API is also stable.

2. Handrolling bindings as they're needed tends to leave conspicuous
   gaps where some API is inaccessible for no good reason.

So based on (1), we'll really want to create some purpose built APIs
that we can keep stable for various tasks. What's needed here? People
want to do things like building a pass manager, setting up a canned JIT
config, and to some degree even emit IR. We'll discuss what's practical
and what people want, and hopefully strike a good balance.

Do people actually care about it being an API for "LLVM"? Would they be
happy with a library that is merely "LLVM-powered"? It seems like an
"LLVM-powered" library would have a similar maintenance burden as a
(non-clang) language frontend; the downside being that it would not allow
blindly copying what is on the C++ side and hence and would require a
significant up-front API design effort.

(not necessarily wanting to discuss that here in this thread, but just
throwing a wild stab into the pot for discussion)

-- Sean Silva

(Moving this to llvm-dev)

    Some users of llvm-c want stable API interfaces into various parts of
    the LLVM infrasture, others want further ABI guarantees about this
    usage, and still others simply want a way to bind to LLVM through their
    language frontend’s existing FFI support for C.

    If we want to improve the situation for any of these users, we need to
    properly understand how these APIs are being used (or abused)
    today. Juergen and I will be hosting a BoF at the dev meeting where we
    can discuss what the requirements of a sustainable C API are, and how we
    can organize things in LLVM to support this.

    There's been a fair amount of discussion about this lately, and it's
    pretty clear that the "hopefully stable bindings to whatever APIs
    somebody needed" approach isn't really good enough for anybody. There
    are a couple of points that I think are fairly non-controversial, but
    will more or less drive the discussion at the BoF:

    1. It isn't practical to keep a bindings API stable, unless the
        underlying API is also stable.

    2. Handrolling bindings as they're needed tends to leave conspicuous
        gaps where some API is inaccessible for no good reason.

    So based on (1), we'll really want to create some purpose built APIs
    that we can keep stable for various tasks. What's needed here? People
    want to do things like building a pass manager, setting up a canned JIT
    config, and to some degree even emit IR. We'll discuss what's practical
    and what people want, and hopefully strike a good balance.

    Similarly, (2) implies that if we really need a *full* bindings API
    we'll want to automate it. But what is a full bindings API? Who uses it,
    and what do they want from it? If it's automated, should installing LLVM
    install this API, or should we simply provide an easy way to generate
    it?

    In the end, I hope to have a good idea of what people are actually using
    these APIs for, and both to support stable API users less haphazardly
    and to make unstable API more thorough and/or easier to create.

One actual use:

A couple of us in the Modula3 support community are working on splicing a
llvm back end onto our Modula3 compiler front end. We have zero-thickness
bindings written in Modula3 which match C bindings like core.h. In our front
end executable, we only build llvm IR in memory, then use LLVMPrintModuleToFile
and/or LLVMWriteBitcodeToFile. We only link in the needed parts of the llvm
infrastructure to build and write the llvm IR. Then we run stock llc on the
IR code in the emitted file.

We have had to write an additional binding to DIBuilder for this purpose,
as well as to a few odd other C++ functions here and there. The latest
version of this is for llvm 3.6.1, after the separation of metadata from
values. It has had minimal testing, but is fairly complete. Older bindings
to an older DIBuilder exist, and are at least partly working, but only
contain specifically needed functions.

This scheme is passing a majority of preexisting compiler tests of code
function, but there is not much testing of emitted debug info. Speaking
for myself, better debugger support is one of the primary motivations for
using llvm. Our older back ends use a really cobbled-up extension of
stabs+. It's a mess, and lots of debugger function is difficult to
provide.

The DIBuilder binding includes .h and .cpp files similar to core, that provide
a C binding, with lots of [un]wrapping, etc. Like core, it loses the type
hierarchy. Modula3 has a type hierarchy, but machine-level representation
is undoubtedly not ABI compatible with any C++ compiler. The current handling
of this is not thoroughly thought out, just using lots of unsafe casting of
pointers. I would like to think of something better, but haven't had much time
for that, so far.

I brought up some similar things to this in the very very long email discussions before.

-eric

Yup that is quite similar to my use case, but in addition I use LLVM to JIT as well from C using MCJIT.

I have a patch for a DIBuilder binding, but it does change the API a bit, so depending on how this discussion goes, it can or cannot be added. It was necessary as the current binding wrap all metadata in values, which is very wasteful when dealing with a large amount of metadata. The breakage is small (all my existing code still compile and work even using the old binding API) but very real. I obviously think the change is good or I wouldn’t have made it, but that may be too much for others.

Back on point, the importants things for me to do through the C API are :

  • Write IR, including debug infos and other metadata (it is mostly good, some IR specificities are not covered, like ordered loads/stores, but these are edges cases that ca be added. I can prepare a patch if people want these).

  • Read IR, so I can do transformations.

  • Codegen to object files

  • JIT (using MCJIT is fine, Orca would be better but making a good C api for it may prove challenging).

I preferably would like that the ABI is not broken, as typecheking won’t work cross language boundary. That being said, I’m ok with this being a “best practice” or whatever people want to call it rather than a strong, enforced, requirement. However, if it is not a strong, enforced requirement, we need a way to test what versions of LLVM is currently in use as to choose the right version of the function to call. As long as ABI breakage are not too frequent, this is workable as far as I’m concerned, and put less constraint on the evolution of the whole thing.

Last but not least, I started working on a test that would read IR and regenerate it. This would enforce that IR is readable and writable from the C API, and that breakage are not done unknowingly like it happened in the past several times. I really would like this to move forward whatever the policy is chosen. Random breakage + detective work to figure out what happened are the worst when we could know things in an automated manner.

As a final note, I’d like for us to focus on resolving this soon enough so that some work can be done related to the C API before 3.8 is out. Nothing is on fire yet, but let’s not let this slip too much.

Just to clarify, are your requirements for:

- A C API?
- A stable C API?
- An ABI-stable C API?
- An API?
- A stable API?
- An ABI-stable API?

For most of these things, I’d be happy with a stable C++ API (ABI-stable within major releases, recompile - but nothing else - needed between), which could potentially be wrapped in a machine-generated C API for bindings in other languages.

David

(Moving this to llvm-dev)

Some users of llvm-c want stable API interfaces into various parts of
the LLVM infrasture, others want further ABI guarantees about this
usage, and still others simply want a way to bind to LLVM through their
language frontend’s existing FFI support for C.

If we want to improve the situation for any of these users, we need to
properly understand how these APIs are being used (or abused)
today. Juergen and I will be hosting a BoF at the dev meeting where we
can discuss what the requirements of a sustainable C API are, and how we
can organize things in LLVM to support this.

There's been a fair amount of discussion about this lately, and it's
pretty clear that the "hopefully stable bindings to whatever APIs
somebody needed" approach isn't really good enough for anybody. There
are a couple of points that I think are fairly non-controversial, but
will more or less drive the discussion at the BoF:

1. It isn't practical to keep a bindings API stable, unless the
   underlying API is also stable.

2. Handrolling bindings as they're needed tends to leave conspicuous
   gaps where some API is inaccessible for no good reason.

So based on (1), we'll really want to create some purpose built APIs
that we can keep stable for various tasks. What's needed here? People
want to do things like building a pass manager, setting up a canned JIT
config, and to some degree even emit IR. We'll discuss what's practical
and what people want, and hopefully strike a good balance.

Similarly, (2) implies that if we really need a *full* bindings API
we'll want to automate it. But what is a full bindings API? Who uses it,
and what do they want from it? If it's automated, should installing LLVM
install this API, or should we simply provide an easy way to generate
it?

In the end, I hope to have a good idea of what people are actually using
these APIs for, and both to support stable API users less haphazardly
and to make unstable API more thorough and/or easier to create.

I'm using LLVM in my own DSL implemented in pure C, in-memory
compilation/execution of DSL within a long-running application. I use
LLVM C API heavily to build LLVM modules, generate machine code out of
them and execute it. Since I link LLVM components into my library
statically, I'm OK to have unstable yet rich C API (possibly
automatically generated, yet clearly documented in header files), and
adjust/re-compile my own things with the new LLVM release. I realize
that I would not necessarily need the absolutely full set of LLVM API,
so I would like having some options in C API automatic generation. My
use cases:

    - Using the full IR building capabilities;

    - running passes over modules and individual functions via pass
manager, configuring pass manager via pass manager builder (possibly
more advanced pass manager and/or pm builder uses);

    - I'm using LLVMCreateJITCompilerForModule() in 3.4, 3.5. (Tried
MCJIT in 3.6 but could not reach an equivalent of the former.);

    - getting pointers to compiled objects (LLVMGetPointerToGlobal)
and accessing/executing them at run time;

    - generating textual IR/assembly for debugging purposes;

    - during my DSL recompile, it's important to cleanly dispose the
LLVM context and other LLVM resources using C API in a long running
application (months);

    - something off this topic: nice to have an option to not
link/depend on libraries related to interactive console and/or
graphical features.

There’s been a fair amount of discussion about this lately, and it’s
pretty clear that the “hopefully stable bindings to whatever APIs
somebody needed” approach isn’t really good enough for anybody.

I actually almost entirely disagree with the above. :slight_smile:

I believe that “somewhat stable bindings to whatever APIs people needed” IS good enough for users of these bindings, and I still would like to see that as the official policy (as per the code review I sent out before, proposing that, http://reviews.llvm.org/D12685).

Yes, the current situation sucks, but as I see it, the major problem is that additional bindings that people want, and attempt to add, get VETOED, because of the fear of imposing an impossible future compatibility guarantee.

The solution is to stop imposing an impossible compatibility guarantee. I strongly believe that’s literally the ONLY change that needs to be made to fix the current situation.

Again, while we’re causing this self-inflicted wound: what do users actually gain from it? Basically nothing. They still need to use the C++ API anyways, so the stablity provided is illusory. It’s essentially impossible to depend on the theoretical 100% ABI/API guarantee today.

There
are a couple of points that I think are fairly non-controversial, but
will more or less drive the discussion at the BoF:

  1. It isn’t practical to keep a bindings API stable, unless the
    underlying API is also stable.

Yes, I agree with this.

  1. Handrolling bindings as they’re needed tends to leave conspicuous
    gaps where some API is inaccessible for no good reason.

I disagree on the main cause of the gaps. Once we fix the compatibility policy, APIs can start being added again by interested parties to keep pace with additions to the LLVM C++ API (as happened in the earlier days of the project, before we hit this logjam!).

So based on (1), we’ll really want to create some purpose built APIs
that we can keep stable for various tasks. What’s needed here? People
want to do things like building a pass manager, setting up a canned JIT
config, and to some degree even emit IR. We’ll discuss what’s practical
and what people want, and hopefully strike a good balance.

I think this is basically a diversion from the issue at hand.

Firstly, I don’t believe there is a real need for inventing a brand new, super-restricted (necessarily! because if it was fully general it couldn’t be stable!) set of stable APIs. Even if there is a real need – which has NOT been demonstrated – someone will have to do that work. I’ve seen nobody volunteer to do that. I find this discussion of a yet-to-be-invented-actually-supportably-stable API a diversion from what to actually do, RIGHT NOW, to unblock proper maintenance for our EXISTING API.

Maybe someone will want to come up with a new set of APIs and a way to build a llvm shared library with a stable so-version, someday. Maybe they won’t. If they do in the future – that’s fine. Let’s have the discussion then. But in the meantime, I’d really like the existing APIs to start having a sane maintenance policy.

Similarly, (2) implies that if we really need a full bindings API
we’ll want to automate it. But what is a full bindings API? Who uses it,
and what do they want from it? If it’s automated, should installing LLVM
install this API, or should we simply provide an easy way to generate
it?

I disagree with this as well. I think the quality of the APIs that come out of auto-generated bindings are typically below par, and that the LLVM-C APIs are actually overall well designed, and, before a few years ago, covered much of what people wanted to do. I do not think they should be discarded or deprecated. Furthermore, these auto-generated bindings are, again, non-existent – nobody has yet actually proposed what these would look like or worked on creating such a thing.

Again, the problem I see is that people are being effectively blocked from adding APIs to the LLVM-C bindings right now – people ARE volunteering to do that work but can not!

A C API (the code is not written in C or C++, and C is kind of the Esperanto of programming). Preferably a ABI-stable C API, as no typechecking is going to be done at language boundary. That being said, I would put the ABI condition as a should rather than as a must.