C interface

Hi all,

I’m authoring a C interface to the LLVM IR type system. Since this is Really Quite Tedious, I would like to solicit opinions before I get too far down any paths that seem offensive. I’ve attached the header, where I’ve mapped a portion of Module and most of Type and its subclasses. This is working, and I’ve built ocaml bindings on top of it.[1] My intent is to extend this work (only) far enough to author a language front-end. The C bindings should help other languages which want to have self-hosting front-ends, and probably a C interface to the JIT would be well-received.

My naming conventions are similar to the Carbon interfaces in OS X. (Should I prefer a Unixy flavor instead?) Naming prefix is LLVM, which may be a bit long. (Would LL be better?) Pointers are opaque, obviously. I find myself copying enums, which is mildly scary.

I’m using C strings instead of const char*, size_t tuples. This avoids having to write things like “tmp”, strlen(“tmp”) in C, and is well-supported for language bindings. Nevertheless, most languages other than C have binary-safe string types, so I’m certainly willing to have my mind changed if we want to prefer correctness over inconvenience to the C programmer. (Providing overloads is silly, though.)

I’m putting the headers in include/llvm-c. I created a new library called Interop to house the C bindings—but it might make more sense to implement the C bindings in each library instead. They’re just glue which the linker will trivially DCE, so that approach may have merit.

— Gordon

[1]

$ cat emit_bc.ml
open Llvm

let emit_bc filename =
let m = create_module filename in

let big_fn_ty = make_pointer_type
(make_function_type (void_type ())
[| make_vector_type (float_type ()) 4;
make_pointer_type
(make_struct_type [| double_type ();
x86fp80_type ();
fp128_type ();
ppc_fp128_type () |] true);
make_pointer_type
(make_struct_type [| make_integer_type 1;
make_integer_type 3;
i8_type ();
i32_type () |] false);
make_pointer_type
(make_array_type (make_opaque_type ()) 4) |]
false) in

(* string_of_lltype is implemented in ocaml, so the info on stdout
shows that make_*_type isn’t a write-once/read-never interface. *)
print_endline ("big_fn_ty = " ^ (string_of_lltype big_fn_ty));

ignore(add_type_name m “big_fn_ty” big_fn_ty);

if not (write_bitcode_file m filename)
then print_endline ("write failed: " ^ filename);

dispose_module m

let _ =
if 2 = *Array.*length *Sys.*argv
then emit_bc *Sys.*argv.(1)
else print_endline “Usage: emit_bc FILE”

$ make emit_bc
ocamlc -cc g++ -I …/llvm/Release/lib/ocaml llvm_ml.cma -o emit_bc emit_bc.ml
$ ./emit_bc test.bc
big_fn_ty = void (< 4 x float >, { double, x86fp80, fp128, ppc_fp128 }, { i1, i3, i8, i32 }, [ 4 x opaque ])
$ llvm-dis -o - test.bc
; ModuleID = ‘test.bc’
%big_fn_ty = type void (<4 x float>, <{ double, x86_fp80, fp128, ppc_fp128 }>, { i1, i3, i8, i32 }, [4 x opaque])

VMCore.h (5.25 KB)

Hi all,

I'm authoring a C interface to the LLVM IR type system. Since this is Really Quite Tedious, I would like to solicit opinions before I get too far down any paths that seem offensive.

Sounds good.

I've attached the header, where I've mapped a portion of Module and most of Type and its subclasses. This is working, and I've built ocaml bindings on top of it.[1]

Oooh, look at the long doubles :wink:

My intent is to extend this work (only) far enough to author a language front-end. The C bindings should help other languages which want to have self-hosting front-ends, and probably a C interface to the JIT would be well-received.

Sounds good, it seems like anyone who wants more can extend it on demand :slight_smile:

My naming conventions are similar to the Carbon interfaces in OS X. (Should I prefer a Unixy flavor instead?) Naming prefix is LLVM, which may be a bit long. (Would LL be better?)

LLVM seems fine to me, and the naming convention seems ok (using lowercase + underscores makes the name longer). I do find things like this slightly strange:

/* Same as Module::addTypeName. */
int AddTypeNameToModule(LLVMModuleRef M, const char *Name, LLVMTypeRef Ty);

I'd expect it to be named something like "LLVMModuleAddTypeName" or something, using NamespaceClassMethod uniformly.

Pointers are opaque, obviously. I find myself copying enums, which is mildly scary.

Copying the enums does seems scary. Is there any way around this? Is LLVMTypeKind that useful?

I'm using C strings instead of const char*, size_t tuples. This avoids having to write things like "tmp", strlen("tmp") in C, and is well-supported for language bindings. Nevertheless, most languages other than C have binary-safe string types, so I'm certainly willing to have my mind changed if we want to prefer correctness over inconvenience to the C programmer. (Providing overloads is silly, though.)

I think this makes sense. In order to support arbitrary strings, you could have a:

void LLVMValueSetName(LLVMValueRef, const char *, unsigned len);

... function that works with arbitrary strings.

I'm putting the headers in include/llvm-c. I created a new library called Interop to house the C bindings—but it might make more sense to implement the C bindings in each library instead. They're just glue which the linker will trivially DCE, so that approach may have merit.

Nice! You'll make a lot of friends with this :), adding the bindings to the libraries in question make sense.

-Chris

I’ve attached the header, where I’ve mapped a portion of Module and most of Type and its subclasses. This is working, and I’ve built ocaml bindings on top of it.[1]

Oooh, look at the long doubles :wink:

Oh, that’s what this comment means:

ÊÊ ÊPackedStructTyID,///< 10: Packed Structure. This is for bitcode only

Maybe we can hide that better? ocaml mappings really (really really) like sequential enums with no dead values.

Oh well. I’ll remap the values.

My naming conventions are similar to the Carbon interfaces in OS X. (Should I prefer a Unixy flavor instead?) Naming prefix is LLVM, which may be a bit long. (Would LL be better?)

LLVM seems fine to me, and the naming convention seems ok (using lowercase + underscores makes the name longer).Ê I do find things like this slightly strange:

/* Same as Module::addTypeName. */
int AddTypeNameToModule(LLVMModuleRef M, const char *Name, Ê
LLVMTypeRef Ty);

I’d expect it to be named something like “LLVMModuleAddTypeName” or something, using NamespaceClassMethod uniformly.

I tried that at first; I do like to do Òmethod completion for CÓ in XCode by typing something likeÊLLVMModuleÊesc. Unfortunately, the names got bizarre and unreadable. I can go back to that, but it wasn’t Ôdoing itÕ for me.

Pointers are opaque, obviously. I find myself copying enums, which is mildly scary.

Copying the enums does seems scary.ÊIs LLVMTypeKind that useful?

Uhm. Just a little bit important? :slight_smile: I’ll need to do the same thing with instructions kinds, too.

Is there any way around this?

Well, we could move the enums into the C interfaces and include the C interfaces from the C++ code. That moves the values to the global namespace, though; neitherÊType::FooTyIDÊnorÊllvm::Type::FooTyIDÊwould be valid. The types themselves can be typedef’d back where they belong.

I’m using C strings instead of const char*, size_t tuples. This avoids having to write things like “tmp”, strlen(“tmp”) in C, and is well-supported for language bindings. Nevertheless, most languages other than C have binary-safe string types, so I’m certainly willing to have my mind changed if we want to prefer correctness over inconvenience to the C programmer. (Providing overloads is silly, though.)

I think this makes sense.Ê In order to support arbitrary strings, you could have a:

void LLVMValueSetName(LLVMValueRef, const char *, unsigned len);

… function that works with arbitrary strings.

That’s true for this case, but I’m not sure there’s always a backdoor like that available.ÊFor some things it doesn’t matter, of course; valid filenames can’t containÊ’\0’ on anything notable except Mac OS ­ X, for one.ÊI guess it’s a case-by-case decision. While I’m sure someone, somewhere, would appreciate consistency, that person is not me.

I’m putting the headers in include/llvm-c. I created a new library called Interop to house the C bindingsÑbut it might make more sense to implement the C bindings in each library instead. They’re just glue which the linker will trivially DCE, so that approach may have merit.

Nice!Ê You’ll make a lot of friends with this :slight_smile:

:slight_smile: Of course, you realize they won’t be happy until they don’t have to link using g++É

adding the bindings to the libraries in question make sense.

I’ll do that.

Ñ Gordon

This obviously should’ve had its prefix, at the very least. That was just an oversight.

— Gordon

Hello Gordon,

I'm part of the felix dev team, and I've been interested in making a backend for felix in llvm. It's very exciting to hear that you're making an ocaml interface to llvm. Do you have any of the libraries exposed to the public yet? Also, what license do you plan on using for the code? Felix is bsd, like llvm, so if there's any chance that you'll use a bsd-compatible license, we'd be very thankful.

-e

Gordon Henriksen wrote:

Hi Erick,

Gordon Henriksen wrote:

I'm authoring a C interface to the LLVM IR type system. I've attached the header, where I've mapped a portion of Module and most of Type and its subclasses. This is working, and I've built ocaml bindings on top of it.

I'm part of the felix dev team, and I've been interested in making a backend for felix in llvm. It's very exciting to hear that you're making an ocaml interface to llvm.

:slight_smile:

Do you have any of the libraries exposed to the public yet?

I've not published it since I haven't yet wrapped enough of the API to do anything useful. The snippet included every function I had mapped at the time I sent the message! Stay tuned.

Also, what license do you plan on using for the code? Felix is bsd, like llvm, so if there's any chance that you'll use a bsd-compatible license, we'd be very thankful.

My intent is to contribute this work to the LLVM project, so you won't have any licensing problems. In fact, I've simply integrated the ocaml bindings into LLVM's source tree so that they are built and installed if configure can find ocamlc.

— Gordon

Now with constants and globals variables. Functions and basic blocks next, then on to LLVMBuilder.

— Gordon

//===-- c-bindings.patch (+730) -------------------------------===//

include/llvm/CHelpers.h (+94)
include/llvm-c/BitWriter.h (+42)
include/llvm-c/Core.h (+221)
lib/Bitcode/Writer/BitWriter.cpp (+51)
lib/VMCore/Core.cpp (+322)

Tedious C bindings for libLLVMCore.a and libLLVMBitWriter.a!

  • The naming prefix is LLVM.
  • All types are represented using opaque references.
  • Functions are not named LLVM{Type}{Method}; the names became unreadable goop.
    Instead, they are named LLVM{ImperativeSentence}.
  • Where an attribute only appears once in the class hierarchy (e.g., linkage
    only applies to values; parameter types only apply to function types), the
    class is omitted from identifiers for brevity. Tastes like methods.
  • Strings are C strings or string/length tuples on a case-by-case basis.
  • APIs which give the caller ownership of an object are not mapped
    (removeFromParent, certain constructor overloads). This keeps
    keep memory management as simple as possible.

For each library with bindings:

llvm-c/.h - Declares the bindings.
lib//.cpp - Implements the bindings.

So just link with the library of your choice and use the C header
instead of the C++ one.

This patch is independent.

//===-- ocaml-make.patch (+380 -27) ---------------------------===//

configure (+111 -25)
Makefile.config.in (+2)
bindings/ocaml/Makefile.ocaml (+263)
Makefile (+2 -2)
autoconf/configure.ac (+2)

I add a generic ocaml Makefile which will be used by the ocaml
language bindings.

configure is schooled how to sniff ocamlc and ocamlopt.

This patch is independent.

//===-- ocaml-bindings.patch (+936) ---------------------------===//

bindings/ocaml/llvm
bindings/ocaml/llvm/llvm.ml (+226)
bindings/ocaml/llvm/llvm_ocaml.c (+394)
bindings/ocaml/llvm/llvm.mli (+168)
bindings/ocaml/llvm/Makefile (+24)
bindings/ocaml/bitwriter
bindings/ocaml/bitwriter/llvm_bitwriter.mli (+18)
bindings/ocaml/bitwriter/bitwriter_ocaml.c (+31)
bindings/ocaml/bitwriter/llvm_bitwriter.ml (+18)
bindings/ocaml/bitwriter/Makefile (+23)
bindings/ocaml/Makefile (+13)
bindings/README.txt (+3)
bindings/Makefile (+18)

Adds ocaml language bindings to LLVM. They are built automatically
if configure detects the ocamlc compiler and are installed to the
ocaml standard library.

This patch depends on c-bindings and ocaml-make.

c-bindings.patch (27.3 KB)

ocaml-bindings.patch (38.2 KB)

ocaml-make.patch (21.5 KB)

Hi Gordon,

> I'm authoring a C interface to the LLVM IR type system.

It's great to see a C interface being added. A minor niggle:

+typedef enum {
+ LLVMVoidTypeKind = 0, /* type with no size */
...
+typedef enum {
+ LLVMExternalLinkage = 0,/* Externally visible function */
...
+typedef enum {
+ LLVMDefaultVisibility = 0, /* The GV is visible */

It's defined by the language that the first enumerate's zero so the
initialisation is redundant. As someone reading the source, it would
make me halt and wonder why it's been done, "What am I missing?".
Similar to seeing a `static int foo = 0'.

Cheers,

Ralph.

Most likely that these code blocks were copied and pasted from the C++ source which have the same attribute. :slight_smile:

— Gordon