Some additions to the C bindings

My front-end is sync'd with the trunk now, and working well, but it
required some additional functions exposed in the C bindings. I
hereby submit them for review and approval for inclusion in the trunk.

cbindings.patch (7.1 KB)

LLVMGetAttribute had a bug in it. Here's the revised version of the patch

cbindings.patch (7.1 KB)

Hi Kenneth!

I wouldn't say that I'm the best reviewer, but I've been doing some
work with the c bindings recently so hopefully I have some idea of
what I'm talking about :slight_smile: Comments are inlined:

+/** See the llvm::Use class. */
+typedef struct LLVMOpaqueUse *LLVMUseRef;

LLVMGetAttribute had a bug in it. Here's the revised version of the patch

Hi Kenneth!

I wouldn't say that I'm the best reviewer, but I've been doing some
work with the c bindings recently so hopefully I have some idea of
what I'm talking about :slight_smile: Comments are inlined:

Thanks. Let me start by talking a bit about my project.

I'm working on a compiler/language that supports run-time code
generation and compile-time code execution. Besides the obvious
benefits of easier JITting, I also get the benefits of C++ templates
and metaprogramming without all of the headaches.

To make this work, the compiler actually compiles functions down into
function generators, outputting calls to the LLVM C-bindings that
generate a "regular" function. The programmer can then either leave
them in that form for run-time JITting, or have the compiler JIT and
execute those function generators in order to get "regular" functions.
Either or both can be exposed as public functions and left in place
by the optimizer. The function generator gets its own set of
parameters, and multiple functions with variations can be generated at
compile time or runtime.

He can also put compile-time expressions inside the body of functions,
so that when the function generator runs, the compile-time expressions
are evaluated and used for function generation. Those compile-time
expressions can use global variables and/or the function generator
parameters..

Anyway, this scheme means that extensive LLVM capability needs to be
available to generated code, since it's the generated code that
creates all of the "regular" functions. Generated code has a much
easier time calling the C bindings than the C++ API.

+/** See the llvm::Use class. */
+typedef struct LLVMOpaqueUse *LLVMUseRef;
+
...
+void LLVMReplaceAllUsesWith(LLVMValueRef OldVal, LLVMValueRef NewVal);
...
+/* Operations on Uses */
+LLVMUseRef LLVMGetFirstUse(LLVMValueRef Val);
+LLVMUseRef LLVMGetNextUse(LLVMUseRef U);
+LLVMValueRef LLVMGetUser(LLVMUseRef U);
+LLVMValueRef LLVMGetUsedValue(LLVMUseRef U);

These seem okay to me, but I don't have too much experience with using
the Use classes. The impression I've gotten from the other developers
is that the C bindings is really designed to just get data into llvm,
and any complex manipulations should really be done in C++ passes.
What's your use case for exposing these more complex manipulations?

I'm using it to support renaming functions and still allowing
generated code to look up those functions by name; basically searching
for all global strings containing the function name, and replacing all
uses of them with uses of the new function name.

I would like to do away with that, though, but I haven't quite managed
to get rid of all cases where LLVMGetNamedFunction is called by
generated code.

Also, I've gotten the impression from other developers that the
C-bindings are considered incomplete and that there is a general
desire to expose more functionality, and eventually all LLVM
functionality, through them.

+/* Operations on Users */
+LLVMValueRef LLVMGetOperand(LLVMValueRef Val, unsigned Index);

So how are you using this, since you aren't exposing any of the other
operand functionality?

This supports the "address-of" operator. Any Value that is a LoadInst
can have its address taken. I need the pointer operand of the
LoadInst to get the address Value.

I figured GetOperand was a good starting point, and could support most
of the operand use cases out there.

+unsigned long long LLVMConstIntGetZExtValue(LLVMValueRef ConstantVal);
+long long LLVMConstIntGetSExtValue(LLVMValueRef ConstantVal);

I'm not sure about these functions. There really isn't any other way
to get to the value of any other constant, so why do you need this?

When I've parsed an int literal and put it on my evaluation stack as a
Value, there's a case where I need to get it back as an int.
Specifically, the LLVMBuildExtractValue function requires an int, not
a Constant, to represent the member. I believe that GEP does as well
when applied to a struct.

/* Operations on composite constants */
@@ -464,6 +479,7 @@
LLVMValueRef LLVMConstVector(LLVMValueRef *ScalarConstantVals, unsigned Size);

/* Constant expressions */
+unsigned LLVMGetConstOpcode(LLVMValueRef ConstantVal);

This seems okay with me, but there really should be an LLVMInstruction
enum defined instead of a raw unsigned value. Could you also add a
LLVMConstExpr that wraps ConstantExpr::get?

That shouldn't be a problem.

+int LLVMHasInitializer(LLVMValueRef GlobalVar);

Seems fine to me. I can commit this now.

+LLVMAttribute LLVMGetFunctionAttr(LLVMValueRef Fn);
+LLVMAttribute LLVMGetAttribute(LLVMValueRef Arg);

I've never really done much with attributes. What are you using this for?

In order to do away with include files, I'm supporting importing
modules in bitcode form. To call a function from an imported module,
I need to put an external into the compiled module, and it really
ought to have the same function and argument attributes as the
original. And I want to be able to do that while JITting at runtime
as well.

Thanks. Let me start by talking a bit about my project.

I'm working on a compiler/language that supports run-time code
generation and compile-time code execution. Besides the obvious
benefits of easier JITting, I also get the benefits of C++ templates
and metaprogramming without all of the headaches.

To make this work, the compiler actually compiles functions down into
function generators, outputting calls to the LLVM C-bindings that
generate a "regular" function. The programmer can then either leave
them in that form for run-time JITting, or have the compiler JIT and
execute those function generators in order to get "regular" functions.
Either or both can be exposed as public functions and left in place
by the optimizer. The function generator gets its own set of
parameters, and multiple functions with variations can be generated at
compile time or runtime.

He can also put compile-time expressions inside the body of functions,
so that when the function generator runs, the compile-time expressions
are evaluated and used for function generation. Those compile-time
expressions can use global variables and/or the function generator
parameters..

Anyway, this scheme means that extensive LLVM capability needs to be
available to generated code, since it's the generated code that
creates all of the "regular" functions. Generated code has a much
easier time calling the C bindings than the C++ API.

You're already doing something a bit more complicated than me :slight_smile: This
does seem a bit more advanced than what llvm-c is intended for,
though. Is there a reason why you can't make a C++ library to do all
this advanced stuff, and just expose some C hooks for your generated
code?

I'm using it to support renaming functions and still allowing
generated code to look up those functions by name; basically searching
for all global strings containing the function name, and replacing all
uses of them with uses of the new function name.

I would like to do away with that, though, but I haven't quite managed
to get rid of all cases where LLVMGetNamedFunction is called by
generated code.

Also, I've gotten the impression from other developers that the
C-bindings are considered incomplete and that there is a general
desire to expose more functionality, and eventually all LLVM
functionality, through them.

While it's lacking in some areas, it's intentional that not all of
llvm is exposed through llvm-c. I learned that after my patches to
expose APInt/APFloat were turned down :slight_smile: Llvm's a large object
oriented project, and maintaining a mapping between the c and c++ api
would be pretty challenging, especially since llvm promises to never
remove anything from llvm-c until 3.0. In order to ease development,
it's really designed to just provide the minimum interface for getting
data into llvm. If you want to do something advanced like modify the
bytecode, you really should be writing against the c++ api.

This supports the "address-of" operator. Any Value that is a LoadInst
can have its address taken. I need the pointer operand of the
LoadInst to get the address Value.

I figured GetOperand was a good starting point, and could support most
of the operand use cases out there.

I'm not sure if I understand. The load instruction takes an address as
an argument and stores the value into a register, therefore you must
already have the address already. Or am I misinterpreting what you're
saying?

When I've parsed an int literal and put it on my evaluation stack as a
Value, there's a case where I need to get it back as an int.
Specifically, the LLVMBuildExtractValue function requires an int, not
a Constant, to represent the member. I believe that GEP does as well
when applied to a struct.

GEP doesn't need to take a constant to work.

%0 = alloca { i32, i32 }
%1 = alloca i32
store i32 0, %1
%2 = load %1
%3 = getelementptr { i32, i32 }*, i32 0, %2
%4 = load %3

extractvalue should only be used if you're using value arrays or
structs, and you need to statically know the indexes. If you don't,
then you really should be using GEPs and let the optimizations do
their thing.

In order to do away with include files, I'm supporting importing
modules in bitcode form. To call a function from an imported module,
I need to put an external into the compiled module, and it really
ought to have the same function and argument attributes as the
original. And I want to be able to do that while JITting at runtime
as well.

If I understand correctly, why aren't the functions already marked
external? If they aren't then an optimizer could theoretically
optimize them away. It may also be more appropriate to pass the
function information through some different channel by the frontend,
rather than directly processing the bytecode. Anyone else have any
experience with doing this?

You're already doing something a bit more complicated than me :slight_smile: This
does seem a bit more advanced than what llvm-c is intended for,
though. Is there a reason why you can't make a C++ library to do all
this advanced stuff, and just expose some C hooks for your generated
code?

I suppose not. It seemed easier for me and advanageous for y'all for
me to get these functions added. But I can ship my own bridge library
as part of my stdlib.

I'm using it to support renaming functions and still allowing
generated code to look up those functions by name; basically searching
for all global strings containing the function name, and replacing all
uses of them with uses of the new function name.

I would like to do away with that, though, but I haven't quite managed
to get rid of all cases where LLVMGetNamedFunction is called by
generated code.

Also, I've gotten the impression from other developers that the
C-bindings are considered incomplete and that there is a general
desire to expose more functionality, and eventually all LLVM
functionality, through them.

While it's lacking in some areas, it's intentional that not all of
llvm is exposed through llvm-c. I learned that after my patches to
expose APInt/APFloat were turned down :slight_smile: Llvm's a large object
oriented project, and maintaining a mapping between the c and c++ api
would be pretty challenging, especially since llvm promises to never
remove anything from llvm-c until 3.0. In order to ease development,
it's really designed to just provide the minimum interface for getting
data into llvm. If you want to do something advanced like modify the
bytecode, you really should be writing against the c++ api.

Then the assumptions under which I submitted the patch were wrong. I
guess it does make sense to ship my own bridge library, then.
Actually, it might be better for me to compile it with llvm-gcc and
ship it as bitcode, come to think of it... one more place that the
optimizer can do its thing.

This supports the "address-of" operator. Any Value that is a LoadInst
can have its address taken. I need the pointer operand of the
LoadInst to get the address Value.

I figured GetOperand was a good starting point, and could support most
of the operand use cases out there.

I'm not sure if I understand. The load instruction takes an address as
an argument and stores the value into a register, therefore you must
already have the address already. Or am I misinterpreting what you're
saying?

When I parse an expression, it gets turned into a Value and stored
away for further processing. (Actually, it gets turned into calls
into LLVM for creating that Value object when the function generator
is run, but anyway...) At that point, I don't keep separate track of
what went into the Value... I can examine the Value itself to get that
information, or do without it.

Any value that lives in memory is represented by a LoadInst from a
pointer to that memory. To take the address, I get the pointer back
out of the LoadInst. Anything that isn't a LoadInst cannot have its
address taken. I end up with about the same rules that C and C++ have
for when an address can be taken.

When I've parsed an int literal and put it on my evaluation stack as a
Value, there's a case where I need to get it back as an int.
Specifically, the LLVMBuildExtractValue function requires an int, not
a Constant, to represent the member. I believe that GEP does as well
when applied to a struct.

GEP doesn't need to take a constant to work.

%0 = alloca { i32, i32 }
%1 = alloca i32
store i32 0, %1
%2 = load %1
%3 = getelementptr { i32, i32 }*, i32 0, %2
%4 = load %3

extractvalue should only be used if you're using value arrays or
structs, and you need to statically know the indexes. If you don't,
then you really should be using GEPs and let the optimizations do
their thing.

That works in most cases. Perhaps it should be that way in all cases.
I wanted to be able to work with struct values without having to
spill them first. Not that it would make any real difference in the
optimized code.

In order to do away with include files, I'm supporting importing
modules in bitcode form. To call a function from an imported module,
I need to put an external into the compiled module, and it really
ought to have the same function and argument attributes as the
original. And I want to be able to do that while JITting at runtime
as well.

If I understand correctly, why aren't the functions already marked
external? If they aren't then an optimizer could theoretically
optimize them away. It may also be more appropriate to pass the
function information through some different channel by the frontend,
rather than directly processing the bytecode. Anyone else have any
experience with doing this?

The functions are marked external in the imported module. But I must
create a matching declaration in the module I'm compiling in order to
create calls to them.

Functions that are not marked external are not imported.

Also, functions in the imported module can be JITted and called at
compile time. Public function/type generators would be used
extensively this way, and would let you ship the equivalent of
template functions/classes in compiled form, something you *still*
can't do with most existing C++ compilers.

Anyway, consider the patch withdrawn (except for that one bit you
already committed). Thank you for looking at it and telling me more
about the motivation behind the C-binding's current state.

Hi Kenneth,

Thanks for working on this. I have some additional comments:

+/** See the llvm::Use class. */
+typedef struct LLVMOpaqueUse *LLVMUseRef;

My understanding is that this actually conceptually corresponds to use_iterator, not Use. Please name this something like LLVMUseIterator. Also, please document this, not just referring to llvm::Use.

+int LLVMHasInitializer(LLVMValueRef GlobalVar);
  LLVMValueRef LLVMGetInitializer(LLVMValueRef GlobalVar);

Isn't LLVMHasInitializer just LLVMGetInitializer(x) != 0?

Otherwise, looks ok to me,

-Chris

My front-end is sync'd with the trunk now, and working well, but it
required some additional functions exposed in the C bindings. I
hereby submit them for review and approval for inclusion in the trunk.

LLVMGetAttribute had a bug in it. Here's the revised version of the patch

Hi Kenneth,

Thanks for working on this. I have some additional comments:

+/** See the llvm::Use class. */
+typedef struct LLVMOpaqueUse *LLVMUseRef;

My understanding is that this actually conceptually corresponds to
use_iterator, not Use. Please name this something like LLVMUseIterator.
Also, please document this, not just referring to llvm::Use.

I was following the pattern of Functions, Globals, etc., where you get
a Use* (not a use_iterator), and then pass it back to a GetNextUse
call, which turns it back into an iterator and advances it.

+int LLVMHasInitializer(LLVMValueRef GlobalVar);
LLVMValueRef LLVMGetInitializer(LLVMValueRef GlobalVar);

Isn't LLVMHasInitializer just LLVMGetInitializer(x) != 0?

Otherwise, looks ok to me,

-Chris

So you want the whole patch, or just the pieces you highlighted?

Last time I tried that, LLVMGetInitializer threw an assertion when the
global variable didn't actually have one. Has this changed?

Isn't LLVMHasInitializer just LLVMGetInitializer(x) != 0?

Last time I tried that, LLVMGetInitializer threw an assertion when the
global variable didn't actually have one. Has this changed?

No idea. It would be more C like to return null. The C implementation of the function can check and return null if not set.

I was following the pattern of Functions, Globals, etc., where you get
a Use* (not a use_iterator), and then pass it back to a GetNextUse
call, which turns it back into an iterator and advances it.

Conceptually you're returning an iterator. It happens to be implemented as a tight wrapper around the Use.

So you want the whole patch, or just the pieces you highlighted?

Please resend an updated patch (the whole thing)

-Chris

All right. You should see it by tonight.

Here it is.

cbindings.patch (8.65 KB)

thanks, applied in r83821