Some question on LLVM design

Hi everybody,

I'm currently looking at LLVM as a possible back-end to a dynamic programming system (in the tradition of Smalltalk) we are developing. I have read most of the llvmdev archives, and I'm aware that some things are 'planned' but not implemented yet. We are willing to contribute the code we'll need for our project, but before I can start coding I'll have to submit to my boss a very concrete proposal on which changes I'll make and how long they're going to take.

So before I can present a concrete proposal, I have some doubts on the design of LLVM and on how some particular constructs should be mapped onto its bytecode representation. Please understand that these questions are not intended to criticize LLVM, but instead to better my understanding of it.

1. Opcodes and intrinsics

Which are the differences between opcodes and intrinsics? How is it determined, for an operation, to implement it as an opcode or as an intrinsic function?

As I understand it, compilation passes can both lower intrinsics into opcodes and also replace opcode sequences, so in the end some of them are interchangeable. For example, why is there a store opcode and a llvm_gcwrite intrinsic? Couldn't the front-end just produce stores/volatile stores and then a compilation pass transform them into a write-barrier if necessary?

A possible view of intrinsics could be "operations that don't depend on the target architecture, but instead on the language runtime". But then wouldn't malloc/free be intrinsics?

2. Stack and registers

As the LLVM instruction set has a potentially infinite number of registers which are mapped onto target registers or the stack by the register allocator, why is there a separate stack? I would understand it, if the stack was more accessible, as a way to implement closures, but it has been repeated here that the correct way to do that is to use heap-allocated structures, as functions can't access other functions' stacks. Is it to signal locations that need to be changed in-place?

3. Control transfer

Why are the control transfer operations so high level when compared to actual processors? Usually processors have instructions to jump to a concrete location and everything else is managed by the compiler (saving into the stack, getting result parameters, etc.) depending on the language's calling conventions. In LLVM there's just one way to transfer control, and the only proposal I've seen (http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps this high level. What are the difficulties in having low level transfer control operations, with explicitly managed arguments, saving registers, etc?

Well, that's all for now. Thanks in advance,

Marc Ordinas i Llopis | Tragnarion Studios

I'm currently looking at LLVM as a possible back-end to a dynamic
programming system (in the tradition of Smalltalk) we are developing.

Neat!

I have read most of the llvmdev archives, and I'm aware that some
things are 'planned' but not implemented yet. We are willing to
contribute the code we'll need for our project, but before I can start
coding I'll have to submit to my boss a very concrete proposal on
which changes I'll make and how long they're going to take.

Sounds good!

1. Opcodes and intrinsics

Which are the differences between opcodes and intrinsics? How is it
determined, for an operation, to implement it as an opcode or as an
intrinsic function?

Opcodes are such things that are usually implemented in one or just a
few machine instructions, i.e. things that have a direct correlation to
a machine ISA, such as 'add', 'mul', 'div', etc.

Intrinsics are such things that provide a more complex interaction with
the processor (or the language runtime), but the implementation would be
so different from one platform to the next, that it will be more than
just a "few instructions". An example of such would be the llvm.gc*
intrinsics that you mention, which are for garbage collected languages.
Other intrinsics include debugging support, etc.

As I understand it, compilation passes can both lower intrinsics into
opcodes and also replace opcode sequences, so in the end some of them
are interchangeable.

That's not really correct. The intrinsics such as llvm.frameaddress and
llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of
the intrinsics is specifically machine-dependent, and LLVM (and its
opcodes) are machine-independent, so there is no valid interchange of
these intrinsics with any combination of LLVM opcodes.

The llvm.memcpy intrinsic can be lowered onto an explicit LLVM loop, but
the memcpy intrinsic provides a succinct representation of memory copy
and so stays as such. Plus, it is harder to do the reverse analysis --
prove that a loop is essentially a memcpy().

For example, why is there a store opcode and a llvm_gcwrite intrinsic?

The 'store' opcode is for generic writes, the gcwrite is specifically
for garbage collection. I think someone else would be able to give you
more information about the GC intrinsics than I am, however.

Couldn't the front-end just produce stores/volatile stores and then a
compilation pass transform them into a write-barrier if necessary?

Again, see above: gcwrite is not a "generic store" in the way that the
'store' opcode is, nor is it a barrier.

A possible view of intrinsics could be "operations that don't depend
on the target architecture, but instead on the language runtime". But
then wouldn't malloc/free be intrinsics?

Good question. Due to the amount of pointer/data analysis in LLVM, it
is often necessary to consider memory allocation instructions to see
where memory is allocated and deallocated. As such, the malloc/free
instructions provide that higher-level construct that make it easier to
manipulate memory-operating instructions since that is done so often.

Consider: for an llvm.malloc intrinsic, one would have to say something
like this every time we wanted to analyze memory usage:

if (CallInst *CI = dyn_cast<CallInst>(I))
  if (CI->getCalledFunction()->getName() == "malloc") {
    // ...
  }

whereas with the malloc instruction raising/lowering pass, you would say

if (MallocInst *MI = dyn_cast<MallocInst>(I)) {
  // ...
}

Given the prevalence of such code, this makes it more efficient and
cleaner to write the second way. Note that LLVM will work just fine if
you use direct calls to malloc() and never create malloc instructions.

2. Stack and registers

As the LLVM instruction set has a potentially infinite number of
registers which are mapped onto target registers or the stack by the
register allocator, why is there a separate stack?

The stack is used for two things: storing structures allocated with
alloca() or automatic variables declared to be function-local. The
spilling of the registers is done automatically (transparently), but the
allocation of DATA is explicit in LLVM to analyze the pointer aliases.

I would understand it, if the stack was more accessible, as a way to
implement closures, but it has been repeated here that the correct way
to do that is to use heap-allocated structures, as functions can't
access other functions' stacks. Is it to signal locations that need to
be changed in-place?

I'm sorry, I don't quite understand this question.

3. Control transfer

Why are the control transfer operations so high level when compared to
actual processors?

Because LLVM is target-independent. It is the job of the backend code
generator (see llvm/lib/Target/*) to convert high-level control transfer
"call" instruction (see below) to the stack-pop/push or register-passing
method of the target. These details are largely different from one
machine to the next, and have no relevance to LLVM analyses.

Usually processors have instructions to jump to a concrete location
and everything else is managed by the compiler (saving into the stack,
getting result parameters, etc.) depending on the language's calling
conventions.

LLVM has the 'call' instruction that abstracts the target machine's
calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call
for more information.

In LLVM there's just one way to transfer control, and
the only proposal I've seen
(http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps
this high level.

This is to have more options for calling conventions that may be
specified for a target that are non-standard. Mostly these are
optimizations (i.e., some registers do not need to be saved if calling a
particular library routine, etc.).

What are the difficulties in having low level transfer control
operations, with explicitly managed arguments, saving registers, etc?

Choosing any lower-level mechanism will make LLVM no longer be
target-independent. Choosing to have the union of all machines is
nearly impossible and is overly complicated and unnecessary for any
LLVM-to-LLVM analyses and transformations.

What is that you are looking to express that isn't captured by the
`call' instruction?

Hope that helps,

Hi everybody,

Hi Marc

I'm currently looking at LLVM as a possible back-end to a dynamic
programming system (in the tradition of Smalltalk) we are developing.

Great!

I
have read most of the llvmdev archives, and I'm aware that some things
are 'planned' but not implemented yet. We are willing to contribute the
code we'll need for our project, but before I can start coding I'll have
to submit to my boss a very concrete proposal on which changes I'll make
and how long they're going to take.

So before I can present a concrete proposal, I have some doubts on the
design of LLVM and on how some particular constructs should be mapped
onto its bytecode representation. Please understand that these questions
are not intended to criticize LLVM, but instead to better my
understanding of it.

Okay, I'll take a crack at it. Others will probably want to give you
better answers than mine :slight_smile:

1. Opcodes and intrinsics

Which are the differences between opcodes and intrinsics? How is it
determined, for an operation, to implement it as an opcode or as an
intrinsic function?

The opcodes are generally fixed as they represent the LLVM mid-level IR.
Changing the opcode set can have wide-reaching impact on all the
analysis, transform, and codegen passes in LLVM so its not a change
taken lightly. However, when it makes sense, they are added
occasionally. For example, we recently added the "unreachable"
instruction which allows a front end compiler to identify code locations
that should not be reached. This can help some of the passes.

As for intrinsics, these are basically function calls that LLVM knows
about. For example, things like memset and memcpy could be implemented
as a function but could also be implemented with direct code if the
processor supports it. Intrinsics are place holders for either code
generation or invocation of a runtime library function. This is probably
where you'd want to extend.

As I understand it, compilation passes can both lower intrinsics into
opcodes and also replace opcode sequences, so in the end some of them
are interchangeable. For example, why is there a store opcode and a
llvm_gcwrite intrinsic? Couldn't the front-end just produce
stores/volatile stores and then a compilation pass transform them into a
write-barrier if necessary?

I believe the llvm_gcwrite intrinsic is for garbage collection, which is
entirely optional. If all stores were to have write-barrier semantics
there could be significant performance penalties.

A possible view of intrinsics could be "operations that don't depend on
the target architecture, but instead on the language runtime". But then
wouldn't malloc/free be intrinsics?

Intrinsics are intended to be replaceable by the target's code
generation, if possible/necessary, or emulated with a function call if
not. Language runtimes should be just that .. calls to functions located
in libraries.

2. Stack and registers

As the LLVM instruction set has a potentially infinite number of
registers which are mapped onto target registers or the stack by the
register allocator, why is there a separate stack? I would understand
it, if the stack was more accessible, as a way to implement closures,
but it has been repeated here that the correct way to do that is to use
heap-allocated structures, as functions can't access other functions'
stacks. Is it to signal locations that need to be changed in-place?

I'm not sure what you mean by a "separate stack". Register allocation
can spill registers to THE stack. I'll let someone more knowledgeable
about this answer.

3. Control transfer

Why are the control transfer operations so high level when compared to
actual processors? Usually processors have instructions to jump to a
concrete location and everything else is managed by the compiler (saving
into the stack, getting result parameters, etc.) depending on the
language's calling conventions. In LLVM there's just one way to transfer
control, and the only proposal I've seen
(http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps
this high level. What are the difficulties in having low level transfer
control operations, with explicitly managed arguments, saving registers,
etc?

I don't think there are any difficulties in having low-level control
transfer operations, I think its more that we don't want them. The point
behind the mid-level SSA IR is to make it simple to generate code from a
compiler front end and to not restrict choices that could be made by
analysis, transform, and codegen passes. The fact that we have a
relatively high level for control transfer operations goes right along
with the rest of the LLVM IR (e.g. gep, call, typed operators, malloc,
free).

The point is to make a simple, consistent, small IR with the following
goals:

* front ends can easily generate code for it because the number of
  instructions is small and model is simple.
* analyses and transforms are simplified because they don't have to
  reason over mind-numbing complexity of lower-level instruction sets
* code generation has significant freedom in the code it generates

Well, that's all for now. Thanks in advance,

Marc Ordinas i Llopis | Tragnarion Studios

Thanks for the interesting questions!

Reid.

Hi everybody,

Hi!

I'm currently looking at LLVM as a possible back-end to a dynamic
programming system (in the tradition of Smalltalk) we are developing. I

Very cool!

have read most of the llvmdev archives, and I'm aware that some things
are 'planned' but not implemented yet. We are willing to contribute the
code we'll need for our project, but before I can start coding I'll have
to submit to my boss a very concrete proposal on which changes I'll make
and how long they're going to take.

Ok.

So before I can present a concrete proposal, I have some doubts on the
design of LLVM and on how some particular constructs should be mapped
onto its bytecode representation. Please understand that these questions
are not intended to criticize LLVM, but instead to better my
understanding of it.

Sure. Before we begin, let me point out that IR design is really a black
art of balancing various forces. Adding stuff to the IR makes it more
powerful but also more complex to deal with and process (making it more
likely that bugs occur). However, we need to add things when there are
features or capabilities that cannot be represented in LLVM. Finally,
LLVM is not immutable: though it's quite stable now, we are always
interested in feedback and ideas. :slight_smile:

1. Opcodes and intrinsics

Which are the differences between opcodes and intrinsics? How is it
determined, for an operation, to implement it as an opcode or as an
intrinsic function?

This is a hard thing to really decide, as there frankly isn't much
difference. Adding intrinsics is far easier than adding instructions, but
intrinsics cannot do some things (for example, they cannot be
terminator instructions). I tend to prefer to make only very simple and
orthogonal operations be the instructions, relegating the rest to be
intrinsics. In practice, the biggest difference might be in the bytecode
encoding: instructions are encoded more densely than intrinsics.

As I understand it, compilation passes can both lower intrinsics into
opcodes and also replace opcode sequences, so in the end some of them
are interchangeable. For example, why is there a store opcode and a
llvm_gcwrite intrinsic?

As pointed out by others, these are completely different operations. In
particular, the gcwrite intrinsic is used by the front-end to indicate
that a write-barrier may be needed. Depending on the implementation of
the garbage collector, this may expand to code or just a normal store
instruction.

Couldn't the front-end just produce stores/volatile stores and then a
compilation pass transform them into a write-barrier if necessary?

Sortof. The problem with this is that (without gcwrite) there is no way
to identify the stores that should be turned into write barriers. In
particular, the heap may have multiple parts to it, some of which are GC'd
and some are not. For example, the .NET framework has different pointer
types for managed and unmanaged pointers: without the ability to represent
both in LLVM, we could not support it.

A possible view of intrinsics could be "operations that don't depend on
the target architecture, but instead on the language runtime".

There are a least these catagories of intrinsics:

1. Target specific intrinsics: these are things like llvm.returnaddress
   and the varargs stuff that are specific to code generators: there is
   simply no way to express this in LLVM without an intrinsic. These
   could be made into LLVM instructions, but would provide no real
   advantage if we did so.
2. GC intrinsics: These are their own catagory because they require
   support for the LLVM code generator and GC runtime library to
   interpret.
3. Extended langauge intrinsics. These include llvm.memcpy and friends.
   Note that the intrinsics are different (more powerful) than the libc
   equivalent, because they allow alignment info to be expressed.

But then wouldn't malloc/free be intrinsics?

Heh, good question. This is largely historical, but you're right, they
probably should have been made intrinsics. Perhaps in the future they
will be converted to be intrinsics, but for now they remain instructions.

2. Stack and registers

As the LLVM instruction set has a potentially infinite number of
registers which are mapped onto target registers or the stack by the
register allocator,

Yes.

why is there a separate stack? I would understand it, if the stack was
more accessible, as a way to implement closures, but it has been
repeated here that the correct way to do that is to use heap-allocated
structures, as functions can't access other functions' stacks. Is it to
signal locations that need to be changed in-place?

I'm not sure what you mean. In particular, the alloca instruction is used
to explicitly allocate stack space. Because it is not possible to take
the address of LLVM registers, this the mechanism that we use to allocate
stack space. Certain langauges do not need a stack, and thus do not need
to use alloca, other languages (e.g. C) do. If you clarify your question
I'll be able to give a more satisfactory answer.

3. Control transfer

Why are the control transfer operations so high level when compared to
actual processors? Usually processors have instructions to jump to a
concrete location and everything else is managed by the compiler (saving
into the stack, getting result parameters, etc.) depending on the
language's calling conventions.

The idea is to make the producer of the LLVM code as independent from the
target as possible. In particular, a front-end for a type safe language
should be able to produce a type-safe module that works on all targets.

In LLVM there's just one way to transfer control, and the only proposal
I've seen
(http://nondot.org/sabre/LLVMNotes/CustomCallingConventions.txt) keeps
this high level. What are the difficulties in having low level transfer
control operations, with explicitly managed arguments, saving registers,
etc?

I'm not sure what it is that you are trying to do. The abstraction
provided by the call instruction is good for the optimizers and analyses
(because they see exactly what they need, not extra details), good for
compilation speed, and good for target independence. Custom calling
conventions (link above) are specifically designed to give the code
generator the flexibility to pick the most efficient calling convention
for a particular call/return pair.

Given all of the above (efficient compiled code, easy to write
analysis/xforms, happy front-ends, fast compile times), I'm not sure what
your approach would give. Care to elaborate?

-Chris

> A possible view of intrinsics could be "operations that don't depend
> on the target architecture, but instead on the language runtime". But
> then wouldn't malloc/free be intrinsics?

Good question. Due to the amount of pointer/data analysis in LLVM, it
is often necessary to consider memory allocation instructions to see
where memory is allocated and deallocated. As such, the malloc/free
instructions provide that higher-level construct that make it easier to
manipulate memory-operating instructions since that is done so often.

This information could be provided by an intrinsic just as well.

Consider: for an llvm.malloc intrinsic, one would have to say something
like this every time we wanted to analyze memory usage:

if (CallInst *CI = dyn_cast<CallInst>(I))
  if (CI->getCalledFunction()->getName() == "malloc") {
    // ...
  }
whereas with the malloc instruction raising/lowering pass, you would say
if (MallocInst *MI = dyn_cast<MallocInst>(I)) {

Not true at all. Consider the "llvm/IntrinsicInst.h" header, which lets
us write stuff like this, today:

  if (MemCpyInst *MCI = dyn_cast<MemCpyInst>(I))
    MCI->getSource() MCI->getLength() ...

The one advantage that mallocinst has over using an intrnisic is that
instructions can have different return values in various parts of the
program (e.g., you can write 'malloc int' instead of '(int*)malloc(4)').

-Chris

OK, then you could say that the *real* advantage of the malloc/alloca
instructions is that it can take a type as an operand (which an
intrinsic cannot do) and hence can represent code in a more
target-independent manner.

Consider malloc(sizeof(long)). That may be 4 on one target, 8 on
another, so either you always allocate 8 to be correct (with an
intrinsic) or you tie the bytecode to a particular size of long, where
as with the instruction, just say 'malloc long'.

Yup, exactly.

Note that this is *still* syntactic sugar, as you can write 'sizeof'
portably in LLVM. The real question is why we give special support for
malloc/free, which we don't give to other language runtime allocation
functions (e.g. operator new' in C++, new in Java, etc).

Going forward, it will be interesting to see how we can generalize support
for other allocators, I don't think that there is anything intrinsically
hard, but it is something that we eventually want to do.

-Chris

The real question is why we give special support for
malloc/free, which we don't give to other language runtime allocation
functions (e.g. operator new' in C++, new in Java, etc).

Going forward, it will be interesting to see how we can generalize support
for other allocators, I don't think that there is anything intrinsically
hard, but it is something that we eventually want to do.

This is something we probably need to do soon (e.g., for Java), to enable our language-independent pointer analyses to work since they need to know about 'new'. We will have to make sure the type behavior is correct, but that should not be difficult since there should (normally) be an explicit LLVM type for each static source type.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

Thanks all for the fast answers, I'm certainly understanding LLVM better. Some more comments below:

Chris Lattner wrote:

Couldn't the front-end just produce stores/volatile stores and then a
compilation pass transform them into a write-barrier if necessary?

Sortof. The problem with this is that (without gcwrite) there is no way
to identify the stores that should be turned into write barriers. In
particular, the heap may have multiple parts to it, some of which are GC'd
and some are not. For example, the .NET framework has different pointer
types for managed and unmanaged pointers: without the ability to represent
both in LLVM, we could not support it.

Ok, I understand this now. Would allocation to different heaps be represented using different intrinsics, then?

Also, some answers were provided by Reid Specer, so I'm merging the two threads here:

> Intrinsics are intended to be replaceable by the target's code
> generation, if possible/necessary, or emulated with a function call if
> not. Language runtimes should be just that .. calls to functions located
> in libraries.

But sometimes it's more efficient to have knowledge of the runtime in the compiler, i.e. inlining method lookup. Would that be implemented using a lowering pass to substitute the function call for the runtime code?

2. Stack and registers

I'm not sure what you mean. In particular, the alloca instruction is used
to explicitly allocate stack space. Because it is not possible to take
the address of LLVM registers, this the mechanism that we use to allocate
stack space. Certain langauges do not need a stack, and thus do not need
to use alloca, other languages (e.g. C) do. If you clarify your question
I'll be able to give a more satisfactory answer.

So would it be possible to pass a pointer to a structure allocated in the stack to a called function?

As to what I'm trying to do, more below.

Reid Spencer wrote:
> I'm not sure what you mean by a "separate stack". Register allocation
> can spill registers to THE stack. I'll let someone more knowledgeable
> about this answer.

I meant a stack separate from the infinite registers. I see now that we need a stack for taking the address of objects, thanks.

3. Control transfer

I'm not sure what it is that you are trying to do. The abstraction
provided by the call instruction is good for the optimizers and analyses
(because they see exactly what they need, not extra details), good for
compilation speed, and good for target independence. Custom calling
conventions (link above) are specifically designed to give the code
generator the flexibility to pick the most efficient calling convention
for a particular call/return pair.

Given all of the above (efficient compiled code, easy to write
analysis/xforms, happy front-ends, fast compile times), I'm not sure what
your approach would give. Care to elaborate?

Right now, I'm just trying to think on how I'm going to map our language features on LLVM, to see if we can use it as our back-end with as little modification as possible.

Thanks everyone,

Marc Ordinas i Llopis | Tragnarion Studios

Misha Brukman wrote:

1. Opcodes and intrinsics

That's not really correct. The intrinsics such as llvm.frameaddress and
llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of
the intrinsics is specifically machine-dependent, and LLVM (and its
opcodes) are machine-independent, so there is no valid interchange of
these intrinsics with any combination of LLVM opcodes.

Yes, I understand that those intrinsics are mapped differently on different machines, but isn't 'add' mapped differently too?

It seems to me that those intrinsics are there to support the GNU C extensions, making them a 'language feature' of sorts. That's why they are intrinsics and not opcodes.

3. Control transfer

LLVM has the 'call' instruction that abstracts the target machine's
calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call
for more information.

Yes, it abstracts the target's C calling convention. But I'm trying to see if other types of languages can be implemented using LLVM.

What is that you are looking to express that isn't captured by the
`call' instruction?

Tail calls, closures, continuations, lazy allocation... Basically I'm trying to see how am I going to implement high-level functions on LLVM.

Hope that helps,

Certainly, thanks!

Marc Ordinas i Llopis | Tragnarion Studios

> Sortof. The problem with this is that (without gcwrite) there is no way
> to identify the stores that should be turned into write barriers. In
> particular, the heap may have multiple parts to it, some of which are GC'd
> and some are not. For example, the .NET framework has different pointer
> types for managed and unmanaged pointers: without the ability to represent
> both in LLVM, we could not support it.
>

Ok, I understand this now. Would allocation to different heaps be
represented using different intrinsics, then?

Hrm, not necessarily intrinsics. The GC interface currently only supports
one logical heap, but it would be trivial to have a custom GC that does
whatever you'd like.

Also, some answers were provided by Reid Specer, so I'm merging the two
threads here:

> Intrinsics are intended to be replaceable by the target's code
> generation, if possible/necessary, or emulated with a function call if
> not. Language runtimes should be just that .. calls to functions located
> in libraries.

But sometimes it's more efficient to have knowledge of the runtime in
the compiler, i.e. inlining method lookup. Would that be implemented
using a lowering pass to substitute the function call for the runtime code?

It depends on what you mean. If you mean virtual method calls, we use
pointer analysis to expose the targets of virtual method calls. In the
future, extensions to LLVM may be able to capture more detailed
information from the front-end (using class hierarchy analysis or other
techniques), but we have not started on that work yet.

>>2. Stack and registers
>
> I'm not sure what you mean. In particular, the alloca instruction is used
> to explicitly allocate stack space. Because it is not possible to take
> the address of LLVM registers, this the mechanism that we use to allocate
> stack space. Certain langauges do not need a stack, and thus do not need
> to use alloca, other languages (e.g. C) do. If you clarify your question
> I'll be able to give a more satisfactory answer.
>

So would it be possible to pass a pointer to a structure allocated in
the stack to a called function?

Exactly.

As to what I'm trying to do, more below.

Reid Spencer wrote:
> I'm not sure what you mean by a "separate stack". Register allocation
> can spill registers to THE stack. I'll let someone more knowledgeable
> about this answer.

I meant a stack separate from the infinite registers. I see now that we
need a stack for taking the address of objects, thanks.

Yup.

>>3. Control transfer
> I'm not sure what it is that you are trying to do. The abstraction
> provided by the call instruction is good for the optimizers and analyses
> (because they see exactly what they need, not extra details), good for
> compilation speed, and good for target independence. Custom calling
> conventions (link above) are specifically designed to give the code
> generator the flexibility to pick the most efficient calling convention
> for a particular call/return pair.
>
> Given all of the above (efficient compiled code, easy to write
> analysis/xforms, happy front-ends, fast compile times), I'm not sure what
> your approach would give. Care to elaborate?
>

Right now, I'm just trying to think on how I'm going to map our language
features on LLVM, to see if we can use it as our back-end with as little
modification as possible.

Okay. If you can express the issues you have in C or C++, one easy way to
do this is to see what LLVM code the C/C++ front-end generates for a
particular construct. Other options are to explain what you need and ask
here :slight_smile:

-Chris

Misha Brukman wrote:
>>1. Opcodes and intrinsics
>>
> That's not really correct. The intrinsics such as llvm.frameaddress and
> llvm.returnaddress have no equivalents in LLVM opcodes -- the meaning of
> the intrinsics is specifically machine-dependent, and LLVM (and its
> opcodes) are machine-independent, so there is no valid interchange of
> these intrinsics with any combination of LLVM opcodes.
>

Yes, I understand that those intrinsics are mapped differently on
different machines, but isn't 'add' mapped differently too?

It seems to me that those intrinsics are there to support the GNU C
extensions, making them a 'language feature' of sorts. That's why they
are intrinsics and not opcodes.

Yes, that is accurate. In fact, these intrinsics COULD be made into
opcodes, but it would add little value to do so.

>>3. Control transfer
>>
> LLVM has the 'call' instruction that abstracts the target machine's
> calling convention. See http://llvm.cs.uiuc.edu/docs/LangRef.html#i_call
> for more information.
>

Yes, it abstracts the target's C calling convention. But I'm trying to
see if other types of languages can be implemented using LLVM.

Sure they can. :slight_smile:

> What is that you are looking to express that isn't captured by the
> `call' instruction?
>

Tail calls, closures, continuations, lazy allocation... Basically I'm
trying to see how am I going to implement high-level functions on LLVM.

As for tail calls, my plans are here:
http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt
Note that LLVM already has a pretty aggressive tail-call elimination pass,
so this is really more for completeness than anything else.

Closures and continuations should not be a problem. Closures are
represented as a pair, where the first element is a pointer to a struct
(containing any environment needed by the closure) and a function pointer.
To invoke the closure, just call the function pointer, passing the pointer
to struct.

I'm not sure what support you would need in LLVM for lazy allocation.

-Chris

I agree with Chris's points below, but I would add a couple of caveats:

What is that you are looking to express that isn't captured by the
`call' instruction?

Tail calls, closures, continuations, lazy allocation... Basically I'm
trying to see how am I going to implement high-level functions on LLVM.

As for tail calls, my plans are here:
http://nondot.org/sabre/LLVMNotes/GuaranteedEfficientTailCalls.txt
Note that LLVM already has a pretty aggressive tail-call elimination pass,
so this is really more for completeness than anything else.

Actually, although we can optimize a tail call of a procedure calling itself, we cannot express an optimized tail call from one procedure to another. We don't have a cross-procedure branch in LLVM. Of course, a cross-procedure tail call could be optimized to a branch within a native back-end but that's not really what you want.

Closures and continuations should not be a problem. Closures are
represented as a pair, where the first element is a pointer to a struct
(containing any environment needed by the closure) and a function pointer.
To invoke the closure, just call the function pointer, passing the pointer
to struct.

I agree that closures should not be a problem. In the case of closures for polymorphic functions, though, the type information for the function pointer in the closure will be imprecise and casts will probably be needed to use the function pointer. The loss of type information is really due to lack of polymorphism at the LLVM level, not due to closures per se.

I'm not sure how well continuations will work. As Chris implied, they can be expressed as closures, but "rediscovering" the continuations for any CPS-based analysis or optimizations could be inconvenient.

I'm not sure what support you would need in LLVM for lazy allocation.

Me neither. You should be able to *implement* it, but I don' t know what should be expressed in LLVM directly.

-Chris

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/