Memory Alignment, Heap allocation.

Hi,

1.

A small question: How do I ensure memory alignment? I want all malloced
memory, globals and functions to be 4-byte aligned. Does llvm have any
".align" keyword?

I'm currently implementing a small scheme toy-compiler, and want to use
the lowest 2 bits for type tags. It's Currently 380 lines of
scheme-code[1], quite similar to the compiler in SICP[2], which I hope to
get self-applicable later on.

2.

Can I change the calling conventions / frame handling, so that call frames
are allocated on the heap instead of on the stack? Right now all my
compiled functions take an environment as an argument to lookup variables
in the scheme-function. It would perhaps be nicer if I could use the call
frames instead, but I can't since lambdas in it can escape when the
frame is popped of the stack, for example:

  (lambda (x) (lambda (z) (+ x z))) ; the inner lambda is returned.

Regards, Tobias

[1] www.ida.liu.se/~tobnu/compile.ss

    Can currently for example compile and run:

    (compiler '((lambda (x y z)
                  (if (seteq (car (cdr (cons x (cons y 3)))) z)
                    (add 1 0)
                    (sub 2 1)))
                1 2 2))

[2] Structure and Interpretation of Computer Programs, Abelson & Sussman.

Tobias Nurmiranta wrote:

Hi,

Chris and others can give you better ideas on the ideal ways to implement what you want, but I'll give some ideas/answers for now.

1.

A small question: How do I ensure memory alignment? I want all malloced
memory, globals and functions to be 4-byte aligned. Does llvm have any
".align" keyword?

No, LLVM does not currently have an alignment keyword (that I know of). It might be useful to have one, though.

I'm currently implementing a small scheme toy-compiler, and want to use
the lowest 2 bits for type tags. It's Currently 380 lines of
scheme-code[1], quite similar to the compiler in SICP[2], which I hope to
get self-applicable later on.

I think the only reliable way you could do this would be to implement your own memory allocator function that returned memory from the heap. Your code could ensure that every pointer it returned was on a 4 byte boundary.

Any other technique would take advantage of side-effects of the LLVM code generators.

2.

Can I change the calling conventions / frame handling, so that call frames
are allocated on the heap instead of on the stack? Right now all my
compiled functions take an environment as an argument to lookup variables
in the scheme-function. It would perhaps be nicer if I could use the call
frames instead, but I can't since lambdas in it can escape when the
frame is popped of the stack, for example:

I suppose you could change the LLVM code generators to use whatever calling conventions you want, but I believe your current implementation is the correct way of passing information from a function back to its caller.

Changing the code generators to do what you describe essentially "breaks" the LLVM calling convention model (i.e. in LLVM, items on the stack disappear when a function returns; you do not want to assume that something on the function's stack still exists after the function already returns).

Passing a pointer may look a little kludgy, but it is correct and should be fine.

-- John T.

A small question: How do I ensure memory alignment? I want all malloced
memory, globals and functions to be 4-byte aligned. Does llvm have any
".align" keyword?

In the medium term, we plan to add alignment requirements to the
alloca/malloc instructions and to globals (vars/functions) but we do not
have this yet. Currently the code generator defines the alignment of
various structures based on the preferred alignment (usually specified by
the platform ABI). If you have a 32-bit object (like an int or a float),
you can be pretty certain that the object is going to be four byte aligned
(for global vars, allocas, and mallocs). Functions SHOULD be at least four
byte aligned: if they are not, please file a bug and we'll get it fixed
ASAP!

I'm currently implementing a small scheme toy-compiler, and want to use
the lowest 2 bits for type tags. It's Currently 380 lines of
scheme-code[1], quite similar to the compiler in SICP[2], which I hope to
get self-applicable later on.

Cool! I'm currently out of town so I can't try it out, but I will when I
get a chance. This is sounds like a neat project!

Can I change the calling conventions / frame handling, so that call frames
are allocated on the heap instead of on the stack? Right now all my
compiled functions take an environment as an argument to lookup variables
in the scheme-function. It would perhaps be nicer if I could use the call
frames instead, but I can't since lambdas in it can escape when the
frame is popped of the stack, for example:

I think that taking an environment pointer is the best way to go. The
semantics of LLVM are supposed to match that of a microprocessor, so if
you want custom semantics for calls (allocating the frame on the stack),
they should be implemented explicitly in the LLVM code. If the code
generated by the result is not good, please let us know and we can tune
the code generator or potentially add a new domain-specific optimization.

One important thing that we don't have (but which will be added when there
is interest) is support for explicitly marked tail calls. Currently there
is support for tail call *optimizations* (e.g., turning a naive pow into a
loop), but no way for a front-end to guarantee that it happens. We will
eventually allow the front-end to mark a call as being a tail call, but
noone has implemented this yet (it shouldn't be hard). In any case, the
optimizer is pretty agressive about eliminating tail calls, so you
probably won't run into problems except for obsurd situations.

Writing a scheme front-end for LLVM sounds like a great project: please
keep us informed how it goes, and when it gets mostly functional, let us
know so we can add a link on the web site. :slight_smile:

-Chris

I just wanted to add that the llvm "malloc" instruction turns into a
direct call to the libc malloc implementation. All libc implementations
that I know of guarantee at least a 4-byte alignment, and many provide 8
byte. If you malloc your data, you should be fine.

-Chris

Hi, now I've had some free coding time.

Writing a scheme front-end for LLVM sounds like a great project: please
keep us informed how it goes, and when it gets mostly functional, let us
know so we can add a link on the web site. :slight_smile:

-Chris

Just to keep you informed. My small scheme compiler[1] of 1K lines is now
self applicable, with the types fixnum, symbols, strings, functions and
vectors (cons cells are seen as vectors of size 2).

You can for example do:

cat compile.ss|mzscheme --script compile.ss|llvm-as -o=ccomp.bc
echo '(display "hello")'|lli ccomp.bc|llvm-as -o=hello2.bc

But be warned, the resulting programs are painfully slow :). Next step is
to implement garbage collection for it, since it right now just joyfully
mallocs away :). (It actually runs out of memory if I try to compile
compile.ss with ccomp.bc, i.e "cat compile.ss|lli ccomp.bc".)

A question: would it be difficult to make my compiled compiler
(ccomp.bc) call functions in llvm for creation of basic blocks and
instructions, instead of using text format? (See under "LLVM primitives"
in the scheme code.). I'll try to read up on how to use the JIT
facilities, but I won't say no to any hints :).

, Tobias

[1] http://www.ida.liu.se/~tobnu/compile.ss

Hi, now I've had some free coding time.

> Writing a scheme front-end for LLVM sounds like a great project: please
> keep us informed how it goes, and when it gets mostly functional, let us
> know so we can add a link on the web site. :slight_smile:
>
> -Chris

Just to keep you informed. My small scheme compiler[1] of 1K lines is now
self applicable, with the types fixnum, symbols, strings, functions and
vectors (cons cells are seen as vectors of size 2).

That is wonderful! Wow, you did this just ~1 month? :slight_smile:

You can for example do:

cat compile.ss|mzscheme --script compile.ss|llvm-as -o=ccomp.bc
echo '(display "hello")'|lli ccomp.bc|llvm-as -o=hello2.bc

You're right, it does work even :slight_smile:

But be warned, the resulting programs are painfully slow :). Next step is
to implement garbage collection for it, since it right now just joyfully
mallocs away :). (It actually runs out of memory if I try to compile
compile.ss with ccomp.bc, i.e "cat compile.ss|lli ccomp.bc".)

Cool, ok. Have you seen the LLVM GC support that is already available:
http://llvm.cs.uiuc.edu/docs/GarbageCollection.html

It should be able to support scheme well, though there may be some missing
bits.

A question: would it be difficult to make my compiled compiler
(ccomp.bc) call functions in llvm for creation of basic blocks and
instructions, instead of using text format? (See under "LLVM primitives"
in the scheme code.). I'll try to read up on how to use the JIT
facilities, but I won't say no to any hints :).

Actually we were just talking about this recently. I believe that Patrick
(among other things) is working on a C API to the LLVM IR classes. Given
a C-style API, it will definitely be callable from LLVM code (that's kinda
the point of Patrick's work). Maybe he can say some more about where this
stands?

[1] http://www.ida.liu.se/~tobnu/compile.ss

This is very neat. If you put together a little web page for it, and send
me a blurb, I would be very happy to add this to the projects page. :slight_smile:

As you continue development on it, would you be interested in integrating
this into the LLVM tree?

-Chris

Hi, thanks for your mail!

That is wonderful! Wow, you did this just ~1 month? :slight_smile:

Yes :), even less, but that is since I used the structure from SICP, see
the URL below.

Cool, ok. Have you seen the LLVM GC support that is already available:
http://llvm.cs.uiuc.edu/docs/GarbageCollection.html

It should be able to support scheme well, though there may be some missing
bits.

Yes, I will read/use it closer and try to use it, and say if I miss
something. It's GCs and persistent memories my Ph.d work is supposed to be
about :). The scheme compiler is just a small hack for now so that I can
easely understand and debug later on. But maybe I'll get obsessed. :slight_smile:

Actually we were just talking about this recently. I believe that Patrick
(among other things) is working on a C API to the LLVM IR classes. Given
a C-style API, it will definitely be callable from LLVM code (that's kinda
the point of Patrick's work). Maybe he can say some more about where this
stands?

Ok, I look forward to it!

> [1] http://www.ida.liu.se/~tobnu/compile.ss

This is very neat. If you put together a little web page for it, and send
me a blurb, I would be very happy to add this to the projects page. :slight_smile:

Ok, here you go,

  http://www.ida.liu.se/~tobnu/scheme2llvm/

(what's a blurb? :slight_smile:

As you continue development on it, would you be interested in integrating
this into the LLVM tree?

Sure, it could probably at least be used as an example of how to use the
GC support. Maybe smaller and easier to understand compared to a complete
JVM. :slight_smile:

, Tobias

> Cool, ok. Have you seen the LLVM GC support that is already available:
> http://llvm.cs.uiuc.edu/docs/GarbageCollection.html
>
> It should be able to support scheme well, though there may be some missing
> bits.

Yes, I will read/use it closer and try to use it, and say if I miss
something. It's GCs and persistent memories my Ph.d work is supposed to be
about :). The scheme compiler is just a small hack for now so that I can
easely understand and debug later on. But maybe I'll get obsessed. :slight_smile:

Sounds good. Hopefully you can even improve what we have :slight_smile:

> > [1] http://www.ida.liu.se/~tobnu/compile.ss
>
> This is very neat. If you put together a little web page for it, and send
> me a blurb, I would be very happy to add this to the projects page. :slight_smile:

Ok, here you go,

  http://www.ida.liu.se/~tobnu/scheme2llvm/

Looks great!

(what's a blurb? :slight_smile:

Just a summary, so that I can add an entry to this page:
http://llvm.cs.uiuc.edu/ProjectsWithLLVM/

> As you continue development on it, would you be interested in integrating
> this into the LLVM tree?

Sure, it could probably at least be used as an example of how to use the
GC support. Maybe smaller and easier to understand compared to a complete
JVM. :slight_smile:

That is what I was thinking. :slight_smile:

-Chris

> http://www.ida.liu.se/~tobnu/scheme2llvm/

Looks great!

> (what's a blurb? :slight_smile:

Just a summary, so that I can add an entry to this page:
http://llvm.cs.uiuc.edu/ProjectsWithLLVM/

Maybe this for now:

"This is a small self applicable scheme compiler for LLVM.

The code is quite similar to the code in the book SICP (Structure and
Interpretation of Computer Programs), chapter five, with the difference
that it implements the extra functionality that SICP assumes that the
explicit control evaluator (virtual machine) already have. Much
functionality of the compiler is implemented in a subset of scheme,
llvm-defines, which are compiled to llvm functions."

modulo my mistakes in English :). Or, write whatever you like.

> > As you continue development on it, would you be interested in integrating
> > this into the LLVM tree?
>
> Sure, it could probably at least be used as an example of how to use the
> GC support. Maybe smaller and easier to understand compared to a complete
> JVM. :slight_smile:

That is what I was thinking. :slight_smile:

Good.

, Tobias

Great, I added it here:
http://llvm.cs.uiuc.edu/ProjectsWithLLVM/

Thanks! Also, please let us know how the front-end is going :slight_smile:

-Chris

Hi,

Regarding llvm.gcroot, do I have to allocate stack-space for all pointers
in a function? Right now I mostly use SSA-variables, and let llvm's
register allocation allocate stack-space when needed. Also, what happens
when I run the mem2reg pass, does it handle llvm.gcroot's that are moved
from stack to registers?

I'm thinking along the lines, that should one not use llvm.gcroot on all
SSA-variables that contains pointers to objects, and then depending on if
the variables end up on the stack, or in registers, the compiler will use
llvm.gcroot?

All my objects are currently typetagged uint's. My llvm scheme code in
the compiler explains this I think:

     (llvm-define (make-number x) (bit-shl x 2))
     (llvm-define (raw-number x) (bit-shr x 2))
     (llvm-define (clear-tag x) (bit-shl (bit-shr x 2) 2))
     (llvm-define (get-tag x) (bit-and x 3))
     (llvm-define (make-pointer x) (bit-or (clear-tag x) 1))
     (llvm-define (make-function-pointer x) (bit-or (clear-tag x) 3))
     (llvm-define (points-to x) (clear-tag x))
     (llvm-define (number? x) (seteq (get-tag x) 0))
     (llvm-define (vector? x) (seteq (get-tag x) 1))
     (llvm-define (procedure? x) (seteq (get-tag x) 3))

     (llvm-define (make-vector raw-size)
                  (make-pointer
                   (cast "uint*" (store raw-size (malloc (add raw-size 1))) "uint")))

, Tobias

Regarding llvm.gcroot, do I have to allocate stack-space for all
pointers in a function? Right now I mostly use SSA-variables, and let
llvm's register allocation allocate stack-space when needed.

Yes. This reflects the fact that the GC can move objects (to compact the
heap) at unpredictable times.

Also, what happens when I run the mem2reg pass, does it handle
llvm.gcroot's that are moved from stack to registers?

Not currently. In the future this will definitely be improved. In
particular, when the code generators are enhanced to provide more accurate
mapping information for GC pointers (e.g. which registers contain
pointers), we can do this. This is in the long term plans, but I suspect
that performance will be fine without it for many applications.

I'm thinking along the lines, that should one not use llvm.gcroot on all
SSA-variables that contains pointers to objects, and then depending on
if the variables end up on the stack, or in registers, the compiler will
use llvm.gcroot?

llvm.gcroot is an abstraction. Your front-end is telling the code
generator and optimizer what locations can be modified unpredictably at
runtime by the GC system. Right now we do not have many clients of the
gc interfaces, so they do not produce wonderful code, but this will change
in time. :slight_smile:

All my objects are currently typetagged uint's. My llvm scheme code in
the compiler explains this I think:

I think that this should be fine. The GC interfaces are explicitly
designed to be able to support list/scheme style type tagging like this,
thought the current garbage collector is not fully implemented.

-Chris

     (llvm-define (make-number x) (bit-shl x 2))
     (llvm-define (raw-number x) (bit-shr x 2))
     (llvm-define (clear-tag x) (bit-shl (bit-shr x 2) 2))
     (llvm-define (get-tag x) (bit-and x 3))
     (llvm-define (make-pointer x) (bit-or (clear-tag x) 1))
     (llvm-define (make-function-pointer x) (bit-or (clear-tag x) 3))
     (llvm-define (points-to x) (clear-tag x))
     (llvm-define (number? x) (seteq (get-tag x) 0))
     (llvm-define (vector? x) (seteq (get-tag x) 1))
     (llvm-define (procedure? x) (seteq (get-tag x) 3))

     (llvm-define (make-vector raw-size)
                  (make-pointer
                   (cast "uint*" (store raw-size (malloc (add raw-size 1))) "uint")))

, Tobias

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://mail.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

> Regarding llvm.gcroot, do I have to allocate stack-space for all
> pointers in a function? Right now I mostly use SSA-variables, and let
> llvm's register allocation allocate stack-space when needed.

Yes. This reflects the fact that the GC can move objects (to compact the
heap) at unpredictable times.

> Also, what happens when I run the mem2reg pass, does it handle
> llvm.gcroot's that are moved from stack to registers?

Not currently. In the future this will definitely be improved. In
particular, when the code generators are enhanced to provide more accurate
mapping information for GC pointers (e.g. which registers contain
pointers), we can do this. This is in the long term plans, but I suspect
that performance will be fine without it for many applications.

How are pointers to heap-data in registers handled right now? Or am I only
allowed to use pointers indirectly through the stack?

> I'm thinking along the lines, that should one not use llvm.gcroot on all
> SSA-variables that contains pointers to objects, and then depending on
> if the variables end up on the stack, or in registers, the compiler will
> use llvm.gcroot?

llvm.gcroot is an abstraction. Your front-end is telling the code
generator and optimizer what locations can be modified unpredictably at
runtime by the GC system. Right now we do not have many clients of the
gc interfaces, so they do not produce wonderful code, but this will change
in time. :slight_smile:

Ok, I guess it in the future would be nice if the llvm.gcroot abstraction
could be applied to SSA-variables as well as the stack, and then
transformed by the compiler.

Is the code available for any of these clients?

, Tobias

> Not currently. In the future this will definitely be improved. In
> particular, when the code generators are enhanced to provide more accurate
> mapping information for GC pointers (e.g. which registers contain
> pointers), we can do this. This is in the long term plans, but I suspect
> that performance will be fine without it for many applications.

How are pointers to heap-data in registers handled right now? Or am I only
allowed to use pointers indirectly through the stack?

Unfortunately we don't have any front-ends using the GC currently (though
several are in progress of being developed). The idea is to have a
call-back, provided by the front-end, which provides pointer mapping
information.

In the Java-style world, this hook would look at the type-information
hanging off of the object vtable to know where the pointers are in the
object. In a scheme world, this hook could look at the elements directly
and "know" whether they are pointers (based on bit-patterns).

> llvm.gcroot is an abstraction. Your front-end is telling the code
> generator and optimizer what locations can be modified unpredictably at
> runtime by the GC system. Right now we do not have many clients of the
> gc interfaces, so they do not produce wonderful code, but this will change
> in time. :slight_smile:

Ok, I guess it in the future would be nice if the llvm.gcroot abstraction
could be applied to SSA-variables as well as the stack, and then
transformed by the compiler.

Yes, it will be.

Is the code available for any of these clients?

llvm-java is in public CVS, but it is not GC enabled. Unfortunately, the
GC implementation is not 100% complete, but it does give you all of the
information you need from the code generator (the ability to identify and
modify roots on the stack).

-Chris

Hi, I'm thinking out loud, please give me some feedback.

Regarding llvm.gcread and llvm.gcwrite, wouldn't it be nicer if they are
implemented as:

llvm.gcread(sbyte** object, uint offset)
llvm.gcwrite(sbyte* data, sbyte** object, uint offset)

Where you also have the offset into the object. In this way the GC would
know where the header of the object we are reading/writing to is. Also it
would mean less traffic to the pointers on the stack, and in some
languages you only want to allow pointers to the header of the objects,
which makes object traversal easier.

Btw, could you give some hints on where the GC functionality is
handled in the source tree (llvm.gcroot, gcread, gcwrite).

, Tobias

Disregard this question, found it :).

Yes, this makes a tremendous amount of sense. Do you think you could
prepare some patches to make this happen? If you have any questions, feel
free to ask :slight_smile:

-Chris

Ok, a patch[1] is attached. I didn't care to coerce the offset, since I
assume that it is an uint, but maybe I should? Hopefully I've understood
the llvm source correctly.

, Tobias

[1] I'm kind of newbie of cvs, but I did:
"cvs -z3 -d :pserver:anon@llvm-cvs.cs.uiuc.edu:/var/cvs/llvm diff llvm > gcpatch"

gcpatch (8.92 KB)

[1] I'm kind of newbie of cvs, but I did:
"cvs -z3 -d :pserver:anon@llvm-cvs.cs.uiuc.edu:/var/cvs/llvm diff llvm > gcpatch"

That patch is well formed. You did exactly the right thing. :slight_smile:

Ok, a patch[1] is attached. I didn't care to coerce the offset, since I
assume that it is an uint, but maybe I should? Hopefully I've understood
the llvm source correctly.

This will work, but it would be better to take two pointers in instead of
a pointer and offset. This allows the front-end to emit target-generic
code instead of target-specific code (where it would have to know the
offset to the field). To be more specific, llvm_gc_read should look like
this by default:

void *llvm_gc_read(void *ObjPtr, void **FieldPtr) {
  return *FieldPtr;
}

Also, please don't forget to update docs/GarbageCollection.html and
LangRef.html

Thanks for the help, sorry I didn't mention this before. :slight_smile:

-Chris

Hm, but doesn't FieldPtr need to be calculated target-specific in those
cases?

My thoughts was that it's better to just have one pointer to the heap, and
reference fields by offset, rather than alloca'ing memory for each
FieldPtr needed (to make sure the recording of gcroots is sound). If we
use FieldPtr we get the problem again with the GC traversing pointers into
objects, rather than to their headers.

Or have I missed something? :slight_smile:

, Tobias