Memory Alignment, Heap allocation.

clattner · July 21, 2004, 6:53pm

> void *llvm_gc_read(void *ObjPtr, void **FieldPtr) {
> return *FieldPtr;
> }

Hm, but doesn't FieldPtr need to be calculated target-specific in those
cases?

For the field pointer, one could use the getelementptr instruction:

%pairty = { sbyte, sbyte, int* }

  %pairPtr = ...
  %fieldptr = getelementptr %pairty* %pairPtr, int 0, uint 2
  FieldVal = llvm.gc_read(pairptr, fieldptr)

Because of the getelementptr instruction is used, we don't have the
explicit offset encoded into the bytecode file. The other advantage of
this is that when the gc_read/write intrinsics are lowered to load/stores,
the resultant LLVM code will be much simpler.

My thoughts was that it's better to just have one pointer to the heap, and
reference fields by offset, rather than alloca'ing memory for each
FieldPtr needed (to make sure the recording of gcroots is sound). If we
use FieldPtr we get the problem again with the GC traversing pointers into
objects, rather than to their headers.

You're right, but you shouldn't need to put the inner pointer into an
alloca. The idea is that the GC will (eventually) only stop a thread at a
GC safe point, so heap-directed pointers that don't live across a safe
point can live in an LLVM register. Currently function calls are the only
safe points, though we will add an explicit gc.safepoint intrinsic when we
get threads (to put safe points in loops and other places that don't have
calls). Since field accesses like this are all very short lived, they can
all just be registers.

-Chris

Tobias_Nurmiranta · July 21, 2004, 7:50pm

Ok, that makes sense :).
, Tobias

Tobias_Nurmiranta · July 22, 2004, 4:04am

Ok, here's the new patch. (Please tell me if I shouldn't mail patches
directly on the mailing list.)

While I was editing LowerGC.cpp I made a little test (not part of this
patch, but the diff with LowerGC.cpp in cvs is attached). I've added a new
intrinsic called llvm.gcroot_value(sbyte*, sbyte*), which takes a pointer
directly instead and transforms it into an alloca. The idea is the
following,

this:

%X = call sbyte* %llvm_gc_allocate(uint 16)
call void %llvm.gcroot_value(sbyte* %X, sbyte* null)

is transformed into:

  %stackGcRoot = alloca sbyte*
  %X = call sbyte* %llvm_gc_allocate(uint 16)
  store sbyte* %X, sbyte** %stackGcRoot
  call void %llvm.gcroot(sbyte** %stackGcRoot, sbyte* null)

and all other uses of %X, for example:

store sbyte 10, sbyte* %X

is transformed into:

%loadGcRoot = load sbyte** %stackGcRoot
store sbyte 10, sbyte* %loadGcRoot

In this way, the frontends don't have to explictly alloca the gcroots,
but instead just use the intrinsic, which hides how the gcroots are
implemented.

, Tobias

gcpatch2 (10.3 KB)

lgc.diff (2.82 KB)

clattner · July 22, 2004, 6:14am

Ok, here's the new patch.

Looks great, thanks! I've applied the patches here:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040719/016339.html
through here:
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040719/016344.html

(Please tell me if I shouldn't mail patches directly on the mailing
list.)

In general patches should be sent to the llvmbugs list, but this subject
has some design discussion, so it's ok for it to be on llvmdev

While I was editing LowerGC.cpp I made a little test (not part of this
patch, but the diff with LowerGC.cpp in cvs is attached). I've added a new
intrinsic called llvm.gcroot_value(sbyte*, sbyte*), which takes a pointer
directly instead and transforms it into an alloca. The idea is the

...

In this way, the frontends don't have to explictly alloca the gcroots,
but instead just use the intrinsic, which hides how the gcroots are
implemented.

Unfortunately this won't really work. It might be reasonable to include
something like this in the front-end that you're working on, but including
this at the LLVM level is a bad idea. The basic idea of the alloca model
is that it directly exposes to the optimizer the fact that GC pointers may
change at unpredictable times, which is something that memory can do, but
SSA registers do not do.

In particular, if the pointers are exposed as raw LLVM pointers, the
optimizer might start doing transformations to the pointers that break the
program. The reason for the current design is that it is guaranteed to
disable any optimizations that might break the program, while it allows
gc-aware optimizations to notice the GC roots and do something trickier on
a case by case basis. The important thing is that we default to being
safe, while allowing snazzy smart optimizations at the same time.

Currently the code generators don't do anything particularly smart with
the GC pointers, so they end up living in actual memory. As the code
generators mature, however, I expect this to change.

I hope this makes sense. If not, please feel free to ask for
clarification. Thanks for the patches!!

-Chris

Tobias_Nurmiranta · July 22, 2004, 11:57am

Ok. But if the gcroot_value transformation in done before any
optimizations, I get the same result as if I had used gcroot and alloca.
So maybe I can move it to a independent pass which I then use with my
frontend.

Hmm, it felt nicer to mark raw LLVM pointers as roots and then transform
them, since I then didn't have to see the overhead in my frontend :).

Btw, how hard would it be to implement the system described in this
paper[1], support for collection at every instruction? You would need to
get the gc root information with you all the way down to machine code and
register allocation. Just curious :).

, Tobias

[1] http://portal.acm.org/citation.cfm?id=301652&dl=ACM&coll=portal

clattner · July 22, 2004, 5:00pm

> Unfortunately this won't really work. It might be reasonable to include
> something like this in the front-end that you're working on, but including
> this at the LLVM level is a bad idea. The basic idea of the alloca model
> is that it directly exposes to the optimizer the fact that GC pointers may
> change at unpredictable times, which is something that memory can do, but
> SSA registers do not do.

Ok. But if the gcroot_value transformation in done before any
optimizations, I get the same result as if I had used gcroot and alloca.
So maybe I can move it to a independent pass which I then use with my
frontend.

Keeping it as part of your front-end is fine, but including it with the
LLVM distro is not. The problem is that the pass subtly changes the
semantics of the LLVM instruction, which is a bad thing for many reasons.
What the pass really does is create a dialect of LLVM, one which is easier
for you to produce in your front-end, and that is easily translatable to
standard LLVM form.

Hmm, it felt nicer to mark raw LLVM pointers as roots and then transform
them, since I then didn't have to see the overhead in my frontend :).

I'm not sure what overhead you mean. I understand that you basically have
a single-assignment source language, so you don't need to deal with
alloca's in the usual case, but every other front-end we have produces
alloca instructions for EVERY user defined variable, regardless of whether
it's a GC root. Obviously most of these get optimized away

Btw, how hard would it be to implement the system described in this
paper[1], support for collection at every instruction? You would need to
get the gc root information with you all the way down to machine code and
register allocation. Just curious :).
[1] http://portal.acm.org/citation.cfm?id=301652&dl=ACM&coll=portal

I don't understand what the benefit of this would be. The problem is that
once dynamically assigned, an SSA value cannot change its contents until
it is redefined by the same definition. Tracking GC information at every
instruction would not be able to address this fundamental problem.

LLVM *does* have a way to represent something that can change like we
want... I'm not sure why you're so afraid of it!

-Chris

Tobias_Nurmiranta · July 25, 2004, 3:03pm

> Hmm, it felt nicer to mark raw LLVM pointers as roots and then transform
> them, since I then didn't have to see the overhead in my frontend :).

I'm not sure what overhead you mean.

I think I should have emphasized "felt", since it only would make me
produce less code in the frontend :). But I think my transformation
thing won't work, so I'll do as the other front-ends, but thanks for the
feedback to my thoughts!

> Btw, how hard would it be to implement the system described in this
> paper[1], support for collection at every instruction? You would need to
> get the gc root information with you all the way down to machine code and
> register allocation. Just curious :).
> [1] http://portal.acm.org/citation.cfm?id=301652&dl=ACM&coll=portal

I don't understand what the benefit of this would be. The problem is that
once dynamically assigned, an SSA value cannot change its contents until
it is redefined by the same definition. Tracking GC information at every
instruction would not be able to address this fundamental problem.

This is regarding every machine instruction, not llvm-instruction, with
the benefit that the threads don't have to fixed to gc-safe points at
gc invocation. But maybe it's not worth the work.

, Tobias

clattner · July 26, 2004, 12:46am

> > Hmm, it felt nicer to mark raw LLVM pointers as roots and then transform
> > them, since I then didn't have to see the overhead in my frontend :).
>
> I'm not sure what overhead you mean.

I think I should have emphasized "felt", since it only would make me
produce less code in the frontend :). But I think my transformation
thing won't work, so I'll do as the other front-ends, but thanks for the
feedback to my thoughts!

> I don't understand what the benefit of this would be. The problem is that
> once dynamically assigned, an SSA value cannot change its contents until
> it is redefined by the same definition. Tracking GC information at every
> instruction would not be able to address this fundamental problem.

This is regarding every machine instruction, not llvm-instruction, with
the benefit that the threads don't have to fixed to gc-safe points at
gc invocation. But maybe it's not worth the work.

That's a good point, but there is a big cost. In particular, it's
somewhat infeasible for a static compiler to generate maps for every
instruction: it's very complex (from an engineering standpoint) and
results in large maps. At some point, it might be interesting for someone
to experiment with this approach (and the results could be interesting!),
but it is not going to be me.

-Chris

Topic		Replies	Views
stack alignment (again) LLVM Dev List Archives	5	100	March 30, 2008
alignment LLVM Dev List Archives	2	71	July 28, 2010
Default alignment for 'malloc' LLVM Dev List Archives	6	79	October 3, 2016
Stack alignment in leaf functions LLVM Dev List Archives	1	64	May 17, 2006
Stack alignment in JIT compiled code LLVM Dev List Archives	7	67	May 1, 2009

Memory Alignment, Heap allocation.

Related Topics