Dataflow analysis based optimisations

I'm working on an LLVM-based compiler for a very closure-centric
language. It's becoming apparent that it's going to suffer heavily from
garbage collector churn, as all the useful programming idioms the
language makes possible are going to involve memory allocations, either
to create objects or to allocate storage for upvalues.

However, it's possible to optimise away a lot of heap allocations, and
replace them with allocas instead, *if* we know that the lifetime of the
object is going to be shorter than the lifetime of the stack frame ---
in other words, the object will be used strictly inside the function and
nowhere else. This applies to both upvalues and objects.

Does LLVM have any mechanism for doing the kind of dataflow analysis
required to track object lifetime like this?

There are passes which mark function parameters as "nocapture", which
means that the function does not store the passed-in pointer for use
after that function returns. If pointers to a newly created object
are only ever passed through "nocapture" parameters, never stored in a
global, and not returned from the function, then that object is dead
when the function that created the object returns.

There may be other possibilities for proving an object dead, but this
should definitely be safe.

That sounds ideal; thanks!

I assume that I would need to implement my own memory-allocation
instruction and then produce a custom pass, running after the
FunctionAttrsPass, which would then lower the instructions to either a
call to malloc or an alloca instruction. I see vague references to an
LLVM malloc instruction; did this disappear after 2.6?

Yes, the LLVM malloc instruction is no more. There might still be an
IRBuilder::CreateMalloc, but that now emits a regular call to library
"malloc".

Now the case where function A calls function B, and function B needs
to return a pointer to a freshly-allocated object, is fairly common.
One way to attack this is to have each function create its own
"region" or "memory pool" and destroy it when it returns. Then,
functions that need to return pointers to newly allocated objects get
a "memory pool" parameter added, and use that memory pool instance to
allocate the memory. A global "memory pool" instance maps to the GC
heap. Anyway, if function A observes the restrictions I mentioned
earlier with respect to that object, it passes its own memory pool
instance to function B. If function A returns the pointer itself but
observes all the other restrictions, then it will declare a memory
pool parameter and pass whatever its caller passed in down to B. This
scheme will allow any call depth to be handled without a GC heap
allocation, as long as no pointer to the object is ever stored in a
global or heap object or passed through a capturing function
parameter.

That leaves the case where a pointer to one object is stored in
another object. If the containing object is allocated in a pool that
is not higher in the call stack than the referenced object, it is safe
to keep it out of the GC heap. There might be cases where that
determination can be made at compile time, and it may or may not be
worth it to try to determine it at run time.

Since the allocator can either allocate from a memory pool or the GC
heap, the allocation needs to be done through a virtual function. A
"partial specialization" pass exists that might help devirtualize
this, but it needs more development and testing. You could always
codegen both specialized versions of each function yourself and let
dead code elimination clean it up for you, but you might not want the
compile/optimization overhead of that.

Is this project open-source?

[...]

Now the case where function A calls function B, and function B needs
to return a pointer to a freshly-allocated object, is fairly common.
One way to attack this is to have each function create its own
"region" or "memory pool" and destroy it when it returns.

[...]

This
scheme will allow any call depth to be handled without a GC heap
allocation, as long as no pointer to the object is ever stored in a
global or heap object or passed through a capturing function
parameter.

If I've understood you correctly, this still requires a run-time memory
allocation from the appropriate memory pool --- so I still get the
overhead of running a heap, although I do still get some optimisation
due to being able to explicitly free the pool rather than relying on the
garbage collector to do it.

I think all I need is the simple case where I lower calls to malloc to
alloca instructions if the result is only ever passed to nocapture
functions and is not returned. The function-that-returns-a-pointer case
(which my language does a lot) may be trivially manageable by relying on
said functions being inlined; I've yet to get LLVM to inline anything
yet, so I don't know how well this will work in practice.

[...]

Is this project open-source?

It will be once I get it into a fit state to release...

[...]

Now the case where function A calls function B, and function B needs
to return a pointer to a freshly-allocated object, is fairly common.
One way to attack this is to have each function create its own
"region" or "memory pool" and destroy it when it returns.

[...]

This
scheme will allow any call depth to be handled without a GC heap
allocation, as long as no pointer to the object is ever stored in a
global or heap object or passed through a capturing function
parameter.

If I've understood you correctly, this still requires a run-time memory
allocation from the appropriate memory pool --- so I still get the
overhead of running a heap, although I do still get some optimisation
due to being able to explicitly free the pool rather than relying on the
garbage collector to do it.

Each function's allocator can start by using (preallocated) stack
space,then switch to allocating from the heap when stack space is used
up. Each heap allocation can be the same size, making fragmentation
issues go away... your function allocator simply moves a pointer
through
its allocated space and doesn't support any form of delete.

I've used this scheme manually in several C++ projects (usually
without a fallback garbage collector, although I occasionally reach
for boost::shared_ptr), and found it greatly reduced development
effort while improving code quality and performance. In my own front
end, I'm planning to automate the process much like I've outlined
above, and in the early versions, have it flag an error when it can't
prove the usage to be safe and would need to fall back to a gc
allocation.

I think all I need is the simple case where I lower calls to malloc to
alloca instructions if the result is only ever passed to nocapture
functions and is not returned. The function-that-returns-a-pointer case
(which my language does a lot) may be trivially manageable by relying on
said functions being inlined; I've yet to get LLVM to inline anything
yet, so I don't know how well this will work in practice.

As long as the function-that-returns-a-pointer is fairly small, it
should almost always be inlined. You can slap an "alwaysinline"
attribute on such functions if you like, and that should guarantee it
(outside of a few cases like functions that use setjmp, indirect
branches, or recursion)

Looking through the code, I see that functions with dynamic alloca's
don't get inlined into callers that don't have dynamic alloca's,
unless you use the alwaysinline attribute.

One problem with using inlining this way... if you ever try to expose
a dynamic library with functions like this, clients of the dynamic
library won't even try to inline across module boundaries. This means
that your library functions will alloca the space they need, and then
return a pointer to its own stack.