GSoC '11: Segmented Stacks

Hi All!

This is the third iteration of my GSoC proposal, which I'm mailing here
for feedback. I've already posted the proposal on Melange.

The proposal is in two parts. The first, which answers the questions on
application template mentioned on Melange is here [1]. I've pasted the
most relevant part here:

'''
Implement segmented stacks inside LLVM. Once this is implemented,
instead of having to allocate a worst-case (large) amount of contiguous
stack space to each thread, we'll be able to get each thread to allocate
stack space in small atomic blocks, as and when more space is required.
'''

The second part, on which I require the most feedback, addressing the
implementation issues, follows:

SEGMENTED STACKS FOR LLVM

This is the timeline I intend to follow during the GSoC period. I've
also posted this on Melange. Please give me some feedback on whether
this sounds realistic or not.

Instead of talking in terms of doing so much per week, I've decided to
divide the work into sub-tasks, to which I've assigned some
(estimated) quantity of time. These sub-tasks includes getting the
relevant code merged - I don't want to end my GSoC with a large unmerged
patch-set.

Prologue changes for all functions

This sub-task involves getting _every_ function to check for stack space
(as mentioned in the implementation) and then (conditionally) executing
a call to setup_block. Writing code for setup_block, and destroy_block
is also included in this sub-task. This should take three weeks. The
deliverable is a set of patches which allows an LLVM user to switch
segmented stacks on (if it is supported on the platform) and see an
improvement in the amount of stack space needed. Performance will be
slower, since there will be no link-time optimizations at this stage.

Emit correct unwind info:

This sub-task involves (as the title suggests) changing the DWARF
related code to ensure that the stack is unwound correctly. This should
take two to three weeks to complete. Deliverable is a set of patches
which, when applied on top of the previous patch-sets, shows the
correct debug info in GDB (i.e. stack traces, local variables etc.) for
the generated executables.

Link Time Optimizations:

Will add a link-time optimization pass which will try to remove the
stack-check from as many function prologues as possible. This will
change the code introduced by the first sub-task to run conditionally,
based on values computed by this link step. This should take three weeks
to complete. Deliverable is a set of patches which don't change any
semantics, but give a much better performance.

I've left some slack, around three to four weeks, to:

* Fix bugs and other issues that crop up from time to time.
* Investigate what all needs to be done to implement this on platforms
other than X86 and X86-64, perhaps even get started on the same.