Position-independent stacks

Hi,
I am toying with an idea of having LLVM generate code that has position-independent stacks. This would be a very useful property for implementing all sorts of micro-thread libraries (I am thinking something similar to Python greenlets), because you’d be able to easily save threadlet state from one OS thread and later restore it into another.

On the surface, it seems entirely do-able - basically, one needs to get rid of all the things that point into the stack. It should be sufficient to:

  1. write a function pass that finds all local variables, whose address is ever taken, and hoists them into a heap-allocated secondary “stack frame”,
  2. either turn off frame base pointers, or make sure they are adjusted after the stack had been relocated,
  3. … can’t think of anything else, actually.

What do you guys think? Any reasons this approach wouldn’t fly?

Vadim

I've implemented something similar, but with the motivation of implementing
SFI sandboxing rather than making the stack relocatable.

The code is here: Issue 29743003: Add passes for applying SFI sandboxing at the LLVM IR level - Code Review In particular,
see the ExpandAlloca pass.

That code implements sandboxing at the level of LLVM IR. It restricts all
memory accesses to a range of address space by truncating the memory
address and adding a base pointer. Here are some notes explaining further:

Cheers,
Mark

Thanks Mark! That’d be a useful starting point.

For the rest of the people here: to be a bit more specific, I am having doubts about the following:

  • Can later optimization passes or code generation still create machine code that takes and stores (in a register perhaps) an address of something on the stack, even if not semantically visible to the programmer? Can this somehow be detected?

  • Can frame pointers be disabled on all architectures? If not, is the frame pointer chain always walkable?

  • Can frame pointers be disabled on a per-function basis? (this is in case not the whole program’s stacks need to be made relocatable).

Vadim

If you are concerned about later optimisation IR passes, you should
probably ensure that you run something like ExpandAlloca as a last pass,
with no general-purpose IR passes afterwards.

Otherwise, for code generation, I can think of one case where you will get
a non-relocatable stack: by-value argument passing. For example:

void receive_struct(struct foo x) {
  receive_ptr(&x);
}

If you don't mind whether you keep the normal calling conventions of the
host architecture, you can lower and remove all of the "byval" IR
attributes. The ExpandAlloca pass I referred to assumes this has already
been done by the ExpandByVal pass, which is one of PNaCl's IR
simplification passes [1].

Otherwise, it might be possible to do a more sophisticated version of
ExpandByVal that keeps "byval" but prevents a pointer into the stack from
being passed around. e.g. For by-value arguments, the above would become:

void receive_struct(struct foo x) {
  // The alloca for x_copy should be expanded out later by ExpandAlloca,
  // so that it will point into the non-relocatable stack.
  struct foo x_copy = x;
  // &x may be stored onto the relocatable stack and may become invalid,
  // but that's OK because we don't use it further.
  receive_ptr(&x_copy);
}

However, I don't think you can make that work for returning structs by
value.

Varargs has a similar problem to passing structs by value.

Do you want to be able to relocate the stack only at function call
boundaries, or at any place that execution might be asynchronously
interrupted?

Cheers,
Mark

[1]
https://chromium.googlesource.com/native_client/pnacl-llvm/+/7f634ce6f622188cd551dbf283131b03d019583d/lib/Transforms/NaCl/ExpandByVal.cpp

Thanks Mark! That'd be a useful starting point.

For the rest of the people here: to be a bit more specific, I am having
doubts about the following:
- Can later optimization passes or code generation still create machine
code that takes and stores (in a register perhaps) an address of something
on the stack, even if not semantically visible to the programmer? Can
this somehow be detected?

If you are concerned about later optimisation IR passes, you should
probably ensure that you run something like ExpandAlloca as a last pass,
with no general-purpose IR passes afterwards.

Otherwise, for code generation, I can think of one case where you will get
a non-relocatable stack: by-value argument passing. For example:

<snip>

However, I don't think you can make that work for returning structs by
value.

Varargs has a similar problem to passing structs by value.

I think I can hoist all those variables into the heap-allocated frame as
well. The same for returning structs. In this case it's actually very
fortunate that it's the front end that has to deal with the platform ABI,
otherwise I'd have to guess which return values backend code generator is
going to re-write as returns into the caller allocated buffer. (Or are
there platforms which do that in the backend???)

Do you want to be able to relocate the stack only at function call
boundaries, or at any place that execution might be asynchronously
interrupted?

Synchronous only.

thanks!
Vadim