RFC: alloca -- specify address space for allocation

Currently, the alloca instruction always allocates in the generic address space (0).

This proposal is to extend the semantics of alloca instruction to allow allocation in any specified address space.

Proposed Syntax

= alloca [inalloca] [, ] [, align ] [, addrspace ] ; yields type addrspace(num)*:result

Motivation

Managed language clients typically use address space 1 to represent GC-pointers.

Some managed clients (CLR, in particular) allow construction of GC pointers to stack locations.

For example, MSIL instruction ldloca (ECMA 335, p 370) creates a GC pointer to a stack local.

Creating an addrespace(1)* pointer to a stack location currently involves two steps: the alloca, and an addrspacecast.

Managed code transformations (ex: RewriteStatepointsForGC) do not handle arbitrary address space casting.

So, it is desirable to obtain the addrespace(1)* pointer by construction.

Thanks,

Swaroop.

+1. It would be useful to fully avoid all of the special semantics that addrspace(0) implies.

Currently, the alloca instruction always allocates in the generic address space (0).

This proposal is to extend the semantics of alloca instruction to allow allocation in any specified address space.

Proposed Syntax

= alloca [inalloca] [, ] [, align ] [, addrspace ] ; yields type addrspace(num)*:result

Motivation

Managed language clients typically use address space 1 to represent GC-pointers.

Some managed clients (CLR, in particular) allow construction of GC pointers to stack locations.

For example, MSIL instruction ldloca (ECMA 335, p 370) creates a GC pointer to a stack local.

Creating an addrespace(1)* pointer to a stack location currently involves two steps: the alloca, and an addrspacecast.

Managed code transformations (ex: RewriteStatepointsForGC) do not handle arbitrary address space casting.

So, it is desirable to obtain the addrespace(1)* pointer by construction.

I don’t have a particular or specific objection here necessarily, but I have to admit I don’t understand the use case (yet). I’m hoping you can explain it to me more thoroughly.

What semantics do you expect for GC pointer to a stack location? That’s what I don’t really understand. Some questions that leap to mind, but may be the wrong questions (so feel free to point me in the right direction here)…

  • Can I promote such a location to an SSA register?
  • What does it mean to collect a stack location?
  • Can I merge stack allocations that are GC pointers? Can I re-use them? What semantics do I get?
  • Do GC pointers to stack locations have any different aliasing properties?
  • When the optimizer needs to create a new stack allocation, what address space should it be in?

Ultimately, to make this change, you’ll also need to change a decent amount of code in the optimizer that will blindly create stack allocations in address space zero.

Without a better understanding of both the motivation and the resulting consequences such as what I’ve outlined in my questions above, it is very hard for me to feel comfortable with this kind of change.

From: "Swaroop Sridhar via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>, "Philip Reames" <listmail@philipreames.com>, "Sanjoy Das"
<sanjoy@playingwithpointers.com>
Sent: Wednesday, August 26, 2015 8:46:08 PM
Subject: [llvm-dev] RFC: alloca -- specify address space for allocation

Currently, the alloca instruction always allocates in the generic
address space (0).

This proposal is to extend the semantics of alloca instruction to
allow allocation in any specified address space.

Proposed Syntax

<result> = alloca [inalloca] <type> [, <ty> <NumElements>] [, align
<alignment>] [, addrspace < num >] ; yields type
addrspace(num)*:result

Motivation

Managed language clients typically use address space 1 to represent
GC-pointers.

Some managed clients (CLR, in particular) allow construction of GC
pointers to stack locations.

For example, MSIL instruction ldloca (ECMA 335, p 370) creates a GC
pointer to a stack local.

Creating an addrespace(1)* pointer to a stack location currently
involves two steps: the alloca , and an addrspacecast .

Managed code transformations (ex: RewriteStatepointsForGC ) do not
handle arbitrary address space casting.

So, it is desirable to obtain the addrespace(1)* pointer by
construction.

Is the mental model here that a function using this feature actually has multiple stacks, in different address spaces, and this allows creating local stack allocations in those non-addrspace-0 stacks?

Would there be any defined interaction with @llvm.stacksave/@llvm.stackrestore?

Normally we assume that distinct allocas can't alias, but what about allocas with different address spaces? Can they alias?

-Hal

For my use case, some of the assumptions about addrspace(0) don’t make any sense, so having the option of not using it for stack allocations would avoid some special case problems. For example, it’s assumed that 0 is the address space of code, and stack, but for us these are unrelated concepts and have different pointer sizes. The assumption that 0 is not a valid pointer value is also incorrect, since we want stack pointers to be a 32-bit offset and 0 is valid. For this use case, all allocas should be created with the same address space, so it could be part of the datalayout. In the backend, we have 3 different memory types we would like to place stack objects, so being able to mark these with different address space IDs would also be helpful (although we wouldn’t care about the middle end deciding that some allocas should be in a different address space from a single one specified by the data layout)

Just sent an email about this assumption in another thread ("Globals in non-zero address spaces might be null” on llvm-commits):

“Isn’t something that should be target configurable? Any reason for not having TTI with a default behavior which would be the current one, but that targets could customize?”

But it also seems like we’re patching little hacks here and there to make the address spaces usable, because they were not part of LLVM at the beginning and we don’t have a real plan to make them first class (maybe it was considered and the conclusion was that the effort wasn’t worth it?).

Please see inline.

It is not clear to me if these GC pointers are actually living in a different address space when allocated on the stack (aka. you have two different stacks) or if the address space is just a way to mark that that pointer is a GC-pointer , but all the stack allocation actually lives in the same address space (presumably 0).

Marcello

+1

—Owen

FWIW, these use cases seem really straight forward for address spaces. For example, I don’t think any of my questions about the semantics apply. It seems straight forward to define a set of allowed address spaces for allocas in the data layout, and a default address space[1] for them that the optimizer could use if it creates or fuses allocas in ways that leave some ambiguity.

But this is very different from the idea of using an alloca with an address spaces for GC pointers to stack objects…

Anyways, if this kind of support of specific address space stack allocations to better utilize a diverse memory hierarchy is of interest, pursuing that makes a lot of sense to me provided we specify a good semantic model for the optimizer to follow when interacting with them…

-Chandler

[1]: I actually also really agree with Mehdi here that things like a “default” address space (and the fact that for your target “0” isn’t really a great default today) are maybe more of an issue with LLVM having unfortunate legacy. I’m inclined to think we should fix LLVM to let “0” be a reasonable default for allocas to simplify the work needed in the optimizer. But i’ve not spent a ton of time thinking through this set of issues.

We have patches locally that allow you to specify the address space for allocas in the DataLayout, because in one of our ABIs the stack moves to a non-zero address space. These would be relatively easy to extract, and if it’s an approach that would be useful then I could probably post them for review this week.

Having allocas in two address spaces seems plausible if they’re conceptually on different stacks, but using them for GC’d allocations seems odd, as you’d need to ensure that they were copied if live at the end of the stack frame, effectively turning the model into a collection of garbage collected activation records, rather than a stack. For this use, I suspect that you’d need something more like a gc-alloca instruction, which could be promoted to an SSA register if its address isn’t taken, to a stack allocation if escape analysis can prove that pointers to it don’t persist beyond the end of the function, or to a GC’d heap allocation in other cases.

David

Currently, the alloca instruction always allocates in the generic address space (0).

This proposal is to extend the semantics of alloca instruction to allow allocation in any specified address space.

Proposed Syntax

= alloca [inalloca] [, ] [, align ] [, addrspace ] ; yields type addrspace(num)*:result

Motivation

Managed language clients typically use address space 1 to represent GC-pointers.

For the proposed solution, what I think Swaroop is going for is simply having a mechanism to distinguish some allocas from others without otherwise effecting code generation. They are still stack based objects; they don’t escape unless unused at a safepoint; they’re not relocated. They may contain pointers to objects which do need to be relocated, so we need to have a mechanism to track them. In particular, it would be perfectly legal to promote to SSA since the interior pointers would become addrspace(1) SSA values and be handled correctly by the relocation logic. I can’t answer the merging question above. There are entirely sane semantics which could be assigned to stack allocation merging, but I’m not sure what expectations the CLR GCs have. Swaroop - If I’ve misstated anything above, please correct me. This is basically my reaction as well. This seems like a big hammer to a problem I’m not sure is actually a nail.

As a bit of background, the reason we originally went with addrspaces to mark GC pointers was that the managed and non-managed heaps for us are semantically non-overlapping. It’s not entirely clear to me that the notion of two non-overlapping stacks is the right mental model for approaching the stack allocated object portion.

Philip

For the record, at least two other implementation strategies to address the original problem have been discussed.

Option 1 (Rejected) - Explode the stack based object into individual fields individual fields before the safepoint and store them back into the alloca afterwards. Only the exploded SSA values would get listed in the stackmap. This could work today, but does not meet a functional requirement. A pointer to a stack based object can be escaped and a callee (or another thread) can write to the field location. As a result, the reform of the stack based object after the safepoint can introduce writes which didn’t exist in the original program.

Option 2 - Consider pointers within stack based objects (allocas in addrspace 0) as being live and add their addresses (not values) to the stackmap. This could be done by extending the liveness analysis already in RewriteStatepointsForGC to consider allocas containing pointers. There wouldn’t be any addrspace casts involved since the members of the stack based object are addrspace(1) pointers to start with. Loading them produces an SSA value which can be relocated as usual. I don’t remember hearing a clearly expressed reason why this doesn’t work. I believe there were some concerns about the density/size of the generated stack maps, but I haven’t seen any evaluation of this.Â

Inline.

Some vocabulary, to avoid confusion later:

- the "alloca" == the pointer to the stack slot that the alloca
   allocated (e.g. %rbp - 0x30)

- the "value" stored or contained in an alloca == at a given point
   during the execution, what is the N-byte value that is present in
   the location that is the result of the alloca instruction.

This is my understanding of the situation: the allocas in question are
not themselves GC pointers, but the value stored in those allocas are.
We could represent the alloca's as address space 0 pointers if we had
a way to do a precise liveness analysis on them late.

However, I think there is a basic problem with doing a precise late
liveness analysis to determine which allocas live over a safepoint
*contain* a gc reference. Since the pointee type of a pointer is not
a strong distinction (more so once we finally have a singular "ptr"
pointer type) we can have IR like:

entry:
  br i1 %cond, label %left, label %right

left:
  %x = alloca addrspace(1)*
  store addrspace(1)* %gcptr, addrspace(1)** %x
  %xc = bitcast addrspace(1)** %x to i8*
  br label %merge

right:
  %y = alloca i64
  store i64 42, i64* %y
  %yc = bitcast i64* %y to i8*
  br label %merge

merge:
  %m = phi i8* [ %xc ] [ %yc ]
  safepoint()
  br i1 %cond, label %left1, label %right1

right1:
  ret void

left1:
  %xd = bitcast i8* %m to addrspace(1)**
  %gcptr = load addrspace(1)*, addrspace(1)** %xd
  use(%gcptr)

Now the only thing live over the safepoint is the phi of allocas ==
`%m` but depending on control flow the runtime value of `%m` (which
will be an alloca) can either contain a legitimate gc pointer, or
i64 42.

Therefore, if we wish to precisely distinguish, at compile time,
between "normal" allocas that cannot contains gc pointers and
"interesting" allocas that *can contain* gc pointers, then we need to
make the pointer types of the "normal" and "interesting" allocas
fundamentally different (i.e. be of different address spaces).
However, I think re-using addresspace(1) for this is a mistake, we
should use a second address space for this (addressspace(2) let's say)
that distinguish pointers that point to locations that may *contain*
gc pointers, but are not themselves gc pointers.

-- Sanjoy

Philip: I think there are two separate issues:
1) Handling GC pointers stored in stack locations -- tracking their liveness, reporting them to the GC. etc.
2) Treating pointers to stack locations as GC-pointers.

I think the two options that you presented here are for solving (1).
I'm trying to solve issue (2) using alloca in addrspace(1).

Basically, we want to create a pointer to a stack location (the location itself may not have any gc-pointers) and give it a type such that it can be used interchangeably with GC-pointers.
Today, we can achieve this by using an alloca followed by an addrspacecast. It could be done in one instruction if the alloca can specify the addrspace of the pointer created.

The advantage of doing this in one step is, for example, that we can check that there are no (other) addrspace(0) -> addrspace(1) casts.
(introduced inadvertently by user code or compiler transformations).

Swaroop.

My response has the same problem, so you can basically ignore it.

-- Sanjoy

Philip: I think there are two separate issues:
1) Handling GC pointers stored in stack locations -- tracking their liveness, reporting them to the GC. etc.
2) Treating pointers to stack locations as GC-pointers.

I think the two options that you presented here are for solving (1).

Your right. I was focused on (1), mostly because I don't understand the challenge with (2).

I'm trying to solve issue (2) using alloca in addrspace(1).

Basically, we want to create a pointer to a stack location (the location itself may not have any gc-pointers) and give it a type such that it can be used interchangeably with GC-pointers.

Can you explain what you mean here? Is this an issue with being able to pass the pointer to a function which expects gc references without having to bitcast? Or is there a lowering challenge I'm not seeing?

Today, we can achieve this by using an alloca followed by an addrspacecast. It could be done in one instruction if the alloca can specify the addrspace of the pointer created.

The advantage of doing this in one step is, for example, that we can check that there are no (other) addrspace(0) -> addrspace(1) casts.
(introduced inadvertently by user code or compiler transformations).

Is there a reason that your custom verifier can't use metadata or pattern matching to accept these special addrspace casts of allocas? We have a similar custom verifier which uses a metadata based exception mechanism which has been pretty successful for us. We've had to occasionally fix a dropped metadata issue, but I think we've hit one or two of these ever.