[RFC] Stackmap and Patchpoint Intrinsic Proposal

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

A Stackmap is a record of variable locations (registers and stack
offsets) at a particular instruction address.

A Patchpoint is an instruction address at which space is reserved for
patching a new instruction sequence at runtime.

These two features are close friends because it wouldn't be possible
for a runtime to patch LLVM generated code without a stack map telling
it where relevant variables live. However, the proposed stackmaps are
useful without patchpoints. In fact, the typical use-case for
stackmaps is implementing a simple trap to the runtime.

Stackmaps are required for runtime traps because without them the
optimized code would be dominated by instructions for marshaling
values into fixed locations. Even if most of the extra code can be
sunk into cold paths, experiments have shown that the impact on
compile time and code size is enormous.

Explicitly branching to runtime traps handles many situations. But
there are also cases in which the runtime needs to patch the runtime
call or the surrounding code. There are two kinds of patching we need
to support. The first case involves trampling LLVM-generated code to
safely invalidate it. This case needs to have zero impact on
optimization and codegen aside from keeping some values live. A second
case involves dynamically optimizing a short code sequence, for
example, to implement a dynamic inline cache. In this case, the
commonly executed code must be a call-free fast path. Some runtime
events may require rewriting the check guarding the fast path (e.g. to
change a type ID) or even rewriting the code the accesses a field to
change the offset. Completely invalidating the code at these events is
undesirable.

Two proposed intrinsics, llvm.stackmap and llvm.patchpoint, solve all
of the problems outlined above. The LangRef doc section is included at
the end of this proposal. The LLVM implementation of the intrinsics is
quite straightforward as you can see from the patches that I'll be
sending to llvm-commits.

Both intrinsics can generate a stack map. The difference is that a
llvm.stackmap is just a stack map. It doesn't generate any
code. llvm.patchpoint always generates a call. The runtime may
overwrite that call into a dynamically optimized inline cache.

llvm.stackmap is simple. It takes an integer ID for easy reference by
the runtime and a list of live values. It can optionally be given a
number of "shadow" bytes. The shadow bytes may be set to nonzero to
ensure that the runtime can safely patch the code following the
stackmap. This is useful for invalidating compiled code by trapping at
arbitrary points.

The LLVM backend emits stackmaps in a special data section. This
design works for JITs that are confined to the LLVM C API. Each
intrinsic results in a stackmap record with the ID and offset from
function entry. Each record contains an entry for each live value with
its location encoded as a register or stack offset.

llvm.patchpoint is the fusion of a normal call and an
llvm.stackmap. It additionally takes a call target and specifies a
number of call arguments. The call target is an opaque value to LLVM
so the runtime is not required to provide a symbol. The calling
convention can be specified via the normal "cconv" marker on the call
instruction. Instead of casting a "shadow" where code can be patched
it reserves a block of encoding space where the call-to-target will be
initially emitted followed by nop padding.

Everything about the design and implementation of these intrinsic is
as generic as we can conceive at the time. I expect the next client
who wants to optimize their managed runtime to be able to do most if
not all of what they want with the existing design. In the meantime,
the open source WebKit project has already added optional support for
llvm.stackmaps and llvm.patchpoint will be in shortly.

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would like
to use the intrinsics but really want to change the interface. At that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

This sort of functionality could probably be used to greatly improve the
usability of DTrace's USDT tracing.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics
-----------------

This class of intrinsics is used by the WebKit JavaScript compiler to
obtain
additional information about the live state of certain variables and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The
purpose
of a stack map is to record the location of function arguments and live
variables at the point of the intrinsic function in the instruction steam.
Furthermore it records a unique callsite id and the offset from the
beginning
of the enclosing function.

'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

      declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32
<numShadowBytes>, ...)

Overview:
"""""""""

The '``llvm.webkit.stackmap``' intrinsic records the location of live
variables in the stack map without generating any code.

Last I checked LLVM IR doesn't have "variables" in this sense (except
globals; it seems like a handful of other places in the LangRef have this
wording slip too). Shouldn't the wording be more like "the run time
location of the provided values"?

Arguments:
""""""""""

The first argument is a unique id and the second argument is the number of
shadow bytes following the intrinsic. The variable number of arguments
after
that are the live variables.

The purpose of the "id" isn't described.

Semantics:
""""""""""

The stackmap intrinsic generates no code in place, but its offset from
function
entry is stored in the stack map. Furthermore, it guarantees a shadow of
instructions following its instruction offset during which neither the end
of
the function nor another stackmap or patchpoint intrinsic may occur.

It's meaningless to discuss the semantics when important terms are
undefined:
* "stack map" (and the format of a stack map, and where it is emitted/how
it can be accessed, etc.)
* "shadow": while it's fairly clear roughly what is meant by this, this is
Lang*Ref*, not "LangOverview" or "LangTour"

It may be that the inherently experimental nature of these intrinsics do
not lend itself to being documented adequately enough for inclusion in
LangRef at this point, in which case I would suggest demoting this
description to a new page for experimental intrinsics until they have
settled enough.

This allows the runtime to patch the code at this point in response to an
event triggered from outside the code.

Here and elsewhere, I suggest avoiding saying "the runtime". It is more
accurate to describe properties of the code, rather than the runtime (which
LLVM doesn't provide and which is not a concept in the LangRef). For
example this sentence could be "This permits the code to be safely patched".

-- Sean Silva

The initial documentation and patches name these intrinsics in a
"webkit" namespace. This clarifies their current purpose and conveys
that they haven't been standardized for other JITs yet. If someone on
the on the dev list says "yes we want to use these too, just the way
they are", then we can just drop the "webkit" name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

I recommend, this being the case, to replace 'webkit' with 'experimental'. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

This sort of functionality could probably be used to greatly improve
the usability of DTrace's USDT tracing.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics
-----------------

This class of intrinsics is used by the WebKit JavaScript compiler to
obtain
additional information about the live state of certain variables
and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The
purpose
of a stack map is to record the location of function arguments and
live
variables at the point of the intrinsic function in the instruction
steam.
Furthermore it records a unique callsite id and the offset from the
beginning
of the enclosing function.

'``llvm.webkit.stackmap``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
"""""""

::

declare void (i32, i32, ...)* @llvm.webkit.stackmap(i32 <id>, i32
<numShadowBytes>, ...)

Overview:
"""""""""

The '``llvm.webkit.stackmap``' intrinsic records the location of live
variables in the stack map without generating any code.

Last I checked LLVM IR doesn't have "variables" in this sense (except
globals; it seems like a handful of other places in the LangRef have
this wording slip too). Shouldn't the wording be more like "the run
time location of the provided values"?

Arguments:
""""""""""

The first argument is a unique id and the second argument is the
number of
shadow bytes following the intrinsic. The variable number of
arguments after
that are the live variables.

The purpose of the "id" isn't described.

Semantics:
""""""""""

The stackmap intrinsic generates no code in place, but its offset
from function
entry is stored in the stack map. Furthermore, it guarantees a shadow
of
instructions following its instruction offset during which neither
the end of
the function nor another stackmap or patchpoint intrinsic may occur.

It's meaningless to discuss the semantics when important terms are
undefined:
* "stack map" (and the format of a stack map, and where it is
emitted/how it can be accessed, etc.)

I'd like to second this; we need to document the format of the stack map. We might want to do this in a separate document (like the document which elaborates on the description of the exception handling intrinsics). This document should be seeded with, at least, the description for x86 (and we can add descriptions for the other backends as they're validated).

This looks like a very-useful piece of functionality.

-Hal

100% agree with all your comments. I’m pulling the intrinsic docs into a separate page. I’ll create a phabricator diff and we can continue reviewing the docs on llvm-commits independent of this proposal.

-Andy

The initial documentation and patches name these intrinsics in a
“webkit” namespace. This clarifies their current purpose and conveys
that they haven’t been standardized for other JITs yet. If someone on
the on the dev list says “yes we want to use these too, just the way
they are”, then we can just drop the “webkit” name. More likely, we
will continue improving their functionality for WebKit until some
point in the future when another JIT customer tells us they would
like
to use the intrinsics but really want to change the interface. At
that
point, we can review this again with the goal of standardization and
backward compatibility, then promote the name. WebKit is maintained
against LLVM trunk so can be quickly adjusted to a new interface. The
same may not be true of other JITs.

I recommend, this being the case, to replace ‘webkit’ with ‘experimental’. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

Again, the only reason to start with the webkit name is that it’s easy to change webkit later to use different intrinsics. I was waiting to see how much interest there is in using these instrinsics as-is for other clients. So far, there seems to be strong interest. If there isn’t much debate regarding the intrinsic format then I’ll drop the webkit name.

This sort of functionality could probably be used to greatly improve
the usability of DTrace’s USDT tracing.

These are the proposed changes to LangRef, written by Juergen and me.

WebKit Intrinsics

This class of intrinsics is used by the WebKit JavaScript compiler to
obtain
additional information about the live state of certain variables
and/or to
enable the runtime system / JIT to patch the code afterwards.

The use of the following intrinsics always generates a stack map. The
purpose
of a stack map is to record the location of function arguments and
live
variables at the point of the intrinsic function in the instruction
steam.
Furthermore it records a unique callsite id and the offset from the
beginning
of the enclosing function.

llvm.webkit.stackmap’ Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:
“”"""""

::

declare void (i32, i32, …)* @llvm.webkit.stackmap(i32 , i32
, …)

Overview:
“”"""""""

The ‘llvm.webkit.stackmap’ intrinsic records the location of live
variables in the stack map without generating any code.

Last I checked LLVM IR doesn’t have “variables” in this sense (except
globals; it seems like a handful of other places in the LangRef have
this wording slip too). Shouldn’t the wording be more like “the run
time location of the provided values”?

Arguments:
“”""""""""

The first argument is a unique id and the second argument is the
number of
shadow bytes following the intrinsic. The variable number of
arguments after
that are the live variables.

The purpose of the “id” isn’t described.

Semantics:
“”""""""""

The stackmap intrinsic generates no code in place, but its offset
from function
entry is stored in the stack map. Furthermore, it guarantees a shadow
of
instructions following its instruction offset during which neither
the end of
the function nor another stackmap or patchpoint intrinsic may occur.

It’s meaningless to discuss the semantics when important terms are
undefined:

  • “stack map” (and the format of a stack map, and where it is
    emitted/how it can be accessed, etc.)

I’d like to second this; we need to document the format of the stack map. We might want to do this in a separate document (like the document which elaborates on the description of the exception handling intrinsics). This document should be seeded with, at least, the description for x86 (and we can add descriptions for the other backends as they’re validated).

This looks like a very-useful piece of functionality.

I’m moving the intrinsic docs to a separate page, like exception intrinsics, and documenting the stack map format there.

-Andy

I'm strongly against naming this "webkit" or anything else to do with any
other single consumer of LLVM which is not even an LLVM project. It is
really confusing and implies a whole boat of things that aren't true.

I don't understand why you are pushing for "the final name or the webkit
name". I think the recommendation of "experimental" is great. It clarifies
that the exact interface isn't fully baked and may change, and clients must
be prepared to update following LLVM trunk as opposed to expecting full
backwards compatibility.

If this feature were *only* applicable to WebKit, I'm not even sure it
would belong in the main open source repository. But it isn't, it's a
really interesting general purpose feature for doing dynamic patching of
call sites, and we should figure out a way to design and evolve it as such.

I think that Hal’s idea of “experimental” is the right approach here. The major thing we want is to avoid having to be backwards compatible with this intrinsic in subsequent llvm releases. “experimental” sends that message, where webkit does not (and is also bad for the reasons Hal mentions).

-Chris

Done. I’ll update the patches on llvm-commits.

For the record, I wasn’t aware of any precedent for “llvm.experimental”, but if it will help avoid backward compatibility issues then it’s a good thing.

-Andy

I don't think we have precedent, but I think it will be really good to
establish precedent. =]

I recommend, this being the case, to replace ‘webkit’ with ‘experimental’. Having webkit in the name implies some dependence on webkit, and there is none. Plus, this functionality will be used by outside projects as soon as it lands in trunk, and I suspect that having webkit in the initial name will end up as a naming incongruity that no one will really think is worth the effort to change.

You’re correct that there is no dependence. I’m fine dropping the webkit name, but only if we can go straight to the final name (no need for “experimental”).

I think that Hal’s idea of “experimental” is the right approach here. The major thing we want is to avoid having to be backwards compatible with this intrinsic in subsequent llvm releases. “experimental” sends that message, where webkit does not (and is also bad for the reasons Hal mentions).

What would be the criteria for eventually dropping ‘experimental’ from the intrinsic names?

Evan

Right!

-Chris

At the least, I’d like to get some experience on these. Having webkit actually ship something based on this seems like a minimal requirement to demonstrate that it will actually work (end to end) in practice. Beyond that, we’d want to be happy enough with it that we’d be willing to autoupgrade it if it ever evolves in future releases: i.e. we’d be promising backward compatibility with the intrinsic.

-Chris

Sounds like excellent criteria for me on the name. Thanks Chris :slight_smile:

-eric

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?
- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

Yours,
Philip

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

The top use case will be heap access dispatch inline cache, which is not a call.

You can also use it to implement call inline caches, but that's not the only thing you can use it for.

-Filip

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

- As others have said, I'd prefer an experimental namespace rather than a webkit namespace. (minor)
- Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
- To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don't see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I'm missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar. In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being *uniquely* a problem for such collectors.) Since I was unaware of what you're using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.

- Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that's not the only thing you can use it for.

Er, possibly I'm misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath. (i.e. avoiding the dynamic dispatch logic in the common case) I'm assuming this what you mean with the term "call inline cache", but I have never heard of a "heap access dispatch inline cache". I've done a google search and didn't find a definition. Could you point me to a reference or provide a brief explanation?

Philip

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

  • As others have said, I’d prefer an experimental namespace rather than a webkit namespace. (minor)
  • Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
  • To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don’t see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I’m missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It’s not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won’t use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it’s possible to unify the two, this isn’t the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I agree that interior pointers are primarily a problem for relocating collectors. (Though I disagree with the characterization of it being uniquely a problem for such collectors.) Since I was unaware of what you’re using your stackmap mechanism for, I wanted to ask. Sounds like this is not an intended use case for you.

In particular, the stackmaps in this proposal are likely to be used for capturing only a select subset of state and that subset may fail to include all possible GC roots. These stackmaps are meant to be used for reconstructing state-in-bytecode (where bytecode = whatever your baseline execution engine is, could be an AST) for performing a deoptimization, if LLVM was used for compiling code that had some type/value/behavior speculations.

Thanks for the clarification. This is definitely a useful mechanism. Thank you for contributing it back.

  • Your patchpoint mechanism appears to be one very specialized use of a patchable location. Would you mind renaming it to something like patchablecall to reflect this specialization?

The top use case will be heap access dispatch inline cache, which is not a call.
You can also use it to implement call inline caches, but that’s not the only thing you can use it for.

Er, possibly I’m misunderstanding you. To me, a inline call cache is a mechanism to optimize a dynamic call by adding a typecheck+directcall fastpath.

Inline caches don’t have to be calls. For example, in JavaScript, the expression “o.f” is fully dynamic but usually does not result in a call. The inline cache - and hence patchpoint - for such an expression will not have a call in the common case.

Similar things arise in other dynamic languages. You can have inline caches for arithmetic. Or for array accesses. Or for any other dynamic operation in your language.

(i.e. avoiding the dynamic dispatch logic in the common case) I’m assuming this what you mean with the term “call inline cache”, but I have never heard of a “heap access dispatch inline cache”. I’ve done a google search and didn’t find a definition. Could you point me to a reference or provide a brief explanation?

Every JavaScript engine does it, and usually the term “inline cache” in the context of JS engines implies dispatching on the shape of the object in order to find the offset at which a field is located, rather than dispatching on the class of an object to determine what method to call.

-Filip

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can’t. The runtime can add GC roots to the stack map just like other live value, and it should know how to interpret the records. The intrinsic doesn’t bake in any particular interpretation of the mapped values. That said, my proposal deliberately does not cover GC. I think that stack maps are the easy part of the problem. The hard problem is tracking interior pointers, or for that matter exterior/out-of-bounds or swizzled pointers. LLVM’s machine IR simply doesn’t have the necessary facilities for doing this. But if you don’t need a moving collector, then you don’t need to track derived pointers as long as the roots are kept live. In that case, llvm.stackmap might be a nice optimization over llvm.gcroot.

Now with regard to patching. I think llvm.patchpoint is generally useful for any type of patching I can imagine. It does look like a call site in IR, and it’s nice to be able to leverage calling conventions to inform the location of arguments. But the patchpoint does not have to be a call after patching, and you can specify zero arguments to avoid using a calling convention. In fact, we only currently emit a call out of convenience. We could splat nops in place and assume the runtime will immediately find and patch all occurrences before the code executes. In the future we may want to handle NULL call target, bypass call emission, and allow the reserved bytes to be less than that required to emit a call.

-Andy

I’m going to respond to Andrew Trick’s followup for this portion. Thank you for the clarification. I am familiar with the patching optimizations performed for property access, but had not been aware of the modified usage of the term “inline cache”. I was also unaware of the term “heap access dispatch inline cache”. I believe I now understand your intent. Taking a step back in the conversation, my original question was about the naming of the patchpoint intrinsic. I am now convinced that you could use your patchpoint intrinsic for a number of different inline caching schemes (method dispatch, property access, etc…). Given that, my concern about naming is diminished, but not completely eliminated. I don’t really have a suggestion for a better name, but given that a “stackmap” intrinsic can be patched, the “patchpoint” intrinsic name doesn’t seem particularly descriptive. To put it another way, how are the stackmap and patchpoint intrinsics different? Can this difference be encoded in a descriptive name for one or the other? As a secondary point, it would be good to update the proposed documentation with a brief description of the intended usage (i.e. inline caching). This might prevent a future developer from being confused on the same issues. Yours, Philip

This is a proposal for adding Stackmaps and Patchpoints to LLVM. The
first client of these features is the JavaScript compiler within the
open source WebKit project.

I have a couple of comments on your proposal. None of these are major enough to prevent submission.

  • As others have said, I’d prefer an experimental namespace rather than a webkit namespace. (minor)
  • Unless I am misreading your proposal, your proposed StackMap intrinsic duplicates existing functionality already in llvm. In particular, much of the StackMap construction seems similar to the Safepoint mechanism used by the in-tree GC support. (See CodeGen/GCStrategy.cpp and CodeGen/GCMetadata.cpp). Have you examined these mechanisms to see if you can share implementations?
  • To my knowledge, there is nothing that prevents an LLVM optimization pass from manufacturing new pointers which point inside an existing data structure. (e.g. an interior pointer to an array when blocking a loop) Does your StackMap mechanism need to be able to inspect/modify these manufactured temporaries? If so, I don’t see how you could generate an intrinsic which would include this manufactured pointer in the live variable list. Is there something I’m missing here?

These stackmaps have nothing to do with GC. Interior pointers are a problem unique to precise copying collectors.

I would argue that while the use of the stack maps might be different, the mechanism is fairly similar.

It’s not at all similar. These stackmaps are only useful for deoptimization, since the only way to make use of the live state information is to patch the stackmap with a jump to a deoptimization off-ramp. You won’t use these for a GC.

In general, if the expected semantics are the same, a shared implementation would be desirable. This is more a suggestion for future refactoring than anything else.

I think that these stackmaps and GC stackmaps are fairly different beasts. While it’s possible to unify the two, this isn’t the intent here. In particular, you can use these stackmaps for deoptimization without having to unwind the stack.

I think Philip R is asking a good question. To paraphrase: If we introduce a generically named feature, shouldn’t it be generically useful? Stack maps are used in other ways, and there are other kinds of patching. I agree and I think these are intended to be generically useful features, but not necessarily sufficient for every use.

The proposed stack maps are very different from LLVM’s gcroot because gcroot does not provide stack maps! llvm.gcroot effectively designates a stack location for each root for the duration of the current function, and forces the root to be spilled to the stack at all call sites (the client needs to disable StackColoring). This is really the opposite of a stack map and I’m not aware of any functionality that can be shared. It also requires a C++ plugin to process the roots. llvm.stackmap generates data in a section that MCJIT clients can parse.

If someone wanted to use stack maps for GC, I don’t know why they wouldn’t leverage llvm.stackmap. Maybe Filip can see a problem with this that I can’t.

You’re right, it could work.

If you were happy with spilling all of your GC roots, then you could put them into allocas and then pass the allocas’ addresses to a stackmap. This will give you a FP offset of the roots.

If you were happy with an accurate GC that couldn’t move objects referenced from the stack then you could have each safepoint call use patchpoint, and then if you also implemented stack unwinding, you could use the patchpoints’ implicit stackmaps to figure out which registers (or stack slots) contained pointers.

These would be niche uses, I think. If you care about performance then you’re not going to use an accurate GC that requires spilling roots; you’ll go for some GC algorithm that can handle conservative stack roots. If you’re using accurate GC support for moving objects then it’s usually because you need to move all objects (after all you can move most objects without any GC roots or stackmaps by using Bartlett’s algorithm or similar) so the calls-as-patchpoints approach won’t work.

I could kind of see some real-time GC’s using the alloca+stackmap approach, but it’s a bit of a stretch.

So, I don’t see stackmaps as being particularly practical for accurate GC, but I do concede that you could implement some kind of accurate GC that uses stackmaps for some part of its stack scanning.