LLVM as a back end for HHVM

Hi All,

Our team at Hip-Hop Virtual Machine (http://hhvm.com) have been experimenting with using LLVM as a code generator for x86-64. We have been successfully running it for quite some time as a secondary back end. We had to modify our version of LLVM and our mods were based on 3.5 release. At this point we feel our requirements have become stable enough to start upstreaming our diffs.

A high-level overview of LLVM changes could be found at:

https://github.com/facebook/hhvm/tree/master/hphp/tools/llvm

The set of patches will be loosely based on the above, as some of our interfaces have changed since we’ve merged with the trunk.

All feedback is welcome. Please let me know if you are interested and I’ll CC you explicitly on the reviews.

Thanks,
Maksim

Hi,

Hi All,

Our team at Hip-Hop Virtual Machine (http://hhvm.com) have been experimenting with using LLVM as a code generator for x86-64. We have been successfully running it for quite some time as a secondary back end. We had to modify our version of LLVM and our mods were based on 3.5 release. At this point we feel our requirements have become stable enough to start upstreaming our diffs.

Great to read that you will upstream stuff!

A high-level overview of LLVM changes could be found at:

https://github.com/facebook/hhvm/tree/master/hphp/tools/llvm

The set of patches will be loosely based on the above, as some of our interfaces have changed since we’ve merged with the trunk.

All feedback is welcome. Please let me know if you are interested and I’ll CC you explicitly on the reviews.

The patch is huge, I expect many small patches won’t be too much controversial, but it would be nice to have some RFC-like document to discuss some high-level design details.

And I’ll be happy to be CC’ed on the reviews.

Thanks,

Hi,

Hi All,

Our team at Hip-Hop Virtual Machine (http://hhvm.com) have been experimenting with using LLVM as a code generator for x86-64. We have been successfully running it for quite some time as a secondary back end. We had to modify our version of LLVM and our mods were based on 3.5 release. At this point we feel our requirements have become stable enough to start upstreaming our diffs.

Great to read that you will upstream stuff!

A high-level overview of LLVM changes could be found at:

https://github.com/facebook/hhvm/tree/master/hphp/tools/llvm

The set of patches will be loosely based on the above, as some of our interfaces have changed since we’ve merged with the trunk.

All feedback is welcome. Please let me know if you are interested and I’ll CC you explicitly on the reviews.

The patch is huge, I expect many small patches won’t be too much controversial, but it would be nice to have some RFC-like document to discuss some high-level design details.

That makes sense. I would think features like “location records” to be useful outside of our project, and agree that it’ll require an RFC.

And I’ll be happy to be CC’ed on the reviews.

Sounds good!

Thanks,
Maksim

Hi Maksim,

This looks really great, and I'm interested in helping review your
changes as they become ready.

Specifically on "Location records" --

Is it legal for the optimizer to drop the `!locrec` metadata that you
attach to instructions? The general convention in LLVM is that
dropping metadata should not affect correctness, and if the location
record information is not best-effort or optional then metadata is
perhaps not the best way to represent it.

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

  musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

OTOH, if location records are optional and the optimizer dropping
location records is a quality of implementation issue, not a
correctness issue then what I said above is probably irrelevant.

I'm also curious about HHVM's notion of side exits -- how do you track
the abstract state to which you have to exit to? Our primary use-case
for operand bundles is to track the abstract state of a thread (the
"interpreter state") that we need for side exits and asynchronous code
invalidation.

[1]: http://lists.llvm.org/pipermail/llvm-dev/2015-August/089070.html
[2]: http://reviews.llvm.org/D12455 http://reviews.llvm.org/D12456
     http://reviews.llvm.org/D12457

-- Sanjoy

Great to see these contributions, Maksim. I’d be very happy to help with the reviews, and I can probably recruit a few more if you need more eyes on it.

Thanks…
-Dave

Glad to hear you're able to start upstreaming changes.

I'd be happy to act as a reviewer for the location records and smashable changes. This sounds very close to what we have/need and coming up with a common representation based on your work would be great.

Philip

Maksim,

I would be very happy to assist in any way possible. Please include me on any code review or RFCs.

Chad

Specifically on "Location records" --

Is it legal for the optimizer to drop the `!locrec` metadata that you
attach to instructions? The general convention in LLVM is that
dropping metadata should not affect correctness, and if the location
record information is not best-effort or optional then metadata is
perhaps not the best way to represent it.

Unfortunately not - all of our uses of locrecs are required for correctness.

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

   musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was that they'd work with any instruction, not just call. We currently only use locrecs on call/invoke, and I can't think of anything we haven't yet implemented that would benefit from locrecs on other instructions (that may change in the future, of course).

I'm also curious about HHVM's notion of side exits -- how do you track
the abstract state to which you have to exit to? Our primary use-case
for operand bundles is to track the abstract state of a thread (the
"interpreter state") that we need for side exits and asynchronous code
invalidation.

All VM state syncing for side exits is explicit in the IR we lower to LLVM (as a series of stores on an unlikely path), so we don't need anything special from LLVM here. We use locrecs to update our jump smashing stubs, so they know the address of the jump that entered the stub and should be smashed.

-Brett

Specifically on "Location records" --

Is it legal for the optimizer to drop the `!locrec` metadata that you
attach to instructions? The general convention in LLVM is that
dropping metadata should not affect correctness, and if the location
record information is not best-effort or optional then metadata is
perhaps not the best way to represent it.

Unfortunately not - all of our uses of locrecs are required for correctness.

This will need to be a function attribute or operand bundle when upstreamed then, but that's a pretty simple change to make.

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

   musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was that they'd work with any instruction, not just call. We currently only use locrecs on call/invoke, and I can't think of anything we haven't yet implemented that would benefit from locrecs on other instructions (that may change in the future, of course).

Interesting. What type of use cases are you imagining for locrecs on non-call instructions? Are you thinking of things like implicit null and div-by-zero checks? (The former is already supported in LLVM today.) Or something else entirely?

I'm also curious about HHVM's notion of side exits -- how do you track
the abstract state to which you have to exit to? Our primary use-case
for operand bundles is to track the abstract state of a thread (the
"interpreter state") that we need for side exits and asynchronous code
invalidation.

All VM state syncing for side exits is explicit in the IR we lower to LLVM (as a series of stores on an unlikely path), so we don't need anything special from LLVM here. We use locrecs to update our jump smashing stubs, so they know the address of the jump that entered the stub and should be smashed.

that you're essentially pre-reserving a set of allocas for the spill locations, somehow registering those with your runtime once, then emitting stores down the unlikely path into those allocas. Is that roughly right?

How are you handling things like constants and duplicate values appearing in the VM state? Our experience has been that constants are fairly common and so are duplicate values (particularly when combined with GC state). It would seem like your frame sizes would be inflated if you had to pre-reserve space for each constant and each copy of a value. Have you found this to be true? If so, has it been problematic for you?

Philip

Specifically on "Location records" --

Is it legal for the optimizer to drop the `!locrec` metadata that you
attach to instructions? The general convention in LLVM is that
dropping metadata should not affect correctness, and if the location
record information is not best-effort or optional then metadata is
perhaps not the best way to represent it.

Unfortunately not - all of our uses of locrecs are required for
correctness.

This will need to be a function attribute or operand bundle when
upstreamed then, but that's a pretty simple change to make.

I think switching from metadata to operand bundles wouldn't be a problem,
assuming ³locrec² operand will have no effect on optimizations and codegen.

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

   musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was
that they'd work with any instruction, not just call. We currently
only use locrecs on call/invoke, and I can't think of anything we
haven't yet implemented that would benefit from locrecs on other
instructions (that may change in the future, of course).

Interesting. What type of use cases are you imagining for locrecs on
non-call instructions? Are you thinking of things like implicit null
and div-by-zero checks? (The former is already supported in LLVM
today.) Or something else entirely?

One possible scenario is locating a constant address generation, e.g. for
an
indirect call destination. It¹s rather hypothetical example,
as we don¹t use it in this way at the moment. Substituting such address
isn¹t quite straightforward as the value could be scattered across several
instructions on some architectures, or be placed in a data section.

In general, we found locrecs useful for annotating IR and exploring
resulting
assembly. You could mark instruction in the IR, and the assembly dump will
include annotations showing all machine instructions generated from it.
However, this particular feature is orthogonal to our JIT requirements
and could go in separately if there¹s enough interest.

Maksim

that you're essentially pre-reserving a set of allocas for the spill
locations, somehow registering those with your runtime once, then
emitting stores down the unlikely path into those allocas. Is that
roughly right?

It's even simpler than that: the stores to spill VM state go directly to the VM locations where those values belong. We only ever side-exit at bytecode boundaries, so every live value has a place to live on the eval stack or in a local variable (every PHP function has a fixed number of local variables, and space for those is allocated on the eval stack). This means that the side-exit path doesn't have to know if the next bytecode is going to executed by the interpreter or by more jitted code, since they'll both read values from the same locations.

There are some downsides to this, of course, and we've been thinking about ways to have a faster ABI between snippets of jitted code, like passing the top n elements of the eval stack in registers. But we have no concrete plans to do that in the near future.

How are you handling things like constants and duplicate values
appearing in the VM state? Our experience has been that constants are
fairly common and so are duplicate values (particularly when combined
with GC state). It would seem like your frame sizes would be inflated
if you had to pre-reserve space for each constant and each copy of a
value. Have you found this to be true? If so, has it been problematic
for you?

If I'm understanding this question correctly it doesn't apply to our situation given my answer to the previous question, but let me know if that's not the case and I can try to expand :). We don't currently have a GC - everything is done using reference counting, though we do have a few people working on changing that.

-Brett

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

    musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was
that they'd work with any instruction, not just call. We currently
only use locrecs on call/invoke, and I can't think of anything we
haven't yet implemented that would benefit from locrecs on other
instructions (that may change in the future, of course).

Interesting. What type of use cases are you imagining for locrecs on
non-call instructions? Are you thinking of things like implicit null
and div-by-zero checks? (The former is already supported in LLVM
today.) Or something else entirely?

One possible scenario is locating a constant address generation, e.g. for
an
indirect call destination. It¹s rather hypothetical example,
as we don¹t use it in this way at the moment. Substituting such address
isn¹t quite straightforward as the value could be scattered across several
instructions on some architectures, or be placed in a data section.

Just to be clear, you're talking about representing a call to a function at a constant address right? With no additional constraints? If so, introducing a function declaration and using the link/resolver functionality provided by MCJIT seems like a more natural fit. The only real downside to that is that you end up with a generic far-call and we don't have a way to indicate a particular call is in fact near enough for a pc-relative offset.

In general, we found locrecs useful for annotating IR and exploring
resulting
assembly. You could mark instruction in the IR, and the assembly dump will
include annotations showing all machine instructions generated from it.
However, this particular feature is orthogonal to our JIT requirements
and could go in separately if there¹s enough interest.

This actually sounds like both a useful debugging feature and a reasonable use of metadata. Given there's no *correctness* requirement that the metadata be preserved, it seems like a reasonable fit. If you wanted to propose this upstream, I could see this being really useful with some revision. My biggest concern would be whether this overlaps with something we already have in the debug info support.

Ah, gotcha. I'd assumed your were intermixing your language and execution stacks. Yeah, if you're maintaining them separately, transitioning becomes much easier.

Just as a sanity check, the compiled code does end up forwarding all the loads and keeping everything in registers within a single compilation right? I'd assume you'd have to for decent performance, but your description almost makes it sound like it doesn't.

Philip

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be
dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

    musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was
that they'd work with any instruction, not just call. We currently
only use locrecs on call/invoke, and I can't think of anything we
haven't yet implemented that would benefit from locrecs on other
instructions (that may change in the future, of course).

Interesting. What type of use cases are you imagining for locrecs on
non-call instructions? Are you thinking of things like implicit null
and div-by-zero checks? (The former is already supported in LLVM
today.) Or something else entirely?

One possible scenario is locating a constant address generation, e.g.
for
an
indirect call destination. It¹s rather hypothetical example,
as we don¹t use it in this way at the moment. Substituting such address
isn¹t quite straightforward as the value could be scattered across
several
instructions on some architectures, or be placed in a data section.

Just to be clear, you're talking about representing a call to a function
at a constant address right? With no additional constraints? If so,
introducing a function declaration and using the link/resolver
functionality provided by MCJIT seems like a more natural fit. The only
real downside to that is that you end up with a generic far-call and we
don't have a way to indicate a particular call is in fact near enough
for a pc-relative offset.

With calls to fixed addresses we do exactly that and it works fine.
I’m talking about a hypothetical case where the generation of the address
happens in the code. In general the idea is to be able to locate code
resulting
from any instruction.

BTW, is there a documentation regarding dropping metadata by optimizer?
What are
the exact rules? We are also using module flags metadata to alter
alignment
for code generation and we certainly don’t want that to be dropped.
Perhaps there
should be a way to distinguish metadata that can/cannot be arbitrarily
removed?
Oh wait, that’s the idea behind operand bundles :slight_smile: They are mandatory
metadata,
right?

In general, we found locrecs useful for annotating IR and exploring
resulting
assembly. You could mark instruction in the IR, and the assembly dump
will
include annotations showing all machine instructions generated from it.
However, this particular feature is orthogonal to our JIT requirements
and could go in separately if there¹s enough interest.

This actually sounds like both a useful debugging feature and a
reasonable use of metadata. Given there's no *correctness* requirement
that the metadata be preserved, it seems like a reasonable fit. If you
wanted to propose this upstream, I could see this being really useful
with some revision. My biggest concern would be whether this overlaps
with something we already have in the debug info support.

Cool. If this happens, location records will complement debug info.

Maksim

We're currently developing a scheme called "operand bundles" (more
details at [1], patches at [2]) that can be used to tag calls and
invokes with arbitrary values in a way that that they won't be
dropped
by the optimizer. The exact semantics of operand bundles are still
under discussion, and I'd like to make mechanism broadly useful,
including for things like location records. So it'd be great if you
take a look to see if location records can be implemented using
operand bundles. I was thinking of something like

     musttail call void @foo(i64 %val) [ "locrec"(i32 42) ]

where the "locrec"(i32 42) gets lowered to some suitable form in
SelectionDAG.

That sounds like it should work. One of the ideas behind locrecs was
that they'd work with any instruction, not just call. We currently
only use locrecs on call/invoke, and I can't think of anything we
haven't yet implemented that would benefit from locrecs on other
instructions (that may change in the future, of course).

Interesting. What type of use cases are you imagining for locrecs on
non-call instructions? Are you thinking of things like implicit null
and div-by-zero checks? (The former is already supported in LLVM
today.) Or something else entirely?

One possible scenario is locating a constant address generation, e.g.
for
an
indirect call destination. It¹s rather hypothetical example,
as we don¹t use it in this way at the moment. Substituting such address
isn¹t quite straightforward as the value could be scattered across
several
instructions on some architectures, or be placed in a data section.

Just to be clear, you're talking about representing a call to a function
at a constant address right? With no additional constraints? If so,
introducing a function declaration and using the link/resolver
functionality provided by MCJIT seems like a more natural fit. The only
real downside to that is that you end up with a generic far-call and we
don't have a way to indicate a particular call is in fact near enough
for a pc-relative offset.

With calls to fixed addresses we do exactly that and it works fine.
I’m talking about a hypothetical case where the generation of the address
happens in the code. In general the idea is to be able to locate code
resulting
from any instruction.

BTW, is there a documentation regarding dropping metadata by optimizer?
What are
the exact rules? We are also using module flags metadata to alter
alignment
for code generation and we certainly don’t want that to be dropped.

I don't have a reference, but the general distinction is the following:
- Attributes effect correctness. They must be preserved at all cost. They can limit optimizations, or even effect the ABI.
- Metadata provide optional hints. It is *always* legal - though not desirable - for the optimizer to drop all metadata before doing anything else. We try to preserve metadata, but failing to do so is at worst a performance problem, not a correctness issue. (*)

(*) Well, some of the module level metadata used by ObjectiveC breaks this rule, but if it's broken, no one has screamed yet. If it ever does break, the response will likely be "don't use metadata".

Perhaps there
should be a way to distinguish metadata that can/cannot be arbitrarily
removed?
Oh wait, that’s the idea behind operand bundles :slight_smile: They are mandatory
metadata,
right?

Essentially, yes. :slight_smile:

Ah yeah, forgot to mention that we had a separate eval stack :). We've tossed around a few ideas about how to merge the C++ and PHP stacks but it would be a lot of work and I don't expect it to happen any time soon.

And yeah, within a single compilation unit we keep whatever we can in registers. There are a few cases where I've seen our optimizations miss something that LLVM cleaned up later but those are rare and tend to be in patterns that we haven't seen show up in hot code.

-Brett