llvm.experimental.gc.statepoint genarates wrong Stack Map (or does it?)

Hello, list

I am not quite sure if what I'm experiencing is a bug or intentional behavior.

In the code below the result of a function call is a deopt arg to llvm.experimental.gc.statepoint
(http://llvm.org/docs/Statepoints.html#llvm-experimental-gc-statepoint-intrinsic).
Therefore a Stack Map containing location of this variable is created upon code generation.

Here's the complete example:

     define i64 addrspace(1)* @func() {
         %p = inttoptr i64 42 to i64 addrspace(1)*
         ret i64 addrspace(1)* %p
     }

     define i8 @main() #0 gc "statepoint-example" {
         %result = call i64 addrspace(1)* @func()

         %token = call i32 (i64, i32, i64 addrspace(1)* ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_p1i64f (
             i64 111, i32 0,
             i64 addrspace(1)* ()* @func,
             i32 0, i32 0, i32 0,
             i32 1,
             i64 addrspace(1)* %result) ; the only deopt arg

         %b = ptrtoint i64 addrspace(1)* %result to i64
         %c = trunc i64 %b to i8
         ret i8 %c
     }

     declare i32 @llvm.experimental.gc.statepoint.p0f_p1i64f(i64, i32, i64 addrspace(1)* ()*, i32, i32, ...)
     declare i64 addrspace(1)* @llvm.experimental.gc.result.p1i64(i32)

     attributes #0 = { "no-frame-pointer-elim"="true" }

I compile with `llc -O3 -x86-asm-syntax=intel`.

Here is the meat of the emitted asm:
     call func
     mov rbx, rax
     mov qword ptr [rbp - 16], rbx
     call func

Here is the corresponding Stack Map Location:
     .byte 2
     .byte 8
     .short 7
     .long 0
Which means, according to http://llvm.org/docs/StackMaps.html#stack-map-format, that it's a Direct location rsp + 0 (rsp has DWARF register number 7).
Which I find is not exactly true, as it should rather be an Indirect location [rsp + 0], or better yet, considering I specified "no-frame-pointer-elim"="true", [rbp - 16].

The only explanation about direct locations in the doc is "If an alloca value is passed directly to a stack map intrinsic, then LLVM may fold the frame index into the stack map as an optimization to avoid allocating a register or stack slot. These frame indices will be encoded as Direct locations in the form BP + Offset." And as far as I can see it's not the case here.

I have a new version of LLVM which I built yesterday.

Please help me understand what's happening here. Thanks in advance

Viad,

My initial impression is that you've stumbled across a bug. I suspect that we - the only active users of the deopt info in the statepoint I know of - have been inverting the meaning of Direct and Indirect throughout our code. (i.e. we're consistent, but swapped on the documented meaning) I've asked Sanjoy to confirm that, and if he believes that is actually the case, we will fix upstream to use Indirect in this case.

I'll try to have more information for you early next week.

Can I ask what you're using the deopt information for? Do you have a language runtime which supports deoptimization? Or are you using it for something different? If so, what? I'm curious to know how others are using the infrastructure.

Philip

Vlad,

My initial impression is that you've stumbled across a bug. I suspect
that we - the only active users of the deopt info in the statepoint I
know of - have been inverting the meaning of Direct and Indirect
throughout our code. (i.e. we're consistent, but swapped on the
documented meaning) I've asked Sanjoy to confirm that, and if he
believes that is actually the case, we will fix upstream to use
Indirect in this case.

I'll try to have more information for you early next week.

Hi, Philip! Thank you for taking time to look into it!
May be it's a separate question, but I hope it's OK to ask in this thread. Would it be hard to make the statepoint intrinsic produce frame pointer based offset (instead of sp-based) if "no-frame-pointer-elim"="true" attribute is specified?

Can I ask what you're using the deopt information for? Do you have a
language runtime which supports deoptimization? Or are you using it
for something different? If so, what? I'm curious to know how others
are using the infrastructure.

Sure.
I am working on LLV8, which is an attempt to use LLVM MCJIT as a backend for Google V8. So yes, we have a language runtime which supports deoptimization.
Deoptimization support wasn't hard. We use the stackmap intrinsic for that. (And we've encountered each type of Location except Direct so far).
Now I am implementing the safepoint functionality which is why I use the gc.statepoint intrinsic. The two utility passes have been extremely helpful. The use-cases they support are much more general than my needs though. So I use a modified version of RewriteStatepointsForGC, which only appends the pointer values live across the statepoint to <deopt

Hi Vlad,

>> Vlad,
>>
>> My initial impression is that you've stumbled across a bug. I suspect
>> that we - the only active users of the deopt info in the statepoint I
>> know of - have been inverting the meaning of Direct and Indirect
>> throughout our code. (i.e. we're consistent, but swapped on the
>> documented meaning) I've asked Sanjoy to confirm that, and if he
>> believes that is actually the case, we will fix upstream to use
>> Indirect in this case.
>>
>> I'll try to have more information for you early next week.

This is definitely what looks to be going on: we report a stack slot
as "Direct" when the doc says we have to report these as "Indirect".
I'll send in a fix sometime this week.

[Some details below, feel free to skip]

The problem is we reuse emitPatchpoint in
X86TargetLowering::EmitInstrWithCustomInserter even though
patchpoint's notion of what needs to be Direct vs. what needs to be
Indirect is different from ours. Specifically, emitPatchpoint will
take any FrameIndex machine operand and report it as Direct, and will
only use Indirect for memory operands that it could fold into the
PATCHPOINT node i.e. PATCHPOINT(load_addr(%rsp + 40)) gets reported as
an Indirect location [RSP+40], but PATCHPOINT(%rsp + 40) gets reported
as Direct. For us, logically, we've already done the "folding"
earlier, so the above behavior isn't quite right for us.

> May be it's a separate question, but I hope it's OK to ask in this
> thread. Would it be hard to make the statepoint intrinsic produce frame
> pointer based offset (instead of sp-based) if
> "no-frame-pointer-elim"="true" attribute is specified?

For non-aligned frames (i.e. frames that don't align themselves) this
shouldn't be fundamentally difficult, but there are some tricky math
around frame layout you'll have to get right.

For aligned frames, I'm not sure if there is a way to get to all of
the stack locations using RBP (assuming it contains the "SP at entry"
value as usual). But please do check.

>> Can I ask what you're using the deopt information for? Do you have a
>> language runtime which supports deoptimization? Or are you using it
>> for something different? If so, what? I'm curious to know how others
>> are using the infrastructure.
>
> Sure.
> I am working on LLV8, which is an attempt to use LLVM MCJIT as a backend
> for Google V8. So yes, we have a language runtime which supports
> deoptimization.
> Deoptimization support wasn't hard. We use the stackmap intrinsic for
> that. (And we've encountered each type of Location except Direct so far).
> Now I am implementing the safepoint functionality which is why I use the
> gc.statepoint intrinsic. The two utility passes have been extremely
> helpful. The use-cases they support are much more general than my needs
> though. So I use a modified version of RewriteStatepointsForGC, which
> only appends the pointer values live across the statepoint to <deopt

.

If I understand you correctly, that will work only if your GC isn't a
relocating GC. Is that the case?

-- Sanjoy

Hi, Sanjoy,

Hi Vlad,

Vlad,

My initial impression is that you've stumbled across a bug. I suspect
that we - the only active users of the deopt info in the statepoint I
know of - have been inverting the meaning of Direct and Indirect
throughout our code. (i.e. we're consistent, but swapped on the
documented meaning) I've asked Sanjoy to confirm that, and if he
believes that is actually the case, we will fix upstream to use
Indirect in this case.

I'll try to have more information for you early next week.

This is definitely what looks to be going on: we report a stack slot
as "Direct" when the doc says we have to report these as "Indirect".
I'll send in a fix sometime this week.

Thanks!

May be it's a separate question, but I hope it's OK to ask in this
thread. Would it be hard to make the statepoint intrinsic produce frame
pointer based offset (instead of sp-based) if
"no-frame-pointer-elim"="true" attribute is specified?

For non-aligned frames (i.e. frames that don't align themselves) this
shouldn't be fundamentally difficult, but there are some tricky math
around frame layout you'll have to get right.

For aligned frames, I'm not sure if there is a way to get to all of
the stack locations using RBP (assuming it contains the "SP at entry"
value as usual). But please do check.

Hmm. Do you think it would be a good change to push upstream given we might not be able to do it in all cases (say, always fall back to rsp-based if the frame is aligned)?

Can I ask what you're using the deopt information for? Do you have a
language runtime which supports deoptimization? Or are you using it
for something different? If so, what? I'm curious to know how others
are using the infrastructure.

Sure.
I am working on LLV8, which is an attempt to use LLVM MCJIT as a backend
for Google V8. So yes, we have a language runtime which supports
deoptimization.
Deoptimization support wasn't hard. We use the stackmap intrinsic for
that. (And we've encountered each type of Location except Direct so far).
Now I am implementing the safepoint functionality which is why I use the
gc.statepoint intrinsic. The two utility passes have been extremely
helpful. The use-cases they support are much more general than my needs
though. So I use a modified version of RewriteStatepointsForGC, which
only appends the pointer values live across the statepoint to <deopt >.

If I understand you correctly, that will work only if your GC isn't a
relocating GC. Is that the case?

V8's GC is a relocating GC. When it moves an object which is live at a safe point it knows where the pointer is located (typically it's a stack slot) from info stored in the associated safepoint table so it can change it right there. As far as I can see, we are getting the relocations right.

[If you are interested in more detail on my use-case, read the next paragraph]

Let's go a little into how Lithium works. After register allocation phase it emits safepoint tables i.e. locations of live pointers for each safe point. To define the live pointers it simply looks which live ranges cover the safe point (and which of them hold pointers).
I want to emit safepoint tables for LLVM-generated code as well, but I lack the info which becomes available only after register allocation. So I use the mighty Stack Maps to get it after codegen. And to emit Stack Maps in this case I use the <deopt args> of gc.statepoint (liveness analysis is done on the CFG in terms of SSA values in this case). I don't use <transition args> and <gc params> at all.

Vladislav Ivanishin wrote:
> Hmm. Do you think it would be a good change to push upstream given we
> might not be able to do it in all cases (say, always fall back to
> rsp-based if the frame is aligned)?

I'm not sure -- I'd say write up a patch and send it in for review;
and see what people think.

>
>>>> Can I ask what you're using the deopt information for? Do you have a
>>>> language runtime which supports deoptimization? Or are you using it
>>>> for something different? If so, what? I'm curious to know how others
>>>> are using the infrastructure.
>>>
>>> Sure.
>>> I am working on LLV8, which is an attempt to use LLVM MCJIT as a backend
>>> for Google V8. So yes, we have a language runtime which supports
>>> deoptimization.
>>> Deoptimization support wasn't hard. We use the stackmap intrinsic for
>>> that. (And we've encountered each type of Location except Direct so
>>> far).
>>> Now I am implementing the safepoint functionality which is why I use the
>>> gc.statepoint intrinsic. The two utility passes have been extremely
>>> helpful. The use-cases they support are much more general than my needs
>>> though. So I use a modified version of RewriteStatepointsForGC, which
>>> only appends the pointer values live across the statepoint to <deopt
>>> >.
>>
>> If I understand you correctly, that will work only if your GC isn't a
>> relocating GC. Is that the case?
>
> V8's GC is a relocating GC. When it moves an object which is live at a
> safe point it knows where the pointer is located (typically it's a stack
> slot) from info stored in the associated safepoint table so it can
> change it right there. As far as I can see, we are getting the
> relocations right.

I don't know your system well enough to say anything definitive, but
the deopt state only gives you spill semantics, not reload semantics.
So it is legal to lower

   %v = gc.statepoint(@callee | %gcptr) ;; %gcptr is in the deopt section
   load_from(%gcptr)

as

   ;; %gcptr is in %r12
   [%rsp + 40] = %r12
   call @callee [[ stackmaps contains [%rsp + 40] ]]
   load_from(%rcx)

You'll only update [%rsp + 40] at the safepoint at "call @callee", and
will dereference a stale pointer at "load_from(%rcx)".

To get around this, you need to use the gc_args section of
gc.statepoint. The good news is that RewriteStatepointsForGC should
do all the heavy lifting for you.

We gave a talk about this in last year's dev meeting [1], and please
feel free to ask any questions here.

[1]: http://llvm.org/devmtg/2014-10/#talk4

-- Sanjoy

Vladislav Ivanishin wrote:

Hmm. Do you think it would be a good change to push upstream given we
might not be able to do it in all cases (say, always fall back to
rsp-based if the frame is aligned)?

I'm not sure -- I'd say write up a patch and send it in for review;
and see what people think.

Okay.

Can I ask what you're using the deopt information for? Do you have a
language runtime which supports deoptimization? Or are you using it
for something different? If so, what? I'm curious to know how others
are using the infrastructure.

Sure.
I am working on LLV8, which is an attempt to use LLVM MCJIT as a backend
for Google V8. So yes, we have a language runtime which supports
deoptimization.
Deoptimization support wasn't hard. We use the stackmap intrinsic for
that. (And we've encountered each type of Location except Direct so
far).
Now I am implementing the safepoint functionality which is why I use the
gc.statepoint intrinsic. The two utility passes have been extremely
helpful. The use-cases they support are much more general than my needs
though. So I use a modified version of RewriteStatepointsForGC, which
only appends the pointer values live across the statepoint to <deopt
>.

If I understand you correctly, that will work only if your GC isn't a
relocating GC. Is that the case?

V8's GC is a relocating GC. When it moves an object which is live at a
safe point it knows where the pointer is located (typically it's a stack
slot) from info stored in the associated safepoint table so it can
change it right there. As far as I can see, we are getting the
relocations right.

I don't know your system well enough to say anything definitive, but
the deopt state only gives you spill semantics, not reload semantics.
So it is legal to lower

  %v = gc.statepoint(@callee | %gcptr) ;; %gcptr is in the deopt section
  load_from(%gcptr)

as

  ;; %gcptr is in %r12
  [%rsp + 40] = %r12
  call @callee [[ stackmaps contains [%rsp + 40] ]]
  load_from(%rcx)

You'll only update [%rsp + 40] at the safepoint at "call @callee", and
will dereference a stale pointer at "load_from(%rcx)".

Ouch. You are right, I'll need more than spill semantics. Thank you for pointing this out.

We gave a talk about this in last year's dev meeting [1], and please
feel free to ask any questions here.

[1]: http://llvm.org/devmtg/2014-10/#talk4

Thanks for the link. I'll watch the talk.

BTW, there's a wrong file attached to the "Video (Computer)" link.
"Video (Mobile)" is fine though. (But it's 360p.)