Stackmap question

Hello, all

We are working on the Multi-OS-Engine (https://multi-os-engine.org), which uses Android's ART runtime ported to iOS to allow Java developers to write iOS applications. Currently we are working on its 2.0 version, which will use LLVM to generate native code from dalvik bytecode, in order to support Apple's new BITCODE requirement for its Store. We already have a working version where the generated code actively maintains the managed stack data, but it has a significant performance impact.

Currently I am trying to use stackmaps to make important values accessible by the runtime with libunwind. The general design was to put one stackmap after every call and from the entry points we walk over the frames, compute the instruction offset, by subtracting starting IP from current IP, use that instruction offset to lookup the relevant stackmap record and finally, load the values using libunwind based on that stackmap record.

We encountered some unforeseen issues with this design, as it is not guaranteed that the stackmap position is immediately adjacent to the entry point call. In most cases it is just some register copy instruction that restore previous register values. This may not only break instruction offset, but also the locations, because this way we might try to load a value from a register, before it gets restored with its original, proper value.

Here is an example, when the generated stackmap position is not directly adjacent to the entry point call.

LLVM IR:

declare i32 @A()
declare void @llvm.experimental.stackmap(i64, i32, ...)

define i32 @F() {
entry:
%0 = call i32 @A()
call void (i64, i32, ...) @llvm.experimental.stackmap(i64 0, i32 0, i32 %0)
%1 = call i32 @A()
ret i32 %0
}

Generated x86-64 assembly:

  .section __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 11
  .globl _F
  .p2align 4, 0x90
_F: ## @F
  .cfi_startproc
## BB#0: ## %entry
  pushq %rbp
Ltmp0:
  .cfi_def_cfa_offset 16
Ltmp1:
  .cfi_offset %rbp, -16
  movq %rsp, %rbp
Ltmp2:
  .cfi_def_cfa_register %rbp
  pushq %rbx
  pushq %rax
Ltmp3:
  .cfi_offset %rbx, -24
  callq _A
  movl %eax, %ebx
Ltmp4:
  callq _A
  movl %ebx, %eax
  addq $8, %rsp
  popq %rbx
  popq %rbp
  retq
  .cfi_endproc

  .section __LLVM_STACKMAPS,__llvm_stackmaps
__LLVM_StackMaps:
  .byte 1
  .byte 0
  .short 0
  .long 1
  .long 0
  .long 1
  .quad _F
  .quad 24
  .quad 0
  .long Ltmp4-_F
  .short 0
  .short 1
  .byte 1
  .byte 4
  .short 3
  .long 0
  .short 0
  .short 0
  .p2align 3

.subsections_via_symbols

The label Ltmp4 used to compute instruction offset is not right after the call.

With my understanding stackmaps should be able to be used for extracting interesting values from a whole stacktrace, but I don’t know how is this possible, if the stackmap is not generated for the position right after the call.

One possible alternative would be referencing the registers and stackslots that are used for restoring the clobbered regisers, this way the stackmap would not have to rely on registers that have to be restored.

Did I make some incorrect assumptions or is something wrong with the IR I generate?

Thanks!

Best regards,
Daniel Mihalyi