I missed that the testing case is returning a struct.You are right in VARegSaveSize.
For callee:
sub sp, sp, #16
push {r11, lr}
mov r11, sp
sub sp, sp, #8
str r3, [r11, #20]
str r2, [r11, #16]
str r1, [r11, #12]
ldr r1, [r11, #76]
The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12
of leftover bytes 67-12 = 55
r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry + 52, this is incorrect, it should be at align(55, 4) = 56.
For caller:
mov r0, sp
ldr r1, .LCPI1_0
str r1, [r0, #56]
the 2nd argument is at sp_at_entry + 56, which is correct.
On my setup (built from TOT), I got “ldr r1, [r11, #80]” instead of 76.
Thanks,
Manman