[libunwind]ARM EHABI co-operate with libgcc_s hang in Android

Android has an malloc debug function which described in https://android.googlesource.com/platform/bionic/+/master/libc/malloc_debug/. When it is enabled, it use libunwind _Unwind_Backtrace to record malloc stacktrace and can analyze memory problems such as memory leaks.
Condition 1:

It works from Android 5.0 with libunwind(still in libcxxabi) commit https://llvm.org/viewvc/llvm-project?view=revision&revision=216730.
This commit avoids call personality function likes __gxx_personality_v0 but with &__gxx_personality_v0 which means only in static build it can continue unwinding.

Condition 2:
Android 6.0 pick up libunwind(still in libcxxabi) commit https://llvm.org/viewvc/llvm-project?view=revision&revision=226822.
This commit try to fix Condition 1 by call personality function in Generic Model.
And it leads to the problem I will describe later.

Condition 3:
Android 7.0 comes with libunwind move out from libcxxabi and pick up libunwind commit https://llvm.org/viewvc/llvm-project?view=revision&revision=238560.
This commit do further more and only calls personality function in _Unwind_Backtrace. To use this commit, libcxxabi should pick up commit https://llvm.org/viewvc/llvm-project/libcxxabi/trunk/src/cxa_personality.cpp?r1=238561&r2=238560&pathrev=238561.

Problem in ARM EHABI likes armv7a:
When using malloc debug in Android, it means the application loads libc_malloc_debug first, and it is compiled with libunwind(llvm). And my application still use gnustl static or shared which means libgcc_s.a is used for user application.
And when the application calls malloc, it will go into libc_malloc_debug and call libunwind _Unwind_Backtrace() and _Unwind_Backtrace() will call personality function. In Android 6.0, it calls libgcc_s personality function in https://android.googlesource.com/toolchain/gcc/+/ndk-r15-release/gcc-4.9/libstdc+±v3/libsupc++/eh_personality.cc. And it dies with the backtrace as follows:

02-25 21:42:28.269 F/DEBUG ( 285): pid: 12453, tid: 12453, name: mo.helloandroid >>> com.jean.demo.helloandroid <<<

02-25 21:42:28.269 F/DEBUG ( 285): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0

02-25 21:42:28.289 F/DEBUG ( 285): r0 00000001 r1 00004080 r2 00000000 r3 00000008

02-25 21:42:28.289 F/DEBUG ( 285): r4 00004080 r5 bea56a28 r6 00000080 r7 00000000

02-25 21:42:28.289 F/DEBUG ( 285): r8 bea564e4 r9 bea565a8 sl 80000000 fp 00000005

02-25 21:42:28.289 F/DEBUG ( 285): ip b39c61b8 sp bea563b8 lr b39d6955 pc b39d64bc cpsr 240d1c30

02-25 21:42:28.300 F/DEBUG ( 285):

02-25 21:42:28.300 F/DEBUG ( 285): backtrace:

02-25 21:42:28.300 F/DEBUG ( 285): #00 pc 000144bc /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (_Unwind_VRS_Pop+47)


02-25 21:42:28.300 F/DEBUG ( 285): #01 pc 00014951 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (__gnu_unwind_execute+162)


02-25 21:42:28.300 F/DEBUG ( 285): #02 pc 00014b45 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (__gnu_unwind_frame+32)


02-25 21:42:28.300 F/DEBUG ( 285): #03 pc 00004599 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (__gxx_personality_v0+336)


02-25 21:42:28.301 F/DEBUG ( 285): #04 pc 00008517 /system/lib/libc_malloc_debug_leak.so (_Unwind_Backtrace+130)

02-25 21:42:28.301 F/DEBUG ( 285): #05 pc 00006003 /system/lib/libc_malloc_debug_leak.so (get_backtrace(unsigned int*, unsigned int)+34)

02-25 21:42:28.301 F/DEBUG ( 285): #06 pc 00006a7d /system/lib/libc_malloc_debug_leak.so (leak_malloc+84)

02-25 21:42:28.301 F/DEBUG ( 285): #07 pc 00007911 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (operator new(unsigned int)+12)

02-25 21:42:28.301 F/DEBUG ( 285): #08 pc 00006eb1 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator const&, std::forward_iterator_tag)+144)

02-25 21:42:28.302 F/DEBUG ( 285): #09 pc 000071e3 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (std::basic_string<char, std::char_traits, std::allocator >::basic_string(char const*, std::allocator const&)+30)

02-25 21:42:28.302 F/DEBUG ( 285): #10 pc 000041f3 /data/app/com.jean.demo.helloandroid-1/lib/arm/libnative-lib.so (Java_com_jean_demo_helloandroid_MainActivity_stringFromJNI+58)

02-25 21:42:28.302 F/DEBUG ( 285): #11 pc 008629e9 /data/app/com.jean.demo.helloandroid-1/oat/arm/base.odex (offset 0x432000) (java.lang.String com.jean.demo.helloandroid.MainActivity.stringFromJNI()+76)

02-25 21:42:28.302 F/DEBUG ( 285): #12 pc 008626f9 /data/app/com.jean.demo.helloandroid-1/oat/arm/base.odex (offset 0x432000) (void com.jean.demo.helloandroid.MainActivity.onCreate(android.os.Bundle)+444)

After some investigate to this problem, I find there are some problems here.
(1) Is it designed to mix libunwind(llvm) with libgcc_s personality routine? libgcc_s has compatible _Unwind_Control_Block with libunwind but has incompatible _Unwind_Context with libunwind.
In gcc, _Unwind_Context is

struct core_regs
_uw r[16];
/* We use normal integer types here to avoid the compiler generating
coprocessor instructions. /
struct vfp_regs
_uw64 d[16];
_uw pad;
struct vfpv3_regs
Always populated via VSTM, so no need for the “pad” field from
vfp_regs (which is used to store the format word for FSTMX). /
_uw64 d[16];
struct wmmxd_regs
_uw64 wd[16];
struct wmmxc_regs
_uw wc[4];
typedef struct
The first fields must be the same as a phase2_vrs. /
_uw demand_save_flags;
struct core_regs core;
_uw prev_sp; /
Only valid during forced unwinding. */
struct vfp_regs vfp;
struct vfpv3_regs vfp_regs_16_to_31;
struct wmmxd_regs wmmxd;
struct wmmxc_regs wmmxc;
} phase1_vrs;

But in libunwind, it is a class with vptr UnwindCursor<LocalAddressSpace, Registers_arm>.

In https://android.googlesource.com/toolchain/gcc/+/ndk-r15-release/gcc-4.9/libgcc/config/arm/pr-support.c, it also calls
_Unwind_VRS_Set (context, _UVRSC_CORE, R_PC, _UVRSD_UINT32,
to store context but what’s the context here?

(2) Since I have no way because Android use libunwind and our application should still use gnustl for a while. It means the mix will long exists.
(3) Actually in Condition 1, it sounds it does not call personality function for Generic Model, but since it calls _Unwind_VRS_Interpret, it really does as libcxxabi __gxx_personality_v0 does!
So, I have a question, can we inline libcxxabi __gxx_personality_v0 in _Unwind_Backtrace to avoid external call to personality function like libgcc_s and old libcxx+gabi++, and it mays dies in some condition.
If it can, actually we should only pick up __gnu_unwind_frame() for personality function.


If mixing libunwind with other personality function like libgcc_s
First, we use _US_VIRTUAL_UNWIND_FRAME | _US_FORCE_UNWIND in _Unwind_Backtrace() which calls continue_unwind(), so why not call _US_UNWIND_FRAME_RESUME? It calls continue_unwind() too. https://llvm.org/viewvc/llvm-project/libunwind/trunk/src/UnwindLevel1-gcc-ext.c?view=markup
Second, _Unwind_Context is not compatible.

To make _Unwind_Context binary compatible in core registers with phase1_vrs in libgcc_s, we should:
(1) Swap _registers and _addressSpace in UnwindCursor class.
(2) Make UnwindCursor aligned to 4 for ARM EHABI.
I don’t know why the first member of UnwindCursor align to 8 for ARM EHABI. UnwindCursor implements the interface class AbstractUnwindCursor.

To conclude,

  1. Mixing libunwind _Unwind_Backtrace with other personality function will lead to incompatible behavior. I suggest we call continue_unwinding() than call personality function.
  2. Though we could change UnwindCursor to make it compatible with libgcc_s in some case and even use _US_UNWIND_FRAME_RESUME to make it compatible with gabi++, it is not a good method.


FWIW Android is moving to its own unwinder soon (https://android.googlesource.com/platform/system/core/+/master/libunwindstack/) and the NDK is removing all STLs but libc++ (and thus NDK apps won’t use libgcc for unwinding any more) in NDK r18.

Thank Dan for pointing out Android is moving to its own unwinder soon, so maybe the problem will be missing in Android P.

Dan, when I investigate this problem, I find prebuilts/ndk sources/cxx libc++ for Android 5.0/6.0/7.0 is built with libcxx + gabi++.
It it strange that Android 5.0 prebuilts/ndk libc++ works but Android 6.0 fails from https://android.googlesource.com/platform/prebuilts/ndk/+/45b1d17d3c24e23b0955928e62881e6fd0f7b8c5.
Is it a way that I can rebuilt this commit to test why it fails?

Can anyone give me some suggestion why UnwindCursor sounds to align to 8 for ARM EHABI? I just find ArmEabiPort - Debian Wiki but it does not tell anything about C++ vptr.