VMRange merging in ProcessElfCore and DoReadMemory

Hi Todd,

I am bit lost in code base and might be confused, but I think we are not getting ProcessElfCore::DoReadMemory still right.

Your fix (rev 201214) to llvm.org/PR18769 with not merging VMRange’s regions if those are from other location in core file is correct.
But when we are requested to do read at boundary of two such not merged VMRanges, we will read correct data from from file to the end of ‘last_entry->data.GetRangeEnd()’ and then we fill rest with ‘\0’.
I think that we should split our requested read operation, to all such ranges in this case.

Is that correct?

I initially thought it was the problem I am after, since I am able to still reproduce this issue for some dumps (mainly multithreaded ones), but I haven’t nailed it down to this case. However it is still my belief that there is (at least potential) issue there.

I will be able provide this fix during the weekend, given my analysis is correct, but for now I wanted to focus on my original issue - which for now leads to ModuleList not being preserved for some reason once created in DynamicLoaderPOSIXDYLD::DidLaunch for core files.

Yet it is still my guess and I need to play with bit more.

Cheers,
/Piotr

[Moving LLVM to BCC and adding lldb-dev]

Hi Piotr!

Thanks for the note.

Hi,

It was just theoretical issue I speculated about, because I was trying to find reason why Unwind ThreadPlan for any other thread than thread group leader fails - and this seems to be already created and cached.

If you want to give it a shot, I’ll attach one of my puppets (10 threads, doing nothing but sleep).
Also I use bfd linker so it has nothing to do with gold in my case.

core was created by kernel (ulimit -c unlimited; kill -3 IIRC)

Please note strange backtraces (clearly .text is not accessible):

(lldb) target create a.out -c core.9025
Core file ‘/home/prak/soft/test/core.9025’ (x86_64) was loaded.
Process 0 stopped

  • thread #1: tid = 0, 0x00007fddcd16b4a2 libpthread.so.0pthread_join + 162, name = 'a.out', stop reason = signal SIGQUIT frame #0: 0x00007fddcd16b4a2 libpthread.so.0pthread_join + 162
    libpthread.so.0pthread_join + 162: -> 0x7fddcd16b4a2: addb %al, (%rax) 0x7fddcd16b4a4: addb %al, (%rax) 0x7fddcd16b4a6: addb %al, (%rax) 0x7fddcd16b4a8: addb %al, (%rax) thread #2: tid = 1, 0x00007fddcc867aad libc.so.6, stop reason = signal SIGQUIT frame #0: 0x00007fddcc867aad libc.so.6 libc.so.6??? + 45:
    → 0x7fddcc867aad: addb %al, (%rax)

libc.so.6`??? + 47:
0x7fddcc867aaf: addb %al, (%rax)

libc.so.6`??? + 49:
0x7fddcc867ab1: addb %al, (%rax)

libc.so.6`??? + 51:
0x7fddcc867ab3: addb %al, (%rax)
thread #3: tid = 2, 0x00007fddcc867aad libc.so.6, stop reason = signal SIGQUIT
frame #0: 0x00007fddcc867aad libc.so.6

and so on it goes for other threads, what looks in memory like 0x00 …

We managed to unwind main thread though.

bt

  • thread #1: tid = 0, 0x00007fddcd16b4a2 libpthread.so.0`pthread_join + 162, name = ‘a.out’, stop reason = signal SIGQUIT
  • frame #0: 0x00007fddcd16b4a2 libpthread.so.0pthread_join + 162 frame #1: 0x0000000000469c77 a.outstd::thread::join() + 39
    frame #2: 0x0000000000433e0b a.outmain + 162 at main.cc:30 frame #3: 0x00007fddcc7d2b05 libc.so.6__libc_start_main + 245
    frame #4: 0x0000000000433b81 a.out`_start + 41
    (lldb) thread 2
    invalid command ‘thread 2’
    (lldb) thread select 2
  • thread #2: tid = 1, 0x00007fddcc867aad libc.so.6, stop reason = signal SIGQUIT
    frame #0: 0x00007fddcc867aad libc.so.6
    libc.so.6`??? + 45:
    → 0x7fddcc867aad: addb %al, (%rax)

libc.so.6`??? + 47:
0x7fddcc867aaf: addb %al, (%rax)

libc.so.6`??? + 49:
0x7fddcc867ab1: addb %al, (%rax)

libc.so.6`??? + 51:
0x7fddcc867ab3: addb %al, (%rax)

  • thread #2: tid = 1, 0x00007fddcc867aad libc.so.6, stop reason = signal SIGQUIT
    frame #0: 0x00007fddcc867aad libc.so.6
    libc.so.6`??? + 45:
    → 0x7fddcc867aad: addb %al, (%rax)

libc.so.6`??? + 47:
0x7fddcc867aaf: addb %al, (%rax)

libc.so.6`??? + 49:
0x7fddcc867ab1: addb %al, (%rax)

libc.so.6`??? + 51:
0x7fddcc867ab3: addb %al, (%rax)
image dump sections
Dumping sections for 6 modules.
Sections for ‘/home/prak/soft/test/a.out’ (x86_64):
SectID Type Load Address File Off. File Size Flags Section Name



0x0000000e code [0x0000000000433840-0x000000000048b7b2) 0x00033840 0x00057f72 0x00000006 a.out…text

Sections for ‘/usr/lib/libc.so.6’ (x86_64):
SectID Type Load Address File Off. File Size Flags Section Name



0x0000000d code [0x00007fddcc7d0490-0x00007fddcc8fb253) 0x0001f490 0x0012adc3 0x00000006 libc.so.6…text

(lldb) memory read 0x0000000000433840
0x00433840: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
0x00433850: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
(lldb) x 0x00007fddcceb9510
0x7fddcceb9510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …
0x7fddcceb9520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 …

Other sections from ELF’s are not there too.

Some single threaded program like /usr/bin/sleep fail with core created by linux kernel, yet fine while created by gcore (may it add .text section to core? - x <address of .text> seems fine then)

I’ve noticed that all Target::ReadMemory and friends go directly to Process instead of trying to read from Modules.
That’s probably no probably no problem for plugins like POSIXProcess and derived since those have everything mapped in memory, and can read it anyway.

Today I won’t pursue it any more, since it is past my bedtime already, but I’ll tinker with that bit more probably during weekend.

Thanks,
/Piotr

main.cc (384 Bytes)

Hi Piotr,

I'm not able to compile your test code (clang 3.4), but as a data
point I have no trouble with a similar C test case of mine, on
FreeBSD. Perhaps you can build it and compare.

My test code is here: https://github.com/emaste/userland-cores

(lldb) bt all
* thread #1: tid = 0, 0x0000000800aa03fc libc.so.7`__sys_thr_kill + 12
at thr_kill.S:3, name = 'gen-core-v1', stop reason = signal SIGABRT
  * frame #0: 0x0000000800aa03fc libc.so.7`__sys_thr_kill + 12 at thr_kill.S:3
    frame #1: 0x0000000800b4487b libc.so.7`abort + 75 at abort.c:65
    frame #2: 0x00000000004007c1
gen-core-v1`thread_fn_2(arg=0x0000000000600c20) + 81 at gen-core.c:37
    frame #3: 0x0000000800822dc4
libthr.so.3`thread_start(curthread=0x0000000801008000) + 260 at
thr_create.c:284

  thread #2: tid = 1, 0x0000000800b2707c libc.so.7`__sys_nanosleep +
12 at nanosleep.S:3, name = 'gen-core-v1', stop reason = signal
SIGABRT
    frame #0: 0x0000000800b2707c libc.so.7`__sys_nanosleep + 12 at nanosleep.S:3
    frame #1: 0x0000000800a929c8 libc.so.7`__sleep(seconds=60) + 40 at
sleep.c:58
    frame #2: 0x0000000800825078 libthr.so.3`___sleep(seconds=60) + 40
at thr_syscalls.c:592
    frame #3: 0x0000000000400755
gen-core-v1`thread_fn_1(arg=0x0000000000600c08) + 69 at gen-core.c:23
    frame #4: 0x0000000800822dc4
libthr.so.3`thread_start(curthread=0x0000000801007c00) + 260 at
thr_create.c:284

  thread #3: tid = 2, 0x0000000800b2707c libc.so.7`__sys_nanosleep +
12 at nanosleep.S:3, name = 'gen-core-v1', stop reason = signal
SIGABRT
    frame #0: 0x0000000800b2707c libc.so.7`__sys_nanosleep + 12 at nanosleep.S:3
    frame #1: 0x0000000800a929c8 libc.so.7`__sleep(seconds=60) + 40 at
sleep.c:58
    frame #2: 0x0000000800825078 libthr.so.3`___sleep(seconds=60) + 40
at thr_syscalls.c:592
    frame #3: 0x0000000000400755
gen-core-v1`thread_fn_1(arg=0x0000000000600bf0) + 69 at gen-core.c:23
    frame #4: 0x0000000800822dc4
libthr.so.3`thread_start(curthread=0x0000000801007800) + 260 at
thr_create.c:284

  thread #4: tid = 3, 0x0000000800b2707c libc.so.7`__sys_nanosleep +
12 at nanosleep.S:3, name = 'gen-core-v1', stop reason = signal
SIGABRT
    frame #0: 0x0000000800b2707c libc.so.7`__sys_nanosleep + 12 at nanosleep.S:3
    frame #1: 0x0000000800a929c8 libc.so.7`__sleep(seconds=60) + 40 at
sleep.c:58
    frame #2: 0x0000000800825078 libthr.so.3`___sleep(seconds=60) + 40
at thr_syscalls.c:592
    frame #3: 0x000000000040088f gen-core-v1`main(argc=1,
argv=0x00007fffffffd540) + 191 at gen-core.c:54
    frame #4: 0x0000000000400681 gen-core-v1`_start(ap=<unavailable>,
cleanup=<unavailable>) + 161 at crt1.c:97

Hi Ed

Sorry for late reply and thanks for looking into it.

Your example fails same way as mine for all combinations static/dynamic gcc/clang.
Wonder if FreeBSD adds .text section to core files.

What would be result of disassembly for you?

For me it always looks like that:

dis -b -f
libc.so.6`__GI_raise:
0x7f72a9b11330: 00 00 addb %al, (%rax)
0x7f72a9b11332: 00 00 addb %al, (%rax)
0x7f72a9b11334: 00 00 addb %al, (%rax)
0x7f72a9b11336: 00 00 addb %al, (%rax)
0x7f72a9b11338: 00 00 addb %al, (%rax)
0x7f72a9b1133a: 00 00 addb %al, (%rax)
0x7f72a9b1133c: 00 00 addb %al, (%rax)
0x7f72a9b1133e: 00 00 addb %al, (%rax)
0x7f72a9b11340: 00 00 addb %al, (%rax)
0x7f72a9b11342: 00 00 addb %al, (%rax)
0x7f72a9b11344: 00 00 addb %al, (%rax)
0x7f72a9b11346: 00 00 addb %al, (%rax)
0x7f72a9b11348: 00 00 addb %al, (%rax)

Could you please compare output of readelf/eu-readelf with mine?

For me it is:

eu-readelf -l core_lnx.1125

Program Headers:

Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
NOTE 0x000388 0x0000000000000000 0x0000000000000000 0x001940 0x000000 0x0
LOAD 0x002000 0x0000000000400000 0x0000000000000000 0x000000 0x0c2000 R E 0x1000
LOAD 0x002000 0x00000000004c2000 0x0000000000000000 0x003000 0x003000 RW 0x1000
LOAD 0x005000 0x00000000004c5000 0x0000000000000000 0x006000 0x006000 RW 0x1000
LOAD 0x00b000 0x0000000001906000 0x0000000000000000 0x002000 0x002000 RW 0x1000
LOAD 0x00d000 0x0000000001908000 0x0000000000000000 0x000000 0x021000 RW 0x1000
LOAD 0x00d000 0x00007f6e4be02000 0x0000000000000000 0x001000 0x001000 0x1000
LOAD 0x00e000 0x00007f6e4be03000 0x0000000000000000 0x800000 0x800000 RW 0x1000
LOAD 0x80e000 0x00007f6e4c603000 0x0000000000000000 0x001000 0x001000 0x1000
LOAD 0x80f000 0x00007f6e4c604000 0x0000000000000000 0x800000 0x800000 RW 0x1000
LOAD 0x100f000 0x00007f6e4ce04000 0x0000000000000000 0x001000 0x001000 0x1000
LOAD 0x1010000 0x00007f6e4ce05000 0x0000000000000000 0x800000 0x800000 RW 0x1000
LOAD 0x1810000 0x00007fffcef2b000 0x0000000000000000 0x022000 0x022000 RW 0x1000
LOAD 0x1832000 0x00007fffceffe000 0x0000000000000000 0x002000 0x002000 R E 0x1000
LOAD 0x1834000 0xffffffffff600000 0x0000000000000000 0x001000 0x001000 R E 0x1000

I am mostly interested if your core contain for .text sections is non-zero length for phdrs in loaded segments:

For me program .text is not clearly there:
LOAD 0x002000 0x0000000000400000 0x0000000000000000 0x000000 0x0c2000 R E 0x1000

I am probably onto something because adding such hack fixes things for me:

diff --git a/source/Target/Target.cpp b/source/Target/Target.cpp
index e781626..21cb29a 100644
--- a/source/Target/Target.cpp
+++ b/source/Target/Target.cpp
@@ -1311,7 +1311,7 @@ Target::ReadMemory (const Address& addr,
     if (!addr.IsSectionOffset())
     {
         SectionLoadList &section_load_list = GetSectionLoadList();
- if (section_load_list.IsEmpty())
+ if (true || section_load_list.IsEmpty())
         {
             // No sections are loaded, so we must assume we are not running
             // yet and anything we are given is a file address.
@@ -1332,7 +1332,7 @@ Target::ReadMemory (const Address& addr,
         resolved_addr = addr;

- if (prefer_file_cache)
+ if (true || prefer_file_cache)
     {
         bytes_read = ReadMemoryFromFileCache (resolved_addr, dst, dst_len,
error);
         if (bytes_read > 0)

This basically forces Target to totally ignore SectionLoadList and with
this change applied it starts to work as expected:

Core file
'/home/prak/tmp/userland-cores/Linux/3.12.8-1-ARCH/x86_64/clang/3.4/core_lnx.1112'
(x86_64) was loaded.
Process 0 stopped
* thread #1: tid = 0, 0x00007f72a9b11369 libc.so.6`__GI_raise + 57, name =
'gen-core-v1', stop reason = signal SIGABRT
    frame #0: 0x00007f72a9b11369 libc.so.6`__GI_raise + 57
libc.so.6`__GI_raise + 57:
-> 0x7f72a9b11369: cmpq $-0x1000, %rax
   0x7f72a9b1136f: ja 0x3538a ; __GI_raise + 90
   0x7f72a9b11371: rep
   0x7f72a9b11372: retq
  thread #2: tid = 1, 0x00007f72a9b92aad libc.so.6, stop reason = signal
SIGABRT
    frame #0: 0x00007f72a9b92aad libc.so.6
libc.so.6`??? + 45:
-> 0x7f72a9b92aad: movq (%rsp), %rdi

libc.so.6`??? + 49:
   0x7f72a9b92ab1: movq %rax, %rdx

libc.so.6`??? + 52:
   0x7f72a9b92ab4: callq 0xf1930 ;
__libc_disable_asynccancel

libc.so.6`??? + 57:
   0x7f72a9b92ab9: movq %rdx, %rax
  thread #3: tid = 2, 0x00007f72a9b92aad libc.so.6, stop reason = signal
SIGABRT
    frame #0: 0x00007f72a9b92aad libc.so.6
libc.so.6`??? + 45:
-> 0x7f72a9b92aad: movq (%rsp), %rdi

libc.so.6`??? + 49:
   0x7f72a9b92ab1: movq %rax, %rdx

libc.so.6`??? + 52:
   0x7f72a9b92ab4: callq 0xf1930 ;
__libc_disable_asynccancel

libc.so.6`??? + 57:
   0x7f72a9b92ab9: movq %rdx, %rax
  thread #4: tid = 3, 0x00007f72a9b92aad libc.so.6, stop reason = signal
SIGABRT
    frame #0: 0x00007f72a9b92aad libc.so.6
libc.so.6`??? + 45:
-> 0x7f72a9b92aad: movq (%rsp), %rdi

libc.so.6`??? + 49:
   0x7f72a9b92ab1: movq %rax, %rdx

libc.so.6`??? + 52:
   0x7f72a9b92ab4: callq 0xf1930 ;
__libc_disable_asynccancel

libc.so.6`??? + 57:
   0x7f72a9b92ab9: movq %rdx, %rax

Also, I expect it might have been failing pretty long time without being
noticed, since for live debugging it wouldn't matter if we load those
values using SectionLoadList or Process::DoReadMemory, because this
information is same and correct in both places.

Hints more than welcome :slight_smile:

Will dig bit more...

Cheers,
/Piotr