ASAN not generating core dump

alexolog · January 30, 2024, 5:08pm

We’re trying to generate a code dump with ASAN.

Env variables:

ASAN_OPTIONS=disable_coredump=0:disable_core=0:unmap_shadow_on_exit=1:abort_on_error=1
UBSAN_OPTIONS=print_stacktrace=1:halt_on_error=1

(UBSAN is not enabled in compile flags)

However, a report is written to the log, but no core dump is generated. Sanitized report:

==1==Hint: address points to the zero page.
    #0 0x55bf1304f1f9 in func1() /.../source1.cpp:40:8
    #1 0x55bf1304efb4 in func2() /.../source1.cpp:28:5
    #2 0x55bf13046cc3 in func3() /.../header2.h:27:9
    #3 0x55bf13045708 in func4() /.../source3.cpp:20:15
    #4 0x55bf12f028a5 in main /.../main.cpp:59:12
    #5 0x7f8be8918d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #6 0x7f8be8918e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #7 0x55bf12e41984 in _start (/opt/something/app+0x4ec984) (BuildId: ffb5a25ac9f21df13570d5fe528eeaba4aa2fc24)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /.../source1.cpp:40:8 in func1()
==1==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

Please advise.

DavidSpickett · January 31, 2024, 9:59am

Using sanitizers and clang built from main, on AArch64 Linux:

$ cat /tmp/test.c
int main() {
  char arr[10];
  arr[11] = 0;
  return 0;
}
$ rm core.*
$ export ASAN_OPTIONS=disable_coredump=0:unmap_shadow_on_exit=1:abort_on_error=1
$ ./bin/clang /tmp/test.c -o /tmp/test.o -fsanitize=address -g
/tmp/test.c:3:3: warning: array index 11 is past the end of the array (that has type 'char[10]') [-Warray-bounds]
    3 |   arr[11] = 0;
      |   ^   ~~
/tmp/test.c:2:3: note: array 'arr' declared here
    2 |   char arr[10];
      |   ^
1 warning generated.
$ /tmp/test.o
=================================================================
==3973626==ERROR: AddressSanitizer: stack-buffer-overflow on address 0xffff85f0002b at pc 0xaaaaab076154 bp 0xffffe0f70d30 sp 0xffffe0f70d28
WRITE of size 1 at 0xffff85f0002b thread T0
<...>
SUMMARY: AddressSanitizer: stack-buffer-overflow /tmp/test.c:3:11 in main
<...>
==3973626==ABORTING
Aborted (core dumped)
$ file core.test.o.3973626.tcwg-jade-03-dev.1706694768
core.test.o.3973626.tcwg-jade-03-dev.1706694768: ELF 64-bit LSB core file, ARM aarch64, version 1 (SYSV), SVR4-style, from '/tmp/test.o', real uid: 15194, effective uid: 15194, real gid: 10000, effective gid: 10000, execfn: '/tmp/test.o', platform: 'aarch64'
$ ./bin/lldb /tmp/test.o --core ./core.*
(lldb) target create "/tmp/test.o" --core "./core.test.o.3973626.tcwg-jade-03-dev.1706694768"
Core file '/home/david.spickett/build-llvm-aarch64/core.test.o.3973626.tcwg-jade-03-dev.1706694768' (aarch64) was loaded.
(lldb) bt
<...>
    frame #7: 0x0000aaaaab076154 test.o`main at test.c:3:11
    frame #8: 0x0000ffff8858be10 libc.so.6`__libc_start_main(main=(test.o`main at test.c:1), argc=1, argv=0x0000ffffe0f70f48, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=<unavailable>) at libc-start.c:308:16
    frame #9: 0x0000aaaaaaf9fad8 test.o`_start + 52
(lldb) up
<...>
frame #7: 0x0000aaaaab076154 test.o`main at test.c:3:11
   1    int main() {
   2      char arr[10];
-> 3      arr[11] = 0;
   4      return 0;
   5    }

So it can work, and disable_coredump is the right option to be setting.

I think:

AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

May be the clue to what’s going on, but I don’t know enough about sanitizers to say what.

I see older references to this error such as asan: no report when several threads hit bug · Issue #858 · google/sanitizers · GitHub but I would presume those fixed by now.

What version of the sanitizers are you using? As in, what llvm release did they come from?

@vitalybuka maybe you know what causes this error?

DavidSpickett · January 31, 2024, 10:00am

Though you should also check that coredumps are 1: enabled on your system overall and 2: ulimit is not set so low that it can’t write the files.

alexolog · January 31, 2024, 2:50pm

That particular example was from our build machine which is using some combination of v.14 (compiler) and v.15 (tools), but I’m in the process of changing everything to to 17.0.6

My dev machine is a Mac M1 running LLVM 17.0.6 prebuild from homebrew.

Running your example under a debugger (in VSCode), I get a break here:

Then a crash here:

With the following disassembly:

DavidSpickett · January 31, 2024, 3:07pm

I think it’s normal for the final exception to come from within the asan code itself, that’s why I went “up” a bunch of times in the callstack in my example, 4/5 levels up is the code that did the out of bounds write.

I’m no asan expert, but if you consider that the out of bounds write wouldn’t normally be an exception, asan has to generate some exception and doing that means calling a function to do it. So there’ll always be some extra frames after the access.

I wonder if you use a simpler example program do you get the same nested bug in the same thread, aborting? Or does this happen for any complexity of example?

alexolog · January 31, 2024, 5:36pm

I don’t think it is normal for the program to segfault in the call to abort → sigprocmask.

And yes, same issue:

==58723==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016f2e6ceb at pc 0x000100b19528 bp 0x00016f2e6cb0 sp 0x00016f2e6ca8
WRITE of size 1 at 0x00016f2e6ceb thread T0
    #0 0x100b19524 in main+0x17c (0test:arm64+0x100001524)
    #1 0x1843190dc  (<unknown module>)
    #2 0xc0517ffffffffffc  (<unknown module>)

Address 0x00016f2e6ceb is located in stack of thread T0 at offset 43 in frame
    #0 0x100b193b4 in main+0xc (0test:arm64+0x1000013b4)

  This frame has 1 object(s):
    [32, 42) 'arr' (line 3) <== Memory access at offset 43 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (0test:arm64+0x100001524) in main+0x17c

8< snip snip >8

==58723==ABORTING
AddressSanitizer:DEADLYSIGNAL
AddressSanitizer: nested bug in the same thread, aborting.

Just to clarify, a normal call to abort() works just fine.
I think asan does something funny with signals.

alexolog · January 31, 2024, 10:09pm

Getting some progress:

The crash is triggered by the unmap_shadow_on_exit flag.
Without it, I get:
$ export ASAN_OPTIONS=disable_coredump=0:abort_on_error=1
$ build/0test/0test

==60647==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016b31ed0b at pc 0x000104ae1528 bp 0x00016b31ecd0 sp 0x00016b31ecc8
WRITE of size 1 at 0x00016b31ed0b thread T0
    #0 0x104ae1524 in main+0x17c (0test:arm64+0x100001524)
    #1 0x1843190dc  (<unknown module>)
    #2 0x9d26fffffffffffc  (<unknown module>)

Address 0x00016b31ed0b is located in stack of thread T0 at offset 43 in frame
    #0 0x104ae13b4 in main+0xc (0test:arm64+0x1000013b4)

  This frame has 1 object(s):
    [32, 42) 'arr' (line 6) <== Memory access at offset 43 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (0test:arm64+0x100001524) in main+0x17c

8< snip snip >8

==60647==ABORTING
[1]    60647 abort      build/0test/0test

Still no core dump

vitalybuka · February 3, 2024, 5:46am

I never tried coredumps with sanitizers, maybe it’s very broken. The point of sanitizers is to have everything needed for the fix in the sanitizer report.

Without unmap_shadow_on_exitshadow they will be huge. But coredumping with shadow is likely not very well maintained as expensive to test.

“nested bug in the same thread” means something crashed during error reporting. Usually I run under debugger to see what exactly happened.

alexolog · February 4, 2024, 10:15pm

The point of sanitizers is to have everything needed for the fix in the sanitizer report.

They don’t show you the full stack frames, heap, and global variables, so while you know where the issue occurred, you need a core dump to figure out how it got to that point.

Without unmap_shadow_on_exitshadow they will be huge. But coredumping with shadow is likely not very well maintained as expensive to test.

I would prefer to unmap it, but unfortunately with that flag, I get a segfault inside the sanitizer itself and still no core.

I don’t have the sources ready and mapped, but when I ran it under the debugger, it broke in _asan_shadow_memory_dynamic_access with “bad access” - see image in my previous post.

Topic		Replies	Views
ASAN not finding any bugs? LLVM Dev List Archives	5	91	February 3, 2020
About Address San... LLVM Dev List Archives	5	72	May 28, 2012
Clang 3.6 and undefined symbol "ubsan::checkDynamicType" Using Clang	1	76	April 2, 2015
ASan init calls itself (but only when not running under a debugger) Beginners	4	1417	December 15, 2021
Debug info broken for ASan-instrumented binaries Clang Frontend	3	76	March 31, 2014

ASAN not generating core dump

Related Topics