Resolve line numbers with LLVM symbolizer

Hi LLVM people,

I would like the llvm-symbolizer to resolve both function names and line/column numbers for me. The latter does not seem to work.

I’m using the Memory Sanitizer example with LLVM 3.3. This is what I did:

➜ msan_test cat test.c
#include <stdio.h>

int main(int argc, char** argv) {
int a[10];
a[5] = 0;
if (a[argc])
printf(“xx\n”);
return 0;
}
➜ msan_test clang -fsanitize=memory -fno-omit-frame-pointer -g -O2 -o test test.c
➜ msan_test ./test
==9547== WARNING: Use of uninitialized value
#0 0x55ea59c9c476 (/home/nathan/test/c/msan_test/test+0x25476)
#1 0x7fbdfc187a3f (/lib/x86_64-linux-gnu/libc.so.6+0x20a3f)
#2 0x55ea59c9c2a8 (/home/nathan/test/c/msan_test/test+0x252a8)
Exiting

Copied file/offset pairs into report.txt. I am aware of MSAN_SYMBOLIZER_PATH.

➜ msan_test cat report.txt
/home/nathan/test/c/msan_test/test 0x25476
/lib/x86_64-linux-gnu/libc.so.6 0x20a3f
/home/nathan/test/c/msan_test/test 0x252a8
➜ msan_test llvm-symbolizer < report.txt
main
??:0:0

??
??:0:0

_start
??:0:0

➜ msan_test

It’s unclear to me why no line numbers show up. This is what readelf has to say:

➜ msan_test readelf -WS test | egrep ‘.(stab|debug)’
[28] .debug_info PROGBITS 0000000000000000 02ec10 040d9e 00 0 0 1
[29] .debug_abbrev PROGBITS 0000000000000000 06f9ae 006943 00 0 0 1
[30] .debug_loc PROGBITS 0000000000000000 0762f1 0637d3 00 0 0 1
[31] .debug_aranges PROGBITS 0000000000000000 0d9ac4 000570 00 0 0 1
[32] .debug_ranges PROGBITS 0000000000000000 0da034 014fe0 00 0 0 1
[33] .debug_line PROGBITS 0000000000000000 0ef014 0091de 00 0 0 1
[34] .debug_str PROGBITS 0000000000000000 0f81f2 010e90 01 MS 0 0 1
[36] .debug_macinfo PROGBITS 0000000000000000 1090cd 000000 00 0 0 1
[37] .debug_pubtypes PROGBITS 0000000000000000 1090cd 000000 00 0 0 1

Can someone point me in the right direction?

Thanks in advance!

This could get a little bit tricky due to Address Space Layout Randomization (ASLR). llvm-symbolizer is able to map an address to file:line:offset, but only if the address given to it matches what is in the ELF file to begin with.

ASLR introduces randomness into the upper address bits, which makes this impossible unless you can filter it out.

If you are using the Linux dladdr function in the process of getting a backtrace, then it provides the base address of whatever segment the queried function is in. (This is true at least for shared libraries; I’m not sure if it is also true for those parts of the program that are not in a shared library.) Subtracting the base address of the library from the address in question will give you the offset into the library, which is the address that llvm-symbolizer really wants.

if (dladdr(addr, &info)) {
// addr - info.dli_fbase is address that must be passed to llvm-symbolizer
string so_name = info.dli_fname;
intptr_t offset = (char*)addr - (char*)info.dli_fbase;

// invoke llvm-symbolizer with so_name and offset