llvm-symbolizer memory usage

I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we’d like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can’t get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:

  1. Does it seem feasible to make llvm-symbolizer work without loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?
  2. If we figured this out, and put it behind something like a “–low-memory” flag, would it be something the upstream community would accept?

Francis

(Adding Hyoun who’s been looking at memory use of llvm-symbolizer recently too)

I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we’d like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can’t get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:

  1. Does it seem feasible to make llvm-symbolizer work without loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?

Does memory mapping the file actually use real memory? Or is it just reading from the file, effectively? I don’t think the mapped file was part of the memory usage Hyoun and I encountered when doing memory accounting. What we were talking about was an LRU cache of DwarfCompileUnits, or something like that - to strip out the DIEArrays and other associated data structures after they were used.

Are you running llvm-symbolizer on many input addresses in a single run? Only a single address? Optimized or unoptimized build of llvm-symbolizer?

  1. If we figured this out, and put it behind something like a “–low-memory” flag, would it be something the upstream community would accept?

Maybe, though I’m hoping we can avoid having to have too much of a perf tradeoff for low memory usage, so we can keep it all together without a flag.

(Adding Hyoun who’s been looking at memory use of llvm-symbolizer recently too)

I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we’d like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can’t get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:

  1. Does it seem feasible to make llvm-symbolizer work without loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?

Does memory mapping the file actually use real memory? Or is it just reading from the file, effectively? I don’t think the mapped file was part of the memory usage Hyoun and I encountered when doing memory accounting. What we were talking about was an LRU cache of DwarfCompileUnits, or something like that - to strip out the DIEArrays and other associated data structures after they were used.

I might be wrong because I’m not familiar with LLVM. When I tried to reduce the RSS of our symbolizer usage, I also saw both input file mapping and internal data structure (DIEArray, line table, etc.) took significant memory.

As Dave mentioned, I’ve tried LRU caching for the internal data structure and that could reduce the memory usage quite a bit for our use case of symbolizing many addresses in a single run. We’re working on somehow upstreaming the caching.

The input file part seems more complicated. For us, the file is memory-mapped and the kernel only brings in needed pages. It was a problem for us because we need to symbolize many addresses and the kernel couldn’t handle the access pattern very well leaving the entire file in memory. I could reduce RSS by inserting madvise(MADV_DONTNEED) here and there, but I don’t think it’s likely to be upstreamed.

While I follow the code path for memory mapping the input file, I vaguely recall seeing other code paths that could just alloc memory worth the entire file and copy it when memory-mapped file is not available. Is this the case for you?

Thanks,
HK