Overview
Producing usable logs often requires referring to the addresses at which events occurred, for backtraces, sanitizer reports, and the like. Since logs are generally intended to be human readable, it’s important that addresses be referenced symbolically.
Symbolizing requires a considerable amount of information: the symbol table of the binary, where shared libraries were mapped, and the symbol tables of those libraries. Typically, the binary itself symbolizes the logs, possibly in concert with the underlying OS. Accordingly, all this information needs to be available at runtime.
For size-constrained environments, it’s desirable to strip out as much information as possible from the runtime environment. Ideally, this could be done without producing unreadable logs.
Instead of symbolizing, a binary can defer the process by embedding markup in its logs. The markup provides necessary contextual information, like memory layout, as well as presentation context, like whether an address is a line of a backtrace. Later, a symbolizing filter can process the logs and replace the markup with human-readable symbols, formatted appropriately.
Decoupling symbolization from execution makes the process more flexible. For instance, symbolization could be done lazily on-demand or in batch. The resulting output could be plain text, colored terminal output, or rich HTML with links to hosted sources. For embedded development, the symbolizing filter would typically run on considerably more powerful hardware, and the debug binaries could be hosted remotely via debuginfod.
Markup-based log symbolization has been well-used in Fuchsia, but it makes no onerous assumptions about the host or target platform. The approach should generalize well to the platforms currently supported by LLVM.
Proposal
We propose incorporating the symbolizer markup format currently used by Fuchsia into LLVM.
A simple symbolizing filter should be added: llvm-logsym
. This would replace the markup in logs from stdin
with human-readable symbol information and output the resulting text to stdout
.
A symbolization markup parsing library should be added to LLVM’s symbolization library. llvm-logsym
should use this library internally. The library would allow more exotic symbolizing filters to be written in conjunction with the existing symbolizer library, depending on specific user needs.
A markup reference document should be added to LLVM based on the contents of the existing specification.
Implementation Notes
llvm-logsym
should share a number of options with llvm-symbolizer
:
--basenames
--debuginfod
--demangle
--dwp
--fallback-debug-path
--functions
--inlining
--relativenames
--dia
--default-arch
--dsym-hint
- The opposing variants of any of these flags
The options not shared with llvm-symbolizer
are those that control output in ways that conflict with the markup specifiers or that specify specific objects to symbolize.
The common options should be extracted into a library for constructing a Symbolizer configuration from flags. This library should then be used in both llvm-symbolizer
and llvm-logsym
.
Markup tags not understood by llvm-logsym
should be ignored and passed through unchanged; this would allow later passes to handle them. The presence of unhandled markup should cause contextual markup to be passed through unchanged; otherwise the markup could become impossible to interpret.
The markup parser should be simple enough to be a single class in the existing Symbolize library.