Implement an XRay mode ‘xray-profiling’ that gathers stack trace latencies/durations and builds histograms to provide basic statistics about where time is going in an execution of the application.
XRay has two modes currently implemented in compiler-rt: a basic (nee naive) mode and flight data recorder (FDR) mode.
Basic mode logging, when enabled, will collect traces and write verbose logs to disk. This mode has some unsophisticated filtering and stack-depth limiting support. This mode is useful for tracing short-lived applications, so that execution can be traced throughout the lifetime of the application. We’ve used basic mode logging to identify sources of latency in Clang/LLVM.
FDR mode logging is designed to record traces and aggressively filter out events that fall below a latency threshold. This mode uses a fixed amount of heap space to record events as they’re happening, and only writes out the logs on demand. FDR mode is used mostly for long-running applications and services.
We also already have tools in LLVM that can turn these XRay traces into other forms that are easier to digest. One tool turns XRay traces into something that the Chrome Trace Viewer can render  and another summarises it to generate latency flame graphs .
Both basic and FDR mode logging can create massive traces. One example, where a fully-instrumented clang binary was traced with XRay’s basic mode, we found that it could generate an 89GB trace compiling the canonical “Hello, World!” program.
One way to address this problem is to limit the amount of memory being used by the implementation through flags. Unfortunately that also means basic mode XRay will block more on writing out the log files.
Filtering at runtime also takes some CPU time, so the overheads of writing out the log as well as filtering the records become non-trivial for heavily instrumented programs.
We’d like to propose a profiling mode to XRay, called ‘xray-profiling’. The intent is to implement handlers that will generate profile data  that other tools can digest. We’ll export the data in an XRay-specific format for a more compact representation. We intend to implement conversion tools to support the profile.proto format that pprof can handle as input.
To do this, we need to record the “stack” of functions that are XRay instrumented, represented as a prefix tree (or forest, as we’ll see later). This prefix tree will represent the XRay instrumented functions that have been entered, and then exited.
A() → B() → C()
Each node in the tree will contain a histogram of the latencies/durations of the function by subtracting the timestamp at the time of exit from the timestamp at entry.
When the tracing functionality is finalised and flushed, we end up with a data file that represents the prefix tree in a summarised form following the profile.proto format.
As with the Basic and FDR modes, the profiling mode will be implemented in compiler-rt and will be default-linked into any XRay instrumented binary.
In the future we can provide more targeted archives that can be controlled in clang, such that users can determine which runtime implementations they’d like to be linked into the final binary. This work may be done in parallel, but is unrelated to the work involved with the XRay profiling mode implementation.
One of the issues we’re finding with the XRay work is there’s very little community knowledge about what’s happening in the project. I would like to change that by encouraging more reviewers, interested in XRay, to please sign up to be a reviewer. Eventually, we’d like to grow the list of people contributing to XRay from the community and this is an opportunity to do so.
- The triggering mechanism is still through programmatically turning on/off XRay instrumentation in an XRay-instrumented binary. We’re exploring some other options based on ptrace (will send another RFC at some point about this) and a tool that will explicitly control the XRay tracing externally. The alternative to this is having a dedicated XRay thread that will periodically check whether a signal has been tripped, and do the standardised procedure for enabling/disabling/flushing trace data. Any alternatives to this that we may not have considered yet?
Thanks in advance!