I’m writing this post to gauge interest in the upstreaming of my recent work on shared library instrumentation with XRay.
XRay enables the insertion of patchable “NOP sleds” at function entry and exit points .
We are currently evaluating the use of XRay in conjunction with established profiling tools from the HPC domain, such as Score-P.
To make XRay viable for this purpose, we implemented support for the instrumentation of dynamic shared objects (DSOs).
This extension is backward compatible with the current XRay API.
Originally created by Dean Michael Berris at Google, XRay hasn’t been in active development in quite some time.
XRay enables patchable instrumentation of the main executable, but currently provides no way of collecting performance data from DSOs.
This issue has been discussed a few times and seems to be of interest to people (see here or here for example).
We have recently developed a prototype that adds DSO instrumentation support and we would like to make this available upstream.
This work was done as part of a research tool for selective instrumentation of HPC applications (https://github.com/tudasc/CaPI).
CaPI uses the instrumentation capabilities of XRay in conjunction with the Score-P profiling infrastructure and the Extrae tracing tool.
A prototype implementation of DSO support is available in a fork of LLVM 13: https://github.com/sebastiankreutzer/llvm-project-xray-dso (branch
The following changes were required:
- Changes to the main XRay runtime library and API, in order to support multiple patchable objects.
- Addition of the
xray-dsoruntime library, to identify and collect patchable sleds for each individual DSO.
- Minor changes to the Clang driver to automatically link in the
The feature is enabled via the
During the creation of a shared library, the newly added
xray-dso runtime library is statically linked into each DSO alongside local trampoline definitions.
Its purpose is to identify the XRay sled addresses and relate this information to the main XRay runtime at program start.
On the executable side, the XRay runtime maintains a map that keeps track of all currently registered DSOs.
Functions are identified by a packed function ID consisting of the object ID and the object-local function ID (Fig. 2).
Patching of DSO functions is analogous to functions from the main executable, but uses DSO-local relocatable trampoline implementations.
An overview of the interaction between the runtime components is shown in Fig. 1.
Figure 1: Overview of runtime interaction
Figure 2: Packed function ID layout
For a more in-depth explanation of the required changes, please refer to our recent paper 
Since our use case employs XRay in conjunction with third-party profiling libraries, we have not worked on XRay’s own tracing/logging capabilities thus far.
As such, the trace analysis features offered by the
llvm-xray utility are currently incapable of correctly mapping symbol names for DSOs.
If there is interest in the DSO instrumentation feature, we will extend these tools accordingly.
Furthermore, the relocatable trampoline variants are currently only implemented for X86.
We expect the expansion to other targets to be straightforward.
 Berris, D. M., Veitch, A., Heintze, N., Anderson, E., & Wang, N. (2016). XRay: A function call tracing system. Whitepaper
 S. Kreutzer, C. Iwainsky, M. Garcia-Gasulla, V. Lopez and C. Bischof, “Runtime-Adaptable Selective Performance Instrumentation,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 2023, pp. 423-432, doi: 10.1109/IPDPSW59300.2023.00073. Pre-print: https://arxiv.org/abs/2303.11110