[RFC] Upstreaming DSO Instrumentation Support for XRay

I’m writing this post to gauge interest in the upstreaming of my recent work on shared library instrumentation with XRay.

Summary

XRay enables the insertion of patchable “NOP sleds” at function entry and exit points [1].
We are currently evaluating the use of XRay in conjunction with established profiling tools from the HPC domain, such as Score-P.
To make XRay viable for this purpose, we implemented support for the instrumentation of dynamic shared objects (DSOs).
This extension is backward compatible with the current XRay API.

Background

Originally created by Dean Michael Berris at Google, XRay hasn’t been in active development in quite some time.
XRay enables patchable instrumentation of the main executable, but currently provides no way of collecting performance data from DSOs.
This issue has been discussed a few times and seems to be of interest to people (see here or here for example).
We have recently developed a prototype that adds DSO instrumentation support and we would like to make this available upstream.

This work was done as part of a research tool for selective instrumentation of HPC applications (https://github.com/tudasc/CaPI).
CaPI uses the instrumentation capabilities of XRay in conjunction with the Score-P profiling infrastructure and the Extrae tracing tool.

Implementation

A prototype implementation of DSO support is available in a fork of LLVM 13: https://github.com/sebastiankreutzer/llvm-project-xray-dso (branch xray_dso).

The following changes were required:

  1. Changes to the main XRay runtime library and API, in order to support multiple patchable objects.
  2. Addition of the xray-dso runtime library, to identify and collect patchable sleds for each individual DSO.
  3. Minor changes to the Clang driver to automatically link in the xray-dso runtime library.

The feature is enabled via the -fxray-enable-shared flag.
During the creation of a shared library, the newly added xray-dso runtime library is statically linked into each DSO alongside local trampoline definitions.
Its purpose is to identify the XRay sled addresses and relate this information to the main XRay runtime at program start.
On the executable side, the XRay runtime maintains a map that keeps track of all currently registered DSOs.
Functions are identified by a packed function ID consisting of the object ID and the object-local function ID (Fig. 2).
Patching of DSO functions is analogous to functions from the main executable, but uses DSO-local relocatable trampoline implementations.
An overview of the interaction between the runtime components is shown in Fig. 1.

xray_dso_patching
Figure 1: Overview of runtime interaction

bit_layout
Figure 2: Packed function ID layout

For a more in-depth explanation of the required changes, please refer to our recent paper [2]

Prototype limitations

Since our use case employs XRay in conjunction with third-party profiling libraries, we have not worked on XRay’s own tracing/logging capabilities thus far.
As such, the trace analysis features offered by the llvm-xray utility are currently incapable of correctly mapping symbol names for DSOs.
If there is interest in the DSO instrumentation feature, we will extend these tools accordingly.

Furthermore, the relocatable trampoline variants are currently only implemented for X86.
We expect the expansion to other targets to be straightforward.

References

[1] Berris, D. M., Veitch, A., Heintze, N., Anderson, E., & Wang, N. (2016). XRay: A function call tracing system. Whitepaper

[2] S. Kreutzer, C. Iwainsky, M. Garcia-Gasulla, V. Lopez and C. Bischof, “Runtime-Adaptable Selective Performance Instrumentation,” 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 2023, pp. 423-432, doi: 10.1109/IPDPSW59300.2023.00073. Pre-print: https://arxiv.org/abs/2303.11110

4 Likes

I’m very much in favor of adding this functionality upstream! Unsure about my technical expertise on every aspect though.

2 Likes

Fuchsia makes heavy use of shared libraries and we’re also trying to support XRay. DSO support is something we already planned to look into and it’s great to see progress in this space. I’d be happy to help with reviews.

Regarding the design, the direction I was considering was similar to ASan runtime where the runtime itself is built as DSO (libclang_rt.xray.so) and every binary, executable or shared library, link the helper library (libclang_rt.xray_static.a). This model supports scenarios where the runtime needs be loaded even before the main executable does, for example when instrumenting the libc (including the dynamic linker), which is something we do in Fuchsia.

1 Like

Hi Petr,
Glad to hear that this is of interest to you!
I’m happy to collaborate on design adjustments.

I’m not familiar with how ASan does these things in detail, but I will check it out.
If I understand the idea correctly, the patching and event handling functionality would be moved to libclang_rt.xray.so and the only purpose of the static library would be to relay sled information and trampoline addresses to that library?

I’m thinking the easiest would be to first integrate DSO support based on the existing static XRay library.
In a second step, we could migrate to the shared library approach and make the required adjustments to support the use case you described.
This way, we can integrate the changes incrementally before breaking backwards compatibility with the “old” implementation.

Would that approach work for you?