Hi Steve,
Am happy to help!
My suggestion to get started here would be to look first into the implementation of the runtime in compiler-rt to see how we’re handling the instrumentation map, and determining how we’re patching and un-patching the instrumentation points.
I describe some of how this works in the 2017 LLVM Developers Meeting talk I gave on the subject (2017 LLVM Developers’ Meeting: D. Michael “XRay in LLVM: Function Call Tracing and Analysis ” - YouTube).
For shared libraries, we’d have to think about how we’ll augment the instrumentation map the currently running program sees to allow for patching the code that’s associated with the shared library, and/or whether we put a smaller “core” of the patching/un-patching mechanism in each shared library built with XRay.
Some things we need to think about:
1) We need to be able to associate the function ID’s we synthesise for instrumented functions in a shared library with function addresses and symbols in that shared library (at least function names).
2) At runtime the shared library functions that have been instrumented need to be using the globally defined “handlers” for the currently running binary. While this should just work, we need to ensure that the jump/call to the trampolines will be within the 32-bit relative offset we’ve constrained ourselves to — which will mean that each shared library will have their own trampolines (as defined in the small “core” patching/un-patching linked into each XRay-instrumented shared library).
3) Dynamic loading and unloading of XRay-instrumented shared libraries should be safe to do while we’re tracing “live”. This might be a bit tricky to get right as this will mean potentially having instrumented XRay code running when `dlclose()` is called, which might be calling into the trampoline. There’s also the issue of being able to handle signals safely while this is happening.
4) The mapping of IDs to function symbols/addresses for shared library functions need to be exported along with the trace/profile generated with XRay — to allow offline processing to not rely on having to reconstruct the instrumentation map from shared libraries. Note that, depending on how the function ID generation is done, it may not be a stable mapping (i.e. order of loading might change the numbering of the function IDs at runtime).
5) When shared library pages are typically placed on memory pages that are shared across multiple processes that use the same shared libraries. Patching these effectively triggers “copy-on-write” behaviour, thus having a multiplicative effect on the memory usage when XRay instrumentation is enabled. We’d need to think about whether there are ways to avoid this, or at least communicate this more effectively.
I’m sure there are more things that I’m missing here on the various discussions that have been had about how to do this with XRay in the past couple of years.
If this is something you’re interested in diving into, please feel free to poke around. I’m happy to have direct correspondence too to thresh out the details. Martin also has a lot of state on this, and he might want to share more as well.
Cheers