TLDR; During my internship at Google I developed a proof of concept framework for supporting dynamic behavior injection. It allows users to specify alternate implementations of functions, and dynamically switch between the original and new behavior at runtime. It works by dispatching the original call through a function pointer that can either point to the original function body or the injected version. We would like feedback about our approach, and the communities’ interest in adding our framework to LLVM.
Is this project derived from a real-world use case or a just a good-to-have framework?
We developed the framework to be used as the basis for performing software fault injection in Fuchsia, in particular for injecting dynamic faults. As we started, exploring this problem space we recognized that this type of framework is more broadly useful outside of fault injection, and that a method for dynamically enabling new program behaviors seems like a useful feature in and of itself. In someways this line of thinking has already been proven by Microsoft’s work on Detours, which enables new program behaviors through a different mechanism (trampolines). Hopefully, as part of the compiler toolchain, Syringe can serve as the basis a variety of other tools and frameworks besides fault injection. A few that come to mind are dependency injection, dynamically enabled security checks, and the selective application of sanitizers.
TLDR; I have been working on the same problem in the past.
I even had a presentation about it in the last LLVM dev conference in
Bristol, 2018 EuroLLVM Developers’ Meeting: G. Márton “Compile-Time Function Call Interception to Mock ... ” - YouTube .
Implementation: https://github.com/martong/finstrument_mock .
White paper: https://martong.github.io/compile-time-fci-to-mock_llvm_2018.pdf .
Seems like your implementation (Syringe) and my realization (fci_mock)
has a lot in common.
I think we should merge our efforts and cooperate to provide an
industrial strength implementation which could land in the future in
LLVM/Clang (**if the community is interested**) and which may
revolutionize testing!
I am really happy and excited to see that an industrial giant, Google,
is interested in having a generic injection framework for C/C++!
TLDR; I have been working on the same problem in the past.
I even had a presentation about it in the last LLVM dev conference in
Bristol, https://www.youtube.com/watch?v=mv60fYkKNHc .
Implementation: https://github.com/martong/finstrument_mock .
White paper: https://martong.github.io/compile-time-fci-to-mock_llvm_2018.pdf .
Seems like your implementation (Syringe) and my realization (fci_mock)
has a lot in common.
I think we should merge our efforts and cooperate to provide an
industrial strength implementation which could land in the future in
LLVM/Clang (if the community is interested) and which may
revolutionize testing!
I am really happy and excited to see that an industrial giant, Google,
is interested in having a generic injection framework for C/C++!
First, thanks for your feedback. Its interesting to see alternative approaches and discuss the trade-offs. Its unfortunate that I missed your work in my literature review. I focused most of my attention on fault injection, but did a review of call interception techniques as well. My apologies for missing this.
In my implementation (fci_mock) I faced similar issues you did and I
choose different solutions for some of them. I think this is the good
place to do a comparison between the two.
Fci_mock has a similar architecture to Syringe.
It consists of three parts: a compiler instrumentation module
(backend), a runtime library and a language specific module
(frontend).The instrumentation module modifies the code to check whether a
function has to be replaced or not.
During the code generation we modify each and every function call
expression to call an auxiliary function.
By instrumenting the call expressions (and not the function body) we
have the convenience and benefit that we do not have to recompile
dependent libraries if the call expression is in a code outside of the
library (e.g in glibc, or in libstdc++).
This is done in the CodeGen of Clang, however it would be better
handled as an LLVM pass.
By having an LLVM pass in Syringe is a great benefit.
In contrast to Syringe, we instrument all call expressions, but this
way we don’t have to modify anything in the production code.
Instrumenting call sites was also something we considered, but ultimately decided against. There is an interesting trade-off between instrumenting the function definition and its call sites. in the end, we felt modifying the function itself made the most sense, and allowed us to only modify targeted portions of the code. That said, there are some compelling use cases that arise should we wish to enable a per call site type of behavior injection, where the behavior at each call site could be modified independently. This of course makes the overall system more complex, but presents some interesting opportunities to consider moving forward.
The runtime library provides functions to setup the replacements
(_substitue_function for C, SUBSTITUTE macro for C++).
This macro uses the new C++ intrinsic (__function_id) which will get a
unique identifier for each C++ function, even if they are virtual.
Here is a simple example to replace a template function (instantiation):
// unit_under_test.hpp
template
T FunTemp(T t) {
return t;
}
// test.cpp
#include “unit_under_test.hpp”
int fake_FunTemp(int p) { return p * 3; }
TEST_F(FooFixture, FunT) {
SUBSTITUTE(FunTemp, fake_FunTemp);
int p = 13;
auto res = FunTemp(p);
EXPECT_EQ(res, 39);
}The frontend part is actually the implementation of __function_id.
We modified the compiler to parse a new kind of unary expression when
the __function_id literal is given and the test specific
instrumentation is enabled.
In case of free functions and static member functions this unary
expression has the very same type which we would get in case of the
“address of” unary expression:
void foo();
void bar() {
auto p = & foo; // void ()()
auto q = __function_id foo; // void ()()
}
However in case of non-static member functions the two expressions
yield different types:
struct X { void foo(); virtual void bar(); };
void bar() {
auto p = & X::foo; // void (X::)()
auto q = __function_id X::foo; // void ()()
auto r = __function_id X::bar; // void (*)()
}First, our implementation is almost completely contained in the LLVM backend, and thus has no real understanding of C++. While we can currently use the mangled names of functions to achieve our desired result, this is cumbersome and error prone. There are additional limitations when considering C++ Templates and class hierarchies. Right now, class methods can be instrumented and replaced by payloads in the same class hierarchy. In this case, an injected method must inherit from the target method’s class and override the target function. This has additional challenges if the function is virtual, since our runtime uses function addresses to resolve which function should be modified. As a result, we do not support injection of virtual methods outside of the Itanium ABI, where we can reliably index into the vtable and thus perform the correct behavior in the runtime. We currently consider C++ templates completely out of scope for the current implementation, chiefly because they are too cumbersome to use without support from the frontend.
In fci_mock, first we used the same Itanium ABI dependent solution,
but then we have implemented the __function_id intrinsic.
- Alleviate the need to mangle function names
- Add support for C++ Templates
Fci_mock works with direct function pointers and the substitution is
happening at runtime, during the test setup. Thus, there is no need to
use mangled names for the substitution.
Though, seems like Syringe has the benefit that the replacement
happens in load-time, before runtime, am I right?
All of the work done in Syringe is done at compile time, with the exception of runtime initialization and the actual callbacks into the runtime to dynamically enable/disable the active behavior. The only part of the runtime that needs to be bootstrapped is initializing the metadata into a searchable list, which happens before main begins execution. Ideally we can store the metadata in read only data and avoid initializing the runtime at all. This just wasn’t fully implemented in my prototype.
- Directly support C++ class hierarchy
- Add new intrinsics to directly handle runtime lookups (i.e. directly insert real addresses for class methods without (ab)using the Itanium ABI)
I think with the __function_id intrinsic these problems are handled/solved.Because Clang understands the class hierarchy, we can add a new annotation for class methods that will take the target base class as a parameter. Clang, in Sema, can look up the base class and add the correct payload annotation to the resulting LLVM function. Similarly for Templates, any instantiated template function, or dependent method, can have its payload forcibly instantiated, and have the new instantiation correctly tagged. This requires that for templates the target and payload definitions must appear in the same translation unit, so that their instantiations can be correctly resolved. While this forces a change to the actual source code (even if it is only an #include directive) it seems to be a reasonable way to offer support for a feature a core language feature.
Fci_mock call expression instrumentation forcibly fetches the address
of every callee (even of function template instantiations). Thus we
indirectly initiate an instantiation, so this is a non-existent
problem there. This has a severe price though, we kill inlining
absolutely, nothing is inlined.
This is a trade-off. Syringe tries to be minimally invasive and have zero impact on uninstrumented code, but using indirect calls means that instrumented functions cannot be inlined in a useful way (inlining a stub isn’t much benefit). As an alternative we could change the way function bodies are modified to try to help the inliner, i.e. avoiding indirect calls and using global booleans instead of function pointers. This is probably a better discussion to have once we know if the community is actually interested in having a framework like this, since this a performance/design problem rather than a fundamental limitation.
Lastly, calls into the Syringe runtime currently use function addresses as keys to manipulate the target function pointer. It should be possible to use some new intrinsic(s) that can correctly resolve the address of functions and methods without relying on ABI details. Because the compiler will be aware of how Syringe works, it should be possible to have the compiler directly insert the correct address while providing an intuitive API to the user.
Yes it is possible and implemented: see __function_id
https://github.com/martong/clang/compare/finstrument_mock_0…martong:finstrument_mockSyringe was designed to help automate behavior injection by understanding a small set of trigger conditions that could be responsible for enabling and disabling the injected behavior. In our initial designs these triggers were often based on profiling counters that could be used to toggle the behavior after some threshold was exceeded. Currently, this is left up to the programmer, but our YAML configuration already supports these sort of annotations. In principle there is no reason why these quality of life instrumentation should not be implemented as the use and design of Syringe solidifies.
Fci_mock uses the test file, i.e a separate translation unit to setup
the replacement configurations.
As I see, there are a few open questions in both of our solutions:
- How to handle constexpr functions? I have some ideas about that, but
this is not trivial.
I might consider constexpr functions to be out of scope, as Syringe is a framework for dynamically injecting new behaviors rather than modifying compile time results. Syringe can still work for constexpr functions that are evaluated at runtime, but that might be incongruent with compile-time results elsewhere in the program. Compile time evaluation has a lot of sharp edges, so I’m not sure of the best policy here. This is one of the details that I think requires a broader discussion.
- Can we replace a constructor / destructor? Destructors seems easier
and I had some early experiments with that, but getting the address of
a constructor is hard, because of injected class names.
I haven’t given much thought to whether there are subtle issues with our approach to constructors and destructors. This is one reason we wanted to solicit feedback from the community: there are probably several subtle issues that we have failed to fully consider. In our experiments thus far, they worked as expected, but our tests may not be thorough enough to fully exercise this problem space.
My experience with fci_mock shows that it is possible to replace
(almost) every function (function template instantiations or virtual
member functions too), but this had the price of killing inlining. The
overall performance therefore was just slightly better than what we
can have with -finstrument_functions. And also there are the
constructors and destructors.
Also, with these solutions (both fci_mock and syringe) we can replace
only functions, but I wanted to be able to replace types too. Thus, I
sought for other solutions which work in compile-time. One of my
experimental idea and prototype reuses the Clang ASTImporter in a
special way: https://martong.github.io/ast-mock_sqamia_2018.pdf
My third idea is based on compile-time reflection, but that is far
from mature (probably will be published in my PhD dissertation).Cheers,
Gabor
Overall I think this is a nice alternative/complementary approach to behavior injection. Should there be sufficient interest I wouldn’t be opposed to broadening the framework to support call site instrumentation, though I think doing so will be a fairly significant design change. Right now Syringe is tiny (I think the LLVM pass is only ~350 lines plus some boiler plate for adding annotations/pass registry/YAML support/etc.), and the runtime is also very small. To me, this is a selling point, and even after adding frontend support, Syringe should remain a small extension to the existing compiler infrastructure. Adding call site based behavior injection will probably require a redesign to keep Syringe reasonably small and provide a natural interface for controlling instrumentation.
Note that the LOOM Framework (https://github.com/cadets/loom) provides both call site and callee instrumentation and lets you expose these things to arbitrary instrumentation functions, described by a YAML file. This code is a more generic and reusable version of the TESLA (temporal assertion) framework form 2014.
If we’re considering importing something less generic, then it may be worth thinking about whether upstreaming LOOM is worthwhile (it wasn’t upstreamed already because there was no publicly expressed desire from the wider community to have such a thing in mainline LLVM).
David
[I haven’t given this much though, so forgive me if my questions are naive]
-
I think you should state the goals and major use cases more clearly.
E.g. if the goal to ever have this in production (looks like it’s not) then the design is scary given all the threats associated with indirect calls. -
What are your performance requirements?
If this is not needed for production, then perhaps 5%-10% is tolerable. -
I’d like to understand more about what’s missing for you in XRay.
IIRC, XRay injects ~11 bytes of NOPs into the function prologue, which you can replace with any code you like, any time you want.
You may for example replace those NOPs with a jump to the payload, which will achieve your goal. No? Why? -
You can have a simpler implementation that for every instrumented function does
if (DivertFuncToPayload) return PayloadForFunc(args); // tail call
and thus instead of an indirect call on main path you have a load/cmp on the main path and a direct call on the slow path.
–kcc
[I haven’t given this much though, so forgive me if my questions are naive]
No. These are great questions. Thank you for the feedback.
- I think you should state the goals and major use cases more clearly.
E.g. if the goal to ever have this in production (looks like it’s not) then the design is scary given all the threats associated with indirect calls.
The goal of this work is to allow developers to dynamically enable new behaviors as their program runs, for some set of behaviors determined at compile time. I think that any use case where the author may wish to transitively modify the program’s normal behavior is a good candidate. The main use cases I’ve thought about revolve around things that usually fall under the umbrella of testing(fault injection, dependency injection, probabilistic sanitizers, etc.). As I mentioned in the RFC, Syringe takes a general approach, and can be used for a variety of tasks.
As for whether to use Syringe in production, I’d be a bit hesitant to recommend it for deployment. That being said, unlike most indirect call sites, functions modified by Syringe have exactly two valid targets. This is far from the intractable problem facing Control Flow Integrity in the general case. I don’t see a reason why the implementation should not insert forward edge CFI checks. Syringe also brings up some ‘trusting trust’ types of concerns, so I would say my recommendation would be to avoid using it in production, until these shortcomings have been addressed.
In short, while Syringe isn’t currently intended for use in production, I think with some thought and careful design the security weaknesses introduced by our instrumentation can be mitigated so that someday that limitation could be lifted.
- What are your performance requirements?
If this is not needed for production, then perhaps 5%-10% is tolerable.
Estimating the overhead from instrumentation is highly dependent on what functions are instrumented. i.e. changing a call inside a tight loop will have a much greater impact on changing the performance than if we instrument a function that is only called infrequently. Choosing an appropriate benchmark to accurately demonstrate the overhead of our approach would be challenging.
I think the majority of use cases for this type of framework are less concerned with the overhead involved, and more interested in having the ability to dynamically change the program’s behavior. Many of the most obvious use cases fall somewhere roughly under testing. I don’t want to limit what Syringe is intended to do (its quite general), but I think many of the most beneficial uses will be out of production, for example fault injection.
Should Syringe move into code review, I would expect a portion of the discussion to center on how best to benchmark the cost of this type of instrumentation.
- I’d like to understand more about what’s missing for you in XRay.
IIRC, XRay injects ~11 bytes of NOPs into the function prologue, which you can replace with any code you like, any time you want.
You may for example replace those NOPs with a jump to the payload, which will achieve your goal. No? Why?
Modifying XRay was something we considered, but ultimately decided against. One of the reasons being that for our initial use case (fault injection) we felt that the overhead of repeatedly writing to code pages may be too expensive if we needed to quickly enable and disable the new behavior. In my understanding, XRay works by making code pages writable, and then updating the NOP sleds of the target functions. The overhead introduced by changing permissions on the code pages may make quickly enabling and disabling new behaviors difficult, if we require fine grained toggling (i.e. only inject new behavior during a single call). I think in a tight loop, or in closely interleaved set of threaded calls the contention to change the code pages might cause our system to loose some of its precision. My understanding here could be incomplete, however, so if I am wrong, please correct me.
The other reason was that our proposed approach was more straightforward, as indirect calls and stubs are easy to understand and allowed us to leverage parts of ORC.
- You can have a simpler implementation that for every instrumented function does
if (DivertFuncToPayload) return PayloadForFunc(args); // tail call
and thus instead of an indirect call on main path you have a load/cmp on the main path and a direct call on the slow path.
This is another alternative that we considered, but again thought that our approach was more straightforward. However, it is worth noting that using direct as suggested above may also have the benefit of simplifying some complexities with C++ templates as well. Should we move forward with upstreaming, this is one aspect of the design I would wish to compare against the alternative.
[I haven’t given this much though, so forgive me if my questions are naive]
No. These are great questions. Thank you for the feedback.
- I think you should state the goals and major use cases more clearly.
E.g. if the goal to ever have this in production (looks like it’s not) then the design is scary given all the threats associated with indirect calls.The goal of this work is to allow developers to dynamically enable new behaviors as their program runs, for some set of behaviors determined at compile time. I think that any use case where the author may wish to transitively modify the program’s normal behavior is a good candidate. The main use cases I’ve thought about revolve around things that usually fall under the umbrella of testing(fault injection, dependency injection, probabilistic sanitizers, etc.). As I mentioned in the RFC, Syringe takes a general approach, and can be used for a variety of tasks.
As for whether to use Syringe in production, I’d be a bit hesitant to recommend it for deployment. That being said, unlike most indirect call sites, functions modified by Syringe have exactly two valid targets. This is far from the intractable problem facing Control Flow Integrity in the general case. I don’t see a reason why the implementation should not insert forward edge CFI checks. Syringe also brings up some ‘trusting trust’ types of concerns, so I would say my recommendation would be to avoid using it in production, until these shortcomings have been addressed.
In short, while Syringe isn’t currently intended for use in production, I think with some thought and careful design the security weaknesses introduced by our instrumentation can be mitigated so that someday that limitation could be lifted.
- What are your performance requirements?
If this is not needed for production, then perhaps 5%-10% is tolerable.Estimating the overhead from instrumentation is highly dependent on what functions are instrumented. i.e. changing a call inside a tight loop will have a much greater impact on changing the performance than if we instrument a function that is only called infrequently. Choosing an appropriate benchmark to accurately demonstrate the overhead of our approach would be challenging.
I think the majority of use cases for this type of framework are less concerned with the overhead involved, and more interested in having the ability to dynamically change the program’s behavior. Many of the most obvious use cases fall somewhere roughly under testing. I don’t want to limit what Syringe is intended to do (its quite general), but I think many of the most beneficial uses will be out of production, for example fault injection.
Should Syringe move into code review, I would expect a portion of the discussion to center on how best to benchmark the cost of this type of instrumentation.
- I’d like to understand more about what’s missing for you in XRay.
IIRC, XRay injects ~11 bytes of NOPs into the function prologue, which you can replace with any code you like, any time you want.
You may for example replace those NOPs with a jump to the payload, which will achieve your goal. No? Why?Modifying XRay was something we considered, but ultimately decided against.
Not sure it’d even involve modifying XRay - but potentially using XRay as-is and building on top of it (but yeah, I don’t really know where the boundaries are between the generic functionality XRay provides, as Kostya described it, and the features built on top of it - and don’t mean to get into a fussy discussion about the distinctions)
One of the reasons being that for our initial use case (fault injection) we felt that the overhead of repeatedly writing to code pages may be too expensive if we needed to quickly enable and disable the new behavior. In my understanding, XRay works by making code pages writable, and then updating the NOP sleds of the target functions. The overhead introduced by changing permissions on the code pages may make quickly enabling and disabling new behaviors difficult, if we require fine grained toggling (i.e. only inject new behavior during a single call). I think in a tight loop, or in closely interleaved set of threaded calls the contention to change the code pages might cause our system to loose some of its precision. My understanding here could be incomplete, however, so if I am wrong, please correct me.
Seems to me if the overhead was too high, you could avoid it by updating the NOP sled once to point to the “handler” & the handler could make a determination if the original function was to be called & call/jump back into it (“if (disabled) call original”). Admittedly, adding some overhead to the disabled case.
A tool built on top of Detours I’d used before did something like this - allowed filtering based on caller (walking the call stack to see if the caller was a certain function - and only redirecting the functionality if it was), thread id, etc, etc. And it did so by intercepting all calls to the target function (within the scope of the redirector - a C++ RAII style object) & then deciding whether it was an interesting one or not.
[I haven’t given this much though, so forgive me if my questions are naive]
No. These are great questions. Thank you for the feedback.
- I think you should state the goals and major use cases more clearly.
E.g. if the goal to ever have this in production (looks like it’s not) then the design is scary given all the threats associated with indirect calls.The goal of this work is to allow developers to dynamically enable new behaviors as their program runs, for some set of behaviors determined at compile time. I think that any use case where the author may wish to transitively modify the program’s normal behavior is a good candidate. The main use cases I’ve thought about revolve around things that usually fall under the umbrella of testing(fault injection, dependency injection, probabilistic sanitizers, etc.). As I mentioned in the RFC, Syringe takes a general approach, and can be used for a variety of tasks.
As for whether to use Syringe in production, I’d be a bit hesitant to recommend it for deployment. That being said, unlike most indirect call sites, functions modified by Syringe have exactly two valid targets.
These are still indirect call instructions, which are a red flag after Spectre
Doesn’t matter for testing, but if there is a likelyhood of someone using this in production,
I’d consider a design w/o indir calls.
+1 to Dave’s comment about XRay.
For those interested in our current prototype, you can see a set of patches up on Phabricator at: https://reviews.llvm.org/D51962
It has the same shortcomings as listed in the RFC, but I’m working on addressing some of these issues to improve this set of patches.