A long time ago I suggested that we might want to add gmock to compliment the facilities provided by gtest in LLVM’s unittests. It didn’t go over well:
There was concern over the benefit vs. the cost
Also concern about what the facilities would look like in practice and whether they would actually help
At the time, I didn’t have good, large examples of what these things might look like or why they might be attractive
I didn’t provide any real explanation of what gmock did and so it was vague and unclear.
Since then, a lot has changed. We have more heavy use of unit testing in the project with more developers finding benefit from it. And I think I have compelling examples.
To start off, it is important to understand that there are two components to what gmock offers. The first has very little to do with “mocks”. It is actually a matcher language and system for writing test predicates:
This pattern moves the matcher out of the macro, giving it a proper C++ API. With that, we get two huge benefits: extensibility and composability. You can easily write a matcher that summarizes concisely the expectation for custom data types. And you can compose these matchers in powerful ways. I’ll give one example here:
EXPECT_THAT(MyDenseMap, UnorderedElementsAre(Eq(key1, value1), Eq(key2, value2), Eq(key3, value3)));
Here I’m composing equality matchers inside a matcher that can handle unordered container element-wise comparison for generic, arbitrary containers. With a small patch, I’ve even extended it to support arbitrary iterator ranges! Combine this with custom matchers for the elements, and it becomes a very expressive an declarative way to write expectations in tests.
I wanted to give a realistic and compelling example so I rewrote an entire test: https://reviews.llvm.org/D28290 Note that I moved every EXPECT to the new syntax so this is essentially worst-case. It also involves a non-trivial custom matcher. Despite this, the code is shorter, easier to read and easier to maintain. It has fewer unnecessary orderings enforced. And it is much easier to extend. Also, the error messages when it fails are substantially improved because these composed matchers have logic to carefully explain why they failed to match.
I hope folks find this compelling. I think this alone is worth carrying the gmock code in tree – it is just used by tests and not substantially larger than gtest. Even if we decide we want nothing to do with mocks, I would very much like to have the matchers.
So, now let’s consider mocks. First off, what are mocks? I’ll give a fairly casual definition here: they are test objects which implement some API and allow the test to explicitly set expectations on how that API is used and how it in turn should behave. For a more detailed vocabulary see 1 and for a more lengthy discussion see 2.
As came up in the original discussion, LLVM relatively infrequently has a need to test API interactions in this way. Usually we’re in the business of translating things from format A to B (instructions, metadata, whatever) and can write down one format and write checks against the other format for tests. This is a wonderful world to live in with tests. I never want LLVM to decrease how much we leverage this.
But we do have API interactions that we need to test. We have plugin APIs, and hookable interfaces, ranging from Clang frontend actions to JIT listeners. We also have generic code in ADT that is all about API interactions. Most generic code in fact is – we want it to work for any T that behaves in a certain way, so we need to give it interesting Ts to test it.
My immediate example is the pass manager. We plug in a bunch of passes to it, and expect it to run them in a precise way over specific bits of IR. When testing this, it is extremely cumbersome to write a test pass which does this in interesting and yet controllable and comprehensible ways. Let’s look at a concrete example:
Here we have over 20 lines of code spent testing that the correct set of things happened the correct number of times. I had to write a long comment just to explain what these numbers mean. And I still never understand whether a change in the numbers really means a good or bad thing.
Now, we have detailed logging based tests use FileCheck which is the primary way to avoid this in LLVM. But it isn’t enough. In these tests we want to carefully permute the behavior of very specific runs of individual passes. A simple example of this can be seen here where we have somewhat magical state in a pass to flip-flop its behavior:
And it gets more complicated if you want statefulness like triggering on the 3rd run of the pass.
But this is exactly the kinds of scenarios that I needed to write tests for in order to get the code to be correct. I have consistently found and been able to fix bugs throughout the pass manager by writing careful unittests.
Mocks with GoogleMock are, IMO, a tool to create interesting and debuggable test objects. These objects can then be fed into an API to exercise it in ways that are hard or impossible to control from a command line in sufficient granularity and precision. While doing this is never fun and should be avoided where possible, when we need to do this I think it provides a powerful tool for the job.
Here is how it works at the highest level:
Create a class with a MOCK_METHOD*(…) API. This API is then hookable by gmock.
Use some APIs to register default behaviors for the APIs.
Setup the minimal amount of expected API interactions for a given test. IE, for this test to pass, X has to happen and in response to that my code needs to do Y.
Feed this class, or a wrapper around it if you need a copyable object, into the system you are testing and run it.
If the expected interactions don’t occur, you get a trace of which ones failed and why. These traces are somewhat verbose and hard to read, but they actually have the information needed to debug the system which saves you from building infrastructure to extract that over and over again.
But a concrete example will likely work better. I’ve used gmock to build the unit tests for a major revision of the LoopPassManager in the new pass manager. This is a substantial redesign that now handles inserting new loops, deleting loops, and invalidating analyses. The tests for it are, IMO, dramatically more readable than the test linked above. And they are substantially more thorough and precise:
I hope this is compelling for folks. Just writing and debugging this one test was extremely compelling for me. I ended up with much better coverage and precision than I would have using any other technique without a tremendous amount of plumbing essentially re-inventing a framework to build test pass objects that work exactly the way these mock pass handles do.
That said, all is not perfect. For instance, gmock suffers from being designed in C++98 world. It has relatively poor support for move and value semantics, which resulted in my using a wrapper around the mock interfaces in the above patch to let a pimpl idiom provide the value semantics I wanted. However, that idiom works well, and this didn’t substantially impede my use of the infrastructure.
Also, I remain very sympathetic to the idea that this kind of testing apparatus should be relatively rarely needed. We shouldn’t be writing new complex unit tests for APIs every week. But even a few use cases such as to test ADTs and generic tools like the pass manager seem to justify the cost to me, and I’m happy to help draw up fairly restrictive guidance around mocks for the coding standards.
Thanks, and sorry for the long email, but I wanted to try and lay out the issues in a way folks could understand, and the examples, while hopefully useful, are quite large and complex.
Please don’t hesitate to ask questions if stuff isn’t clear.