RFC: Reconsidering adding gmock to LLVM's unittest utilities

A long time ago I suggested that we might want to add gmock to compliment the facilities provided by gtest in LLVM’s unittests. It didn’t go over well:

  1. There was concern over the benefit vs. the cost

  2. Also concern about what the facilities would look like in practice and whether they would actually help

  3. At the time, I didn’t have good, large examples of what these things might look like or why they might be attractive

  4. I didn’t provide any real explanation of what gmock did and so it was vague and unclear.

Since then, a lot has changed. We have more heavy use of unit testing in the project with more developers finding benefit from it. And I think I have compelling examples.

Matchers

To start off, it is important to understand that there are two components to what gmock offers. The first has very little to do with “mocks”. It is actually a matcher language and system for writing test predicates:

EXPECT_EQ(expected, actual);
EXPECT_NE(something, something);

Become instead:

EXPECT_THAT(actual, Eq(expected));
EXPECT_THAT(actual, Ne(not-expected));

This pattern moves the matcher out of the macro, giving it a proper C++ API. With that, we get two huge benefits: extensibility and composability. You can easily write a matcher that summarizes concisely the expectation for custom data types. And you can compose these matchers in powerful ways. I’ll give one example here:

EXPECT_THAT(MyDenseMap, UnorderedElementsAre(Eq(key1, value1), Eq(key2, value2), Eq(key3, value3)));

Here I’m composing equality matchers inside a matcher that can handle unordered container element-wise comparison for generic, arbitrary containers. With a small patch, I’ve even extended it to support arbitrary iterator ranges! Combine this with custom matchers for the elements, and it becomes a very expressive an declarative way to write expectations in tests.

I wanted to give a realistic and compelling example so I rewrote an entire test: https://reviews.llvm.org/D28290 Note that I moved every EXPECT to the new syntax so this is essentially worst-case. It also involves a non-trivial custom matcher. Despite this, the code is shorter, easier to read and easier to maintain. It has fewer unnecessary orderings enforced. And it is much easier to extend. Also, the error messages when it fails are substantially improved because these composed matchers have logic to carefully explain why they failed to match.

I hope folks find this compelling. I think this alone is worth carrying the gmock code in tree – it is just used by tests and not substantially larger than gtest. Even if we decide we want nothing to do with mocks, I would very much like to have the matchers.

Mocks

So, now let’s consider mocks. First off, what are mocks? I’ll give a fairly casual definition here: they are test objects which implement some API and allow the test to explicitly set expectations on how that API is used and how it in turn should behave. For a more detailed vocabulary see 1 and for a more lengthy discussion see 2.

As came up in the original discussion, LLVM relatively infrequently has a need to test API interactions in this way. Usually we’re in the business of translating things from format A to B (instructions, metadata, whatever) and can write down one format and write checks against the other format for tests. This is a wonderful world to live in with tests. I never want LLVM to decrease how much we leverage this.

But we do have API interactions that we need to test. We have plugin APIs, and hookable interfaces, ranging from Clang frontend actions to JIT listeners. We also have generic code in ADT that is all about API interactions. Most generic code in fact is – we want it to work for any T that behaves in a certain way, so we need to give it interesting Ts to test it.

My immediate example is the pass manager. We plug in a bunch of passes to it, and expect it to run them in a precise way over specific bits of IR. When testing this, it is extremely cumbersome to write a test pass which does this in interesting and yet controllable and comprehensible ways. Let’s look at a concrete example:

https://github.com/llvm-project/llvm-project/blob/master/llvm/unittests/IR/PassManagerTest.cpp#L481-L509

Here we have over 20 lines of code spent testing that the correct set of things happened the correct number of times. I had to write a long comment just to explain what these numbers mean. And I still never understand whether a change in the numbers really means a good or bad thing.

Now, we have detailed logging based tests use FileCheck which is the primary way to avoid this in LLVM. But it isn’t enough. In these tests we want to carefully permute the behavior of very specific runs of individual passes. A simple example of this can be seen here where we have somewhat magical state in a pass to flip-flop its behavior:
https://github.com/llvm-project/llvm-project/blob/master/llvm/unittests/IR/PassManagerTest.cpp#L138-L139

And it gets more complicated if you want statefulness like triggering on the 3rd run of the pass.

But this is exactly the kinds of scenarios that I needed to write tests for in order to get the code to be correct. I have consistently found and been able to fix bugs throughout the pass manager by writing careful unittests.

Mocks with GoogleMock are, IMO, a tool to create interesting and debuggable test objects. These objects can then be fed into an API to exercise it in ways that are hard or impossible to control from a command line in sufficient granularity and precision. While doing this is never fun and should be avoided where possible, when we need to do this I think it provides a powerful tool for the job.

Here is how it works at the highest level:

  1. Create a class with a MOCK_METHOD*(…) API. This API is then hookable by gmock.

  2. Use some APIs to register default behaviors for the APIs.

  3. Setup the minimal amount of expected API interactions for a given test. IE, for this test to pass, X has to happen and in response to that my code needs to do Y.

  4. Feed this class, or a wrapper around it if you need a copyable object, into the system you are testing and run it.

If the expected interactions don’t occur, you get a trace of which ones failed and why. These traces are somewhat verbose and hard to read, but they actually have the information needed to debug the system which saves you from building infrastructure to extract that over and over again.

But a concrete example will likely work better. I’ve used gmock to build the unit tests for a major revision of the LoopPassManager in the new pass manager. This is a substantial redesign that now handles inserting new loops, deleting loops, and invalidating analyses. The tests for it are, IMO, dramatically more readable than the test linked above. And they are substantially more thorough and precise:

https://reviews.llvm.org/D28292

I hope this is compelling for folks. Just writing and debugging this one test was extremely compelling for me. I ended up with much better coverage and precision than I would have using any other technique without a tremendous amount of plumbing essentially re-inventing a framework to build test pass objects that work exactly the way these mock pass handles do.

That said, all is not perfect. For instance, gmock suffers from being designed in C++98 world. It has relatively poor support for move and value semantics, which resulted in my using a wrapper around the mock interfaces in the above patch to let a pimpl idiom provide the value semantics I wanted. However, that idiom works well, and this didn’t substantially impede my use of the infrastructure.

Also, I remain very sympathetic to the idea that this kind of testing apparatus should be relatively rarely needed. We shouldn’t be writing new complex unit tests for APIs every week. But even a few use cases such as to test ADTs and generic tools like the pass manager seem to justify the cost to me, and I’m happy to help draw up fairly restrictive guidance around mocks for the coding standards.

Thanks, and sorry for the long email, but I wanted to try and lay out the issues in a way folks could understand, and the examples, while hopefully useful, are quite large and complex.

Please don’t hesitate to ask questions if stuff isn’t clear.
-Chandler

TL;DR - I want this.

Matchers

To start off, it is important to understand that there are two components to what gmock offers. The first has very little to do with “mocks”. It is actually a matcher language and system for writing test predicates:

EXPECT_EQ(expected, actual);
EXPECT_NE(something, something);

Become instead:

EXPECT_THAT(actual, Eq(expected));
EXPECT_THAT(actual, Ne(not-expected));

This pattern moves the matcher out of the macro, giving it a proper C++ API. With that, we get two huge benefits: extensibility and composability. You can easily write a matcher that summarizes concisely the expectation for custom data types. And you can compose these matchers in powerful ways. I’ll give one example here:

EXPECT_THAT(MyDenseMap, UnorderedElementsAre(Eq(key1, value1), Eq(key2, value2), Eq(key3, value3)));

Here I’m composing equality matchers inside a matcher that can handle unordered container element-wise comparison for generic, arbitrary containers. With a small patch, I’ve even extended it to support arbitrary iterator ranges! Combine this with custom matchers for the elements, and it becomes a very expressive an declarative way to write expectations in tests.

I wanted to give a realistic and compelling example so I rewrote an entire test: https://reviews.llvm.org/D28290 Note that I moved every EXPECT to the new syntax so this is essentially worst-case. It also involves a non-trivial custom matcher. Despite this, the code is shorter, easier to read and easier to maintain. It has fewer unnecessary orderings enforced. And it is much easier to extend. Also, the error messages when it fails are substantially improved because these composed matchers have logic to carefully explain why they failed to match.

I hope folks find this compelling. I think this alone is worth carrying the gmock code in tree – it is just used by tests and not substantially larger than gtest. Even if we decide we want nothing to do with mocks, I would very much like to have the matchers.

+1, these look amazing. Often times I find myself writing many EXPECT statements to test a single logical condition. When you want to do this for many different inputs / outputs of an API it turns into a long list of expect statements that the person reading the test can’t easily grok and see how they’re related. Here’s an example from the formatv tests that I wrote:

Replacements = formatv_object_base::parseFormatString("{0,-3}");
ASSERT_EQ(1u, Replacements.size());
EXPECT_EQ(ReplacementType::Format, Replacements[0].Type);
EXPECT_EQ(0u, Replacements[0].Index);
EXPECT_EQ(3u, Replacements[0].Align);
EXPECT_EQ(AlignStyle::Left, Replacements[0].Where);
EXPECT_EQ("", Replacements[0].Options);

It would be nice if I could write:

EXPECT_THAT(Replacements, ReplacementsAre(Rep(Format, 0, 3, Left, “”)));

This isn’t a huge win here, but if you have a longer format string where there’s multiple replacements, you end up 5 lines per replacement, which starts to become very unwieldy and hard to follow. Now multiply that by the number of different edge cases you want to test, and you end up losing test coverage because you have to balance maintainability of the test’s code with test coverage, and adding 100 lines to test one API hurts readability more than it helps test coverage.

Another thing. Often times I find myself writing a function to test a complex condition, like this:

EXPECT_TRUE(Value, ComplexTest(Value));

But then you lose the error message ability to see why the complex test failed. You say this is handled by the matcher infrastructure although I don’t see an example, but I’ll take your word for it. If so, these matchers seem like an across the board win and I hope to be able to use them in-tree soon.

Mocks

So, now let’s consider mocks. First off, what are mocks? I’ll give a fairly casual definition here: they are test objects which implement some API and allow the test to explicitly set expectations on how that API is used and how it in turn should behave. For a more detailed vocabulary see [1] and for a more lengthy discussion see [2].

As came up in the original discussion, LLVM relatively infrequently has a need to test API interactions in this way. Usually we’re in the business of translating things from format A to B (instructions, metadata, whatever) and can write down one format and write checks against the other format for tests. This is a wonderful world to live in with tests. I never want LLVM to decrease how much we leverage this.

You’re forgetting about that troublesome LLVM subproject that nobody wants to think about which does things completely differently: LLDB. :wink: LLDB very frequently has a need to test API interactions in this way, and is very infrequently in the business of translating things from format A to format B.

Also, I remain very sympathetic to the idea that this kind of testing apparatus should be relatively rarely needed. We shouldn’t be writing new complex unit tests for APIs every week. But even a few use cases such as to test ADTs and generic tools like the pass manager seem to justify the cost to me, and I’m happy to help draw up fairly restrictive guidance around mocks for the coding standards.

In LLDB, I think this will end up being the most useful kind of unit test. There is so little test coverage right now precisely because certain things in an interactive application are hard/impossible to test with a garbage-in garbage-out model.

Consider me on board.

TL;DR - I want this.

For the most part, +1 from me too. A few comments though.

Matchers

To start off, it is important to understand that there are two components to what gmock offers. The first has very little to do with “mocks”. It is actually a matcher language and system for writing test predicates:

EXPECT_EQ(expected, actual);
EXPECT_NE(something, something);

Become instead:

EXPECT_THAT(actual, Eq(expected));
EXPECT_THAT(actual, Ne(not-expected));

For the cases where you have containers and other non-trivial objects, I completely agree that this is compelling. However, for simple cases like string equality I don’t like the change from EXPECT_EQ(a, b) to EXPECT_THAT(a, Eq(b)).

Which brings me to what I guess is my main question. Are we going to be able to keep using EXPECT_EQ (and others) via gtest? Or are we going to slowly migrate from gtest to gmock?

I don’t think you are suggesting phasing out gtest, but at the same time I’m not really sure why we should have both. It may be easier to move completely to gmock if its more powerful, even if the checks are sometimes more verbose for simple cases.

Anyway for at least the subset of cases which need the more powerful forms of testing, this seems like a reasonable thing to add.

Cheers,
Pete

EXPECT_THAT(actual, Eq(expected));
EXPECT_THAT(actual, Ne(not-expected));

For the cases where you have containers and other non-trivial objects, I completely agree that this is compelling. However, for simple cases like string equality I don’t like the change from EXPECT_EQ(a, b) to EXPECT_THAT(a, Eq(b)).

I’d like to understand – why do you not like it?

On one hand, I dislike it because it is longer to type and read.
On the other hand, I like it because it is more consistent and explicit what is being tested and what the expectation is.

Which brings me to what I guess is my main question. Are we going to be able to keep using EXPECT_EQ (and others) via gtest? Or are we going to slowly migrate from gtest to gmock?

Note that gmock is a subset of gtest, relies on it, and can’t work without it. And all of the TEST(…) stuff is strictly gtest.

It is only the EXPECT_* and ASSERT_* bits that gmock provides an alternative for.

I don’t think you are suggesting phasing out gtest, but at the same time I’m not really sure why we should have both. It may be easier to move completely to gmock if its more powerful, even if the checks are sometimes more verbose for simple cases.

This is essentially where I am at.

If we didn’t need the more expressive expectations in some cases, I would totally stick with EXPECT_EQ and friends for consistency and simplicity. But if the use cases for more rich and powerful matchers in test EXPECT lines is compelling, I have a mild preference for not having two ways of writing EXPECT_* and ASSERT_* lines, even though the simple cases are a bit more typing. The consistency and predictability for me win out, but not by much. =] I don’t think its a huge deal either way.

I’d also be totally willing to have LLVM-specific aliases of:

EXPECT(…) → EXPECT_THAT(…)
ASSERT(…) → ASSERT_THAT(…)

Because these are only inside of tests, and LLVM’s projects are reasonably self contained, if saving the 5 characters helps folks, I’m OK with it.

Anyways, this is something of a detail to sort out if we want the matcher functionality.

-Chandler

Alongside this, can we come up with a place in the codebase to put shared matchers? I almost tried to solve this previously when i wanted to unit test some functions returning llvm::Error and/or llvm::Expected objects. Because if a function returns an error, you have to consume it before you continue or the unit test asserts. So this is an example of where you might to have a shared matcher that anyone can use. It seems like there could be other examples of wanting shared matchers as well, so perhaps it’s worth adding a folder somewhere under llvm/Unittests that we can provide some support matchers for commonly used things.

Chandler Carruth via llvm-dev <llvm-dev@lists.llvm.org> writes:

EXPECT_THAT(actual, Eq(expected));
EXPECT_THAT(actual, Ne(not-expected));

For the cases where you have containers and other non-trivial objects, I completely agree that this is compelling. However, for simple cases like string equality I don’t like the change from EXPECT_EQ(a, b) to EXPECT_THAT(a, Eq(b)).

I’d like to understand – why do you not like it?

On one hand, I dislike it because it is longer to type and read.

Thats all it is TBH. Seems like we should just do “#define EXPECT_EQ(a, b) EXPECT_THAT(a, Eq(b))” or something similar. Or perhaps it just doesn’t matter that much.

On the other hand, I like it because it is more consistent and explicit what is being tested and what the expectation is.

Which brings me to what I guess is my main question. Are we going to be able to keep using EXPECT_EQ (and others) via gtest? Or are we going to slowly migrate from gtest to gmock?

Note that gmock is a subset of gtest, relies on it, and can’t work without it. And all of the TEST(…) stuff is strictly gtest.

It is only the EXPECT_* and ASSERT_* bits that gmock provides an alternative for.

Ah ok, thanks for clarifying.

I don’t think you are suggesting phasing out gtest, but at the same time I’m not really sure why we should have both. It may be easier to move completely to gmock if its more powerful, even if the checks are sometimes more verbose for simple cases.

This is essentially where I am at.

If we didn’t need the more expressive expectations in some cases, I would totally stick with EXPECT_EQ and friends for consistency and simplicity. But if the use cases for more rich and powerful matchers in test EXPECT lines is compelling, I have a mild preference for not having two ways of writing EXPECT_* and ASSERT_* lines, even though the simple cases are a bit more typing. The consistency and predictability for me win out, but not by much. =] I don’t think its a huge deal either way.

I’d also be totally willing to have LLVM-specific aliases of:

EXPECT(…) → EXPECT_THAT(…)
ASSERT(…) → ASSERT_THAT(…)

Because these are only inside of tests, and LLVM’s projects are reasonably self contained, if saving the 5 characters helps folks, I’m OK with it.

Yeah, thats totally fine with me.

Anyways, this is something of a detail to sort out if we want the matcher functionality.

Sounds good. Cheers!

Pete

No strong opinion, but certainly not opposed. If it makes testing pass manager changes easier, SGTM.

Philip

  • Providing some universal helpers for various situations that you want to EXPECT() on sounds great.

  • I can see how the “Mocks” stuff can help in the pass manager case. There is some cost learning yet another library just to test a feature, so we should keep pushing for simple/well known solutions (mostly thinking of “helper-command | FileCheck”) by default and at least require people to write long justifications like this when they want to use gmock :slight_smile:

  • Matthias

So far I’ve not heard any objections to the core of:

  1. add the utility code
  2. use it in the clear places where it makes a substantial improvement, both matchers and mocks

I’d really like to hear if there are serious concerns here, but so far this looks like pretty strong consensus. If possible I’d like to make progress on landing the actual code Friday, so if you haven’t given a shout yet, please do. Of course, if new concerns come up, we can always revisit this. It’s just internal testing utilities, so it seems especially low-risk.

We still need to sort out several details of course:

a) I will put together some good LLVM-focused primitives (mostly around matchers) in a common location. At the least this will give us a good pattern to follow as new bits of common stuff come up. I’ll get the initial skeleton of this quickly and then everyone should be able to chip in with the bits that they need. I’ll send this out as a relatively small follow-up patch that we can discuss in code review to get the location / pattern right.

b) We will definitely want some guidelines around how and when to use this stuff. I’ll try and distill something more brief than my email and incorporating some of the comments on this thread, and put it up for review as an addition to the coding standards. This will take me a bit more time but I’m happy to make sure this happens. This code review can then serve as a place to discuss the somewhat mechanical bits that are still important such as should we write EXPECT_EQ(b, a), EXPECT_THAT(a, Eq(b)), or (with some custom magic) EXPECT(a, Eq(b)).

c) It might be helpful to have an LLVM-focused explanatory guide to how gtest+gmock work and how to use them effectively. I’m not the best at writing this documentation, so if anyone else wants to take a stab at it, honestly I’d appreciate that. Happy to review of course. But if no one else feels like they can help with this, I can try to pull this together as well. It will definitely take a bit though.

I don’t think any of these really need to be blocking as it seems like the example usages I posted weren’t terribly controversail, and it’ll be easy to update based on any changes in suggested practice from (a) or (b).

Does that sound right? Anything I’m missing? Any concerns with this path forward?

Also, thanks everyone! I know my writeup was a bit long, appreciate taking the time.
-Chandler

1) add the utility code
2) use it in the *clear* places where it makes a substantial improvement,
both matchers and mocks

Sounds like a plan.

I'd really like to hear if there are serious concerns here, but so far this
looks like pretty strong consensus. If possible I'd like to make progress on
landing the actual code Friday, so if you haven't given a shout yet, please
do. Of course, if new concerns come up, we can always revisit this. It's
just internal testing utilities, so it seems especially low-risk.

This is in line with existing infrastructure in the unit-tests,
test-suite and libc++ benchmarking, so I don't think there should be
any contentious issues.

Maintenance would be a problem whether we use a third-party suite or
develop our own (more likely higher in the latter case).

b) We will definitely want some guidelines around *how* and *when* to use
this stuff.

YES! We didn't even know libc++ had a benchmark utility at the libc++ BoF.

Just a mention on how to add tests would be a long way towards more
tests. Pointing to existing docs would be more than ok.

We can start slow in that front, too, but it would be great if we
could get some traction there in the long run.

cheers,
--renato

b) We will definitely want some guidelines around *how* and *when* to
use this stuff. I'll try and distill something more brief than my email
and incorporating some of the comments on this thread, and put it up
for review as an addition to the coding standards. This will take me a
bit more time but I'm happy to make sure this happens. This code review
can then serve as a place to discuss the somewhat mechanical bits that
are still important such as should we write `EXPECT_EQ(b, a)`,
`EXPECT_THAT(a, Eq(b))`, or (with some custom magic) `EXPECT(a, Eq(b))`.

Let me suggest that guidelines specific to test code ought to be
separated from the main coding standard, and probably on their own
webpage. This would help focus the new stuff and avoid cluttering
the existing stuff.
Thanks,
--paulr