[RFC] Updating googletest to non-release tagged version

Hi all,

I’m currently writing some unit tests for some debug line error handling code I’m working on (see e.g. https://reviews.llvm.org/D44382), and I just ran into an annoying disabled feature in gtest, specifically the “Combine” feature for use in combinatorially generating parameterised tests. A FIXME comment in ProfileData\CoverageMappingTest.cpp suggests that I’m not the only one to have tried and discovered that they cannot use this feature. The problem is that the version of googletest (v 1.8.0, released in Aug 2016) in the LLVM tree requires TR1 tuple support for this feature, which is not really supported in recent compilers, and has been explicitly disabled in our googletest CMakeLists.txt, thus disabling “Combine”.

I did a bit of looking around, and v 1.8.0 is indeed the last officially tagged release of googletest. However, there has been a lot of development on the framework since that point, including a fix to enable use of Combine with std::tuple-supporting compilers. There have been a number of issue raised on the googletest issue tracker (see e.g. https://github.com/google/googletest/issues/1467 or https://github.com/google/googletest/issues/1079) asking about a 1.9.0 release, and there has been zero response from anybody answering the query of when/if it will happen. In the meantime, the last release gets older and crustier…

I’d therefore like to propose something that might be seen as slightly controversial: update to use ToT googletest (or at least some reasonably recent version of master), at least until a new release is created.

Thoughts?

James

+Chandler who might have some thoughts on this.

Could you provide an example here of the motivation for the feature you’re missing? Might help motivate the discussion (and/or we’ll end up nitpicking how it could be done differently without that feature… - which is sort of where I’m going with this. Combinatorial test case expansion does seem a bit suspicious to me - I’d hope we could pick a few examples from the various equivalence classes & that would suffice?)

+Chandler who might have some thoughts on this.

Could you provide an example here of the motivation for the feature you're
missing? Might help motivate the discussion (and/or we'll end up nitpicking
how it could be done differently without that feature... - which is sort of
where I'm going with this. Combinatorial test case expansion does seem a bit
suspicious to me - I'd hope we could pick a few examples from the various
equivalence classes & that would suffice?)

Hi all,

I'm currently writing some unit tests for some debug line error handling
code I'm working on (see e.g. https://reviews.llvm.org/D44382), and I just
ran into an annoying disabled feature in gtest, specifically the "Combine"
feature for use in combinatorially generating parameterised tests. A FIXME
comment in ProfileData\CoverageMappingTest.cpp suggests that I'm not the
only one to have tried and discovered that they cannot use this feature. The
problem is that the version of googletest (v 1.8.0, released in Aug 2016) in
the LLVM tree requires TR1 tuple support for this feature, which is not
really supported in recent compilers, and has been explicitly disabled in
our googletest CMakeLists.txt, thus disabling "Combine".

I did a bit of looking around, and v 1.8.0 is indeed the last officially
tagged release of googletest. However, there has been a lot of development
on the framework since that point, including a fix to enable use of Combine
with std::tuple-supporting compilers. There have been a number of issue
raised on the googletest issue tracker (see e.g.
https://github.com/google/googletest/issues/1467 or
Next release? · Issue #1079 · google/googletest · GitHub) asking about a 1.9.0
release, and there has been zero response from anybody answering the query
of when/if it will happen. In the meantime, the last release gets older and
crustier...

I'd therefore like to propose something that might be seen as slightly
controversial: update to use ToT googletest (or at least some reasonably
recent version of master), at least until a new release is created.

Thoughts?

This sounds strange.

I've been using googletest-1.8.0 in my project, with fresh compilers
(clang, gcc - Compiler Explorer - tr1/tuple works)
and not once did i have any issues with unavailability of Combine,
even though i do use it.

Debian sid:
$ dpkg -S tr1/tuple
libstdc++-7-dev:amd64: /usr/include/c++/7/tr1/tuple
libstdc++-5-dev:amd64: /usr/include/c++/5/tr1/tuple
libstdc++-6-dev:amd64: /usr/include/c++/6/tr1/tuple

So "which is not really supported in recent compilers" part is at
least too broad.

Roman.

Thanks. The motivating example can be seen in this review: https://reviews.llvm.org/D44382.

In that review, I am unit testing .debug_line parsing, specifically, the behaviour when the parser is fed a malformed section. Most of the code under test goes through some slight variations in the code path, depending on a) the DWARF version (interesting cases are 3, 4 and 5), and b) whether the DWARF is 32-bit DWARF or 64-bit DWARF. So I have two axes, with multiple values on each. The end “symptoms” being tested aren’t much different in each case, so it makes sense to share the test code as much as possible. Without Combine, I have to do the following:

INSTANTIATE_TEST_CASE_P(
LineTableTestParams, DebugLineParameterisedFixture,
Values(std::make_pair(2, DWARF32), std::make_pair(3, DWARF32),
std::make_pair(4, DWARF32), std::make_pair(5, DWARF32),
std::make_pair(2, DWARF64), std::make_pair(3, DWARF64),
std::make_pair(4, DWARF64), std::make_pair(5, DWARF64)));

I realised whilst typing this out, that I don’t need a distinction between v2 and v3. However, there’s still 6 different combinations, which I have to explicitly list. Combine would, by my understanding, allow me to do something like “Combine(Values(3, 4, 5), Values(DWARF32, DWARF64))”.

As to why it doesn’t work in 1.8, it’s because LLVM has explicitly overridden the GTEST_HAS_TR1_TUPLE define (see utils\unittest\CMakeLists.txt and r316798). The short-story is that it’s because recent MSVC compilers (sorry, I guess I was too broad in my original statement) have started issuing TR1 deprecation warnings, combined with C++11 not being detected properly.

Thanks. The motivating example can be seen in this review: https://reviews.llvm.org/D44382.

In that review, I am unit testing .debug_line parsing, specifically, the behaviour when the parser is fed a malformed section. Most of the code under test goes through some slight variations in the code path, depending on a) the DWARF version (interesting cases are 3, 4 and 5), and b) whether the DWARF is 32-bit DWARF or 64-bit DWARF. So I have two axes, with multiple values on each. The end “symptoms” being tested aren’t much different in each case, so it makes sense to share the test code as much as possible. Without Combine, I have to do the following:

INSTANTIATE_TEST_CASE_P(
LineTableTestParams, DebugLineParameterisedFixture,
Values(std::make_pair(2, DWARF32), std::make_pair(3, DWARF32),
std::make_pair(4, DWARF32), std::make_pair(5, DWARF32),
std::make_pair(2, DWARF64), std::make_pair(3, DWARF64),
std::make_pair(4, DWARF64), std::make_pair(5, DWARF64)));

I realised whilst typing this out, that I don’t need a distinction between v2 and v3. However, there’s still 6 different combinations, which I have to explicitly list. Combine would, by my understanding, allow me to do something like “Combine(Values(3, 4, 5), Values(DWARF32, DWARF64))”.

Is it especially valuable to test all 6? Or would sufficient coverage/confidence be achieved by testing, say, 3+32, 4+64, and 5+64? (ie: there’s at least one test for each of the values, but not for each combination) - or is there interesting logic in the implementation that’s different for each combination/pair?

+Chandler who might have some thoughts on this.

FWIW, I have no concerns about updating to a modern googletest. More modern the better IMO if someone is willing to do the work to make sure it works on all our platforms, etc.

However:

Could you provide an example here of the motivation for the feature you’re missing?

I definitely think that the whole space of combinatorial testing should be carefully discussed. Seems like a high risk of long-running tests w/o providing much incremental value.

+Chandler who might have some thoughts on this.

FWIW, I have no concerns about updating to a modern googletest. More
modern the better IMO if someone is willing to do the work to make sure it
works on all our platforms, etc.

I'd be happy to do some of the work of this, and testing using MSVC14,
perhaps 15 too, on x86_64 Windows 10 and (fairly elderly) GCC, possibly
clang too, and Ubuntu 14.04, but I don't have easy access to other
platforms/toolchains etc, so I'd need some assistance for this.

However:

Could you provide an example here of the motivation for the feature
you're missing?

I definitely think that the whole space of combinatorial testing should be
carefully discussed. Seems like a high risk of long-running tests w/o
providing much incremental value.

These are fair comments, and probably should be judged on a case-by-case
basis.

In my particular case from https://reviews.llvm.org/D44560, I currently
test the following 3 different cases across the full set of DWARF versions
and formats:
  - Parsing a valid line table
  - Emitting an error if the stated prologue length is greater than the
actual length
  - Emitting an error if the stated prologue length is shorter than the
actual length

The first is just testing the positive cases for each possible input. I
guess a single test for DWARF64 could be written for versions 2-4 and
another for version 5 (where there is more stuff between the two length
fields so this becomes interesting), similarly for DWARF32. To summarise, I
think that these cases are interesting: V5 + 32, V5 + 64, V2 + 32/64, V3 +
32/64, V4 + 32/64. The biggest issue I have with cutting down to just this
set is that it makes the tests less specific, e.g. at a glance, what is
important about the test - the fact that it is v4, or DWARF64, or both
independently, or the combination?

Related aside: I've realised since earlier that there is scope for version
2 tests, distinct from version 3: v2 tests test the lower boundary on valid
versions, and v3 the upper boundary on versions without
maximum_operations_per_instruction.

The latter two test cases are important because a) the length field has a
different size for DWARF32/64 and therefore the prologue length needs to be
measured from a different point between the different formats, and b) the
contents of the prologue are different in each of version 3, 4, and 5, and
thus the amount read will be different. We could test each individual
version, independently of the format, but it is theoretically possible for
an error to sneak in whereby the two different failure points cancel each
other out. The benefit is admittedly small, but these tests are fast, so I
don't think it hurts to have them.

James

Still not sure I follow all of this - though perhaps the test design discussion might be better in its own thread. But the broad subject/topic/name of the stuff I’m interested in applying here is equivalence partitioning: https://en.wikipedia.org/wiki/Equivalence_partitioning - helps explain the general idea of reducing “test all combinations/permutations of cases” to “test representative cases from each class”.

  • Dave