[RFC] Updating googletest to non-release tagged version

In my particular case from https://reviews.llvm.org/D44560, I currently
test the following 3 different cases across the full set of DWARF
versions and formats:
- Parsing a valid line table
- Emitting an error if the stated prologue length is greater than the
   actual length
- Emitting an error if the stated prologue length is shorter than the
   actual length
The first is just testing the positive cases for each possible input. I
guess a single test for DWARF64 could be written for versions 2-4 and
another for version 5 (where there is more stuff between the two length
fields so this becomes interesting), similarly for DWARF32. To summarise,
I think that these cases are interesting: V5 + 32, V5 + 64, V2 + 32/64,
V3 + 32/64, V4 + 32/64. The biggest issue I have with cutting down to
just this set is that it makes the tests less specific, e.g. at a glance,
what is important about the test - the fact that it is v4, or DWARF64,
or both independently, or the combination?

Related aside: I've realised since earlier that there is scope for
version 2 tests, distinct from version 3: v2 tests test the lower
boundary on valid versions, and v3 the upper boundary on versions
without maximum_operations_per_instruction.

The latter two test cases are important because a) the length field has
a different size for DWARF32/64 and therefore the prologue length needs
to be measured from a different point between the different formats, and
b) the contents of the prologue are different in each of version 3, 4,
and 5, and thus the amount read will be different. We could test each

                        >

individual version, independently of the format, but it is theoretically
possible for an error to sneak in whereby the two different failure
points cancel each other out. The benefit is admittedly small, but these
tests are fast, so I don't think it hurts to have them.

Still not sure I follow all of this - though perhaps the test design
discussion might be better in its own thread. But the broad subject/topic/
name of the stuff I'm interested in applying here is equivalence
partitioning: Equivalence partitioning - Wikipedia -
helps explain the general idea of reducing "test all combinations/
permutations of cases" to "test representative cases from each class".

- Dave

Equivalence classes are a fine way to reduce duplication in manually
written tests. They tend to depend on a white-box understanding of the
system-under-test to define the equivalences reasonably. LLVM regression
tests tend to be written very much white-box, as it's the developer who
is writing the test. :slight_smile: The problem there is that as the code evolves,
the original equivalence classes may change, but that doesn't necessarily
make any tests fail, it just ends up with reduced coverage. Without good
coverage analysis we can easily find tests no longer verifying everything
we wanted them to.

(And in fact back at HP I had an interesting chat with a product owner
who was very much a fan of combinatoric testing; he said every single
time he added a new dimension, he found new bugs, and didn't think he
could have predicted which instances would have found the bugs.)

That said, in the case at hand I think looking at the testing at the
granularity of "parsing a line table" is too coarse. There are several
different bits to it and we can reasonably anticipate that they will
remain relatively independent bits:
- detecting the 32/64 format
- extracting the version-dependent sets of fields accurately
- detecting mismatches between the stated length and the internally
  consistent length (e.g., the header says length X, but the file
  table has N entries which runs it to length > X).

To the extent that new DWARF versions have different fields, we should
test that the new fields are extracted and reported (the success cases).
If the new fields are 32/64-sensitive, those should be tested in both
formats to make sure that sensitivity is handled (more success cases).
The notion of running past the stated length (conversely, finishing the
parsing before reaching the stated length) intuitively isn't either
version-sensitive or format-sensitive, i.e. all those cases look like
they should be part of the same equivalence class, so you don't need
to replicate those across all versions and formats. With the caveat
that if white-box examination shows that some case takes a very
different path, it's worth replicating the test. In particular the
v5 directory/file tables are very different from prior versions, so
it's worth having separate bad-length tests for pre-v5 and v5.

--paulr

Thanks for the comments all. I’ve thought about it and I agree that I can probably reduce the number of test cases, so that it is not combinatorial, in my use case.

I guess the broader question still is there as to whether a) people would be opposed to updating to a non-official-release version of googletest, and b) whether it would be worthwhile doing so. As I noted previously, there is at least one other place in the tests that said it would like to use Combine, so it still seems like there would be some benefit in it, although I admit that I haven’t looked to make sure that this is a reasonable request there.

James

Doesn’t sound like there’s opposition - can’t say whether or not it’s wort hit, but if you’re willing to give it a go/send a patch, probably can’t hurt. Might be a bit of work to get all the corners right - I think there are a few local patches to the gtest in llvm, unfortunately, that you’d have to port over, etc.

Yes, I imagine there will be some interesting issues. I’m away for a couple of weeks around Easter, so it will likely be a few weeks before I or anybody in my team get a chance to look at this further, but we’ll certainly keep people updated on here with how it goes (or via reviews).

James

Hi all,

Just wanted to give a quick update on this: only a few weeks after I started this email thread, the gtest developers announced tentative plans for their next two releases. Full details can be seen on various issues on the gtest github page, but basically the current plan is to create their next release (tentatively called 1.8.1) in the coming months, with another release (tentatively 1.9) some time thereafter, which will migrate to C++11. It’s a little unclear what on the current master will be in 1.8.1, but it looks like all of it, (but I’m not 100% certain). Given other things have meant I didn’t get to look at this, and the now-planned release, I don’t think there’s any point in updating to a copy from master (except maybe as a local experiment).

James