[RFC] C++ conformance test suite

Summary

I suggest we extend our current test suite with tests that tracks changes to the C++ Standard and update it as Standard text evolves. To make maintenance as incremental as possible, the Git repo with the LaTeX sources of the Standard can be used to version the test suite. With a fixed source of truth for changes to the Standard, a number of opportunities arise. First, it becomes possible to leverage the experience we had with defect reports: breaking the test suite into smaller pieces (e.g. paragraphs), and generating coverage reports (like cxx_dr_status). Second, it gives us more flexibility when referencing things, if it’s deemed useful to use something beyond stable section names.

Even if we don’t have resources for this today, I think it’s useful to have this idea written down for future reference.

Status quo

I’d like to highlight some of the properties of the current test suite under clang/test/CXX that are of significance to this RFC.

First, paragraph numbers are widely used, but sometimes it’s not clear what wording they test. Consider basic/basic.lookup/basic.lookup.unqual/p3.cpp. The latest draft says “Unqualified name lookup from a program point performs an unqualified search in its immediate scope”, which doesn’t seem to match the test. Trying to find the matching paragraph manually won’t get you anywhere because the entire unqualified lookup section was replaced in P1787. While it’s possible to do such digging (and git blame is of great help) for a single test, we should be able to avoid needing to do that.

Second, there’s nothing that resembles a coverage report or a way to generate one automatically (which is one of larger-scale things mentioned above). Compare that to cxx_dr_status, which gives visibility into the total scope of work, and what the implementation status of DRs is (unknown, not available, partial, available). It’s also generated automatically based on CWG issues list and inline markup left at every test. If we try to put something like this together for the current conformance suite, we face at least two issues. 1) it’s hard to define total scope, when many tests were written decade ago against C++11 and 14, but there are, for example, tests for C++20 modules (basic/basic.namespace/basic.namespace.general/p2.cppm). 2) while we can generate a report based on directory names and presence of tests for a particular paragraph, we’d have to assume they are complete, without a way to systematically check this assumption.

The cxx_dr_status page is what made it possible for the Clang C/C++ Language Working Group to suggest new contributors “pick something there” as a starting point. I was one of those newcomers, and here I am, stuck with P1787 :slight_smile:

Third, there are inconsistencies in directory layout. Above I used a test of C++20 modules as an example. It’s placed under CXX/basic, because stable name of this sections begins with basic. Whereas the rest of the tests for [basic.namespace] are placed under CXX/dcl.dcl, which is done after the Table of Contents of recent revisions of the Standard. Another example is CXX/dcl, CXX/dcl.dcl, and CXX/dcl.decl existing simultaneously on the same directory depth, which doesn’t make sense however I look at it. The fact that regular Clang contributors (both authors and reviewers) inadvertently make things worse suggests that our processes around conformance test suite should be improved.

Versioning

The granularity of the Standard text updates could be anything from an official publication (once every three years) and draft publication (several times per year), to individual papers and Git commits to official repo. I suggest to version based on commits to the GitHub repository, as not all significant changes to wording are tracked through papers, e.g. CWG and LWG defect report resolutions, or subclauses being rearranged (note: changes that rearrange the standard may or may not be significant for testing, though).

While tracking the official repo seems the most obvious choice, for practical purposes I suggest we use eel.is fork instead because pre-rendered HTML is much more accessible than the LaTeX sources, preserving commit hashes from official repo.

Scope

I believe that testing [lex] through [cpp], [support], and [meta] covers the most of what should be tested within Clang. Notable omissions are:

  • <stacktrace>
  • std::addressof
  • std::launder

Note that libc++ may want to consider a similar approach for testing, but this is out of scope for this RFC.

Coverage reporting

I suggest applying our experience with DRs, meaning that we define the total scope using e.g. Table of Contents from a particular version of the Standard, and use a combination of directory layout and inline markup inside tests to fill that total scope with the actual status of the tests. All of that should be done automatically, like make_cxx_dr_status. Here is an example of how such a report could look, using colors from cxx_dr_status:

A test file example that correspond to the table above (I picked up a markup syntax that is independent of directory layout, like the one currently used for DRs, but it’s not the focus here):

// [namespace.udecl]/1: yes
/// test case

// [namespace.udecl]/2: yes
/// test case

// [namespace.udecl]/15: na

// [namespace.udecl]/16: yes
/// test case

// [namespace.udecl]/ex1: no
/// example 1 from the Standard

// [namespace.udecl]/ex2: partial
/// example 2 from the Standard
/// unsupported lines are commented out with a FIXME

/// example 3 is not mentioned anywhere, but present in the wording, so it's considered untested

// [dcl.asm]/1: yes
/// test case

Example commit

[clang] Update conformance test suite to 2023-04-02 draft

Commits applied:
[module.interface] Fix outdated example

Commits silently skipped over:
"[stmt] Fix cross-references for condition" - too big for the sake of example, even to downgrade existing tests to partial

Commits skipped over as editorial:
[over.literal] Cross-reference deprecated grammar

diff --git a/clang/test/CXX/Notes.rst b/clang/test/CXX/Notes.rst
-Wording version 6c039939693c2e5c22a75ed5a8468e717867a7fb
+Wording version dcac5eaf993a190a1bb1335217779bd9ef13a38e

Commits silently skipped over
=============================
+`[stmt] Fix cross-references for condition<https://github.com/Eelis/draft/commit/39c1510d443b647c46de3e84d49a21d442154795>`_

diff --git a/clang/test/CXX/module.interface.cpp b/clang/test/CXX/module.interface.cpp
// [module.interface]/ex6: yes
export module M;
+int g;
export namespace N {
  int x;                        // OK
-  static_assert(1 == 1);        // error: does not declare a name
+  using ::g;                    // error: ::g has module linkage
}

Note that the example above depicts how an update would look when put together and does not insist on exact syntax of inline markup (// [module.interface]/ex6: yes) or directory layout (CXX/module.interface.cpp).

Corner cases

Technical Specifications and omnibus papers (like P1787) pose a challenge even with the most granular versioning. I suggest to skip over those leaving notes, ideally marking affected tests as incomplete, but even that could be challenging when there are dozens of pages of wording changes. Being in sync with eel.is and addressing smaller changes seems better to me than being stuck at a big change.

Directory layout

I don’t suggest any particular directory and file layout, as I deem it secondary at this point. Available options depend on other decisions like granularity of updates and inline markup syntax.

Transition

I’d like to lay out a possible scenario for the transition from status quo:

  1. we decide on versioning, coverage report format, inline markup syntax, and directory layout;
  2. we prepare tooling for generating coverage report like make_cxx_dr_status, using the scope proposed in corresponding section;
  3. we decide to take tip of eel.is fork as “reference wording”;
  4. starting out with the coverage report all red, we assess existing tests against the reference wording, adding inline markup (and possibly moving them to another file, depending on decided directory layout);
  5. we keep our new test suite in sync with eel.is fork while assessing existing tests.

Resources required

Given we have full and up-to-date test suite, keeping it this way could easily be a full-time job, I think. I understand that community resources are scarce, and I’m not advocating that executing this plan is a better way to utilize those resources over keeping status-quo and dedicating them to e.g. improved C++20 support.

3 Likes

Thank you for putting together this RFC! Conformance testing of the standards is very important to the health of the project and improving our test coverage here would be fantastic!

CCing some other folks to try to solicit more opinions on the proposed approach: @zygoloid @dblaikie @reinterpretcast @luken-google @cor3ntin

Overall, I like that we can get some automated reporting that helps us tell at a glance where we’re at in terms of test coverage for the standard and this helps us to identify when the standards wording has changed and thus our test coverage may need to be reconsidered. Our current system really doesn’t handle this well.

However, there are some things that aren’t quite clear to me.

How do the test cases/coverage reports distinguish between standards versions? e.g., it may be p1 in C++11, p3 in C++14, moved into an entirely different section as p11 in C++17 and C++20, and then be removed entirely in C++23; how do those different states get tracked and reported? Do we use special comments in the source file, some directory hierarchy structure, something else?

A worry I have is that the C++ standard is modified at a rate that far outpaces our ability to react to it. Between the initial push to add tests and the high rate of change, I worry that our coverage will be around 5-10% “forever” because of struggles to keep up. I’m not certain how to address this concern though, as it’s a question of resources. I suppose one way to address this concern is to see if there are any volunteers within the community who would be willing to help out on this effort, either as patch authors writing new tests or as reviewers willing to review this work. (If any of our corporate contributors are willing to dedicate some of their QA staff to helping with this effort, this would be incredibly helpful – those QA staff may also have good ideas on how to improve this RFC.)

A question I have is what the review workflow should look like when someone is adding support for a new feature. e.g., if someone goes to implement a new feature, do we want reviewers to push on adding tests to this conformance suite vs adding tests scattered throughout the clang/test/ subdirectories as we currently do? How does this test suite relate to things like claiming conformance for a new feature or a DR (if at all)?

Another question is whether we’ll be taking our existing conformance tests (clang/test/CXX) and converting those into this new format or whether we will be “starting from scratch” with the new format/populating it from some other test suite/etc?

I’m sure we’ll come up with more questions as we dig into the idea further, but in general, I’m strongly in favor of getting more conformance testing coverage. Purchasing a conformance test suite has turned out to be an impossibility due to licensing issues, so we’re going to need to do something “in house” to improve this situation.

1 Like

This is a very complicated topic, I’m not sure i have a well formed opinion yet.

However, the list of papers + list of drs should be sufficient to track conformance, from a good state.
The current problem the very large number of issues with an unknown state.

I think that regardless of what we do, we should avoid mentioning a paragraph number without also mentioning which version of c++ (Or C) it applies to - this is made more complicated by the fact that we develop features based on unreleased standard subject to changes, but it’s a good starting point.

I have concerns about duplicating out test suite without duplicating our resources. For example, a new test suite seems less immediately useful than trying to resolve DRs. And as noted, the infrastructure for automatic tracking of dr status seems fit for purpose.

I’m glad you like it!

How do the test cases/coverage reports distinguish between standards versions?

They don’t. As show in example commit, we have an exact commit hash of draft sources specified somewhere, and all references to the Standard in test cases and coverage report refer to it.

e.g., it may be p1 in C++11, p3 in C++14, moved into an entirely different section as p11 in C++17 and C++20, and then be removed entirely in C++23; how do those different states get tracked and reported?

Under this RFC, such changes to the wording would be observed in draft commits while upgrading wording version, and corresponding test moved or deleted accordingly. After an upgrade, coverage report is generated against new wording.

Silver lining here is that at any given commit of LLVM repository, we refer to single wording, specified by wording version. Generating coverage report for e.g. every release of the Standard is an explicit non-goal, because done faithfully, it would require evaluating single test in more than one context. It will multiply amount of work to be done by wording experts, which is a scarce resource.

Do we use special comments in the source file, some directory hierarchy structure, something else?

We need to associate test cases with wording (e.g. a paragraph) somehow, and leveraging DR tests experience seems natural. There we have // dr123: <status> comments, and a script that does a full-text search in every .cpp file in a specific directory. This way interpreting those comments doesn’t require context, specifically path to the file. I spent some time writing DR tests, and I find this tooling both easy to understand and very robust against moving tests between files (they are grouped by default).

Here’s a very different approach for the same problem: we use stable names of sections as file names, and put every paragraph in a namespace, e.g. namespace p10 { ... }. This way we don’t have to resort to special comments, and we reduce amount of noise in tests cases, because we don’t have to specify stable name for every paragraph and example. There are flaws in this approach that could make it non-starter for us, so don’t take it too seriously.

A worry I have is that the C++ standard is modified at a rate that far outpaces our ability to react to it.

I share this concern with you, so I reduced the scope to a single wording version, and propose multiple measures to reduce maintenance load: tooling, stable names, multiple ways to defer proper work on draft commits while upgrading.

A question I have is what the review workflow should look like when someone is adding support for a new feature. e.g., if someone goes to implement a new feature, do we want reviewers to push on adding tests to this conformance suite vs adding tests scattered throughout the clang/test/ subdirectories as we currently do?

I’d encourage people implementing new features to expand conformance test suite, but not every test belong there. Testing a new feature could include both tests written against wording (which belong to conformance suite), and tests written against implementation in order to improve test coverage there (which may not belong to conformance suite). Approach I’d suggest is to first cover the feature from conformance standpoint (I expect implementers to be able to reason about wording of what they implement), and then place the rest of the tests the way we currently do.

How does this test suite relate to things like claiming conformance for a new feature or a DR (if at all)?

Ideally claims of new feature conformance should be backed by full coverage of related paragraphs in conformance suite, and those tests passing. As for DRs, resolved ones are eventually included in the Standard text, so full coverage of the Standard would implicitly cover DRs as well. From a practical standpoint, I suggest keeping current DR workflow untouched until we have some solid footing and experience with conformance test suite.

Another question is whether we’ll be taking our existing conformance tests (clang/test/CXX) and converting those into this new format or whether we will be “starting from scratch” with the new format/populating it from some other test suite/etc?

To the best of my knowledge, surprising (to me) amount of C++98 is still normative today, not to mention C++11 and 14, so I expect existing tests to be relevant for the most part. What needs to be done is to evaluate them against the current wording, both what do they test, and what the coverage is. If we go with DR-style self-sufficient special comments, then commits would largely consist of adding such comments to existing tests, without moving or renaming anything. Does it count as starting from scratch?

There are two major approaches for catching up: 1) evaluate tests against some old wording, and then work through draft commits (only); 2) evaluate against current wording, and work through both list of sections (improves coverage) and draft commits (keeps suite up-to-date). Ultimately it should be the same amount of work (except for the cases when WG21 went back and forth on something — option 1 is at disadvantage here). Option 1 makes transition of existing tests easier, because we can pick C++14, but it doesn’t provide any room to test new features until catch-up is complete. I find that last bit crucial, because it prevents us from involving feature implementers into conformance testing at the best time for that — when features are implemented.

Yeah, at a rough glance I’m not sure how we’d keep up with this - like we want to implement unreleased features to get experience and/or get ahead of the next release, but that means adding tests against unreleased wording which is subject to change - and/or updating tests for all the other changes in wording to be consistent (what happens when paragraphs get reordered and we’re adding a new test/feature against a new paragraph, but there’s an old test against a previous version of that paragraph by number, but really the old paragraph has moved)

I’m not sure we could find the resources to even update everything each C++ release (I guess we’d have to keep some tests around in the old version/layout because we still need to conformantly implement the old spec too, so then we’d have different directories per released spec, and /some/ idea that the old directories/specs cover some of the conformance of the newer specs, without duplicating/moving them around?)

Sorry, just sort of rambilng, not sure it’s helpful - but I can’t quite picture how this’d work just now.

I wonder if there would be any utility to instead/in addition to a clang (language) conformance test suite, if we could buy/subscribe to a commercial C++ conformance test suite and resolve to test it against some representative configurations? This could benefit the C++ libraries in the llvm project too.

Bound to be some wrinkles: non-conformance due to OS defects or other out-of-llvm-project-scope. But it could probably satisfy “some automated reporting that helps us tell at a glance where we’re at in terms of test coverage for the standard”.

I was told that there are two existing commercially available conformance test suites for C++, but their license is not compatible with how Clang needs to use them. This RFC came to light only after I was told that negotiations with companies offering said test suites went nowhere.

1 Like

Oh, yes, I should’ve thought of that.

If the reports from the test suites were not public but were shared with specific individuals who work on the clang/libc++/etc projects, would that suffice?

I was told that at best only amount of passed and failed tests might be reported.

Oh gee, having perused our suite’s license terms I see it is incredibly onerous – so I now understand the incentive.

Sony is probably not the only vendor who has licensed a conformance suite and runs it internally. Interesting failures do prompt questions or bug reports upstream. We can’t just copy test source into the report, the license doesn’t permit that, but it’s generally simple to write a separate example.

Aaron makes a relevant point about C++ being a moving target. I would add that it is multiple moving targets, because the C++ committee is continually redefining what past standards say and mean. (Have these people never heard of version control?) :slight_smile: I have heard the argument about “we are fixing bugs in previous versions” which I get, but I mean… if we find a bug in Clang 17.0.0, we don’t revise and re-release Clang 17.0.0. That is effectively what the C++ committee does. (Sorry, end rant.)

On the flip side, we run the upstream libc++ suite against our library. IIRC there are some parts of it that assume a particular implementation, but in general it seems to have been useful.

1 Like