A majority of the tests in Semantics uses test_errors.py, which asserts that the errors emitted by flang are the same as those expected from “ERROR:” directives. Missing as well as unexpected errors cause the test to fail, and it is also checked that the errors originate from the source lines following the “ERROR:” directives. This makes the script very useful, achieving similar rigour with FileCheck is both difficult and inconvenient. However, any warnings emitted by flang are ignored by test_errors.py – and some tests appear to try to match expected warnings using a “WARNING:” directive, which is also ignored.
I propose that we patch test_errors.py to take warnings into account, with warnings accounted separately from errors, but with all kinds of warnings lumped together (other bookkeeping polices are of course possible, and up for discussion). We could either
a) test warnings the same way as errors, where both missing and unexpected warnings cause test failure, or
b) only assert that expected warnings are emitted, but let the test pass with unexpected warnings.
Alternative a) has the advantage of being more rigorous. b) sacrifices some rigour in favour of simplicity:
It is less noisy. When a new warning is introduced, a lot of existing tests may require an update.
Introducing it will be easier, as we do not need to add the “WARNING:” directive where needed in the existing tests.
I favour alternative b). It is straight-forward to get in place, while a) will require a fair bit of work now and risk becoming a maintenance burden. We also may not always want to check for unexpected warnings, e.g., at lines which trigger more than one diagnostic message.
Either option is going to be an improvement and will make testing in Flang more rigorous and complete. Long term, we should aim for Option a). Within the compiler, we should be able to test/verify that only the expected diagnostics are emitted.
When a new warning is introduced, a lot of existing tests may require an update.
That’s to be expected and a reasonable price to pay (I don’t expect this to be a huge burden). In return, we will have a mechanism to verify that Flang does not emit unwanted/random/incorrect diagnostics.
It would be really great to aim for something akin Clang’s -verify. In Clang, testing for diagnostics is relatively easy. It’s also very flexible and allows writing very expressive, yet concise tests. And does not require extra scripts. However, implementing this would require introducing a dedicated diagnostics API (see e.g. DiagnosticsEngine). This is probably outside the scope of this proposal. But if people are interested, we could discuss this in one of our calls in more detail.
As for you proposal, I would like us to aim for Option a). However, you could start with Option b) first and see how that goes. To me it feels like Option a) is a natural follow-up/extension of Option b).
Warning are important. We ought to consider testing them with the same rigor as errors. I don’t know why warnings were omitted from the existing tests. There are also “notes” attached to warning and errors, and the format changes occasionally as we try to bring more clarity to the messages.
@ekieri How many warning and other non-error messages do the existing tests produce? From there, maybe we can assess the burden of extending the tests to cover warnings et al. If we move fwd, I will volunteer to help update the expected results.
I do not mind going for a), to not allow unexpected warnings, and this appears to be what everybody else wants. As Andrzej suggests, I think we should land D125804 with b) first. It is an improvement from the current situation, and a step on the way towards a). I am here just for fun and have quite limited bandwidth, getting all tests in shape for a) could take me a while. And of course, any help is much appreciated (thanks, Steve!).
In D125804 portability-level warnings are matched using the same directive as normal warnings. That is, you can assert that an expected portability warning is issued, but you cannot check the severity level of a warning. We can change so as to distinguish them if we wish, no strong opinion from my side.
Regarding introducing new warnings, no, I did not have a particular example in mind. But imagine, just for the sake of argument, that we introduced a warning for closing a subroutine with an end statement which does not name the subroutine in question, i.e., we require the full ‘end subroutine foo’. My not-so-careful greping suggests such a warning would currently be emitted about 2000 times in test/Semantics. So it is imaginable, but will probably not be this impactful very often.
Yes, test_warnings.py is a possibility (for a bridging period until all tests are compliant, you mean?). IMHO the end-goal is to test warnings and errors with the same script, so then perhaps a ‘–ignore-unexpected-warnings’-flag to test_errors.py is more appropriate? A disadvantage with such a procedure (either of them) is that it might end up being not quite so temporary as initially intended.
Sure, discussing ‘-verify’ in a call sounds like a good idea. Unfortunately I participate less often than not, but I think it is a good place to start. Or alternatively in a separate RFC.
Yes, b) is clearly a step towards a). In D125804 I already make the (to me) easy cases a)-compliant.
With main as of yesterday, introducing a) would cause 20 tests to fail (19 for unexpected warnings, 1 for failing to emit expected warnings). D125804 fixes 9 of those failures. The remaining 11 tests emit 37 unexpected warnings (including portability warnings), so that is not too much. But I do not want to just add them to the tests, assuming all are actually desired. I need to understand what is going on first. And thanks again for offering to help with this!
Here is an example of a case I found not straight-forward: Line 31 of resolve37.f90,
integer, parameter :: q = 1+2*(1/0)
emits one error and four warnings. All four warnings are “INTEGER(4) division by zero”, referring to different subsets of the arithmetic expression. Are all of them desired, or would we rather have only one? A user will probably get the point from just one, and it feels a bit silly to add four expected-warning-directives to the test case, but they all do refer to arithmetic expressions containing division by zero.
To proceed, I suggest that I
Update the commit message of D125804 to mark it more of a stepping-stone, and request it reviewed.
Create a new draft implementing a), for now with D125804 as dependency.
Making the tests compliant could be done in separate revisions, I will rebase as needed. I will work on this, but do appreciate any help.
The number of non-compliant tests are now down to 7. For common-blocks.f90 and data06.f90 the emitted warnings seem legitimate to me, I have added the corresponding directives as part of D125804. dosemantics03.f90 and kinds04.f90 required a bit more work, which I did in separate patches (D126176 and D126459).
I have just posted D127337, which treats another three tests which I would like input on, for review.