State of the serialization tests, and plan to fix it

Hi all,

I recently discovered that serialization of the AST is only very lightly
tested. What is tested is mostly that parsing works with an external
source, but that in itself does not guarantee that the round trip
serialization + deserialization is lossless. In addition some nodes are
not tested at all. This can be seen in the coverage report available here:
http://lab.llvm.org:8080/coverage/coverage-reports/clang/index.html.

I would like to fix this and am looking for comments on the following plan:

Plan 1:

Hm, no one wants to reply :slight_smile:
I guess i could try to..

First of all, thank you for looking into it!

Hi all,

I recently discovered that serialization of the AST is only very lightly
tested. What is tested is mostly that parsing works with an external
source, but that in itself does not guarantee that the round trip
serialization + deserialization is lossless. In addition some nodes are
not tested at all. This can be seen in the coverage report available here:
http://lab.llvm.org:8080/coverage/coverage-reports/clang/index.html.

Yes, that seems very unfortunate. Especially since that serialization
logic (i guess?) will be more used in the future due to the Modules.

This also limits cleanup work on serialization - e.g. may be able
to significantly(or not) reduce the size of these dumps if we added
Abbrevs to more AST nodes.
But it's a bad idea to do that without test coverage.

I would like to fix this and am looking for comments on the following plan:

Yay!

Plan 1:
-------
Go through each AST node and ensure that its state is dumped fully.

Yes, please.

To avoid
polluting to output of ast-dump, split the various bits of state into two
parts, the first part being the part generally useful for people working
on clang (ie: mostly what is currently dumped), and the second part being
the rest (which is rather subjective obviously).

(though this can be skipped if the additional data in the output in
the AST dump is judged to be acceptable.)

I guess to answer that one would need to see just how much more output
that adds. Unless that separation already exists, i'd try not to introduce it.

Then use that to check that the serialization works as intended. There is
the possibility of re-using the ast-dump tests (in AST/ast-dump-*) here.

Sounds reasonable. They do basically the same thing, especially since
in both cases you want to check the final AST dump.

Of course this does not prevent someone from adding some data to some
node and forgetting to update both the serialization and the AST dump.

True, but i think this is mostly the problem of code review.

Possible alternative plan:
--------------------------
Superficially it seems like ASTStructuralEquivalence could be used. However
as far as I can tell, as its name implies, it only look at the structure
at the AST and does not look to the various bits of each node. Additionally,
it does not currently handle statements and expression at all.

I am missing something here, and does anyone have a better idea ?

I know i have said that already, but this is a very fundamental problem.
And i think it is *significantly* caused by the lack of tooling.
It is no fun to manually write CHECK lines, *especially* good ones,
**especially** with good coverage.

Fixing the coverage of ASTDumper tests will result in a significant amount
of tests, that should hopefully be really brittle, and break if not updated
to show the changes in the ASTDumper. But that will unquestionably
cause some disdain(FIXME: can't pick better word), since they would
need to be updated. And as i have already said the coverage may be
bad because it is tedious to write these tests.. :slight_smile:

So i would really recommend to approach this from tooling perspective first,
before adding new tests. See:
llvm/utils/update_llc_test_checks.py
llvm/utils/update_mca_test_checks.py
llvm/utils/update_mir_test_checks.py
llvm/utils/update_test_checks.py

I would really recommend to examine them, and come up with a similar
script for -ast-dumper,
so that updating tests will consist *only* of two steps:
1. write the source you'd want to test (+run-line)
2. run the updater tool, which will do the rest
    (drop old check lines, run the runline, sanitize the output, write
new checklines)

Once again, thank you for looking into it!

Bruno

Roman.