Headers from C++ standard

Hi Richard,

Background: I am writing a tool to retrieve all symbols and its information from C++ standard library for #include_fixer (a tool that can automatically find the right #include headers for unidentified symbols in C++ code). My current solution is running FindAllSymbol tool across all C++ standard headers in the system include path, e.g. header files under /usr/include/c++/4.8.4/ directory in my own machine. However, this solution can be problematic since there can be multiple headers that export the same symbol. For example, “vector” can be #include’d via either or besides itself.

As an alternative, we are thinking about parsing header synopses from C++ standard. So, we are wondering if there are existing syntax-correct header synopses in the form of C++ header files somewhere that we can use. If not, Manual(@klimek) suggests that maybe we could have the header synopses as C++ header files and have the standard draft includes them so that syntax can be easily checked, and we can conveniently parse the synopses too.

Regards,
Eric

I don’t know if this is ‘correct’ enough, but the libcxx headers that correspond to standards defined headers usually (maybe always?) have a comment at the top that is basically the standard synopsis. There are some things in the header comment that aren’t in the standard synopsis, such as ‘// C++14’ and ‘// C++17’ comments. But maybe it’s good enough for what you are looking for?

Hi Ben, thanks for the reply! We’d like to keep the symbol information synchronized with the standard, so synopses from standard would be preferred.

Hi Eric,

As an alternative, we are thinking about parsing header synopses from C++
standard. So, we are wondering if there are existing syntax-correct header
synopses in the form of C++ header files somewhere that we can use. If not,
Manual(@klimek) suggests that maybe we could have the header synopses as C++
header files and have the standard draft includes them so that syntax can be
easily checked, and we can conveniently parse the synopses too.

We've had similar plans for IWYU, and I think we eventually parsed the
TeX source from https://github.com/cplusplus/draft/tree/master/source
with a simple shell script.

I didn't do this myself, but I found a couple of examples in our source here:

  // These headers are defined in C++14 [headers]p3. You can get them with
  // $ sed -n '/begin{floattable}.*{tab:cpp.c.headers}/,/end{floattable}/p'
lib-intro.tex | grep tcode | perl -nle 'm/tcode{<c(.*)>}/ && print qq@
{ "<$1.h>", kPublic, "<c$1>", kPublic },@' | sort
  // on https://github.com/cplusplus/draft/blob/master/source/lib-intro.tex

  // These headers are defined in C++14 [headers]p2. You can get them with
  // $ sed -n '/begin{floattable}.*{tab:cpp.library.headers}/,/end{floattable}/p'
lib-intro.tex | grep tcode | perl -nle 'm/tcode{(.*)}/ && print qq@
"$1",@' | sort
  // on https://github.com/cplusplus/draft/blob/master/source/lib-intro.tex

(from https://github.com/include-what-you-use/include-what-you-use/blob/master/iwyu_include_picker.cc)

Assuming you want this to be portable I'm sure you can do something
similar in C++ relatively easily.

FWIW,
- Kim

The below is the sort of thing I’d suggest. You’ll probably need some manual massaging to get the extraction exactly right, though.

Also, the synopses do not typically include class definitions, just a “class X;” style declaration, so if that’s a problem for you then you’ll need to do a bit more work to extract those too.

Bonus points if you can package this in a Clang library we can use from IWYU :slight_smile:

- Kim

The below is the sort of thing I’d suggest. You’ll probably need some manual massaging to get the extraction exactly right, though.

Would you be generally open to the idea of having the standard maintain a more parsable form of the definitions, and including them back into the tex file? Or do you think that will increase the amount of maintenance work too much?

Also, the synopses do not typically include class definitions, just a “class X;” style declaration, so if that’s a problem for you then you’ll need to do a bit more work to extract those too.

Generally, we need all symbols users might want to use, and which headers they are defined in.

The below is the sort of thing I'd suggest. You'll probably need some
manual massaging to get the extraction exactly right, though.

Would you be generally open to the idea of having the standard maintain a
more parsable form of the definitions, and including them back into the tex
file?

The only supported use of the sources on github are to allow the C++
project editor to produce working drafts and standards. They're on github
as a convenience to contributors to the C++ standardization process, and
could go away or change format or layout or anything else, at any time, if
we find a better process. On that basis, it doesn't make a lot of sense to
me for the layout of the standard's sources to be driven by external
concerns like this one.

If there's some changes you want, and you can make that change in a way
that's aligned with making the sources better as a vehicle for maintaining
or producing a standard / working draft, then that's probably the best path
forward. For instance, a change that adds a different kind of LaTeX
environment around synopsis blocks, providing higher-level semantic markup,
would generally be positive for the standard sources and would help you
extract the parts you need.

It's not clear to me what benefit there would be to the standard sources if
we separated out these synopses into separate files, and it would make our
current organizational system (one .tex file per top-level clause)
non-uniform.

Or do you think that will increase the amount of maintenance work too much?

Also, the synopses do not typically include class definitions, just a
"class X;" style declaration, so if that's a problem for you then you'll
need to do a bit more work to extract those too.

Generally, we need all symbols users might want to use, and which headers
they are defined in.

It might be easier to extract this from the index of library names rather
than from the synopses (although you may need to insert some markers to
indicate which headers contain each library name -- it would be useful to
include that information in the index too, which would justify the cost of
maintaining those markers in the standard text).

I think what we really want are synopses that can compile (i.e. with all necessary #include headers) so that we can run clang matcher on them.

I think what we really want are synopses that can compile (i.e. with all necessary #include headers) so that we can run clang matcher on them.

Exactly. Richard, I thought it might make sense for the C++ standard to have its code-like snippets in parsable form for verifyability, but if you don’t think it’d help, we’ll need to look for something else.
Options:
a) we create an ugly little script that extracts the functions from the standard and puts them into a bunch of files with #includes (probably by maintaining a side table) and have people use that script + the standard
b) we just one-time create our own table and try to maintain it (perhaps just as part of the clang repo)
c) … ?

I share Richard’a opinions here.

– Gaby