[RFC] clang support for API information generation in JSON

Hi All!

I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves.

Background

Motivation

Library and SDK providers may find it useful to be able to create and inspect a “snapshot” of APIs they expose, for example, to check for API/ABI-breaking changes between two versions, or to automate generating documentation for the APIs. Here is a list of examples of information we want to extract from APIs:

• the name (spelling/mangled) of the symbol;

• the unique identifier of the symbol, for example the Unified Symbol Resolution (USR);

• the source location of the API declaration (file, line, column);

• access control of the API (public/private/protected);

• availability (available/unavailable/deprecated);

• function signatures (return/parameters);

• documentation comments attached to a symbol;

• relations with other symbols (class methods, typedef relations, struct data fields, enum constants, etc.)

Since these API information is available in the header files, which declare and distribute the APIs, we can implement a tool to extract them without invoking a compilation of the whole project to enable easy access to the information for tooling.

Existing solutions

While there are some existing solutions in clang to dump symbols or AST information, they either expose unnecessary low-level details or fail to provide enough information of APIs. For example, clang -ast-dump dumps low-level details for all declarations for debug purposes and the output is not machine-parsable. Doxygen also extracts documentation comments and other information from API declarations, but its output is rendered documentation in web formats which is not flexible for other uses and tools.

Proposal

We propose to implement this tool as a new frontend action invoked by clang -extract-api as show in the example below.

clang -extract-api
header.h [more_header.h …] or a filelist
-isysroot
-target
-I
-isystem

-o output.json

It takes in the header file(s) or a filelist file containing paths to the header file(s) as the input. The header files will be parsed by clang and the extract-api action will visit the AST to extract needed information and serialize to a JSON output. Please find an example input and output attached.

The example output is based on the symbol graph format that’s already used by Swift for serializing symbol information and their relations. This format can represent the required API information and is flexible and extendable as demonstrated in the example so we think it’s a good starting point.

We are excited about this idea and its potential uses, and we’d love to hear feedback and suggestions!

Test.h (535 Bytes)

test.json (32.8 KB)

Hi All!

I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves.

Would this tool be able to provide the same functionality as tools
like abi-compliance-checker[1] and libabigail[2], that extra ABI/API
information from debuginfo?

-Tom

[1] GitHub - lvc/abi-compliance-checker: A tool for checking backward API/ABI compatibility of a C/C++ library
[2] My Project: The ABI Generic Analysis and Instrumentation Library

Hi All!
I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves.

Would this tool be able to provide the same functionality as tools
like abi-compliance-checker[1] and libabigail[2], that extra ABI/API
information from debuginfo?

-Tom

[1] GitHub - lvc/abi-compliance-checker: A tool for checking backward API/ABI compatibility of a C/C++ library
[2] My Project: The ABI Generic Analysis and Instrumentation Library

Hi Tom!

I’m not really familiar with those tools but a brief look seems to suggest that they operate on compiled binaries. We propose clang-extract-api to directly work on and extract information from the parsed AST of the header source files so I’m not quite sure how to compare these tools. Debuginfo lives in the compiled binary so clang-extract-api doesn’t look into it. However, anything about the API described in the header file could potentially be extracted.

Do you have a specific example of information you want to get from abi-compliance-checker and libabigail?

Zixu

Hi Zixu, I just wanted to say that this is of interest to me!

I work on a couple of FFI generation tools, and something like this would make it easier
for us to generate code from headers. The clang AST is pretty scary so a tool like this
would definitely be appreciated.

Best regards
Mats Larsen

Hi All!
I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves.

Would this tool be able to provide the same functionality as tools
like abi-compliance-checker[1] and libabigail[2], that extra ABI/API
information from debuginfo?

-Tom

[1] GitHub - lvc/abi-compliance-checker: A tool for checking backward API/ABI compatibility of a C/C++ library
[2] My Project: The ABI Generic Analysis and Instrumentation Library

Hi Tom!

I’m not really familiar with those tools but a brief look seems to suggest that they operate on compiled binaries. We propose clang-extract-api to directly work on and extract information from the parsed AST of the header source files so I’m not quite sure how to compare these tools. Debuginfo lives in the compiled binary so clang-extract-api doesn’t look into it. However, anything about the API described in the header file could potentially be extracted.

Do you have a specific example of information you want to get from abi-compliance-checker and libabigail?

We have a CI job[1] in the release branch that runs the abi-compliance-checker tool to
ensure that we don't accidentally change the ABI/API of libclang.so. Is this something
your tool could be used for ?

-Tom

[1] https://github.com/llvm/llvm-project/blob/release/13.x/.github/workflows/libclang-abi-tests.yml

I'll point to this tool which already exists:

    https://github.com/CastXML/CastXML

It dumps XML instead of JSON, but it serves the goals at least. Note
that one of the main problems that needs tackling is emulating other
compilers (e.g., seeing the API as MSVC sees it).

--Ben

In Fuchsia, we have been using clang-doc (https://clang.llvm.org/extra/clang-doc.html) for this purpose (using the YAML output format). Would it be possible to use clang-doc for your purposes? You might need to extend the output format to include additional information but that should be quite straightforward.

Hi All!
I’m writing to propose clang-extract-api, a tool to collect and serialize API information from header files, for example function signatures, Objective-C interfaces and protocols, and inline documentation comments. We hope it could help future tools to understand clang-based language APIs without needing to dig into the AST themselves.

Would this tool be able to provide the same functionality as tools
like abi-compliance-checker[1] and libabigail[2], that extra ABI/API
information from debuginfo?

-Tom

[1] https://github.com/lvc/abi-compliance-checker
[2] https://sourceware.org/libabigail/

Hi Tom!
I’m not really familiar with those tools but a brief look seems to suggest that they operate on compiled binaries. We propose clang-extract-api to directly work on and extract information from the parsed AST of the header source files so I’m not quite sure how to compare these tools. Debuginfo lives in the compiled binary so clang-extract-api doesn’t look into it. However, anything about the API described in the header file could potentially be extracted.
Do you have a specific example of information you want to get from abi-compliance-checker and libabigail?

We have a CI job[1] in the release branch that runs the abi-compliance-checker tool to
ensure that we don’t accidentally change the ABI/API of libclang.so. Is this something
your tool could be used for ?

-Tom

[1] https://github.com/llvm/llvm-project/blob/release/13.x/.github/workflows/libclang-abi-tests.yml

That job works on a finished build with binary products to check ABI at the binary level. The proposed tool could potentially be used to compare information extracted at the source level (parsed AST). I can’t say that this tool could replace abi-compliance-checker in libclang-abi-tests, but that’s definitely a direction we would like to explore, or enable such checks at different levels/directions.

Zixu

In Fuchsia, we have been using clang-doc (https://clang.llvm.org/extra/clang-doc.html) for this purpose (using the YAML output format). Would it be possible to use clang-doc for your purposes?

We’ve looked into clang-doc when planning and designing clang-extract-api. Our conclusion is that though clang-doc looks great and works by extracting information from the clang AST, it still does not fit the purpose of clang-extract-api 100%, as it’s focus is on documentation generation. And we think it’s okay to have two separate tools in this case. In the future more fields might become of-interest for one tool but not the other and it’s good to have the input/output conventions separated.
Another reason is that we would like to implement the core functionalities of clang-extract-api in libclang so that it could enable more use cases. Clang-doc lives in clang-tools-extra and is more like an endpoint tool. One possibility is that we could refactor clang-doc to re-use the AST visitor implemented by clang-extract-api (in libclang) and have the two tools process and serialize the extracted information in their own ways.

Zixu

Could bindings generators use this?

Sincerely,

Demi Marie Obenour