Update on clang-extract-api: Clang Support for API Information Generation in JSON

Hi All!

About a year ago I sent out an RFC proposing a new Clang tool for collecting and serializing API information from header files. Thanks everyone for your interest, and great feedbacks and suggestions in the original email thread as well as Phabricator code reviews. Today I would like to share an update on the status of the development of clang-extract-api, and a simple demo of the current workflow.


Current Status

We’ve implemented all of the core components of clang-extract-api in a new clang library, ExtractAPI (checkout clang/include/clang/ExtractAPI and clang/lib/ExtractAPI):

  • API : This component defines the representations of the API information collected. Individual declarations are captured by records derived from the base APIRecord struct. And APISet holds all the records from the product defined by the input header files.
  • Serialization : Serialization contains the APISerializer interface that can be implemented to serialize an APISet, as well as a SymbolGraphSerializer implementation to serialize in the Symbol Graph format, as proposed.
  • DeclarationFragments : This component defines the Declaration Fragments representation, which is an abstraction of a symbol’s declaration, with language-agnostic annotations about syntactic/semantic properties of the fragments.
  • Finally, ExtractAPIConsumer.cpp glues everything together, defines the ExtractAPIAction frontend action that hooks into the new driver option -extract-api, processes the input header files, and kicks off the ExtractAPIVisitor that visits Decl nodes in the AST and collects API information.

Demo

Here is a simple demo of everything put together in action:

❯ tree
.
└── headers
    ├── anotherCoolAPI.h
    └── coolAPI.h

1 directory, 2 files

We provide some cool APIs in the two headers in the headers directory.

// coolAPI.h
#ifndef COOL_API_H
#define COOL_API_H

#include <stdint.h>

/**
 * Defines 8-bit RGB+alpha colors
 */
typedef struct Color {
  uint8_t red;   ///< Red component.
  uint8_t green; ///< Green component.
  uint8_t blue;  ///< Blue component.
  uint8_t alpha; ///< Opacity component.
} Color;

#define RGB(r,g,b) (Color){ .red=r, .green=g, .blue=b, .alpha=255 }

#endif

// anotherCoolAPI.h
#ifndef ANOTHER_COOL_API_H
#define ANOTHER_COOL_API_H

#include "coolAPI.h"

const Color black = RGB(0, 0, 0);

/// Add opacity to a given color.
///
/// - Parameters:
///   - color: The original color.
///   - opacity: The amount of opacity to be added.
void addOpacity(Color *color, uint8_t opacity);

#endif

Now if we want to extract structural information about these APIs, we can use the following command-line to invoke the extract-api driver:

❯ clang -extract-api \
    -x c-header \
    headers/coolAPI.h \
    headers/anotherCoolAPI.h \
    -isysroot <SDK> \
    -Iheaders \
    --product-name=Demo \
    -o APIInfo.json

Clang will parse the two headers, visit the AST to collect information about the APIs, and finally write out the Symbol Graph output APIInfo.json (attached: APIInfo.json.txt (15.8 KB))


We’ve had great comments and reviews during the past year and I’d like to thank you again for your interest and help. Looking forward to bring this tool further and better as a community together!

4 Likes

Would you expect clang -extract-api -x c++-header headers/cool_cxx_API.h ... to work too?

Unfortunately the C++ support is not yet there in extract-api. The command-line you got is valid, but the AST consumer won’t visit C++ specific nodes so information might be missing, for example templates etc. And also the Symbol Graph serializer won’t be able to handle it.

Hi Zixu,

This tool can definitely be useful for developers looking to extract and serialize API information, thank you!

I’m curious to know if there is any work being done on adding C++ support for this tool. I believe this would open up a whole new realm of possibilities for creating new tools using the ExtractAPI library.

Additionally, I’m wondering if the tool supports parsing doxygen comments or if this is something that the consumer of the Symbol Graph JSON should handle. It would be great to have some more information on this.

Hi! If you were still interested, I added C++ support for ExtractAPI over the summer.

1 Like

@evelez @zixu-w If I have a header file with “restrict" is there any way I can get the "” included in the json structured data?

restrict is a supported keyword. If the header is fed into clang then ExtractAPI should serialize it.

Sorry the formatting did not appear on the last comment if I had " (double underscore) restricted" is there way the json format would show the double underscore?

I’m not sure if ExtractAPI supports those GCC attributes. @daniel-grumberg

does extract support any words with double underscore in front of it?

We don’t surface __restrict on pointers in any way, do you have a particular use case for this feature?

@evelez We want to integrate Clang-extract-api for the new headergen system. The goal is we call clang-extract-api on a .h file → receive Json structure → generate either into yaml and then c headers or straight to c headers. However, one of the issues was that when we generate C header files we want to include the macros and enums of the library. Is there any way we can include with that clang-extract-api?

Right now no attributes are supported. They are not read by ExtractAPI. It does seem like something that could be implemented from a quick glance but I’m not sure if it’s something that’d work downstream right now. @daniel-grumberg