I want to add some custom built-in functions to Clang (functions for which I need some compiler support, for example “get the number of member variables of a class”), and want to do it using a plugin (mainly for iteration time and distribution reasons).
Clang plugin system currently doesn’t allow to do this, the way to add it is to define a new token in TokenKinds.def, and plug custom code in several places of Clang.
So, I would like to add support for it.
I started working on it locally, and currently have a basic working flow. The main steps in it are:
- Adding custom token support
- Add a new interface class, CustomTokenHandler. It will have a bunch of virtual functions, which will be entry points from various points where we handle tokens, and functions to expose the handled keyword. For now, from my tests I think we need entry points in ParseTopLevelDecl and ParseCastExpression. Others can be added as needed.
- Add a new registry to register these handlers.
- Have Preprocessor list all the plugins for custom tokens, and make an association between handlers and an internal CustomTokenId, and add an entry in the IdentifierTable to insert the new token.
- Add a CustomTokenId field in IdentifierInfo (We have 29 bits left, using 13 bits here would allow several thousands of custom tokens, and let a 16 free bits. It would be the same size as for ObjCOrBuiltinID). This id will be stored when the Preprocessor calls the identifier table, and be used to retrieve the correct handler.
- Add a new token type custom_token. Every custom token will have this type.
- At places where we handle tokens, we add a new case in the switch statement for custom_token (this means there is no overhead when not using a custom token). In this case, we retrieve the handler from the Preprocessor, and call the relevant virtual function on it, instead of doing the static treatment.
With this, we can output some simple ExprResult as a result of our token being parsed. However, this will not be enough in many situations, mostly because of the dependant types in templates. ExprResult are re-evaluated by TreeTransform after template types resolution, so we need to use our custom ExprResult and handler for this.
- Adding custom stmt/expr support
- We add CustomStmt/CustomExpr virtual classes. They’re basically Stmt/Expr classes, with an additional CustomId field, and meant to be extended by the plugin developer for them to store additional information.
- We add CustomStmtHandler/CustomExprHandler interfaces, and a new registry for them. Similarly to CustomTokenHandler, it will have virtual functions. For now, we only need Transform function, for which the static version is called from a templated TreeTransform function, so…
- We introduce a new TransformPluginEntryPoint interface. This interface is passed to the Transform function of the CustomStmt/ExprHandler, and exposes several functions of the TreeTransform which will probably be needed by the plugin. Mainly, these are: TransformType, TransformExpr, TransformStmt. The implementation is only a simple wrapper.
- Have Sema list all the plugins for custom statements, and associate an internal CustomStmtId with each of them.
- CustomTokenHandlers may return a CustomStmt (or subclasses of them) with the correct CustomStmtId, to have them handled by the corresponding CustomStmtHandler.
- The TreeTransform, when transforming CustomStmt/CustomExpr, will retrieve the handler from Sema, create a TransformPluginEntryPoint, and call the Transform function.
- We add entry points for CustomStmt/CustomExpr where necessary. For now relevant places seems to be ASTReader/WriterStmt, StmtPrinter and StmtProfile.
- Still not 100% sure about this, but the internal CustomStmtId may be a StringRef chosen by the plugin instead of a runtime numeric id - it would allow a straightforward way for the CustomTokenHandler to know the id of the CustomStmt they want to create, and be able to serialize/deserialize the statements in ASTReader/WriterStmt, but will add some overhead to do the correspondence
All this should add no overhead when not using custom tokens. When using them, it will add virtual calls for the custom tokens and statements only.
Think adding support for custom built-in functions in plugins is a reasonable objective?
Think my approach is viable?
Have any comment/advice?
Have a workaround to do this with my modifications?
If it’s ok, I can start sending some small patches in review for the details of the implementation.