[RFC] Add custom built-in functions using a plugin

Hello,

I want to add some custom built-in functions to Clang (functions for which I need some compiler support, for example “get the number of member variables of a class”), and want to do it using a plugin (mainly for iteration time and distribution reasons).
Clang plugin system currently doesn’t allow to do this, the way to add it is to define a new token in TokenKinds.def, and plug custom code in several places of Clang.
So, I would like to add support for it.

I started working on it locally, and currently have a basic working flow. The main steps in it are:

  • Adding custom token support
  • Add a new interface class, CustomTokenHandler. It will have a bunch of virtual functions, which will be entry points from various points where we handle tokens, and functions to expose the handled keyword. For now, from my tests I think we need entry points in ParseTopLevelDecl and ParseCastExpression. Others can be added as needed.
  • Add a new registry to register these handlers.
  • Have Preprocessor list all the plugins for custom tokens, and make an association between handlers and an internal CustomTokenId, and add an entry in the IdentifierTable to insert the new token.
  • Add a CustomTokenId field in IdentifierInfo (We have 29 bits left, using 13 bits here would allow several thousands of custom tokens, and let a 16 free bits. It would be the same size as for ObjCOrBuiltinID). This id will be stored when the Preprocessor calls the identifier table, and be used to retrieve the correct handler.
  • Add a new token type custom_token. Every custom token will have this type.
  • At places where we handle tokens, we add a new case in the switch statement for custom_token (this means there is no overhead when not using a custom token). In this case, we retrieve the handler from the Preprocessor, and call the relevant virtual function on it, instead of doing the static treatment.

With this, we can output some simple ExprResult as a result of our token being parsed. However, this will not be enough in many situations, mostly because of the dependant types in templates. ExprResult are re-evaluated by TreeTransform after template types resolution, so we need to use our custom ExprResult and handler for this.

  • Adding custom stmt/expr support
  • We add CustomStmt/CustomExpr virtual classes. They’re basically Stmt/Expr classes, with an additional CustomId field, and meant to be extended by the plugin developer for them to store additional information.
  • We add CustomStmtHandler/CustomExprHandler interfaces, and a new registry for them. Similarly to CustomTokenHandler, it will have virtual functions. For now, we only need Transform function, for which the static version is called from a templated TreeTransform function, so…
  • We introduce a new TransformPluginEntryPoint interface. This interface is passed to the Transform function of the CustomStmt/ExprHandler, and exposes several functions of the TreeTransform which will probably be needed by the plugin. Mainly, these are: TransformType, TransformExpr, TransformStmt. The implementation is only a simple wrapper.
  • Have Sema list all the plugins for custom statements, and associate an internal CustomStmtId with each of them.
  • CustomTokenHandlers may return a CustomStmt (or subclasses of them) with the correct CustomStmtId, to have them handled by the corresponding CustomStmtHandler.
  • The TreeTransform, when transforming CustomStmt/CustomExpr, will retrieve the handler from Sema, create a TransformPluginEntryPoint, and call the Transform function.
  • We add entry points for CustomStmt/CustomExpr where necessary. For now relevant places seems to be ASTReader/WriterStmt, StmtPrinter and StmtProfile.
  • Still not 100% sure about this, but the internal CustomStmtId may be a StringRef chosen by the plugin instead of a runtime numeric id - it would allow a straightforward way for the CustomTokenHandler to know the id of the CustomStmt they want to create, and be able to serialize/deserialize the statements in ASTReader/WriterStmt, but will add some overhead to do the correspondence

All this should add no overhead when not using custom tokens. When using them, it will add virtual calls for the custom tokens and statements only.

Do you:

  • Think adding support for custom built-in functions in plugins is a reasonable objective?

  • Think my approach is viable?

  • Have any comment/advice?

  • Have a workaround to do this with my modifications?

If it’s ok, I can start sending some small patches in review for the details of the implementation.

Is adding support for custom tokens really needed for custom builtin functions? If your builtin function behaves

like a function call (i.e. like __builtin_expect or whatever) I would expect that you just need to modify

Builtin::Context to add a fourth ‘custom’ kind of builtin (add CustomRecords member, have all IDs after those

used by AuxTSRecords be used by CustomRecords) and have some way for a plugin to add builtins to that list.

When it comes to actually doing something with a use of the builtin, you can already define a PluginASTAction

to go before the main AST action in order to modify the AST (by having getActionType return

AddBeforeMainAction), and I did an experiment where I has the PluginASTAction use an AST consumer which:

  • Defined a HandleTopLevelDecl mothod which uses a RecursiveASTVisitor to visit decls

  • The RecursiveASTVisitor defines a VisitStmt method which iterates over the children() of each stmt

  • If that child is a CallExpr replace it with an IntegerLiteral

… and that seemed to work. If you replace that last step with “if that child is a CallExpr which calls my custom

builtin, then replace it with the appropriate expression” then maybe that would get you what you want?

John

Thanks, I missed the Bultins because I was looking for how type_traits are implemented.

However, in my example, we would need to call the function on a type - like type_traits. I would need to add a way to pass types to it, and check if CallExpr can handle this, and correctly transform them in TransformTree.

It may be a little more work than adding a custom token kind, but would probably make it easier to add custom builtins after this, avoids a lot of trouble with possibly missing entry points, and be more consistant with how built-ins are handled.

I’m wondering about the performance impact of using a RecursiveASTVisitor instead of having a CustomExpr handled in a single pass - it may not be that huge, but I will test.

Rudy