I can’t speak for Sam but I can offer my 2c on your questions:
I would like to have a generic code indexer that could create a database of all entities defined in a codebase along with a backlinks to points of definition in source code: functions & methods with all their arguments and call sites, structs&classes, all templates instantiated. maybe variables &etc.
clangd-indexer seem to already been doing a very notable part of this job, but unfortunately for me - not all. I’m completely not familiar with clangd-indexer internals and am wondering if it is a good idea to take
clangd-indexer as a foundation of such solution?
I think that depends on how you intend to use the database.
Clangd’s index is fairly specialized for answering the types of queries that come up during clangd’s usage, such as “find all symbols with this name in the project” or “find all references to this symbol in the project”.
One can imagine interesting semantic queries one might want to pose to such a database – such as “find all uses of this type in this project”, or “given this template, find all its instantiations in the project” – which clangd’s index is not designed to answer, and likely could not be made to answer without significant changes (like introducing some sort serialized representation of types and other semantic entities besides symbols).
So, I think it really depends on what are the missing pieces that you’d like the index to be able to do. If you elaborate on this, I can try to offer further thoughts.
Is it extendable enough?
Depends on how you’re looking to extend it. If you’re hoping to use the upstream indexer code unmodified, and extend it in your own codebase by inheritance or composition without modifying the upstream sources, it’s probably not suitable for that.
If instead you plan to have a copy or fork of clangd’s code and make modifications to it, that could work (e.g. depending on the new functionality you want, you may be able to implement it as some incremental modifications to the existing code without any significant rewrite).
Is its internals sufficiently well documented?
There is some documentation on this page, and comments in the code headers such as this. You can also ask further questions about it here or in the #clangd Discord channel.
If you end up working with the indexer and identifying gaps in the documentation that would aid future efforts along these lines, documentation patches are always welcome