Clustering source files to work around long clangd indexing time

My code base is huge and takes a day for clangd to index (if it survives and not crashes in the middle). Using the index server is also not an option because users work on isolated machines. So I’m trying to be “smart” about it and utilize the fact that users will only open a handful of files in the IDE, so why not try and guess what they will open and only index those by custom crafting the CDB (compile_commands.json) accordingly.
This will require me to customize vscode clangd extension and also to run some clustering algorithm over the dependency tree ( will take some time for sure for the clustering to finish but it will be only once and after that I can just apply the changes)
Then each time user opens a file, I can start from the cluster that contains that file, and only if in the middle they open a header that has no c file in the cluster, I will restart clangd with a new CDB that has files from all clusters that depend on that header. Gradually over time I can learn from users’ behaviors and be smarter about the cluster selection…

Three questions:
1- how can i dump the dependency tree from clangd after it has indexed all of the files?
2- has anyone done anything like this before? any pointers?
3- other than restarting clangd with new CBD is there a way to just tell the running one about updated CDB?

3- other than restarting clangd with new CBD is there a way to just tell the running one about updated CDB?

I haven’t tried it, but my understanding based on code reading is that clangd watches the compile_commands.json file for changes, and the background indexer listens for those changes, so it should work automatically.

1- how can i dump the dependency tree from clangd after it has indexed all of the files?

You would probably need to patch clangd to add this functionality.