Implementing symbols in document (textDocument/documentSymbol)

Hi,

I’ve been thinking about how to implement the “symbols in document” feature. I think it would make a lot of sense to reuse the in-memory index for this. One problem is that there is no current way to do “file-major” queries, i.e. get all symbols within one file.

I see a few options:

  1. Change the YAML format, in-memory model and index interface to allow file-major queries. This is the most work. On the other hand, I don’t think it’s worth investing a ton of effort in the YAML format.
  2. Change only the “in-memory” index and interface for file-major queries. Not sure how feasible that is, but perhaps a good solution.
  3. Run a new SymbolCollector every time on textDocument/documentSymbol, only on the needed file. That way there is a lot less symbol to filter out (only need the ones in the main file).
  4. Iterate through all symbols, keep only the ones in the correct file. This is basically workspace/symbol with another “file” parameter. This is likely the least amount of work.

I tried #4 and it seems to work fine but is probably not the most scalable so I am not sure how acceptable that is as an interim implementation.

Let me know what you think,
Marc-André

Hi Marc-André,

I find that our index is just not suited for file-centric queries at this point. E.g. it aggregates all redecls and chooses just one when creating the symbol, has the logic to filter out non-local symbols, etc. Moreover, it was design with project-wide queries in mind and changing it is definitely a lot of work.

For file-major queries, we have the AST and I think we should use it instead. So I suggest we:

  1. Traverse the AST of the current file to find all decls inside this file. The results will not contain everything, including local vars(do we need them BTW?), static functions, etc. We can also convert results to something more suitable for filtering, e.g. SymbolSlab. But I won’t put them into the index.
  2. Search through results from step 1 and filter out the ones that don’t match the query.

If step 1 turns out to be slow, we can compute the results once and stash them somewhere as an optimization. But given that the current file is usually small enough, we can probably even get away without it.

Thanks Ilya! I’ll try something like this. That was my initial idea before the index came in but I thought we could maximize code reuse by going the index way. My worry with using the AST approach is that it will duplicate not only some of the symbol collection code but also the some querying parts, i.e. matching prefixes, scoring, etc. Perhaps I’ll be able to extract some common logic for that or perhaps we can do something simpler for this type of query. I’ll let you know how it turns out.

Thanks again,

Marc-André

Hi Marc-André,

The need for common scoring between AST and index-based approaches is real :slight_smile: It shows up in code-completion already, as results come both from Sema/AST and from the index, and we want consistent scoring.

I’ve started to pull out aspects of this into Quality.h.
The idea is that we have a signals struct that describes scoring-relevant attributes, and is mostly agnostic to the source of a symbol. This can be used to produce and debug scores, and can be built by incrementally merging in any number of sources that describe the same symbol. (Currently SymbolQualitySignals accepts CodeCompletionResult and index Symbol as input, but it could/should also accept Decl and much of CodeCompletionResult would be implemented in terms of that).

This work isn’t complete but I think it might help with doc symbols.

Hi Sam,

The scoring changes you mention sound like a good improvement!

What I mentioned about scoring for document symbols turned out to not be relevant. I forgot that for this request, the filtering is done on the client side (there is no “query” string) and symbols should be ordered as seen in the file anyway. The fact that we have to keep symbols as ordered in files would be probably not make a lot of sense for the index. So, this makes me more confident that an AST-only approach is the way to go.

Cheers,

Marc-André