textDocument/didOpen, file tracking, reading files from disk, etc.

Folks,

I'm currently writing a bunch of extensions for Visual Studio 2019 to exercise the client-side functionality against different LSP servers. The VS2019 client code makes a textDocument/documentSymbol request just after initialization, which is perfectly fine according to the LSP spec. However, in Clangd, if it hadn't received the textDocument/didOpen request first, it returns an error because the file is untracked in DraftMgr--which is an equally fine interpretation of the spec. So, the VS2019 client never works with Clangd. I have a patch to Clangd to fix this, reading the file from the file system and then adding the doc so it is tracked. But the question is whether people are going to balk because from a purist point of view, Clangd only gets the file contents from the client. I have no control over the client because it's MS code, and I'd rather not write a client from scratch at this point--I'd rather spend time working on my server for Antlr and other programming languages. What do people think about having Clangd read from disk for any untracked docs?

Note, I'm sending this to the old mail list instead of Discourse or Discord or whatever people call it--not many are using it, the UI isn't easy, and not as simple as sending an email.

Ken Domino

Hi Ken,

The main difficulty here is that clangd builds and holds a clang AST for each file that’s open, and uses it to serve requests.
The C++ AST is often very large and expensive to build, though incremental rebuilds are usually cheap (preamble optimisation).
This means:

  • in practice, requests on non-opened files may be extremely expensive (e.g. 20 seconds CPU-bound)
  • we’d need to work out when the AST gets disposed for non-open files (for open files, this happens on didClose)

Whether we can find appropriate choices here depends a bit on what the workflow is - what does it use the documentSymbol results for? Are the documents actually open?

An alternative would be to build a somewhat ad-hoc solution like using a pseudo-parser to give approximate outlines for non-opened files.

balk because from a purist point of view, Clangd only gets the file contents from the client
That’s not a concern - clangd needs to read from disk e.g. to understand the contents of #includes.

Hi Ken,

In addition to the resource management concerns, I would argue that not requiring didOpen is a bug in the protocol and all the clients should do this.
Filesystem accesses are prone to races and communicating the contents of the open files through the protocol makes the interaction model between editors and language servers much “clearer” from the theoretical perspective.

Sending didOpen and didChange request should also be a very simple change in the LSP client code.

Even though you don’t control the client code, I suggest requesting this change from maintainers of the aforementioned clients.

This also aligns with how VSCode does things, which seems to be the most feature-complete and most widely used LSP client. Aligning with widely used clients ensures more language servers work without extra modifications, which should also be a good thing for any LSP client implementation.

Hi Sam,

Thanks for the reply. I wasn't sure whether the lists.llvm.org Mailing Lists was in use anymore.

There are several problems here. The first problem is the LSP spec itself. When I read the spec, what the open request really is is a lock, where "truth" is code-speak for a write lock. Making this substitution in mind, it starts to be a little clearer. I've been trying to raise awareness of the wording problem in the spec, as well as format issues (e.g., STEM articles usually use section numbers so one can identify concisely things that need to be changed), but who knows.

So, as implemented, Clangd currently requires a write lock from the client. Should Clangd require a write lock just to just use the language feature services like "go to def", "find all refs", "hover", etc? I just want to open a file, and maybe see what is the type of an auto, navigate around to understand how the code works before I start editing it. As you say, #include's are already read out-of-band, which are not locked explicitly by the client. So, I'd like to submit my change request for that, as soon as I can figure out how to refer to a file on disk under clangd/test.

The second issue is memory management. I would recommend a garbage collection scheme of the open files, configurable to the memory size or number of files in the cache. (I'm not sure what multiple opens and closes from the same client means, as the spec doesn't explain this at all. But no matter.) When the ref count is zero, Clangd can keep it around--because as you say, it can take 20 s to reload from scratch--or remove it if another file is opened and hits one of these limits. I can work on this later this week.

Ken

In addition to the resource management concerns, I would argue that not requiring didOpen is a bug in the protocol and all the clients should do this.

Hi Ilya,

I thought this too, and even opened a bug against MS's client. But, after thinking about this for a while, it is clear what the spec is trying to say here: "truth", which is obtained by the client with an open, is a content-wide write lock. If you look at it this way, something we all know very well, then you can see the range of possibilities here on what the protocol means and what it could evolve into.

Ken

I've made an implementation for implicit file opens (client asks for language features, but doen't actually do the open request). Please check out https://llvm.discourse.group/t/clangd-explicit-and-implicit-doc-open/205

--Ken