Q&A: clangd's parser design

Hi All,

We are attempting to start the development of our own language server for Fortran based on our compiler parser (LFortran) and I was wondering if I could query the clangd Devs for some insight to better inform our design decisions.

LFortran’s compiler parser is capable of parsing to ASR/AST standard conforming code which we can then serialise and use in a language server. However, in cases such as completion requests where the user is actively typing, or incomplete code in general, our parser throws an error, exactly like a compiler would abort compilation if illegal code was detected. I was wondering, how does clangd’s parser handle such incomplete states? Does it skip any AST nodes it does not recognise?
Does clangd even use the clang(++) compiler parser?
Do you have any document or piece of code to point me towards?

Apologies, for asking in such a direct manner potentially ignorant questions, but I was hoping to prevent diving into the clangd source code and trying to reverse engineer what is being done.

Any and all help would be appreciated.

Yes, clangd uses the clang compiler’s parser. However, said parser has support for code completion built into it (rather than layered entirely on top). This predates clangd and is used to power e.g. libclang’s code completion capabilities as well.

I believe the way clang’s code completion support works is that the parsing is done in a special mode that’s informed about the position in the file where completion has been invoked, and the lexer injects a special “completion token” at that position. When the parser encounters that token, it stops parsing further and instead drops into completion-related code that gathers completion proposals appropriate to the context in which the completion token appears. For example, if the completion token appears right after a . token in a member reference expression (e.g. object.), then the completion proposals will be the members of the class type of object.

To provide a concrete example, here is a place in clang’s parser code that checks for the next token being a completion token, and then calls into a code completion hook.

Hope that helps!

1 Like

Thanks @HighCommander4 your reply was extremely insightful, I think up until now we were going about this the wrong way; what you are suggesting makes substantially more sense. A follow up question is: Does the clang’s parser do anything in addition to the “completion tokens” to recover and continue parsing when invalid syntax is detected? If so do you have any idea what it is and where to look? For example, consider the following snippet

void foo() { invalid syntax }

clang seems to be able to circumvent the invalid syntax and parse foo as a function

Moreover, from what I am observing the Diagnostic messages served by clangd and the compiler errors thrown by clang++ are virtually identical (except the Severity of the last one)

gn@gn % clang++ test.cpp 
test.cpp:1:2: error: invalid preprocessing directive
test.cpp:2:14: error: unknown type name 'invalid'
void foo() { invalid syntax }
test.cpp:2:28: error: expected ';' at end of declaration
void foo() { invalid syntax }
3 errors generated.

Is there anything else that clang’s parser is doing under the hood?

Thanks again for all the help, I really appreciate it!

I know that some efforts have been put into making clang’s parser recover from errors, but I’m not familiar with the details here.

I do know one piece of it is RecoveryExpr, an AST node that can represent invalid code in an expression context, such that the surrounding code can still be parsed / represented in the AST. I assume there’s something similar for declaration contexts but I don’t have a code reference off the top of my head.

1 Like

Thanks again, I suspected that there might be something special in the AST. If you know anyone else in the clang parser team that might have a better understanding of the functionality under the hood feel free to ping them in this conversation.

Again, greatly appreciate your help @HighCommander4!