RFC: A proposal for a Clang-based service architecture

Greetings all!

What follows is a fairly lengthy and detailed design document for a proposed persistent Clang server (or clangd in unix-terms) to serve as infrastructure for increasingly advanced and interactive C++ tools. It should generalize and build upon libclang, and will allow us to effectively target Vim, Emacs, and other editors. This is something we’re planning to pursue in the near term, so I’d appreciate any and all feedback.

Here is a Google Docs link you can use to view and comment on the proposal:
https://docs.google.com/document/d/1kNv2jJK0I0JGnxJxU6w5lUOlrIBoecU4fi9d_o5e2-c/edit

Its interim home is on github here, where you can see the history and the actual rest version in all its glory:
https://github.com/chandlerc/llvm-designs/blob/master/ClangService.rst

I’ve also attached the text for email-based comments.

Thanks!
-Chandler

ClangService.rst.txt (17.9 KB)

Hi Chandler,

I really like the proposal. Thanks a lot for going into this direction.

I have one question:

You want to implement a binary compatible interface of libclang, but at the same time you plan to provide only a subset of libclang.
This might cause a conflict, as it will not allow an editor to use both libclang for the cursor interface and the persistent clang server for the services it already supports. Is there a reason people may want to implement something like this and should we make sure it is supported?

Otherwise I really like the direction this is going. Especially continuing the use of libclang and the python interfaces will allow tools like the vim clang_complete to move to this new service easily.
I will point the clang_complete guys to this proposal.

Cheers
Tobi

This is a very interesting proposal. We have been thinking about
something like this for a while.

I would like to recommend the addition of indexing somehow. What I'd
like is, given a file/line/column number, to retrieve the lexical and
static semantics of the token at the location. The
filename/line/column number is essentially the content of the Cursor,
I think.

It might also be interesting to retrieve a comment, if the specified
location lands in a comment, but I'm not sure I know how libclang
handles comments. I haven't seen comment handling in the libclang
documentation.

This is a very interesting proposal. We have been thinking about
something like this for a while.

I would like to recommend the addition of indexing somehow. What I’d
like is, given a file/line/column number, to retrieve the lexical and
static semantics of the token at the location. The
filename/line/column number is essentially the content of the Cursor,
I think.

This is definitely something we want to enable through this service, but currently the service is more about enabling the infrastructure, than about defining exactly what pieces we’ll want to build for it later. Makes sense?

Cheers,
/Manuel

That is completely sensible. I'll be watching this space with great interest.

I'll contribute if I find the time.

Great proposal, I really like the potential of such a project - it's
exactly the kind of service that can start with one thing in mind and
then find itself adapted for various interesting uses. That said, your
initial proposed application is awesome and itself worth the effort -
having real IDE-ish code completion and indexing in Vim & Emacs could
be awesome.

Some minor comments on the design document:

* Goal: "Provide a restartable, long-lived background process which
manages caching, compilation, indexes, and performs the business
logic."

What does "compilation" mean in this context? From the rest of the
document I understand Clang is only used for its analysis features,
not for actual code generation or full compilation.

* "The crazy stretch goal for this is O(1ms) for code-completion with
fully warm and primed caches."

It really sounds overly ambitious, taking IPC and a fairly complex
code-base into account. Why 1ms, though? Since this is user
interaction application, isn't 1ms far, far below the human detection
threshold?

* "The communication protocol will take the form of serialized
messages encoded using the LLVM bitcode system"

Why the LLVM bitcode system, and not a library designed for IPC, such
as protobuf? I realize this means less external dependencies, but it
can also be a burden on a subsystem that was designed for a different
purpose. Not to mention that something like protobuf gives you IPC
bindings for other languages for free (Python, Java, etc.)

* s/filei system/file system/

* "Likely only to support Linux and local sockets"

You mean Unix domain sockets
(Unix domain socket - Wikipedia) or "normal" sockets
on localhost ?

- Eli

Greetings all!

What follows is a fairly lengthy and detailed design document for a proposed
persistent Clang server (or clangd in unix-terms) to serve as infrastructure
for increasingly advanced and interactive C++ tools. It should generalize
and build upon libclang, and will allow us to effectively target Vim, Emacs,
and other editors. This is something we’re planning to pursue in the near
term, so I’d appreciate any and all feedback.

Here is a Google Docs link you can use to view and comment on the proposal:
https://docs.google.com/document/d/1kNv2jJK0I0JGnxJxU6w5lUOlrIBoecU4fi9d_o5e2-c/edit

Its interim home is on github here, where you can see the history and the
actual rest version in all its glory:
https://github.com/chandlerc/llvm-designs/blob/master/ClangService.rst

I’ve also attached the text for email-based comments.

Great proposal, I really like the potential of such a project - it’s
exactly the kind of service that can start with one thing in mind and
then find itself adapted for various interesting uses. That said, your
initial proposed application is awesome and itself worth the effort -
having real IDE-ish code completion and indexing in Vim & Emacs could
be awesome.

Some minor comments on the design document:

  • Goal: “Provide a restartable, long-lived background process which
    manages caching, compilation, indexes, and performs the business
    logic.”

What does “compilation” mean in this context? From the rest of the
document I understand Clang is only used for its analysis features,
not for actual code generation or full compilation.

  • “The crazy stretch goal for this is O(1ms) for code-completion with
    fully warm and primed caches.”

It really sounds overly ambitious, taking IPC and a fairly complex
code-base into account. Why 1ms, though? Since this is user
interaction application, isn’t 1ms far, far below the human detection
threshold?

Well, people argue about what is “far below the human detection threshold”. For example, I (personally, no idea how much chandler agrees) want this service to just take my keystrokes and be able to give me contextual information updated while I type. That would require getting close to the 1ms.

Cheers,
/Manuel

Greetings all!

What follows is a fairly lengthy and detailed design document for a proposed
persistent Clang server (or clangd in unix-terms) to serve as infrastructure
for increasingly advanced and interactive C++ tools. It should generalize
and build upon libclang, and will allow us to effectively target Vim, Emacs,
and other editors. This is something we’re planning to pursue in the near
term, so I’d appreciate any and all feedback.

Here is a Google Docs link you can use to view and comment on the proposal:
https://docs.google.com/document/d/1kNv2jJK0I0JGnxJxU6w5lUOlrIBoecU4fi9d_o5e2-c/edit

Its interim home is on github here, where you can see the history and the
actual rest version in all its glory:
https://github.com/chandlerc/llvm-designs/blob/master/ClangService.rst

I’ve also attached the text for email-based comments.

Great proposal, I really like the potential of such a project - it’s
exactly the kind of service that can start with one thing in mind and
then find itself adapted for various interesting uses. That said, your
initial proposed application is awesome and itself worth the effort -
having real IDE-ish code completion and indexing in Vim & Emacs could
be awesome.

Some minor comments on the design document:

  • Goal: “Provide a restartable, long-lived background process which
    manages caching, compilation, indexes, and performs the business
    logic.”

What does “compilation” mean in this context? From the rest of the
document I understand Clang is only used for its analysis features,
not for actual code generation or full compilation.

  • “The crazy stretch goal for this is O(1ms) for code-completion with
    fully warm and primed caches.”

It really sounds overly ambitious, taking IPC and a fairly complex
code-base into account. Why 1ms, though? Since this is user
interaction application, isn’t 1ms far, far below the human detection
threshold?

Well, people argue about what is “far below the human detection threshold”. For example, I (personally, no idea how much chandler agrees) want this service to just take my keystrokes and be able to give me contextual information updated while I type. That would require getting close to the 1ms.

Do you really managed to type 1000 chars per seconds ? For the record, with a 60Hz screen, each frame is displayed 16ms, so even if you were able to type faster than that and query completion for each keystroke, you would not be able to display all results.

Cheers,
/Manuel

– Jean-Daniel

A couple things.

We don’t want .clangrc files just being willy nilly all over the place, and we don’t want them tied at all to a particular person’s machine. The .clangrc files should be a part of the project and checked into VCS. I think that the way .gitignore files work is a good model, e.g. how you have a canonical one in the top-level directory of your source tree, and then you can customize on a per-directory basis. I’m not sure what you expect to be in these files, but the thought of each person having their .clangrc files strewn about their computer frightens me (difficult to set up). Perhaps .clangdconfig or something is a better name for these files, since rc files generally are for personal (per-user) settings (we can have those too, but for options like how to sort search results, which is a user-preference).

A use case that I think is worth considering is the case of completing a TableGen’d diagnostic (e.g. diag::err_missing_typ). By necessity, this service will need to be able to find any generated files so that it can look in them. Thus, clangd has to either

  1. be aware of the build system
  2. or that the build system has to generate files that appropriately inform clangd about the setup.
    I think that 1 is clearly undesirable since we don’t want to code into clangd the idiosyncrasies of every build system known to man. Therefore, 2 it is.

As for the proposal and implementation strategy, I disagree with the implementation strategy and think that the rest of the proposal is a bit premature (things like settling on using LLVM bitcode format for the protocol). With the current implementation strategy, you have no way of knowing whether you are building “the right thing” until it is too late. For example, there is nothing in the proposal about having this be easy to use and setup.

I think that the following should work before anything else happens (based roughly on how I set up my dev environment):

git clone http://llvm.org/git/llvm.git
cd llvm/tools
git clone http://llvm.org/git/clang.git
cd …/…/
mkdir release
cd release
cmake -G Ninja …/source /* … other flags … */ -DLLVM_ENABLE_CLANGD_INTEGRATION=ON
ninja clangd-init

At this point, I should be able to do something like:
clang-cli complete --file=llvm/tools/clang/lib/Sema/SemaStmt.cpp --line=1302 --column=24
and get completions. Since clangd knows about the project, it knows exactly what project to complete for based on just the filename.

At this point, stable binary interfaces don’t have to happen yet, stable IPC protocols don’t have to happen yet, etc. Just something that works.

Now we’re in business, and can make real progress:

  • Too slow? → Can benchmark and make it faster
  • Wow it would be really useful if we could do X? → Do X, and be able to immediately test it and dogfood it.
  • Oh shit, there are certain kinds of modifications of the source file that cause extremely long delays that are completely unavoidable? → come up with a way to handle this, which might require rethinking the protocol or client API.

Since this is primarily for interactive use, a clang-cli based RESTful interface is all you need (I’d honestly say use JSON). I don’t like the idea of having a persistent client-server connection since I don’t want vim to have a socket open constantly (emacs users probably would be comfortable with that though ;). Also, what if I change directory while in Vim? Or have two files open from different projects? Now vim has to handle the logic of managing N connections and renegotiating sessions? That’s just silly. If the overhead of fork/exec’ing a new process ever becomes significant then we will already be unnoticeably fast (try :%!sort on a large file, fast no?).

I also think it is mind-numbingly stupid to add a socket I/O library into LLVM/Clang. There is no reason why the Clang/LLVM part of clangd should ever have to touch the network or have to include request-processing logic. Leave that to node.js (or whatever). Let the people that are good with that kind of thing mash it up and make cool things. We of course can make our own cool things, but don’t decrease the usefulness of LLVM/Clang for this kind of thing by enshrining what will become “that shitty web-server-written-from-scratch-by-compiler-writers-in-C++ that only talks this weird binary format”. Just because clang provides functionality useful for GUI IDEs doesn’t mean that you need to write a GUI library in-tree and enshrine a specific IDE with Clang/LLVM. Similarly, it is braindead to attempt to add a networking library to LLVM/Clang and enshrine a specific server implementation in-tree. The networking code will always be total crap, never production quality, and a huge pain in the ass for anyone that wants to do anything nontrivial with it; for example, if users want to use encryption over the network, are you going to add OpenSSL as a dependency for LLVM? Or write a crypto library from scratch for LLVM? What if a node.js user wants to use this? Are you going to make all the networking code asynchronous?. What I’m talking about here of course doesn’t preclude stuff like fixing the fs layer, which is necessary for LLVM/Clang to be useful in this capability (at production quality) in the first place.

To recap:

  1. Get something working ASAP
  2. Iterate, dogfooding it every step of the way.
  3. Don’t add things in-tree that are outside the domain of what LLVM/Clang do (e.g. networking, GUIs, crypto, 3D-rendering, etc).
  4. In-tree changes should focus (as always) on increasing the utility/flexibility of the LLVM/Clang libraries to clients (e.g. can safely be used multithreaded, etc), and should be motivated by compelling use-cases derived from part 2)

–Sean Silva

Sorry for the tone there at the end. Sounds a bit ranty. After a bit of discussion on IRC with chandlerc et al. and some reflection, I guess my opposition to the networking aspect is mostly that it just seems (by gut-feeling) damn odd that LLVM would have a hand-rolled networking library inside of it, but I guess you folks certainly have the seniority w.r.t. gut-feeling for LLVM.

–Sean Silva

Greetings all!

What follows is a fairly lengthy and detailed design document for a proposed persistent Clang server (or clangd in unix-terms) to serve as infrastructure for increasingly advanced and interactive C++ tools. It should generalize and build upon libclang, and will allow us to effectively target Vim, Emacs, and other editors. This is something we’re planning to pursue in the near term, so I’d appreciate any and all feedback.

Here is a Google Docs link you can use to view and comment on the proposal:
https://docs.google.com/document/d/1kNv2jJK0I0JGnxJxU6w5lUOlrIBoecU4fi9d_o5e2-c/edit

Its interim home is on github here, where you can see the history and the actual rest version in all its glory:
https://github.com/chandlerc/llvm-designs/blob/master/ClangService.rst

I’ve also attached the text for email-based comments.

Very interesting! A couple of comments:

-Please do make sure that the IPC layer is high-level and general enough to allow implementing it using XPC services at the Mac platform.

  • Clients using only the narrow API should be able to switch trivially between the two libraries to get IPC vs. internal process behavior.

[…]

Wrapping the C++ client libraries will be a API and ABI stable C library. This will very closely resemble (and ideally end up largely source-compatible with) the highest-level libclang APIs

I doubt that being source-compatible between libclang API and clang-server API is a useful enough goal to justify limiting the design of the clang-server API to current libclang APIs.
Clients will end up designed specifically for libclang or the clang-server, (e.g shouldn’t a clang-server client be designed to be fault-tolerant ?) and IMHO the primary clang-server API should be designed to be asynchronous (“equivalent” synchronous functions can optionally be added on top).