Network RPCs in LLVM projects

Short version: clangd would like to be able to build a client+server that can make RPCs across the internet. An RPC system isn’t a trivial dependency and rolling our own from scratch isn’t appealing.
Have other projects had a need for this? Any advice on how to approach such dependencies?

Short version: clangd would like to be able to build a client+server that can make RPCs across the internet. An RPC system isn’t a trivial dependency and rolling our own from scratch isn’t appealing.
Have other projects had a need for this? Any advice on how to approach such dependencies?

Longer: clangd (a language server, like an IDE backend) builds an index of the project you’re working on in order to answer queries (go to definition, code completion…). This takes lots of CPU-time to build, and RAM to serve.
For large codebases with many developers, sharing an index across users is a better approach - you spend the CPU in one place, you spend the RAM in a few places, and an RPC is fast enough even for code completion. We have experience with this approach inside Google.

We’d like to build this index server upstream (just a shell around clangd’s current index code) and put the client in clangd. For open-source projects, I imagine the server being publicly accessible over the internet.

This means we care about

  • latency (this is interactive, every 10ms counts)
  • security
  • proxy traversal, probably
  • sensible behavior under load
  • auth is probably nice-to-have

I don’t think this is something we want to build from scratch, I hear portable networking is hard :slight_smile:

It really isn’t that bad. Just as a note, LLDB does have portable socket communication already, so it could be a refactor and reuse exercise rather than building from scratch.

The most obvious thing is to depend on something like Thrift, grpc, etc, but these aren’t trivial dependencies to take on. They could probably be structured as an optional CMake dependency, which we’d want to ask distributors to enable.

This is possible, but adding large and non-standard external dependencies have significant drawbacks for distribution.

Have other projects had anything like these requirements? Any solutions, or desire to use such infrastructure? I saw some RPC layer in ORC, but it seems mostly abstract/FD-based IPC.

The ORC RPC layer in-tree runs over sockets, but I’ve implemented it to run over XPC (a Darwin low-latency IPC mechanism). It is actually a really useful abstraction over true remote procedure calls.

-Chris

The most obvious thing is to depend on something like Thrift, grpc, etc, but these aren’t trivial dependencies to take on.

I would recommend against using Apache Thrift unless you are able to recruit a larger community for that project. I am on the project management committee of Apache Thrift, and I do not feel that it is organizationally prepared to handle a client like LLVM.

Note that I am specifically referring to Apache Thrift. I take no stance on fbthrift, or any of the other Thrift branches or forks.

It sounds as if the clangd index server is supposed to work across the open internet, which effectively means it needs to speak HTTPS. That’s not really something that you can just write.

I think speaking over the open internet means they need TLS, which OpenSSL should be a sufficient and reasonable dependency for. Are they really talking HTTP? If so, I would have expected different suggestions for open source project dependencies because I’m not sure Thrift builds in HTTP support (gRPC does).

-Chris

Short version: clangd would like to be able to build a client+server that can make RPCs across the internet. An RPC system isn't a trivial dependency and rolling our own from scratch isn't appealing.
Have other projects had a need for this? Any advice on how to approach such dependencies?

--

Longer: clangd (a language server, like an IDE backend) builds an index of the project you're working on in order to answer queries (go to definition, code completion...). This takes *lots* of CPU-time to build, and RAM to serve.
For large codebases with many developers, sharing an index across users is a better approach - you spend the CPU in one place, you spend the RAM in a few places, and an RPC is fast enough even for code completion. We have experience with this approach inside Google.

We'd like to build this index server upstream (just a shell around clangd's current index code) and put the client in clangd. For open-source projects, I imagine the server being publicly accessible over the internet.
This means we care about
- latency (this is interactive, every 10ms counts)
- security
- proxy traversal, probably
- sensible behavior under load
- auth is probably nice-to-have

I don't think this is something we want to build from scratch, I hear portable networking is hard :slight_smile:

It really isn't that bad. Just as a note, LLDB does have portable socket communication already, so it could be a refactor and reuse exercise rather than building from scratch.

It sounds as if the clangd index server is supposed to work across the open internet, which effectively means it needs to speak HTTPS. That's not really something that you can just write.

I think speaking over the open internet means they need TLS, which OpenSSL should be a sufficient and reasonable dependency for. Are they really talking HTTP? If so, I would have expected different suggestions for open source project dependencies because I'm not sure Thrift builds in HTTP support (gRPC does).

Just commenting as a distant observer interested in the high level
topic of serialization and RPCs: Does the C++ necessarily have to talk
HTTP? For example, if the system can be structured in such a way that
a C++ engine can talk to a local socket, then one can build a HTTP
capable server in Python to communicate with the C++ engine over the
local socket. If structured this way, the server code need not stay in
the LLVM tree. And, the users are free to build the real server in a
manner that suits them.

Short version: clangd would like to be able to build a client+server that can make RPCs across the internet. An RPC system isn’t a trivial dependency and rolling our own from scratch isn’t appealing.
Have other projects had a need for this? Any advice on how to approach such dependencies?

Longer: clangd (a language server, like an IDE backend) builds an index of the project you’re working on in order to answer queries (go to definition, code completion…). This takes lots of CPU-time to build, and RAM to serve.
For large codebases with many developers, sharing an index across users is a better approach - you spend the CPU in one place, you spend the RAM in a few places, and an RPC is fast enough even for code completion. We have experience with this approach inside Google.

We’d like to build this index server upstream (just a shell around clangd’s current index code) and put the client in clangd. For open-source projects, I imagine the server being publicly accessible over the internet.
This means we care about

  • latency (this is interactive, every 10ms counts)
  • security
  • proxy traversal, probably
  • sensible behavior under load
  • auth is probably nice-to-have

I don’t think this is something we want to build from scratch, I hear portable networking is hard :slight_smile:

It really isn’t that bad. Just as a note, LLDB does have portable socket communication already, so it could be a refactor and reuse exercise rather than building from scratch.

It sounds as if the clangd index server is supposed to work across the open internet, which effectively means it needs to speak HTTPS. That’s not really something that you can just write.

I think speaking over the open internet means they need TLS, which OpenSSL should be a sufficient and reasonable dependency for. Are they really talking HTTP? If so, I would have expected different suggestions for open source project dependencies because I’m not sure Thrift builds in HTTP support (gRPC does).

Just commenting as a distant observer interested in the high level
topic of serialization and RPCs: Does the C++ necessarily have to talk
HTTP? For example, if the system can be structured in such a way that
a C++ engine can talk to a local socket, then one can build a HTTP
capable server in Python to communicate with the C++ engine over the
local socket. If structured this way, the server code need not stay in
the LLVM tree. And, the users are free to build the real server in a
manner that suits them.

I think you’re making exactly the point I was trying to make. I don’t think clangd needs HTTP. I think they need TLS. I agree TLS isn’t something LLVM should implement from scratch, but there are mature libraries that provide TLS on every platform (like OpenSSL).

-Chris