I’ve been keeping my eye on the cpp2/cppfront project for a year or two now and it looks like its made a lot of cool progress. Some of us contributors on that repo were discussing the idea of trying to write an LSP server for cpp2 so we can start to see a nicer experience in the IDE.
For those who don’t know, cpp2 is an experiment/project by Herb Sutter to create a new syntax that transpiles down to plain c++ code, similar to the way typescript works with javascript. cppfront is the cpp2 compiler, which does the job of converting a cpp2 file to a normal cpp file.
Almost the main question
After spending some time thinking about it, I think it may be the case that we don’t actually need to create a full-blown LSP server at all. Intuitively (and probably naively), it seems to me like this can be made to work directly in clangd: all we need to do is preprocess cpp2 files with cppfront, and then pipe those results into clangd as if they had been c++ files in the first place.
I know this is probably a drastic simplification of what really needs to happen, but I know clangd supports a couple of different languages, like cuda and objective-c, so I figured maybe I can replicate whats happening with those to work with cpp2.
TL;DR - The main question
Can someone help provide some direction on how I might add support for cpp2 into clangd?
I’ve cloned the llvm repo and I’ve been reading through the source but obviously its a huge project! In the name of saving time, I thought getting some advice might be a good idea. I know trying to merge changes like this back into clang trunk may not be acceptable at this time since cpp2 is relatively experimental, but I’d like to try it out as a proof of concept.
Any advice or direction anyone could provide would be greatly appreciated
It’s conceivable for a language server for a language that transpiles to C++ to work this way, but I think that would still involve a nontrivial amount of work.
Let’s consider a specific language server feature like “go to definition” as an example. You invoke it at a source location in your cpp2 source file (where there is a reference to some symbol), and you want to get another source location in a cpp2 source file (where the definition of that symbol is located).
Clangd meanwhile implements a “go to definition” operation that takes a source location in a C++ source file as input, and produces another source location in a C++ source file as output.
So to make this work, you’d need some sort of mapping from the location of every token in the cpp2 source file, to the location of a corresponding token (for some sense of “corresponding”) in the generated C++ source file. You can then:
Translate the source from cpp2 to C++ (and presumably generate the mapping at the same time).
Do a lookup in the mapping to translate your input cpp2 source location to a C++ source location.
Run clangd with the C++ source location as input to give you the C++ source location of the definition.
Do a reverse lookup in the mapping to translate the C++ source location of the definition back to a cpp2 source location, which is then your result.
If you’re looking to use cppfront to do step (1), you may need to fork it to produce the needed source location mapping (unless it happens to already do that).
Note that these work differently than what we are discussing above: these languages are not transpiled to C++, but parsed directly into an AST by the clang frontend.
That would be the other approach to supporting cpp2 in clangd, to add support for it in the clang frontend first.
Thanks for that thorough response @HighCommander4, I really appreciate it!
cppfront does generate some form of mapping in the cpp files that points back to where things are happening in the cpp2 file. For example, if I have the following cpp2 file
//cppfront allows c++ syntax to sit alongside cpp2 syntax
int SumNum(int x, int y) {
return x + y;
}
//cpp2 style main function
main: () -> int = {
x: int = 12;
y := SumNum(x, 15);
std::cout << "Hello world! " << x;
return 0;
}
cppfront yeilds the following cpp file
//=== Cpp2 type declarations ====================================================
#include "cpp2util.h"
#line 1 ".\\src\\test.cpp2"
//=== Cpp2 type definitions and function declarations ===========================
#line 1 ".\\src\\test.cpp2"
//cppfront allows c++ syntax to sit alongside cpp2 syntax
int SumNum(int x, int y) {
return x + y;
}
//cpp2 style main function
#line 5 ".\\src\\test.cpp2"
[[nodiscard]] auto main() -> int;
//=== Cpp2 function definitions =================================================
#line 1 ".\\src\\test.cpp2"
#line 5 ".\\src\\test.cpp2"
[[nodiscard]] auto main() -> int{
int x {12};
auto y {SumNum(x, 15)};
std::cout << "Hello world! " << cpp2::move(x);
return 0;
}
Whether thats good enough to accurately map source code back to cpp2, I’m unsure.
Sorry, what I meant here was not to actually replicate what they’re doing, but (assuming there are distinct “modules” for each language clangd handles), I figured I could study those to figure out a starting point for implementing cpp2 processing.
Not really. From an AST point of view, Objective-C support basically amounts to some new types of Objective-C specific nodes in the AST (e.g. ObjCMethodDecl). Producing these is handled by the clang frontend.
From clangd’s point of view, things either “just work” (e.g. the AST traversal functionality provided by the clang frontend ensure that statements inside an Objective-C method look the same to clangd as statements inside a C/C++ function), or in some cases where e.g. clangd code has to handle different AST node types, handling is added for the Objective-C node types as well.
I see, gotcha! Seems like I might be barking up the wrong tree from the sound of things then. Its probably easier to write a lsp server specialized towards cpp2 that just calls into a c++ compiler rather than trying to modify clangd/clang directly.
I’m wondering if we have a bigger question here: how do we want to deal with compatible languages. I’m not just seeing this question for just cpp2. I think the same question would hold for Carbon, as well as older languages like lex/flex and yacc/bison.
The big advantage that I see is that Clangd is completely able to understand C++ code. As you most likely want to be able to do autocomplete with C++ symbols, it would make sense to delegate that to clangd in some way. The other way around is less critical as all those languages generate some C++ code that can be parsed by clangd. Assuming the user triggers that generation by doing some build and clang monitoring creation/modification of files. (I have such changes locally that I’m testing on a large codebase. Once I have real usage experience, I can share it, though it’s trivial to implement)
Back to your original concern, you will be needing a complete LSP that can handle the cpp1 code in a cpp2 file. For that, it might make sense to redirect that in some way to clangd. Either by extending it, or by running a clangd instance and forwarding a preprocessed/filtered file and commands.
Other requests, like find references, rename … might benefit as well from some shared approach. Without it, you can have a perfect LSP for cpp2 code that is incompatible with cpp1 code. Which defeats the purpose of cpp2, aka being compatible with cpp1.
As such, I feel like adding Cpp2 support in the parser might be the best approach, even if you don’t support compilation of it. Though if you want to go that route, I suspect a larger discussion will be needed. For example by posting something on the clang-part of the discourse.
If I reflect again about the other languages, lex/bison has almost verbose copy/paste (except $1…), so there a simple redirect with a new minor mappings would be possible. I don’t have sufficient insight in Carbon to have ideas on implementation, though I wouldn’t be surprised to see a completely separate LSP as they don’t mix with cpp1 syntax. So for now, it’s safe to assume the cpp2 case is special in that regard due to cpp1 code in the file.
Yea it seems to me like its a much easier option to have a dedicated LSP server for cpp2 rather than trying to make some deeply integrated change into clang/clangd itself. If clang had some modular type of plugin-system where you could define some pipeline that preprocessed code up to a point where clang could just consume it without any issues, then I could see it being a lot more doable, but anything else seems too invasive.
I think it makes more sense to have a cpp2 lsp server in which users can reference/pick the compiler they’re using, and the server then manages the mappings between cpp2 and cpp1 code.
And from what I’ve seen of cppfront, it doesn’t catch every compiler error/diagnostic anyways; it definitely does some amount of deferring to c++ compilers to finish the job, so I think it will need to point to a c++ compiler in any case
I think that it’s safe to say that clangd is the clang compiler without the codegen. (And clang-format and clang-tidy)
It might be worth testing your idea locally for feasibility. Clangd receives the sources via it’s commands. So if you write a small python program that launches clangd and intercepts the requests to do the transformation, you most likely have a POC within the week. Though I think that due to the mix cpp1/cpp2, you’ll find that you’ll require a lot more than just the preprocessing.