RFC: Semantic highlighting

Hi,

I am starting to work on semantic highlighting in clangd.

There's currently no mention of that in LSP but it seems like this proposal is on the right path to get accepted:
https://github.com/Microsoft/vscode-languageserver-node/pull/367

TLDR: the proposal introduces new capabilities and server notification.
- New client boolean capability flag.
- New server capability - server shares a lookup table of semantic categories with client in response.
- Semantic categories are defined by TextMate scopes. Scope Naming
- Server sends notifications containing line number, column number, length and semantic category (index in the lookup table).
- Server is responsible for sending only delta information (not full document every time).
- Server is responsible for ignoring trivial shift edits to documents (e. g. newline at the start of the document).

My intention is to implement interface from the proposal.

Does anyone have any thoughts on this?

Thanks.

Jan

Does anyone have any thoughts on this?

Lack of semantic highlighting is the main reason I still use cquery rather than clangd for C++ development, so I'm very excited to see this implemented!

Cheers,
Nate

Sounds great, Jan.
This has been one of those things we’d hoped to get around to, but never quite made it to the top of the pile.
The protocol looks pretty sensible to me. I’m a bit nervous that there doesn’t seem to be a high-quality VSCode implementation to check our work against yet - I guess we can use Theia.
Still, I think the best way to start is go ahead with a minimal implementation (without worrying about deltas etc yet) to demonstrate it’ll work.

Some issues to think about:

What’s the basic implementation idea? Traverse the main-file AST after every edit, collect spans, diff them at the end? Do we do this synchronously as part of the diagnostics cycle? Latency seems likely to be ~1 second with this approach, and never under 500ms (because of the debouncing of continuous AST rebuilds).
I don’t see something fundamentally better, other than tracking dirty regions and only traversing decls that intersect them. My intuition says that wouldn’t save much, as we have to build the whole AST, which should be much slower than traversing it.

What to do before first parse? (In particular when preamble is slow)
I guess the client could fall back to its standard heuristic-parsing-based highlighting, use of TextMate scopes seems to suggest that.
An alternative: for more uniformity/control, we could instead implement such parsing on the server side as a cheap fallback when ASTs aren’t available. There may be other reasons to have such a heuristic parser. Probably doesn’t make much sense to worry about initially.

How are re-highlights around the cursor going to feel in practice? I think there’s a danger that if we highlight highlighting partially-formed expressions from a more-or-less-random point in editing, and the editor fills in the gaps, then we’ll end up with basically random flashing colors on the line being edited.
We should probably wait and see how it feels after a first implementation. If this is an issue, one idea would be to attempt some low-latency highlighting of the edited section (using heuristic parsing and/or the old AST) while we wait for the AST rebuild.

Sounds great, Jan.
This has been one of those things we’d hoped to get around to, but never quite made it to the top of the pile.
The protocol looks pretty sensible to me. I’m a bit nervous that there doesn’t seem to be a high-quality VSCode implementation to check our work against yet - I guess we can use Theia.

Our intended use-case is clangd talking to the recently announced SourceKit-LSP service:
https://github.com/apple/sourcekit-lsp

I haven’t really thought about details yet (including how to approach testing) but I assume we’ll have some integration tests in SourceKit-LSP. But it makes sense to test (at least manually) with some editor client too.

Still, I think the best way to start is go ahead with a minimal implementation (without worrying about deltas etc yet) to demonstrate it’ll work.

I agree.

Some issues to think about:

What’s the basic implementation idea? Traverse the main-file AST after every edit, collect spans, diff them at the end? Do we do this synchronously as part of the diagnostics cycle? Latency seems likely to be ~1 second with this approach, and never under 500ms (because of the debouncing of continuous AST rebuilds).
I don’t see something fundamentally better, other than tracking dirty regions and only traversing decls that intersect them. My intuition says that wouldn’t save much, as we have to build the whole AST, which should be much slower than traversing it.

Haven’t looked into this yet as I wanted to know what people think first. I’ll start by looking at how libclang does this.

What to do before first parse? (In particular when preamble is slow)
I guess the client could fall back to its standard heuristic-parsing-based highlighting, use of TextMate scopes seems to suggest that.
An alternative: for more uniformity/control, we could instead implement such parsing on the server side as a cheap fallback when ASTs aren’t available. There may be other reasons to have such a heuristic parser. Probably doesn’t make much sense to worry about initially.

I agree this is an enhancement which we can implement later.

How are re-highlights around the cursor going to feel in practice? I think there’s a danger that if we highlight highlighting partially-formed expressions from a more-or-less-random point in editing, and the editor fills in the gaps, then we’ll end up with basically random flashing colors on the line being edited.
We should probably wait and see how it feels after a first implementation. If this is an issue, one idea would be to attempt some low-latency highlighting of the edited section (using heuristic parsing and/or the old AST) while we wait for the AST rebuild.

Sounds reasonable - on server side we can either annotate the hot zone quickly or not annotate it at all.

Our intended use-case is clangd talking to the recently announced SourceKit-LSP service:
https://github.com/apple/sourcekit-lsp
I haven’t really thought about details yet (including how to approach testing) but I assume we’ll have some integration tests in SourceKit-LSP. But it makes sense to test (at least manually) with some editor client too.

Understood - I think a lot of the decisions about details will depend on how this feels in practice in an editor (latency, signal-vs-noise) so I think we’ll need it connected to a real editor to really understand. I guess there’s a prototype of XCode that can use this, but the rest of us need to find something to try it out on too :slight_smile:

No, it’s really the server who’s responsible for ignoring these. I guess it makes sense since client can’t just decide to not send 'textDocument/didChange’ to server as they would get out of sync.

https://github.com/Microsoft/vscode-languageserver-node/pull/367/files#diff-0253ecfa5ea7304bd5d168e2665e2fd1R54

Ah right - the server is responsible for deciding when to send a syntax update, and the client is responsible for interpolating in between.
So the server has the responsibility of guessing when the client’s interpolation will do a better job and it shouldn’t waste time calculating and/or sending a new notification.

Just FYI - due to change of plans we decided to postpone this :frowning: The main reason is that we want to have an editor against which we can test the feature first.

Jan

Just FYI - due to change of plans we decided to postpone this :frowning:
The main reason is that we want to have an editor against which
we can test the feature first.

FWIW, Eclipse CDT supports semantic highlighting, and it's hooked up in the LSP configuration to work with with cquery's $cquery/publishSemanticHighlighting message.

It should be straightforward to adapt the implementation to work with a slightly different protocol, such as the one proposed for clangd. I will even volunteer to do the client-side work involved in this adaptation, if that convinces you to continue working on the server-side feature :slight_smile:

Regards,
Nate

Just FYI - due to change of plans we decided to postpone this :frowning:
The main reason is that we want to have an editor against which
we can test the feature first.

FWIW, Eclipse CDT supports semantic highlighting, and it's hooked up in the LSP
configuration to work with with cquery's $cquery/publishSemanticHighlighting
message.

It should be straightforward to adapt the implementation to work with a slightly
different protocol, such as the one proposed for clangd. [...]

Actually, since writing this, I realized that Theia [1] has had an implementation [2] of the semantic highlighting protocol proposed for LSP.

Does that address your need for an editor against which to test the feature?

Regards,
Nate

[1] https://www.theia-ide.org/
[2] https://github.com/theia-ide/theia/issues/1845

Actually, since writing this, I realized that Theia [1] has had an implementation
[2] of the semantic highlighting protocol proposed for LSP.

I meant to say "has had one for a while". It seems to have been implemented concurrently with the LSP proposal.

Regards,
Nate

Thanks for the notice Nathan!

Due to the change of plans I am now busy upstreaming our index-while-building implementation in clang. I don't want to speculate on our plans but having a semantic highlighting in clangd is still really important for us.

Thanks.

Jan

In case anyone's interested, I have a work-in-progress prototype implementation of the LSP semantic highlighting extension:

https://reviews.llvm.org/D61842

I'm not sure when I'll have a chance to resume working on it, so I thought I'd post it in case someone was interested in picking up work on it. Please see the revision summary for its current status if you're interested.

Thanks,
Nate

Hi Nate,

This looks interesting! We're definitely interested in semantic highlighting in clangd so if you don't get back to it first we might pick up your patch.

In the meantime - if it helps I'm willing to review your patches. Do you think it might make sense to land this incrementally before the full feature is finished?

Thanks,

Jan

Hi Jan,

In the meantime - if it helps I'm willing to review your patches.
Do you think it might make sense to land this incrementally before
the full feature is finished?

I appreciate the offer to review! I don't think the patch is quite ready for it yet, but if/when I do end up getting back to it, I hope to take you up on it.

Thanks,
Nate

Hi, Nathan

I think this is a good start! And we’ll have an intern starting from June, and he will work on the semantic highlighting stuff.