Adding a third-party dependency in clang-tools-extra

clangd communicates with an editor via JSON-RPC. It parses JSON with YAMLParser, which is awkward, and generates JSON with printf and friends, which is miserable. Much of LLVM does things this way, but clangd does it a lot.

I’d like to try replacing this with a JSON library. nlohmann/json[1] seems like a reasonable fit: C++11 with exceptions optional, simple build, MIT license.

I’d propose vendoring it under tools/clang/tools/extra/clangd/nlohmann-json so there’s no question of it “leaking” into runtimes as described in this thread[2].
This also means it wouldn’t solve llvm’s general JSON-parser problem :slight_smile:

Any LLVM-level objections or concerns? (Whether that library is the right technical choice for clangd can be discussed elsewhere, I think)
If anyone wants to argue that we shouldn’t bury it in clang/tools/extra/clangd, that’s fine too!

Cheers, Sam

[1] https://github.com/nlohmann/json/blob/develop/src/json.hpp
[2] https://groups.google.com/d/msg/llvm-dev/2JHX3smXTpE/U-E32Yg0AAAJ

clangd communicates with an editor via JSON-RPC. It parses JSON with YAMLParser, which is awkward, and generates JSON with printf and friends, which is miserable. Much of LLVM does things this way, but clangd does it a lot.

I’d like to try replacing this with a JSON library. nlohmann/json[1] seems like a reasonable fit: C++11 with exceptions optional, simple build, MIT license.

I’d propose vendoring it under tools/clang/tools/extra/clangd/nlohmann-json so there’s no question of it “leaking” into runtimes as described in this thread[2].
This also means it wouldn’t solve llvm’s general JSON-parser problem :slight_smile:

Any LLVM-level objections or concerns? (Whether that library is the right technical choice for clangd can be discussed elsewhere, I think)
If anyone wants to argue that we shouldn’t bury it in clang/tools/extra/clangd, that’s fine too!

Generally, I feel like we have continued to find JSON and YAML uses in LLVM. I think it would be a shame to have more code in that space in the repository.

But that brings us to another problem: outside of tests, we really shouldn’t add more third-party code with different licenses to LLVM. As a compiler, LLVM has rather unique licensing requirements. So while this might be “fine” inside of clangd, if it moves elsewhere (and in many ways it should!) it would become a problem. Worse, it might be easily missed, or accidentally end up being used elsewhere.

I would personally be fairly reluctant to take this on unless there is a pretty huge reason why it is needed. For example, if we had a absolute need to do proper XML parsing and manipulation, the amount of code required for that would be untenable without using one of the existing XML libraries. LLDB for example actually does use an XML library IIRC.

I’m hoping that the problem domain here is substantially simpler and it is tenable (if never really appealing) to just roll our own…

We already had a JSON parser at some point, but that was deleted as YAML was considered a replacement (https://reviews.llvm.org/rL146735).
Regarding writing out JSON with a library, I’d be curious in the design - I do agree with Chandler that all designs I’d come up with are rather simple.

That said, I’d be curious to see how code that is written against nlohmann/json would look compared to code that’s currently written.

If compelling, could an alternative be to make it a build time dep for folks wanting to build clangd, as opposed to putting it in svn?

Fair enough - we do need JSON facilities in many places.
I’m both surprised and unsurprised that licensing is a concern here :slight_smile:

https://reviews.llvm.org/D39098 includes some conversion of serialization code.

Mostly it’s not shorter, just a lot easier and safer.

I’ll convert some parsing code tomorrow, I expect bigger simplifications there.

I do think it’s well-designed. If writing something for LLVM from scratch, it’d likely be similar, just better integrated with ADT, SourceMgr etc.

Maybe - what problem does that solve?

Fair enough - we do need JSON facilities in many places.
I’m both surprised and unsurprised that licensing is a concern here :slight_smile:

Did you look at the one that I referenced that was already in LLVM at some point?

FWIW, while I don’t like this personally, I also don’t see any real problem with this.

clangd communicates with an editor via JSON-RPC. It parses JSON with
YAMLParser, which is awkward, and generates JSON with printf and friends,
which is miserable. Much of LLVM does things this way, but clangd does it a
lot.

I'd like to try replacing this with a JSON library. nlohmann/json[1]
seems like a reasonable fit: C++11 with exceptions optional, simple build,
MIT license.

I'd propose vendoring it under tools/clang/tools/extra/clangd/nlohmann-json
so there's no question of it "leaking" into runtimes as described in this
thread[2].
This also means it wouldn't solve llvm's general JSON-parser problem :slight_smile:

Any LLVM-level objections or concerns? (Whether that library is the
right technical choice for clangd can be discussed elsewhere, I think)
If anyone wants to argue that we *shouldn't* bury it in
clang/tools/extra/clangd, that's fine too!

Generally, I feel like we have continued to find JSON and YAML uses in
LLVM. I think it would be a shame to have more code in that space in the
repository.

But that brings us to another problem: outside of tests, we really
shouldn't add more third-party code with different licenses to LLVM. As a
compiler, LLVM has rather unique licensing requirements. So while this
might be "fine" inside of clangd, if it moves elsewhere (and in many ways
it should!) it would become a problem. Worse, it might be easily missed, or
accidentally end up being used elsewhere.

Fair enough - we do need JSON facilities in many places.
I'm both surprised and unsurprised that licensing is a concern here :slight_smile:

I would personally be fairly reluctant to take this on unless there is a
pretty huge reason why it is needed. For example, if we had a absolute need
to do proper XML parsing and manipulation, the amount of code required for
that would be untenable without using one of the existing XML libraries.
LLDB for example actually does use an XML library IIRC.

I'm hoping that the problem domain here is substantially simpler and it
is tenable (if never really appealing) to just roll our own....

Of course, my first inclination was to start writing one, and I had to

restrain myself!

Happy to have a crack at this and start a bikeshed thread over the design.
My main concerns:
- compromising ease-of-use to satisfy every use case in LLVM. In
particular, I really want an eager parser rather than the streaming style
of YAMLparser.

Did you look at the one that I referenced that was already in LLVM at some
point?

Yes - it seems a little easier to use than YAMLParser (forward iterators
rather than input iterators, easy to validate the whole document up front).
But the common things are still awkward, particular random access of object
properties.
And the ownership model (everything owned by the Parser) means you can't
use and compose JSON objects as value types.
(e.g. RequestContext::reply in https://reviews.llvm.org/D39098, where ID is
an arbitrary subtree of an earlier parsed document, and Result could/should
be passed by value).
Something closer to nlohmann seems worthwhile to me even if allocations
aren't optimal - but that might be a tough sell for a general LLVM support
lib.

No, it doesn’t address it. It contains it.

As source code, the license issue is hard to contain. But linking against a system library has relatively limited license implications. LLDB does this and we’ve done it elsewhere. It is a relatively constrained and well understood thing compared to embedding source code.

Yeah. Its something that we avoid for the convenience reasons among other issues.

As for how to weigh it against having to implementing something that already exists in the world, dunno.