RFC: Adding a JSON library to LLVM Support

We don’t have a dedicated JSON library in the LLVM tree, I’d like to add one. The pressing need is for clangd, but we have other tools that read/write JSON too.

I’m proposing we write a new one, rather than importing a third-party library (licensing/integration reasons), on or extending YamlParser/YamlIO (usability/flexibility). I lean towards a design that parses a full DOM at once, and provides literal-like syntax for composing documents.

I’ve written a document laying out the reasons for taking this path, and my proposal for a design (with links to a prototype)
https://docs.google.com/document/d/1OEF9IauWwNuSigZzvvbjc1cVS1uGHRyGTXaoy3DjqM4/edit

(Comments are enabled, but high-level discussion probably belongs on the list instead.)

What do you think? I’m particularly interested in which parts of LLVM produce/consume JSON (or might want to), and what they need. I’m mostly familiar with the stuff in clang-tools-extra.

Cheers, Sam

LLDB has a JSON parser/serializer that we'd like to get rid of. The
main thing that's missing from YamlIO for us to do that is the ability
to handle (or at very least, not choke on) unknown keys (for example,
keys that are only generated by an lldb-server from the future --
backwards/forwards compatibility).

LLDB has a JSON parser/serializer that we'd like to get rid of.

Ah, sorry for missing that!
What makes you want to get rid of it? Maintenance/duplication, or things
you don't like about using it?
Do you see it as a suitable thing to "promote" up to Support for wider use,
instead?

The
main thing that's missing from YamlIO for us to do that is the ability
to handle (or at very least, not choke on) unknown keys (for example,
keys that are only generated by an lldb-server from the future --
backwards/forwards compatibility).

Interesting - that certainly sounds doable (not knowing the details).
You'd still need a way to serialize though?

LLDB has a JSON parser/serializer that we'd like to get rid of.

Ah, sorry for missing that!
What makes you want to get rid of it? Maintenance/duplication, or things you
don't like about using it?
Do you see it as a suitable thing to "promote" up to Support for wider use,
instead?

I don't think the json library in lldb is in a shape that can be
easily promoted to llvm/Support (see previous attempt here
<https://reviews.llvm.org/D24369&gt;\). I think you'd be better off
writing a new one from scratch (or reusing parts of YamlIO).

The
main thing that's missing from YamlIO for us to do that is the ability
to handle (or at very least, not choke on) unknown keys (for example,
keys that are only generated by an lldb-server from the future --
backwards/forwards compatibility).

Interesting - that certainly sounds doable (not knowing the details).
You'd still need a way to serialize though?

For the lldb <-> lldb-server communication, the ability to ignore the
parts of the message we don't understand would be enough, but I do not
fully understand all the different ways in which it is used. However,
I think that at least in some cases we have a field in a json object,
which is completely opaque to us, but we are expected to take it, and
pass it on to someone. I'm looping in lldb-dev, hopefully someone can
elaborate on the requirements.

We don’t have a dedicated JSON library in the LLVM tree, I’d like to add one. The pressing need is for clangd, but we have other tools that read/write JSON too.

I’m proposing we write a new one, rather than importing a third-party library (licensing/integration reasons), on or extending YamlParser/YamlIO (usability/flexibility). I lean towards a design that parses a full DOM at once, and provides literal-like syntax for composing documents.

I’ve written a document laying out the reasons for taking this path, and my proposal for a design (with links to a prototype)
https://docs.google.com/document/d/1OEF9IauWwNuSigZzvvbjc1cVS1uGHRyGTXaoy3DjqM4/edit

(Comments are enabled, but high-level discussion probably belongs on the list instead.)

What do you think? I’m particularly interested in which parts of LLVM produce/consume JSON (or might want to), and what they need. I’m mostly familiar with the stuff in clang-tools-extra.

llvm-cov exports coverage data to JSON. The exporter isn’t a big maintenance burden, but it would be nice not to ‘printf’ the output, and to instead have assurance that something else will properly structure/escape the data. Concretely: we could get rid of some tests that check that the JSON is structurally valid, and only keep tests that check that the output is complete.

vedant

The technical problems you've listed for YAMLIO aren't fundamental to the
design of the library, and wouldn't be too hard to fix (it already has
output support). However the usability concerns you have are pretty
fundamental to the design, as it's designed for strong typing as opposed to
exposing a DOM.

An alternative I would suggest looking at is adding a JSON mode to
YamlParser and then adding a YAML compatible DOM API similar to what you've
proposed on top of that. I could see this being more complex/less usable
than a dedicated JSON parser, but just want to make sure it's weighed
against the cost of adding yet another parser.

- Michael Spencer

We don't have a dedicated JSON library in the LLVM tree, I'd like to add
one. The pressing need is for clangd, but we have other tools that
read/write JSON too.

I'm proposing we write a new one, rather than importing a third-party
library (licensing/integration reasons), on or extending YamlParser/YamlIO
(usability/flexibility). I lean towards a design that parses a full DOM at
once, and provides literal-like syntax for composing documents.

I've written a document laying out the reasons for taking this path, and
my proposal for a design (with links to a prototype)
https://docs.google.com/document/d/1OEF9IauWwNuSigZzvvbjc1cV
S1uGHRyGTXaoy3DjqM4/edit
(Comments are enabled, but high-level discussion probably belongs on the
list instead.)

What do you think? I'm particularly interested in which parts of LLVM
produce/consume JSON (or might want to), and what they need. I'm mostly
familiar with the stuff in clang-tools-extra.

Cheers, Sam

_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
llvm-dev Info Page

The technical problems you've listed for YAMLIO aren't fundamental to the
design of the library, and wouldn't be too hard to fix (it already has
output support).

Right! Sorry if that was misleading: I meant it can't be used for writing
_JSON_ today (whereas reading does work).

However the usability concerns you have are pretty fundamental to the
design, as it's designed for strong typing as opposed to exposing a DOM.

Yeah. The tagged-union format is unfortunate in its requirement for out of
order parsing (though it is strongly typed!)
The protocol is driven by the fact that everyone *else* uses DOM parsers, I
think.

An alternative I would suggest looking at is adding a JSON mode to

YamlParser and then adding a YAML compatible DOM API similar to what you've
proposed on top of that. I could see this being more complex/less usable
than a dedicated JSON parser, but just want to make sure it's weighed
against the cost of adding yet another parser.

Sharing the parser might make sense. On the other hand the JSON grammar is
so simple that JSON enforcement in YAMLParser may be more complex than a
standalone parser, and would certainly be slower. I'll do some experiments
here.
I'm less convinced about a YAML-compatible DOM: tree vs arbitrary graph is
a pretty big difference, to start.

Polly has JsonCpp (https://github.com/open-source-parsers/jsoncpp, MIT
licensed) in its source tree, or optionally uses the one installed on
the host system.

If we add an "official" one, Polly might use it as well.

Michael

We don't have a dedicated JSON library in the LLVM tree, I'd like to add
one. The pressing need is for clangd, but we have other tools that
read/write JSON too.

I'm proposing we write a new one, rather than importing a third-party
library (licensing/integration reasons), on or extending YamlParser/YamlIO
(usability/flexibility). I lean towards a design that parses a full DOM at
once, and provides literal-like syntax for composing documents.

I've written a document laying out the reasons for taking this path, and
my proposal for a design (with links to a prototype)
https://docs.google.com/document/d/1OEF9IauWwNuSigZzvvbjc1cV
S1uGHRyGTXaoy3DjqM4/edit
(Comments are enabled, but high-level discussion probably belongs on the
list instead.)

What do you think? I'm particularly interested in which parts of LLVM
produce/consume JSON (or might want to), and what they need. I'm mostly
familiar with the stuff in clang-tools-extra.

Cheers, Sam

_______________________________________________
LLVM Developers mailing list
llvm-dev@lists.llvm.org
llvm-dev Info Page

The technical problems you've listed for YAMLIO aren't fundamental to the
design of the library, and wouldn't be too hard to fix (it already has
output support).

Right! Sorry if that was misleading: I meant it can't be used for writing
_JSON_ today (whereas reading does work).

Ah, yeah.

However the usability concerns you have are pretty fundamental to the

design, as it's designed for strong typing as opposed to exposing a DOM.

Yeah. The tagged-union format is unfortunate in its requirement for out of
order parsing (though it is strongly typed!)
The protocol is driven by the fact that everyone *else* uses DOM parsers,
I think.

Tagged union works with YAMLIO, the MachO YAML interface uses it (see
MappingTraits<MachOYAML::LoadCommand>::mapping). By usability I was mainly
referring to the verbosity and non-JSONness of YAMLIO.

An alternative I would suggest looking at is adding a JSON mode to

YamlParser and then adding a YAML compatible DOM API similar to what you've
proposed on top of that. I could see this being more complex/less usable
than a dedicated JSON parser, but just want to make sure it's weighed
against the cost of adding yet another parser.

Sharing the parser might make sense. On the other hand the JSON grammar is
so simple that JSON enforcement in YAMLParser may be more complex than a
standalone parser, and would certainly be slower. I'll do some experiments
here.
I'm less convinced about a YAML-compatible DOM: tree vs arbitrary graph is
a pretty big difference, to start.

Thanks for looking into this.

- Michael Spencer

We don't have a dedicated JSON library in the LLVM tree, I'd like to
add one. The pressing need is for clangd, but we have other tools that
read/write JSON too.

I'm proposing we write a new one, rather than importing a third-party
library (licensing/integration reasons), on or extending
YamlParser/YamlIO (usability/flexibility). I lean towards a design
that parses a full DOM at once, and provides literal-like syntax for
composing documents.

Sorry for the late response (too late?).

There is a "modern", header-only, well-tested JSON-library which I'm
using in several projects. It has a very active user- and
contributor-base.

  https://github.com/nlohmann/json

It's awesome! :wink: (MIT-licensed)

I've written a document laying out the reasons for taking this path,
and my proposal for a design (with links to a prototype)
JSON library for LLVM proposal - Google Docs
(Comments are enabled, but high-level discussion probably belongs on
the list instead.)

What do you think? I'm particularly interested in which parts of LLVM
produce/consume JSON (or might want to), and what they need. I'm
mostly familiar with the stuff in clang-tools-extra.

CMake's compile-command output is in JSON. We are using it to quickly
get the whole compile-line (when compiling our project with clang) to
add the flags needed to generate the .ll .

regards,

Finally following up here…

We ended up writing a JSON parser and checking it in under clangd/. We’ve been using that for a while now without hitting new problems.
The header is here: https://reviews.llvm.org/source/clang-tools-extra/browse/clang-tools-extra/trunk/clangd/JSONExpr.h
It’s a DOM-based approach with objects on the heap. There’s no streaming parser, though one could be bolted on for e.g. long arrays. There are some weak conventions around marshalling to structs with ADL functions. Compared to what was described in my doc, it’s simpler but less efficient, dropping the arena representation.

(The reasons for not reusing an existing library were covered here: https://groups.google.com/forum/#!topic/llvm-dev/5rryBKDY8eY)

There’s at least some desire to reuse this more widely (I just talked to Pavel about LLDB). So I’d propose lifting this up to llvm/Support/JSON.h. (Dependencies are only Support and ADT).

Let me know if anyone has concerns!

Just pinging this as https://reviews.llvm.org/D45753 is now open.
So if anyone has opinions, that’s a good place for them.

(I’m also trying to identify the right reviewer for suitability in llvm/Support.)

And one last followup :slight_smile:

The JSON library is now available as llvm/Support/JSON.h:
https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/include/llvm/Support/JSON.h

Try it out, happy to answer questions or review patches.

Thanks a lot to to everyone for help with review and suggestions!