The proposal: The C++11 Migrator has a need to write Replacement data: offset, length, and replacement text, to disk. The replacement data describes changes made to a header while transforming one or more TU's. All the replacement data would be gathered up after an entire code-base is transformed by a separate tool and merged together to produce actual changes to headers. So the point is to serialize Replacement data as a form of inter-process communication using the file system as the communication link. Real inter-process communication is a possibility but not portable.
YAML was suggested off-hand in some past patch discussion. Support for reading/writing YAML exists in LLVM. YAML is also meant for human-readable data serialization. It seems like a good choice.
It's worth nothing we don't care to make these change description files full of serialized replacements available to general-purpose diff/merge tools. The only intended consumer is the C++11 Migrator Post-Processor. A prototype of this tool exists at https://github.com/revane/migmerge_git but is soon to be replaced by a tool that has no third-party dependencies so it can live in clang-tools-extra with the migrator.
Are there any other suggestions to address the flow of data between migrator and post-processor? Comments?
YAML was added solely as a test format for lld. It was sold as "better JSON" in that context, in that it had comments and references. I agree with those properties.
Personally, after decades of working with Make, I have decided that I am allergic to significant whitespace. YAML has this misfeature. Yes, this is a bikeshed.
I have wanted for some time for rewriter tools to use diff output that can be used with existing review tools. If a merge tool is created that generates diff, I'd probably be less concerned, but I'd still want a way to handle this self-contained in the LLVM/Clang frameworks without a separate merge process.
From: Alex Rosenberg [mailto:alexr@ohmantics.com]
Sent: Thursday, August 01, 2013 1:32 AM
To: Vane, Edwin
Cc: Clang Dev List (cfe-dev@cs.uiuc.edu)
Subject: Re: [cfe-dev] RFC: YAML as an intermediate format for
clang::tooling::Replacement data on disk
...
I have wanted for some time for rewriter tools to use diff output that can be
used with existing review tools. If a merge tool is created that generates diff, I'd
probably be less concerned, but I'd still want a way to handle this self-contained
in the LLVM/Clang frameworks without a separate merge process.
This can still happen. What you're describing is orthogonal to this proposal about using YAML as an intermediate representation of serialized data between the migrator and replacement coalescing tool. I don't think anything here will block your desire from coming true.
Speaking of JSON, this was suggested on IRC. It seems like it would be just as fine as YAML for this situation since there's no need for references or comments (yet, anyway). However, LLVM has no generic JSON parser, just a specific implementation for compilation databases. The YAML reader/writer that LLVM provides is completely generic and available right now even if it provides some features we don't need.
It would be nice if we had some generic object serialisation infrastructure in LLVM that could output multiple different representations, including something human-readable, something standard (e.g. JSON, XML), and something cheaply machine-parseable. We already have Clang AST serialisation, LLVM IR serialisation, and a number of other things. Some generic framework that would be used by all of these and by tooling things would be great.
> From: Alex Rosenberg [mailto:alexr@ohmantics.com]
> Sent: Thursday, August 01, 2013 1:32 AM
> To: Vane, Edwin
> Cc: Clang Dev List (cfe-dev@cs.uiuc.edu)
> Subject: Re: [cfe-dev] RFC: YAML as an intermediate format for
> clang::tooling::Replacement data on disk
...
> I have wanted for some time for rewriter tools to use diff output that
can be
> used with existing review tools. If a merge tool is created that
generates diff, I'd
> probably be less concerned, but I'd still want a way to handle this
self-contained
> in the LLVM/Clang frameworks without a separate merge process.
This can still happen. What you're describing is orthogonal to this
proposal about using YAML as an intermediate representation of serialized
data between the migrator and replacement coalescing tool. I don't think
anything here will block your desire from coming true.
Speaking of JSON, this was suggested on IRC. It seems like it would be
just as fine as YAML for this situation since there's no need for
references or comments (yet, anyway). However, LLVM has no generic JSON
parser, just a specific implementation for compilation databases. The YAML
reader/writer that LLVM provides is completely generic and available right
now even if it provides some features we don't need.
Note that the compilation databases use the YAML parser. JSON is a subset
of YAML. I'd also (slightly) prefer to use JSON over YAML for the
intermediate representation.
From: Manuel Klimek [mailto:klimek@google.com]
Sent: Thursday, August 01, 2013 9:40 AM
To: Vane, Edwin
Cc: Alex Rosenberg; Clang Dev List (cfe-dev@cs.uiuc.edu)
Subject: Re: [cfe-dev] RFC: YAML as an intermediate format for
clang::tooling::Replacement data on disk
...
Note that the compilation databases use the YAML parser. JSON is a subset of
YAML. I'd also (slightly) prefer to use JSON over YAML for the intermediate
representation.
Is it enough to hard-code the output into JSON format and just use the YAML parser for reading? Or should we aim for a general purpose JSON reader/writer as with YAML in LLVM? Either way, I think the general purpose parser is beyond scope for what we want to achieve at this time with the migrator.
I'm opposed to any "general purpose" parsing. Whether JSON is sufficient
here is a question of definition I'd define the interface to be JSON,
and, like you suggest, output JSON and use the YAML parser for parsing.
JSON is significantly simpler, and there's only "one way" to express
something. That will be all I contribute to the bike-shedding - I also
think it doesn't really matter much ...
>>> From: Alex Rosenberg [mailto:alexr@ohmantics.com]
>>> Sent: Thursday, August 01, 2013 1:32 AM
>>> To: Vane, Edwin
>>> Cc: Clang Dev List (cfe-dev@cs.uiuc.edu)
>>> Subject: Re: [cfe-dev] RFC: YAML as an intermediate format for
>>> clang::tooling::Replacement data on disk
>>
>> ...
>>
>>> I have wanted for some time for rewriter tools to use diff output
that can be
>>> used with existing review tools. If a merge tool is created that
generates diff, I'd
>>> probably be less concerned, but I'd still want a way to handle this
self-contained
>>> in the LLVM/Clang frameworks without a separate merge process.
>>
>> This can still happen. What you're describing is orthogonal to this
proposal about using YAML as an intermediate representation of serialized
data between the migrator and replacement coalescing tool. I don't think
anything here will block your desire from coming true.
>>
>> Speaking of JSON, this was suggested on IRC. It seems like it would be
just as fine as YAML for this situation since there's no need for
references or comments (yet, anyway). However, LLVM has no generic JSON
parser, just a specific implementation for compilation databases. The YAML
reader/writer that LLVM provides is completely generic and available right
now even if it provides some features we don't need.
>
> Note that the compilation databases use the YAML parser. JSON is a
subset of YAML. I'd also (slightly) prefer to use JSON over YAML for the
intermediate representation.
Why? YAML has less syntactic overhead.
JSON is significantly simpler, and there's only "one way" to express
something. That will be all I contribute to the bike-shedding - I also
think it doesn't really matter much ...
Also, more languages have built-in json parsers. Built-in yaml parsers are
less common.
> From: Manuel Klimek [mailto:klimek@google.com]
> Sent: Thursday, August 01, 2013 9:40 AM
> To: Vane, Edwin
> Cc: Alex Rosenberg; Clang Dev List (cfe-dev@cs.uiuc.edu)
> Subject: Re: [cfe-dev] RFC: YAML as an intermediate format for
> clang::tooling::Replacement data on disk
...
> Note that the compilation databases use the YAML parser. JSON is a
subset of
> YAML. I'd also (slightly) prefer to use JSON over YAML for the
intermediate
> representation.
Is it enough to hard-code the output into JSON format
Unfortunately YAMLIO doesn't support JSON-formatted output. I had a brief
interchange with Nick Kledzik about this a while back, and he seemed to
have a pretty clear idea of how to cleanly and "properly" implement this
within the YAMLIO framework, but it seemed like it would still require some
dedicated work before becoming a reality.
TBH, outputting JSON isn't that hard really. I believe we already have
somewhere in LLVM that does this. The most complicated part is probably
escaping strings, which is basically just the "usual" (C-like) escapes.
Probably a single function `writeAsEscapedJSONString(raw_ostream &OS,
StringRef Str)` would be all the "abstraction" you need.
and just use the YAML parser for reading? Or should we aim for a general
purpose JSON reader/writer as with YAML in LLVM? Either way, I think the
general purpose parser is beyond scope for what we want to achieve at this
time with the migrator.
As Manuel already pointed out, YAML is a superset of JSON, so the YAML
parser serves as a perfectly adequate JSON parser (YAMLIO should also work
fine for the reading side as well).
Speaking of JSON, this was suggested on IRC. It seems like it
would be just as fine as YAML for this situation since there's no
need for references or comments (yet, anyway).
We're not going to want comments for llvm-lit RUN lines?