Hi all,
I’d like to extend Clang’s serialized-diagnostics binary format with the ability to provide the contents of source files that are referenced by diagnostics. This can have a few uses:
- Make serialized diagnostics files fully self-contained, so you don’t need to have the original source files around to reproduce caret diagnostics. This is particularly good if source files are generated as part of the build and might be hard to get access to in some build environments.
- Allow source code that’s the compiler generates internally (e.g.,macro expansions) to be emitted into the serialized diagnostics files, so we can reproduce caret diagnostics for them and allow users to inspect the code after-the-fact.
The extension I’m proposing is to add a single new record kind, RECORD_SOURCE_FILE_CONTENTS
, to the serialized diagnostics bitstream format. This record contains:
- The ID of the file for which it is providing contents (i.e., the same ID that will occur in a
RECORD_FILENAME
). - The “original source range” for this buffer, which is where this source file logically resides in another source file. It’s somewhat tied to the macro-expansion use case, where it indicates the place that the macro was expanded, but it can be left as an empty range for other use cases.
- A blob containing the actual source text.
Serialized diagnostic files are generally read by libclang’s clang_loadDiagnostics
, although other implementations exist. We can extend libclang with two additional APIs to get those bits of information from a loaded diagnostic set:
/**
* Get the contents if the given file that was provided via diagnostics.
*
* \param diags the diagnostics set to query for the contents of the file.
* \param file the file to get the contents of.
* \param outFileSize if non-null, set to the file size on success.
* \returns on success, a pointer to the file contents. Otherwise, NULL.
*/
CINDEX_LINKAGE const char *clang_getDiagnosticFileContents(
CXDiagnosticSet diags, CXFile file, size_t *outFileSize);
/**
* Retrieve the original source range if the given file was provided via
* diagnostics and is conceptually a replacement for the original source range.
*
* \param diags the diagnostics set to query for the contents of the file.
* \param file the file to get the contents of.
* \returns on success, the source range (into another file) that is
* conceptually replaced by the contents of the given file (available via
* \c clang_getDiagnosticFileContents).
*/
CINDEX_LINKAGE CXSourceRange clang_getDiagnosticFileOriginalSourceRange(
CXDiagnosticSet diags, CXFile file);
The serialized diagnostic format has been stable for nearly a decade. Fortunately, libclang will ignore any records it does not know about, so we can add this new record kind without breaking existing implementations, and without a version bump. Old clients will see filenames in diagnostics that they can’t find on disk, but that’s no worse than we have today if the source is missing or moved. Once clients are updated, they’ll get the source file contents.
I have an implementation of this change in this pull request, which pairs with a Swift compiler change to emit macro expansion buffer contents using this mechanism.
Thoughts? Additional use cases?
Doug