Hello Everyone,
Below is an RFC on extending the clang -fdiagnostics-format option’s to
let clang to emit machine readable json diagnostics. Feedback is highly appreciated!
Why
Machine consumable diagnostics are important for writing generic static
analysis wrappers and harnesses that want to interact with code bases through
clang, There are two options to consider for the diagnostic format to use in
clang:
- Mimic
gcc-9 -fdiagnostics-format=json, covered in the previous work section - Emit SARIF diagnostic information, a cross-language standardized format
that is already supported inclang/lib/StaticAnalyzer(through--analyzer-output=sarif)
We propose (2) as it is a standardized format, which should make it easier for tools to
implement support for it.
Previous Work
gcc-9 -fdiagnostics-format=json
GCC recently implemented serializing diagnostics to JSON. This option
could be implemented as a -fdiagnostics-format=json-gcc in clang to signal
users of its intended interoperability with the corresponding gcc option.
The schema for this format may be inferred from current gcc code.
While not community standard, it can be expected to be reasonably stable as the
original patch states the flag emits machine readable diagnostics.
SARIF diagnostics in LLVM
SARIF (Static Analysis Results Interchange Format) is a standard format
for the output for static analysis tools.
Clang StaticAnalyzer already implements a SARIF diagnostic consumer in
D53814, this should allow us to implement (necessary, if any) extra fields
to the diagnostics output
Mapping clang diagnostics to SARIF
This section assumes the typical compiler diagnostic which looks like what is
provided in the expressive diagnostics page
In SARIF, the attributes can be mapped to the results property as follows:
- File name where the diagnostic occurs is relocated to the
physicalLocation
property - Line/Column of the caret marking the error can be stored in the
region
property, this can also encode the source range to which an error corresponds - The error message can be transferred to the
message - Each of the locations can store the rendered caret & snippet from clang using the
snippetproperty for that region - Nested diagnostics (typically
notelevel items) can be represented using the
locationRelationShipobject - Fixit hints can be communicated through the
fixesproperty
Interface Changes
We propose the following interface changes:
- Input: Extend the
-fdiagnostics-formatflag to recognize:-fdiagnostics-format=sarif - Output: Clang will emit SARIF formatted diagnostics when
-fdiagnostics-format=sarifis provided.
Diagnostic Examples
Various examples for what are available on this github gist (which also renders this message in markdown): https://gist.github.com/envp/3a5fdd33115b91c391c22e5e8a5210f4#diagnostic-examples