Proposal: Add diagnostic enumeration to libclang

Currently, there is no easy way for libclang consumers to infer the underlying type of an individual diagnostic. Operating within the bounds of the existing API, we have the following choices:

  • Parse text from clang_getDiagnosticSpelling().
  • Switch off text from clang_getDiagnosticOption() and hope there is a 1:1 mapping between options and diagnostic types and hope that options don’t change over time.
  • Perhaps utilize clang_getDiagnosticCategory() to aid in above

The most robust choice is probably the text parsing one. And, that is ugly and prone to string changes and possibly translations.

I’m proposing exposing the unique per-diagnostic numeric enumeration and/or string value to libclang. For example:

clang_getDiagnosticID(CXDiagnostic) → diag::warn_unused_variable (3301) – Implemented through Diagnostic.getID()
clang_getDiagnosticName(CXDiagnostic) → CXString(“warn_unused_variable”) – Implemented through DiagnosticIDs::getName()

The benefit of this exposure is that consumers can identify specific diagnostic classes easily and with confidence.

The downside to providing this info related to the mutability of these values over time. How constant are the enumerations? It has an impact for compatibility of downstream applications over time, as they would have to ensure enumeration changes are reflected. But, they would need to do this today, albeit in a hackier way, so I think the change would be a net win. There is also a concern for serialized diagnostics. AFAICT the diagnostic enumeration isn’t serialized in SerializedDiagnosticPrinter.cpp today. Was this intentional over concerns of enumeration mutability over time? If so, is the reason still valid, or can the enumeration be added to the serialized diagnostic representation [to support this feature]?

The impetus behind this proposal is a tool I’m writing that can automatically fix (simple) compiler warnings, such as unused parameters, variables, etc. I’m writing this in Python, which means going through libclang. Yes, I could write my own tool hooking into the C++ directly, but I like the ease and convenience of a higher-level language. And, I think other tool writers would appreciate the ability to easily distinguish diagnostic types as well.

If there is interest, I can code up a patch.

Greg

I appreciate people working on libclang and the Python bindings in general, since it means that it makes it easier for everyone to access a quality C++ parser instead of hacking around…

However in your particular case (fixing simple warnings) Clang already has such fixes available (through Fix Hints) for many trivial warnings, so I am surprised that you would need to completely reimplement them. Perhaps it is those Fix Hints that should be exposed through the Python API ?

I hope better informed people will chime in…

–Matthieu

The Python API does expose the Fix-it Hints currently. However, there are no Fix-it Hints for some warnings that I wish to correct. You can make the argument that Clang should provide Fix-it Hints for more warnings. But, there are plenty of scenarios where there are multiple remedies to a warning. Which one do you suggest? Do you emit multiple hints? How do you control the application of these? Maybe your remedy for an unused function parameter is to remove the parameter itself and change the function signature (a valid choice in some scenarios). Does Clang then do the refactoring of callers for you? Probably not: Clang is a compiler, not an IDE. But, where do you draw the line? Providing more Fix-it Hints is a deep rabbit hole. And, there will always be areas that Clang doesn’t want to tread, necessitating the existence of tools.

This proposal aims to empower those tools with knowledge they don’t have today. And, I think there are use cases outside of my tool. A sign of a good API is it enables usage beyond what you had originally conceived. Exposing diagnostic enumerations would enable such usage for anyone dealing with diagnostics.

Ah indeed, you might wish to provide corrections beyond what Clang does (and can) propose.

Indeed one of the driving philosophy is that Clang should only provide Fix Hints when there is a single possible correction, lest it broke the code further.

I was just trying to made sure you had not overlooked this existing facility.

– Matthieu

But that's technically impossible; there's never a single possible
correction.

I agree - just because there is one *obvious* correction does not mean that
it is the correct correction or what the user intended.

The rules are there to protect -fixit behavior - a fixit hint on an
error or warning cannot change the semantics of the program (imho,
though it'd be substantially more expensive to implement - it probably
should even produce identical ASTs, in fact, to be sure of exact error
recovery. Of course that's harder for warnings since they'd have to
conditionally modify the AST only if the warning was currently an
error) because they may be automatically applied by clang -fixit.

So the only fixit a warning should have is the fixit necessary to
suppress the warning without changing the program's behavior. Any
fixits that might address a mistake made by the user must be presented
in notes.

- David

What are the semantics of a program that doesn't compile?

The relevant rule for fixits on warnings and errors is that the AST must
reflect the syntax of the fixed code. For warnings, this means that the fixit
must not change semantics. For errors, it means that if all errors have
fixits, we can (hopefully reliably) assume that all (error and optionally
warning) fixits can be simultaneously applied and all errors will be fixed. In
principle, we could even go on to produce object code in the same compilation
pass if all errors have fixits -- but sadly many of the fixits predate this
rule and do not yet recover correctly.

Fixits on notes don't have these restrictions, since they're not automatically
applied by -fixit.

- Richard

The fixit conversation is interesting. But, I don't think I ever got a
response to the proposal part of this email. Anybody care to weigh in on
that?