I've been pondering on diagnostics a bit lately, and reviewing one of Doug's patches got me to thinking about this more intently. Here are three problems I've noticed with Clang's current diagnostics:
1) We create a ton of std::strings when emitting Diagnostics. We do this any time a substitution happens (e.g. %1) like when printing integers (utostr), types, identifiers, and soon DeclarationNames. This generally doesn't matter, as the actual I/O and other work to print out the diagnostic dwarfs the cost of doing the string manipulations. However, there are plenty of diagnostics for warnings and extensions that end up *not* being printed, and we end up doing a ton of string manipulation (yay heap traffic) to create strings that are ultimately never printed. I believe, but haven't proven, that this also makes the code for "Diag" call sites large (in terms of code size), as they have to make lots of temporary strings and manage their lifetime.
2) There are cases when we want to format some things more richly than we're already doing. One simple example of this is with types. For example, I've seen diagnostics complaining that "there is some problem with type 'foo'". The problem here is that 'foo' is a typedef, and I have no idea what it is a typedef of - this is the amusing inverse of the problem most compilers have where they don't track typedefs at all. In an IDE, we'd like to be able to have more rich information about the 'foo' in the diagnostic, potentially allowing the user to click through it to dive into the structure of the type (e.g. click once and 'foo' turns into 'bar*', click again and it turns into "void (*)()"). In a CLI, there might be an -funravel-types=2 option or something. There is obviously no way to do this if the type is turned into a string before the DiagnosticClient gets it.
You could imagine doing this for things like template instantiation issues etc as well of course.
3) Poor Doug is having to write code like this:
if (NumParams == 1)
DK = diag::err_operator_overload_must_be_binary;
else
DK = diag::err_operator_overload_must_be_binary_plural;
The difference is a single 's':
DIAG(err_operator_overload_must_be_binary, ERROR,
"overloaded operator '%0' must be a binary operator (has %1 parameter)")
DIAG(err_operator_overload_must_be_binary_plural, ERROR,
"overloaded operator '%0' must be a binary operator (has %1 parameters)")
and:
diag::kind DK;
if (Op == OO_PlusPlus)
DK = diag::err_operator_overload_post_inc_must_be_int;
else
DK = diag::err_operator_overload_post_dec_must_be_int;
The difference is increment vs decrement:
DIAG(err_operator_overload_post_inc_must_be_int_member, ERROR,
"parameter of overloaded post-increment operator must have type 'int' (not '%0')")
DIAG(err_operator_overload_post_dec_must_be_int_member, ERROR,
"parameter of overloaded post-decrement operator must have type 'int' (not '%0')")
Having tons of variants of diagnostics like this is currently necessary for localization purposes, but it is ugly and painful to write.
I think it is time to stop the madness. Currently, diagnostics are passed down into DiagnosticClient with an array of strings that are referenced by substitutions, and turned into the final string by DiagnosticClient::FormatDiagnostic. I think it would be straight-forward to change this to be an array of void*, where (at the beginning) all of these are pointers to std::string. Once that was done, we can start introducing some more rich formatting characters like "%T0" instead of "%0" and making the corresponding pointers be to richer thing than strings. For example, "%T0" format the 0th element of the array as a QualType. We could also introduce things like %u0 (unsigned int) and %i0 (signed int).
Doing this would directly solve #1 above, because 1) diagnostics that are squelched never make it to FormatDiagnostic and 2) those that are could be printed into one big SmallString. This would also solve #2, because FormatDiagnostic would be given the QualType, so it could do any crazy AST-level groveling that it wants to format it based on what the DiagClient implementation wants. This also is a big step to solving #3, because it would mean that we could write things like:
DIAG(err_operator_overload_must_be_binary, ERROR,
"overloaded '%0' must be a binary operator (has %u1 parameter%up1)")
Where %u1 prints the unsigned, and %up1 (p=plural, or pick a better letter) prints a "s" if the unsigned is != 1, or nothing if it is.
Does this seem like a reasonable thing to do? Does anyone volunteer to start tackling it?
-Chris