In particular, if all you want to do is support __attribute__((format(printf, x, y)))
on function parameters that happen to be of type char8_t*
, char16_t*
or char32_t*
, that should be trivial. Just look at how Clang works for arguments of type wchar_t*
and copy that.
…Oh wait, it looks like neither GCC nor Clang actually implement format-string checking for wchar_t format strings!
https://godbolt.org/z/Tk9YCA
std::wprintf(“%s”, 42); // no diagnostic emitted
So that would be a very good place to start, IMO. Once the code is in place to format-check wide string literals, it should be trivial to extend it to also format-check char{8,16,32}_t literals.
Here’s the existing bug report: https://bugs.llvm.org/show_bug.cgi?id=16810
Orthogonally, you seem to be proposing that there should be some new printf format specifiers besides %s %c %[ (for char) and %ls %lc %l[ (for wchar_t). This is not a Clang issue; this is a library-design issue that you should think about as you write your library function that takes a format string (you know, the one you want to apply attribute((format)) to). If you are not writing a library function, then you have nothing to apply the attribute to, and therefore there’s no reason for you to need anything changed.
You throw out the ideas of %us for char16_t, %Us for char32_t, and have no suggestion for char8_t. However, you cannot use %us as a format specifier, because printf already gives that sequence a valid meaning:
printf(“hello %us world”, 42u); // prints “hello 42s world”
My off-the-top-of-my-head idea is that you should take a hint from MSVC; they provide %I32d, %I64d, etc., for integer types, so how about %C8s, %C16s, and %C32s for Unicode character string types? However, again, this is an issue to think about as you design your MyPrintfLikeFunction
within your own codebase. Maybe you’ll find that you don’t even need a format specifier for those types.
(FWIW, the C and C++ party line seems to be that no “%C16s” or “%C32s” is needed, because the modern approach is to separate transcoding from output. You shouldn’t be printf’ing Unicode strings directly; you should be first transcoding them into char
or wchar_t
strings, and then printf’ing or wprintf’ing those strings. Personally I don’t think that approach is very helpful in practice, though.)
–Arthur