A few years ago, I worked to apply __attribute__((format))
to non-variadic functions, plugging the most significant hole that was left at the time in the -Wformat-nonliteral story. I’m coming back today to get some feedback on improving format checking to make -Wformat-nonliteral yet more useful: accepting format strings given types that are not represented in the function signature.
-Wformat-nonliteral warns when Clang is unable to infer the format specifiers used in a format string. For instance:
func myprintf(const char *str, ...) {
va_list ap; va_start(ap, str);
vprintf(str, ap); // warning: format string is not a string literal
va_end(ap);
}
We warn in this instance because we don’t know how to guarantee that str
is a valid format string for the argument list. This specific instance is resolved by annotating myprintf
:
__attribute__((format(printf, 1, 2)))
func myprintf(const char *str, ...) {
va_list ap; va_start(ap, str);
vprintf(str, ap); // no diagnostic
va_end(ap);
}
Since call sites of myprintf
will warn for incorrect format strings, the implementation of myprintf
does not introduce format string unsafety and the diagnostic is not emitted.
-Wformat-nonliteral is (was?) Clang community’s pet example of a diagnostic that wouldn’t be accepted by today’s standards. This is disappointing because format string problems are security problems, continue to surface every now and then, and they are preventable. A few years ago, I surveyed the reasons -Wformat-nonliteral triggers in our code bases and found the following categories, roughly in order of most prevalent to least prevalent:
- a format function is missing
__attribute__((format))
- the format string is localized
- the format string is unknown at the point of passing it, but it could be verified
- the format string formats formal arguments instead of variable arguments (for instance,
void error(const char *fmt, const char *file, int line)
) - the format string is constructed at runtime to specialized precision fields
- the format string is a constant but it is held in a variable
These are now all addressable:
- add
__attribute__((format))
- use
__attribute__((format_arg))
- verify the format string with
fmtcheck
(or your format string flavor’s equivalent), which can be implemented with__attribute__((format_arg))
- as of clang-15,
__attribute__((format))
applies to functions without variadic arguments or fall back to solution #3 - use variable precision flags such as
printf("%*.*f", leading, decimals, value)
or fall back to solution #3 const
ify the variable or fall back to solution #3
(Note that although I’m only listing the busy work, as a result of these changes, we often find format string problems.)
As you see, solution #3 can carry a lot of water. There’s one use case that it can cover but that I think we can have a better solution for: when a function accepts a format string with fixed arguments, but those arguments are not passed to the function. For instance:
const char *firstname;
double temperature;
void print_temperature(const char *format) {
printf(format, firstname, temperature);
}
This can be addressed today by using printf(fmtcheck(format, "Hi %s, it's %g degrees outside"), temperature)
, but it defers a preventable error to runtime.
I would like to propose a new attribute, __attribute__((format_like))
, to check at compile-time that a format string is appropriate for a list of argument types. The syntax would be like __attribute__((format))
's, except that instead of indicating the first format argument, you would write out a format string that the argument should be compatible with:
__attribute__((format_like(
flavor,
format_string_param_index,
format_string_literal)))
Used on print_temperature
, it would look like this:
__attribute__((format_like(printf, 1, "%s %g")))
void print_temperature(const char *format);
With -Wformat, clang will then verify that callers of print_temperature
pass a format string with specifiers compatible with “%s %g”. Inside the body of print_temperature
, format
will be assumed to have %s %g
parameters for the purposes of verification.
As an alternative to a format string literal, I thought of accepting a list of argument types. This makes it potentially more useful in the context of C++ callers that might want to create that list by template expansion. On the other hand, I don’t believe that this is supported at all today, and if this is the only motivation, it will never happen. Additionally, it creates substantially longer attribute spellings in the regular case, and it causes problems with -Wformat that assumes there’s a source location for the specifier to point to and propose fixits for. It’s also more likely to be something that other compilers can support without massively going out of their way.
The bar to new attributes in clang is relatively low, but the area of format attributes in particular is already fairly crowded, so I wanted to preflight this with an RFC. In particular, there is partial overlap between __attribute__((format))
used on non-variadic functions and __attribute__((format_like))
. For instance, although they have slightly different failure modes for code that doesn’t work, they are equivalent in code that does work:
__attribute__((format(printf, 1, 2)))
void print_windspeed(const char *fmt, const char *firstname, double speed);
__attribute__((format_like(printf, 1, "%s %g")))
void print_windspeed(const char *fmt, const char *firstname, double speed);
However, there are still cases that can only realistically be served by format
applying to non-variadic functions, such as C++ variadic template functions:
template<typename... Args>
__attribute__((format(printf, 1, 2)))
void print(const char *fmt, Args ...args);
In the end, I don’t think that it’s a problem that in some cases, you can get away with either. Also, GCC (as of 14.2) still does not accept the format
attribute on non-variadic functions and makes it an error; folks that could get away with either could find it easier to work with format_like
since you can either test for its existence with __has_attribute(format_like)
or let GCC ignore it as unknown.
Thoughts?
Clang consensus called in this message.