vprintf(3) and "format string is not a string literal"

I don't understand the following warning:

$ cat a.c
#include <stdarg.h>
#include <stdio.h>

int logmessage(int loglevel, char const *fmt, ...) {
     int ret = 0;
     va_list ap;

     if (loglevel > 1) {
         va_start(ap, fmt);
         ret = vprintf(fmt, ap);
         va_end(ap);
     }
     return ret;
}
$ clang -std=c99 a.c
a.c:10:23: warning: format string is not a string literal (potentially insecure)
         ret = vprintf(fmt, ap);
               ~~~~~~~ ^
1 diagnostic generated.

This seems counter-intuitive to the point of the vprintf(3) API, which is to pass the format string and arguments from its caller (logmessage()) in this case. When would vprintf(3) ever realistically be called with a string literal? There seems to be test cases and explicit code for this, so I'm guessing this is intentional, but I don't quite understand why...

Shantonu Sen
ssen@apple.com

Hi Shantonu,

We probably should make these kinds of warnings optional. In case you are not aware, format-string attacks are an extremely serious and often exploited attack vector for a hacker to compromise a system, and have been the source of much grief. Here are a couple references on the issue:

http://citeseer.ist.psu.edu/lhee02buffer.html (see the PDF link)

Essentially the idea is that if the format string is a value whose contents can be dynamically controlled at runtime (and not statically specified via a string literal), if a hacker can control the contents of the format string they can induce a buffer overrun, inject shellcode into a process, and hijack control of the process by jumping to that shellcode. The can happen because certain format specifiers actually allow you to write data back to a buffer (which is extremely dangerous in any context). Consequently, casual use of format strings specified via a function variable are generally discouraged by security experts, and this is the motivation behind this warning. The warning, however, doesn't always indicate an actually security hole or exploit; it just warns of a "deprecated" API.

Clearly the logmessage function in your example code snippet is a possible exception because it serves as a wrapper to vprintf (so this warning could be considered a false positive). If the format string checker was more powerful (i.e., it performed an inter-procedural program analysis), it should similarly flag casual uses of the logmessage function where the format string is not specified with a string constant.

Your question specifically regarding vprintf is an interesting one. Clearly vprintf was designed with the idea of passing in arguments from some other source, be it via parameters from a caller of the current function or pulled from a data structure. Depending on how careful you are, this can be okay (such as with the logmessage function), but in general it's just a bad idea. The va_list argument to vprintf (and friends) allows a tremendous amount of flexibility in how these functions are used (specifically be a dynamically specified argument list almost always implies a dynamically specified format string); the horribly consequence is that this code is very difficult to check statically for correctness, and can be the source of awful security holes and other bugs when these functions aren't used properly. Unfortunately, people often underestimate how easy it is to screw up how these functions are used, either when they are called directly or called indirectly via wrappers.

Ultimately, we should probably make warnings like these an option. People can then decide their own policy on when such warnings are emitted.

Ted

Hi Shantonu,

We probably should make these kinds of warnings optional.

See below

In case you are not aware, format-string attacks are an extremely serious and often exploited attack vector for a hacker to compromise a system, and have been the source of much grief. Clearly the logmessage function in your example code snippet is a possible exception because it serves as a wrapper to vprintf (so this warning could be considered a false positive). If the format string checker was more powerful (i.e., it performed an inter-procedural program analysis), it should similarly flag casual uses of the logmessage function where the format string is not specified with a string constant.

There would still be cases where you register a function like logmessage() with a system library API as a callback, so LLVM would not know either at compile time or runtime what the callers are.

Your question specifically regarding vprintf is an interesting one. Clearly vprintf was designed with the idea of passing in arguments from some other source, be it via parameters from a caller of the current function or pulled from a data structure. Depending on how careful you are, this can be okay (such as with the logmessage function), but in general it's just a bad idea. The va_list argument to vprintf (and friends) allows a tremendous amount of flexibility in how these functions are used (specifically be a dynamically specified argument list almost always implies a dynamically specified format string); the horribly consequence is that this code is very difficult to check statically for correctness, and can be the source of awful security holes and other bugs when these functions aren't used properly. Unfortunately, people often underestimate how easy it is to screw up how these functions are used, either when they are called directly or called indirectly via wrappers.

I think what I'd like the eventual behavior to be is:
1) Functions that call printf(3) directly should have the warning enabled
2) Functions that call vprintf(3) and all callers are known, enable the warning
3) If the function calling vprintf(3) is itself annotated with something like GCC's __attribute__((__format__ ... )), make it the responsibility of the second-order caller to verify arguments
4) Otherwise disable it.

For comparison, GCC has:
        -Wformat-nonliteral
            If -Wformat is specified, also warn if the format string is not a
            string literal and so cannot be checked, unless the format function
            takes its format arguments as a "va_list".

Ultimately, we should probably make warnings like these an option. People can then decide their own policy on when such warnings are emitted.

How about this:
$ clang -std=c99 a.c
a.c:10:23: warning: format string is not a string literal (potentially insecure)
         ret = vprintf(fmt, ap);
               ~~~~~~~ ^
1 diagnostic generated.
$ clang -std=c99 a.c -Wno-format-nonliteral
$

clang.cpp.diff (1.08 KB)

In case you are not aware, format-string attacks are an extremely serious and often exploited attack vector for a hacker to compromise a system, and have been the source of much grief. Clearly the logmessage function in your example code snippet is a possible exception because it serves as a wrapper to vprintf (so this warning could be considered a false positive). If the format string checker was more powerful (i.e., it performed an inter-procedural program analysis), it should similarly flag casual uses of the logmessage function where the format string is not specified with a string constant.

There would still be cases where you register a function like logmessage() with a system library API as a callback, so LLVM would not know either at compile time or runtime what the callers are.

That's a good point. What you suggest essentially requires a whole program analysis, and the issue that comes to my mind is whether or not doing a whole program analysis to perform a simple check like this makes it less practical in the typical compile/debug cycle and more of a check you perform once and a while. Moreover, the issue of whether or not registering a function like logmessage in a callback is a good idea is ultimately up to the programmer (or whoever is setting the security policies for a code base). One option is that we could use an attribute to silence such warnings (especially if they are indeed false positives). This would require people to at least think about what they are doing before they silence the warning.

Your question specifically regarding vprintf is an interesting one. Clearly vprintf was designed with the idea of passing in arguments from some other source, be it via parameters from a caller of the current function or pulled from a data structure. Depending on how careful you are, this can be okay (such as with the logmessage function), but in general it's just a bad idea. The va_list argument to vprintf (and friends) allows a tremendous amount of flexibility in how these functions are used (specifically be a dynamically specified argument list almost always implies a dynamically specified format string); the horribly consequence is that this code is very difficult to check statically for correctness, and can be the source of awful security holes and other bugs when these functions aren't used properly. Unfortunately, people often underestimate how easy it is to screw up how these functions are used, either when they are called directly or called indirectly via wrappers.

I think what I'd like the eventual behavior to be is:
1) Functions that call printf(3) directly should have the warning enabled

Sounds good.

2) Functions that call vprintf(3) and all callers are known, enable the warning

Again this ties in with my previous point. Knowing all callers requires a whole-program analysis, and defeats the notion of making this a quick check. Building an accurate callgraph statically is not always possible, so there is always a chance of emitting a false positive. Further, once you are doing whole-program analysis you can do an entirely different kind of checking; e.g. actually track tainted data to see if they are ever used as a format string (this applies to both a static and dynamic analysis of the program). The other problem is that often programs are "open"; e.g. what policy should we set when analyzing library code?

One other suggestion: functions like vprintf are not used frequently. Perhaps requiring the user to add an attribute to silence a warning (see below) is the simplest solution.

3) If the function calling vprintf(3) is itself annotated with something like GCC's __attribute__((__format__ ... )), make it the responsibility of the second-order caller to verify arguments

Again, this requires a whole-program analysis to relax the constraints to the second-order caller. It's almost better to add an attribute to the first-order caller of vprintf. Then the local checker would see that vprintf is called by such a function (and not emit a warning for the direct call to vprintf), and when checking the second-order callers they can consult the attribute of the wrapper function (assuming the attribute is in the header file). This kind of attribute/annotation trick has been used before in other systems to bubble up through an API where the actual checking takes place.

Just to be clear, I have no aversion against doing a whole-program analysis, and if this is what makes the most amount of sense to do this kind of checking the "right way" then that's what should be done. As I have already mentioned, doing a whole-program analysis is a different beast altogether, and would complicate issuing warnings.

4) Otherwise disable it.

For comparison, GCC has:
      -Wformat-nonliteral
          If -Wformat is specified, also warn if the format string is not a
          string literal and so cannot be checked, unless the format function
          takes its format arguments as a "va_list".

Understood. The issue is whether or not GCC has the best policy. I think some of the ideas we have tossed around in this thread point to some better solutions in how to handle format functions that take its formal argument as a va_list.

Ultimately, we should probably make warnings like these an option. People can then decide their own policy on when such warnings are emitted.

How about this:
$ clang -std=c99 a.c
a.c:10:23: warning: format string is not a string literal (potentially insecure)
       ret = vprintf(fmt, ap);
             ~~~~~~~ ^
1 diagnostic generated.
$ clang -std=c99 a.c -Wno-format-nonliteral
$

Sounds reasonable. I'll look at the patch today. Thanks!

Ted

This looks great to me. I'll go and apply the patch.