Hi all,
LLVM’s formatv() library for type safe formatting currently allows the number of “replacement parameters” in the format string to mismatch with the actual number of arguments provided to formatv(). If the format string references an index that is not provided, that replacement is skipped, and if the number of arguments > number of replacement parameters in the format string, the extra arguments are ignored. When formatv() is used with a literal format string, these issues are easy to fix, whereas with a dynamic format string, it may not be as simple.
Towards that, I have prototyped a change to split formatv() into 2 variants. Current formatv() will be repurposed to only work with literal strings, and a new formatvv() variant will be added to support “variable” format strings. Within the llvm repo, I had to change ~80 instances for formatv() to formatvv() to get things building, and in several MLIR files modify the literal format string from const char *fmt= to const char fmt[] = to that the code continues works with formatv(). It also identified a couple of bad uses in MLIR emitters, which are fixed as well in the change.
I realize that this particular check should/could be enabled in both cases, but as I said above, fixing dynamic format string cases less straightforward. However, if we can distinguish between literal vs non-literal format strings (with ~97% uses in LLVM being literal) we could further extend formatv() to do this check at compile time. And in addition, other checks as well (like rejecting format strings with holes: “{0}{2}”) and may be the other contents of the format string (escaped {{). So, it seems having literal vs non-literal variants might be advantageous.
I wanted to see if this is something to pursue (starting with the PR to split formatv into 2). Even that PR is large and can be split/staged into 3:
- NFC changes that will be compatible with current formatv()
- Split formatv into 2 variants, but with no error checking.
- Add argument count error checking (runtime) to formatv
And then we can extend this with:
-
[Optional] Runtime argument error checking for formatvv() or decide if we want to keep the flexibility of formatvv() as is. Or we could add a formatvv() variant that is unchecked formatvv_unchecked() assuming it will be used sparingly, so the long and explicit name is ok.
-
Add compile time argument and format string checking for formatv (replacement indexes should have no holes and then arg count == number of replacements). I am assuming this involves implementing a compile time constexpr parser for format strings, which should be possible?
Please let me know your thoughts.
Thanks,
Rahul