Llvm::format is dangerous

efriedma-quic · July 12, 2023, 7:41pm

Recently, while implementing a new optimization, we ran into undefined behavior in LLVM code: some code was passing a 64-bit integer to a 32-bit format specifier in llvm::format. (See ⚙ D155093 [DWARF] Fix undefined behaviour in dwarf type printer .) printf-style format strings are generally sort of dangerous, but the danger is normally mitigated by -Wformat. However, llvm::format is not checked by -Wformat, so it’s actually more dangerous than a normal snprintf.

If you do actually enable format warnings for llvm::format, the compiler produces a bunch of warnings. Most of them looked like they weren’t actually undefined behavior, but still, a lot of mistakes.

I see the following paths forward to mitigate the issues here:

We use the functionality in new versions of clang to allow marking llvm::format with a “format” attribute even though it’s not varargs (⚙ D112579 Allow non-variadic functions to be attributed with `__attribute__((format))`). Only people using clang 16 or newer as a host compiler would get the checks, but that would at least give us some coverage on buildbots.
Make llvm::format varargs, at the cost of some efficiency, which would allow using a “format” attribute on all compilers that support such attributes. (Essentially making it equivalent to asprintf.)
Discourage use of llvm::format, in favor of other formatting functions like llvm::formatv and llvm::format_hex.

Suggestions welcome.

tschuett · July 12, 2023, 7:50pm

I am not going to implement it, but Clang depends on LLVM. Why cannot Clang Sema learn what llvm::format is and check it accordingly. Without these strange workarounds using attributes.

jyknight · July 12, 2023, 8:58pm

Huh. llvm::format seems like a weird combination of C and C++ idioms. In C++, we can actually propagate the type information in a way such that a proper implementation of llvm::format wouldn’t require users to worry about specifying size-modifiers – or even types – in the format string. (See also, absl::StrFormat which is designed to be basically a drop-in replacement for printf, but using the C++ type-system to avoid having any format-string-mismatch issues.)

smeenai · July 13, 2023, 1:27am

I imagine we’re still a while away from being able to upgrade LLVM to C++20, but at that point, could we just switch to std::format? It also gets us compile-time format string checking while having a similar-ish API to llvm:format (at first glance, at least).

hansw2000 · July 13, 2023, 9:21am

This sounds like a good starting point at least. It looks like @fay59’s ⚙ D132413 [NFC] Make format() more amenable to format attributes set us up for this already.

unterumarmung · July 13, 2023, 9:43am

Another possible solution is adopting GitHub - fmtlib/fmt: A modern formatting library as a backbone for the llvm::format. But I guess it’s rather impossible.

fay59 · July 13, 2023, 4:04pm

My original plan was to solve all format issues in LLVM, but buildbots ran into issues that I couldn’t reproduce myself and I put the work aside. I do suspect that my change, if it landed, would have solved @efriedma-quic’s problem, as the list of affected modules does include llvm/DebugInfo.

kuhar · July 13, 2023, 5:25pm

Is this roughly what llvm:formatv does? Are there some implementation details / properties of fmt that we would want to potentially incorporate into formatv?

unterumarmung · July 13, 2023, 6:12pm

To be honest, I’m not sure
This library is resulted in std::format and std::print capabilities in C++20 and C++23 respectively. By default it has a python-like syntax for the formatting. But in the library there are printf-like functions also.
Additionally, there is a compile-time checking for the arguments to match the format string and the author claims it is the fastest formatting library out there, even faster than the printf.

efriedma-quic · July 13, 2023, 6:18pm

llvm::formatv format strings are similar to std::format. I think the std::format APIs are less efficient writing to a raw_ostream, though. (Switching from OS << llvm::formatv("{}", x); to OS << std::format("{}", x); means allocating memory for an std::string.)

Either way, though, the format strings supported by those APIs are very different from printf format strings, so porting from llvm::format to llvm::formatv/std::fomat is non-trivial. And there are currently around 600 uses of llvm::format in tree.

unterumarmung · July 13, 2023, 6:27pm

Actually, there is a clang-tidy check that coverts printf-like functions to std::print: modernize-use-std-print. So I guess it can be applied to llvm::format → std::format with some tweaks.

jwakely · July 18, 2023, 1:13pm

Only if the result doesn’t fit in the string’s internal buffer, which is 22 chars for libc++ and 15 chars for libstdc++. From a quick grep, it looks like many uses of llvm::format are for short strings that would not allocate.

mehdi_amini · July 18, 2023, 8:18pm

What’s the difference between std::format and llvm::formatv (other than the std::string conversion)?

It’s not clear to me how switching from llvm::formatv("{}", x); to OS << std::format("{}", x); brings any safety?

tschuett · July 18, 2023, 8:23pm

I am happy to read your RFC to update the toolchain requirements to C++20. std::format is C++20

danilaml · July 20, 2023, 6:40am

I don’t think anyone is proposing to move to std::format (and std::print when we allow C++23) immediately. Jus that it should be easier to replace all llvm::formatv uses with std ones once we do, so it should be favoured (in addition to be less dangerous).

MaskRay · February 2, 2024, 9:22pm

@bulbazord has a PR to add a llvm::formatv variant of createStringError: [Support] Introduce formatv variant of createStringError by bulbazord · Pull Request #80493 · llvm/llvm-project · GitHub

It looks good to me. Do folks have opinions?

Since the signature inline Error createStringError(std::error_code EC, char const *Fmt, const Ts &... Vals) has been taken by the dangerous llvm::format, I think using a different name is a correct move.

If we able to remove const Ts &... Vals from the signature, it’d be nice as well.

Topic		Replies	Views
Using `__attribute__((format))` on the `format` function LLVM Project	1	639	November 9, 2022
warning for passing NULL as format arg Clang Frontend	4	79	April 10, 2019
Relaxing format specifier checks Clang Frontend	29	104	May 22, 2018
proposed patch: detecting non-constant format strings in printf calls Clang Frontend	1	81	August 10, 2007
Overzealousness of of -Wformat causes problems for LLDB, others Clang Frontend	1	82	May 21, 2012

Llvm::format is dangerous

Related Topics