RFC: General purpose type-safe formatting library

A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:

  1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you’re writing your own Printf-like functions, you need to tag them with attribute(format, printf) which again not all compilers have. If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!

  2. Not security safe. Functions like sprintf() will happily smash your stack for you if you’re not careful.

  3. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn’t even support %z until 2015, which we aren’t even officially requiring yet. So you’ve gotta write (uint64_t)x and then use PRIx64. Ugh.

  4. Redundant. If you’re giving it an integer, why do you need to specify %d? It’s an integer! We should be able to use the type system to our advantage.

  5. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can’t. You have to resort to providing an overloaded streaming operator or formatting it some other way.

So I’ve been working on a library that will solve all of these problems and more.

The high level design of my library is borrowed heavily from C#. But if you’re not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:

  1. os << format_string(“Test”); // writes “test”
  2. os << format_string(“{0}”, 7); // writes “7”

Immediately we can see one big difference between this and llvm::format() / printf. You don’t have to specify the type. If you pass in an int, it formats it as an int.

  1. os << format_string(“{0} {0}”, 7); // writes “7 7”

#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it’s expensive to compute, this means you have to save it into a temporary.

  1. os << format_string(“{0:X}”, 255); // writes “0xFF”

  2. os << format_string(“{0:X7}”, 255); // writes “0x000FF”

  3. os << format_string(“{0}”, foo_object); // fails to compile!

Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.

However, you can always define custom formatters for your own types. If you write:

namespace llvm {
template<>
struct format_provider {
static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {
}
};
}

Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I’ll keep going with some more examples:

  1. os << format_string(“{0:N}”, -1234567); // Writes “-1,234,567”. Note the commas.
  2. os << format_string(“{0:P}”, 0.76); // Writes “76.00%”

You can also left justify and right justify. For example:

  1. os << format_string(“{0,8:P}”, 0.76); // Writes " 76.00%"
  2. os << format_string(“{0,-8,P}”, 0.76); // Writes "76.00% "

And you can also format complicated types. For example:

  1. os << format_string(“{0:DD/MM/YYYY hh:mm:ss}”, std::chrono::system_clock::now()); // writes “10/11/2016 18:19:11”

I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc.

To summarize, the advantages of this approach are:

  1. Safe. If it can’t format your type, it won’t even compile.
  2. Concise. You can re-use parameters multiple times without re-specifying them.
  3. Simple. You don’t have to remember whether to use %llu or PRIx64 or %z, because format specifiers don’t exist!
  4. Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax.
  5. Extensible. If you don’t like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you’d like to be able to format, you can add formatting support for it in multiple different ways.

I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well!

Thanks,
Zach

Hi,

A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:

  1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you’re writing your own Printf-like functions, you need to tag them with attribute(format, printf) which again not all compilers have.

I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.

If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!

llvm::format now fails to compile as well :slight_smile:

However this does not address other issues, like: format(“%d”, float_var)

  1. Not security safe. Functions like sprintf() will happily smash your stack for you if you’re not careful.

  2. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn’t even support %z until 2015, which we aren’t even officially requiring yet. So you’ve gotta write (uint64_t)x and then use PRIx64. Ugh.

  3. Redundant. If you’re giving it an integer, why do you need to specify %d? It’s an integer! We should be able to use the type system to our advantage.

  4. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can’t. You have to resort to providing an overloaded streaming operator or formatting it some other way.

It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.

So I’ve been working on a library that will solve all of these problems and more.

Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :slight_smile:

The high level design of my library is borrowed heavily from C#. But if you’re not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:

  1. os << format_string(“Test”); // writes “test”
  2. os << format_string(“{0}”, 7); // writes “7”

Immediately we can see one big difference between this and llvm::format() / printf. You don’t have to specify the type. If you pass in an int, it formats it as an int.

  1. os << format_string(“{0} {0}”, 7); // writes “7 7”

#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it’s expensive to compute, this means you have to save it into a temporary.

What about: printf(“%0$ %0$”, 7);

  1. os << format_string(“{0:X}”, 255); // writes “0xFF”
  2. os << format_string(“{0:X7}”, 255); // writes “0x000FF”
  3. os << format_string(“{0}”, foo_object); // fails to compile!

Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.

However, you can always define custom formatters for your own types. If you write:

namespace llvm {
template<>
struct format_provider {
static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {
}
};
}

Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I’ll keep going with some more examples:

  1. os << format_string(“{0:N}”, -1234567); // Writes “-1,234,567”. Note the commas.

Why add commas? Because of the “:N”?
This seems like localization-dependent: how do you handle that?

What happens with the following?

os << format_string(“{0:N}”, -123.455);

  1. os << format_string(“{0:P}”, 0.76); // Writes “76.00%”

You can also left justify and right justify. For example:

  1. os << format_string(“{0,8:P}”, 0.76); // Writes " 76.00%"
  2. os << format_string(“{0,-8,P}”, 0.76); // Writes "76.00% "

And you can also format complicated types. For example:

  1. os << format_string(“{0:DD/MM/YYYY hh:mm:ss}”, std::chrono::system_clock::now()); // writes "10/11/2016 18:19:11”

11 looks pretty cool in terms of flexibility :slight_smile:

I already have a working proof of concept that supports most of the fundamental data types and formatting options such as percents, exponents, comma grouping, fixed point, hex, etc.

To summarize, the advantages of this approach are:

  1. Safe. If it can’t format your type, it won’t even compile.
  2. Concise. You can re-use parameters multiple times without re-specifying them.
  3. Simple. You don’t have to remember whether to use %llu or PRIx64 or %z, because format specifiers don’t exist!
  4. Flexible. You can format types in a multitude of different ways while still having the nice format-string style syntax.
  5. Extensible. If you don’t like the behavior of a built-in formatter, you can override it with your own. If you have your own type which you’d like to be able to format, you can add formatting support for it in multiple different ways.

I am hoping to have something ready for submitting later this week. If this interests you, please help me out by reviewing my patch! And if you think this would not be helpful for LLVM and I should not worry about this, let me know as well!

Feel free to add me as a reviewer!

Hi,

A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:

  1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you’re writing your own Printf-like functions, you need to tag them with attribute(format, printf) which again not all compilers have.

I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.

If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!

llvm::format now fails to compile as well :slight_smile:

However this does not address other issues, like: format(“%d”, float_var)

  1. Not security safe. Functions like sprintf() will happily smash your stack for you if you’re not careful.

  2. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn’t even support %z until 2015, which we aren’t even officially requiring yet. So you’ve gotta write (uint64_t)x and then use PRIx64. Ugh.

  3. Redundant. If you’re giving it an integer, why do you need to specify %d? It’s an integer! We should be able to use the type system to our advantage.

  4. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can’t. You have to resort to providing an overloaded streaming operator or formatting it some other way.

It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.

So I’ve been working on a library that will solve all of these problems and more.

Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :slight_smile:

The high level design of my library is borrowed heavily from C#. But if you’re not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:

  1. os << format_string(“Test”); // writes “test”
  2. os << format_string(“{0}”, 7); // writes “7”

Immediately we can see one big difference between this and llvm::format() / printf. You don’t have to specify the type. If you pass in an int, it formats it as an int.

  1. os << format_string(“{0} {0}”, 7); // writes “7 7”

#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it’s expensive to compute, this means you have to save it into a temporary.

What about: printf(“%0$ %0$”, 7);

Well, umm… I didn’t even know about that. And I wonder how many others also don’t. How does it choose the type? It seems there is no d in there.

  1. os << format_string(“{0:X}”, 255); // writes “0xFF”
  2. os << format_string(“{0:X7}”, 255); // writes “0x000FF”
  3. os << format_string(“{0}”, foo_object); // fails to compile!

Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.

However, you can always define custom formatters for your own types. If you write:

namespace llvm {
template<>
struct format_provider {
static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {
}
};
}

Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I’ll keep going with some more examples:

  1. os << format_string(“{0:N}”, -1234567); // Writes “-1,234,567”. Note the commas.

Why add commas? Because of the “:N”?
This seems like localization-dependent: how do you handle that?

Yes, it is localization dependent. That being said, llvm has 0 existing support for localization. We already print floating point numbers with decimals, messages in English, etc.

The purpose of this example was to illustrate that each formatter can have its own custom set of options. For the case of integral arithemtic types, those would be:

X : Uppercase hex
X- : Uppercase hex without the 0x prefix.

x : Lowercase hex
x- : Lowercase hex without the 0x prefix
N : comma grouped digits
E : scientific notation with uppercase E
e : scientific notation with lowercase e
P : percent
F : fixed point

But for floating point types, a different set of format specifiers would be valid (for example, it doesn’t make sense to print a floating point number as hex)

If you wrote your own formatter (as described earlier in #6, the field following the : would be passed in as the Options parameter, and the implementation is free to use it however it wants. The std::chrono formatter takes strings similar to those described in #11, for example.

What happens with the following?

os << format_string(“{0:N}”, -123.455);

You would get “-123.46” (default precision of floating point types is 2 decimal places). If you had -1234.566 it would print “-1,234.57” (you could change the precision by specifying an integer after the N. So {0:N3} would print “-1,234.566”). For integral types the “precision” is the number of digits, so if it’s greater than the length of the number it would pad left with 0s. For floating point types it’s the number of decimal places, so it would pad right with 0s.

Of course, all these details are open for debate, that’s just my initial plan.

I only half agree with this. for llvm::format() there is no glue or helpers that can fit into the existing model. It’s a wrapper around snprintf, so you get what snprintf gives you. You can go around llvm::format() and overload an operator to print your std::chrono::time_point, but there’s no way to integrate it into llvm::format. So with my proposed library you could write:

os << format_string(“Start: {0}, End: {1}, Elapsed: {2:ms}”, start, end, start-end);

Or you could write:

os << "Start: " << format_time_point(start) << ", End: " << format_time_point(end) << ", Elapsed: " << std::chrono::duration_caststd::chrono::millis(start-end).count();

Hi,

A while back llvm::format() was introduced that made it possible to combine printf-style formatting with llvm streams. However, this still comes with all the risks and pitfalls of printf. Everyone is no-doubt familiar with these problems, but here are just a few anyway:

  1. Not type-safe. Not all compilers warn when you mess up the format specifier. And when you’re writing your own Printf-like functions, you need to tag them with attribute(format, printf) which again not all compilers have.

I’m not very sensitive to the “not all compilers have” argument, however it is worth mentioning that the format may not be a string literal, which defeat the “sanitizer”.

If you change a const char * to a StringRef, it can silently succeed while passing your StringRef object to printf. It should fail to compile!

llvm::format now fails to compile as well :slight_smile:

However this does not address other issues, like: format(“%d”, float_var)

  1. Not security safe. Functions like sprintf() will happily smash your stack for you if you’re not careful.

  2. Not portable (well kinda). Quick, how do you print a size_t? You probably said %z. Well MSVC didn’t even support %z until 2015, which we aren’t even officially requiring yet. So you’ve gotta write (uint64_t)x and then use PRIx64. Ugh.

  3. Redundant. If you’re giving it an integer, why do you need to specify %d? It’s an integer! We should be able to use the type system to our advantage.

  4. Not flexible. How do you print a std::chrono::time_point with llvm::format()? You can’t. You have to resort to providing an overloaded streaming operator or formatting it some other way.

It seems to me that there is no silver bullet for that: being for llvm::format() or your new proposal, there is some sort of glue/helpers that need to be provided for each and every non-standard type.

So I’ve been working on a library that will solve all of these problems and more.

Great! I appreciate the effort, and talking about that with Duncan last week he was mentioning that we should do it :slight_smile:

The high level design of my library is borrowed heavily from C#. But if you’re not familiar with C#, I believe boost has something similar in spirit. The best way to show it off is with some examples:

  1. os << format_string(“Test”); // writes “test”
  2. os << format_string(“{0}”, 7); // writes “7”

Immediately we can see one big difference between this and llvm::format() / printf. You don’t have to specify the type. If you pass in an int, it formats it as an int.

  1. os << format_string(“{0} {0}”, 7); // writes “7 7”

#3 is an example of something that cannot be done elegantly with printf. Sure, you can pass it in twice, but if it’s expensive to compute, this means you have to save it into a temporary.

What about: printf(“%0$ %0$”, 7);

Well, umm… I didn’t even know about that. And I wonder how many others also don’t. How does it choose the type? It seems there is no d in there.

Sorry, I meant printf(“%0$d %0$d”, 7);

  1. os << format_string(“{0:X}”, 255); // writes “0xFF”
  2. os << format_string(“{0:X7}”, 255); // writes “0x000FF”
  3. os << format_string(“{0}”, foo_object); // fails to compile!

Here is another example of an improvement over traditional formatting mechanisms. If you pass an object for which it cannot find a formatter, it fails to compile.

However, you can always define custom formatters for your own types. If you write:

namespace llvm {
template<>
struct format_provider {
static void format(raw_ostream &S, const Foo &F, int Align, StringRef Options) {
}
};
}

Then #6 will magically compile, and invoke the function above to do the formatting. There are other ways to customize the formatting behavior, but I’ll keep going with some more examples:

  1. os << format_string(“{0:N}”, -1234567); // Writes “-1,234,567”. Note the commas.

Why add commas? Because of the “:N”?
This seems like localization-dependent: how do you handle that?

Yes, it is localization dependent. That being said, llvm has 0 existing support for localization. We already print floating point numbers with decimals, messages in English, etc.

The purpose of this example was to illustrate that each formatter can have its own custom set of options. For the case of integral arithemtic types, those would be:

X : Uppercase hex
X- : Uppercase hex without the 0x prefix.

x : Lowercase hex
x- : Lowercase hex without the 0x prefix
N : comma grouped digits
E : scientific notation with uppercase E
e : scientific notation with lowercase e
P : percent
F : fixed point

But for floating point types, a different set of format specifiers would be valid (for example, it doesn’t make sense to print a floating point number as hex)

Not sure if it is the best example: hexadecimal is the default format for printing float literal in the IR I believe. But OK I see how it works!

Ok, well another example would be if you pass a pointer. The only valid options are various flavors of hex. You wouldn’t want to print a pointer in scientific notation, for example.

Sure, I got the point, this is great! (I should have made it more clear earlier).

This is awesome. +1

Copying a time-tested design like C#'s (and which also Python uses) seems like a really sound approach.

Do you have any particular plans w.r.t. converting existing uses of the other formatting constructs? At the very least we can hopefully get rid of format_hex/format_hex_no_prefix since I don’t think there are too many uses of those functions.

Also, Since the format string already can embed the surrounding literal strings, do you anticipate the use case where you would want to use OS << format_string(...) << ...something else...?
Would print(OS, "....", ....) make more sense?

– Sean Silva

I’m generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utility. Somewhat minor comments below:

The high level design of my library is borrowed heavily from C#.

My only big hesitation here is that the substitution specifier seems heavily influenced by C#. I’d prefer to model this after a format string syntax folks are fairly familiar with. IMO, Python’s is probably the best bet here and has had a lot of hammering on it over the years. So I’d suggest that the pattern syntax be mapped to be as similar to Python’s as possible or at least built on top of it.

  1. os << format_string(“Test”); // writes “test”

  2. os << format_string(“{0}”, 7); // writes “7”

The “<< format_string(…” is … really verbose for me. It also makes me strongly feel like this produces a string rather than a streamable entity.

I’m not a huge fan of streaming, but if we want to go this route, I’d very much like to keep the syntax short and sweet. “format” is pretty great for that. If this is going to fully subsume its use cases, can we eventually get that to be the name?

(While I don’t like streaming, I’m not trying to fight that battle here…)

Also, you should probably look at what is quickly becoming a popular C++ library in this space: https://github.com/fmtlib/fmt

I’m also generally in favour, but I wonder what the key motivations for designing our own, rather than importing something like FastFormat, fmtlib, or one of the other tried-and-tested C++ typesafe I/O libraries is. Has someone done an analysis of why these designs are a bad fit for LLVM, or are we just reinventing the wheel because we feel like it?

David

(this keeps coming up in various contexts, so a somewhat longer/in-depth post than I originally intended. If folks want to discuss this further should probably fork a new thread)

Given the tendency of utilities like this to become used pervasively in the project, it would seem a fairly heavy weight dependency to grow.

I understand that LLVM’s refusal to depend on and re-use existing open source code is frustrating, I’m actually rather frustrated as well at times by the NIH-like pattern. But I think there are good reasons for LLVM to eschew third party libraries in its core utilities, not the least of which are the inherent licensing complications.

LLVM faces a somewhat unique challenge when it comes to licensing compared to most other open source software: parts of LLVM are embedded into the binaries we build. This makes finding a “compatibly licensed” existing project … unlikely. ;]

I don’t want to spin off on a debate here about which license LLVM should use or not use or what all it needs to say. We have a separate thread about that. But one hope I have of any resolution ta that thread is that perhaps more open source projects will use exactly the same license. If they do, we might finally be able to have more reuse of existing open source code.

Either way, rolling our own has some advantages: LLVM may be able to make simplifying tradeoffs other libraries cannot realistically make due to narrower use cases and needs.

Provided we’re only talking about very low level utilities like this, the cost doesn’t seem terribly high to rolling our own, so I’m generally comfortable doing it.

Doesn’t mean we shouldn’t look at all the existing ones and learn everything we can from them.

-Chandler

>

> I'm generally favorable on the core idea of having a type-safe and friendly format-string-like formatting utility

I’m also generally in favour, but I wonder what the key motivations for designing our own, rather than importing something like FastFormat, fmtlib, or one of the other tried-and-tested C++ typesafe I/O libraries is. Has someone done an analysis of why these designs are a bad fit for LLVM, or are we just reinventing the wheel because we feel like it?

(this keeps coming up in various contexts, so a somewhat longer/in-depth post than I originally intended. If folks want to discuss this further should probably fork a new thread)

Given the tendency of utilities like this to become used pervasively in the project, it would seem a fairly heavy weight dependency to grow.

A reimplementation is likely to be no less complex than any of the originals. Both fmtlib and FastFormat are under BSD / MIT-style licenses and are both small enough that it would be possible to embed copies of either in the LLVM tree if eliminating a dependency were desired.

Even if the implementation is not useable, adopting similar interfaces to an existing C++ solution is likely to be more friendly to C++ developers than designing something based on C# or Python.

Either way, rolling our own has some advantages: LLVM may be able to make simplifying tradeoffs other libraries cannot realistically make due to narrower use cases and needs.

If that is the case, I would be totally in favour of rolling our own, but it seems that rolling our own was a decision made before investigating the alternatives.

Provided we're only talking about very low level utilities like this, the cost doesn't seem terribly high to rolling our own, so I'm generally comfortable doing it.

Doesn't mean we shouldn't look at all the existing ones and learn everything we can from them.

Completely agreed.

David

Sorry, by heavyweight I meant more that everything in LLVM would end up using it, and so any potential license incompatibility would be a serious issue.

And “BSD / MIT-style licenses” specifically don’t address a number of the issues raised in the licensing thread. I don’t want to try to rehash it here, but if we as a community think those issues are worth addressing, that precludes depending on existing code carrying these licenses.

As a specific issue: if this code ends up transitively used in runtime libraries, we would have binary attribution problems. So adding a dependency on code under some other license is, IMO, problematic from a very basic pragmatic perspective. It would move us back into having a weird partition through the LLVM project of some code that could go into runtimes but other code that could not go into runtimes. I don’t want to go back to that point.

2-clause BSD and MIT licenses (the relevant ones here) do address this. They are as permissive as the most permissive license used in LLVM (and far more permissive than the proposed new license) and carry no binary attribution clauses.

David

What would happen if I accidentally type "ps" instead of "ms" (I am
assuming we will not support picoseconds here)?

Will this abort at runtime?

I would prefer if *all* arguments to the format were checkable at compile time:
I.e. something like:
os << "blah blah" << format<std::milli>(end-start) << "blah blah";

I understand this may clash a bit with the desire for a compact
representation, but maybe with some clever design we could achieve
both?

pl

2-clause BSD and MIT licenses (the relevant ones here) do address this.

The second clause here:
https://github.com/fmtlib/fmt/blob/master/LICENSE.rst

States:
“”"
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

“”"

IANAL and all that, but I do not think this addresses the binary distribution issues as effectively as what is being proposed for the LLVM license.

Even if it does, it would become that much harder to understand and convince everyone that it sufficiently addresses it.

However, if everything goes under the single LLVM license being proposed, we get to deal with that exactly once rather than having to evaluate N different licenses.

Anyways, we’re pretty far afield here. My main point was that reusing existing libraries in LLVM at this low level has a surprising additional cost beyond any technical cost of tracking dependencies due to the surprising nature of runtime libraries reusing parts of the LLVM project. It is still a cost that should be traded off carefully against the cost of re-implementing something. And none of it should cause us to not examine the alternatives and learn from and match their API ideas where reasonable.

They are as permissive as the most permissive license used in LLVM (and far more permissive than the proposed new license) and carry no binary attribution clauses.

While I don’t quite agree with every aspect of your claim here, I don’t want to debate on a mailing list which license is more or less permissive (ones currently in use, ones proposed, etc.). Not sure anything good comes of that.

In my current implementation, it’s up to the format provider. If you have an illegal format spec (eg {0;0}) it ignores it and prints the format spec as a literal. We could also add an assert here in theory.

If/when we move to c++14, a constexpr StringRef implementation would allow us to parse and validate the entire format string at compile time.

Since conciseness is one of the main goals of a library such as this, I would hate to actively hamper this for more compile time checking. If you write an invalid format spec presumably your test will fail since it will ignore it

Actually I should elaborate because this is a tad misleading.

If the syntax is illegal it will ignore. Like if you write {0;} or {0{. In this case the entire thing is pasted into the output and no replacement happens.

If it can successfully parse into X, Y, and Z where X is the index, Y is the alignment, and Z is the option string, then what happens depends on which of X, Y, and Z are illegal.

If X is empty the sequence is replaced with an empty string. Otherwise, If X is not a positive integer the sequence is pasted into the output and everything else ignored.

If Y is illegal, Y is ignored as if it wasn’t specified.

Whether Z is illegal is up to the format provider, so each one decides how to react to an invalid string

Note that any point here we can assert. This would allow us to catch these in debug builds while silently doing the best we can in non debug builds

1. os << format_string("Test"); // writes "test"
2. os << format_string("{0}", 7); // writes "7"

The "<< format_string(..." is ... really verbose for me. It also makes me
strongly feel like this produces a string rather than a streamable entity.

I wonder if we could use UDLs instead?

os << "Test" << "{0}"_fs << 7;

~Aaron

I’m not sure that would work well. The implementation relies on being able to index into the parameter pack. How would you do that if each parameter is streamed in?

“{0} {1}”_fs(1, 2)

Could perhaps work, but it looks a little strange to me.

Fwiw i agree format_string is long. Ideally it would be called format, but that’s taken.

Another option is os.format(“{0}”, 7), and have format_string(“{0}”, 7) return a std::string.