Generating printf format specifiers

Hey list,

As an C/Obj-C developer, I’ve often thought that it would be useful to be able to do this:

NSString *username = NSUserName();
uid_t userID = 5;
NSLog(@“Hello, " nsfmt(username) @”! This is your user id: " nsfmt(userID), username, userID);

I imagine nsfmt would work similarly to typeof, but rather than substituting the type, the compiler would substitute the most natural printf format specifier for the given type. There would also be a standard-C fmt that would generate constant C-strings rather than constant NSStrings. Is there any reason this doesn’t already exist? (Or perhaps I’ve overlooked it if it does?)

Not having any experience with compiler implementation, I was hoping someone could give me a few pointers before I try implementing such a feature?

Thanks!

David

It depends on what you mean by "substitute". __typeof__ is not a macro; it doesn't substitute tokens or anything, it actually gets parsed and interpreted by the semantic layer as a special kind of type-sugaring. It looks like you want __fmt__ to work much more like a macro, since string-literal concatenation is still happening. That's somewhat trickier and raises some interesting questions about behavior.

Also, we try to be very cautious about accepting new language enhancements, even in Objective C. We'll really need to talk about whether we want this before we can accept it.

John.

Thanks for the reply!

It depends on what you mean by "substitute". __typeof__ is not a macro; it doesn't substitute tokens or anything, it actually gets parsed and interpreted by the semantic layer as a special kind of type-sugaring. It looks like you want __fmt__ to work much more like a macro, since string-literal concatenation is still happening. That's somewhat trickier and raises some interesting questions about behavior.

Ah yes - please excuse my loose language.

Also, we try to be very cautious about accepting new language enhancements, even in Objective C. We'll really need to talk about whether we want this before we can accept it.

I see; I just wanted to inquire to see if I could get any hints of how
I might do it myself, more as an exercise than anything else. Official
support would be great, though!

David

Thanks for the reply!

It depends on what you mean by "substitute". __typeof__ is not a macro; it doesn't substitute tokens or anything, it actually gets parsed and interpreted by the semantic layer as a special kind of type-sugaring. It looks like you want __fmt__ to work much more like a macro, since string-literal concatenation is still happening. That's somewhat trickier and raises some interesting questions about behavior.

Ah yes - please excuse my loose language.

Well, I'm not really asking about language, I'm asking about semantics. It looks like you want this to work in all cases as if the user had typed a string literal instead of __fmt__(foo). That's going to complicate the implementation, because the parser has to be prepared to accept __fmt__ everywhere it would accept a string literal. That might be very straightforward, it might not be.

Also, we try to be very cautious about accepting new language enhancements, even in Objective C. We'll really need to talk about whether we want this before we can accept it.

I see; I just wanted to inquire to see if I could get any hints of how
I might do it myself, more as an exercise than anything else. Official
support would be great, though!

If you just want it in private code, of course that's fine. You'll need to recognize __fmt__ as a new type of token, parse the following tokens appropriately (presumably as an arbitrary expression), and add an API call to Sema to turn the expression into a string literal (by looking at its type). If you do this in ParseStringLiteralExpression and fake up a string-literal token appropriately, most everything else should fall out.

John.

Well, I'm not really asking about language, I'm asking about semantics. It looks like you want this to work in all cases as if the user had typed a string literal instead of __fmt__(foo). That's going to complicate the implementation, because the parser has to be prepared to accept __fmt__ everywhere it would accept a string literal. That might be very straightforward, it might not be.

Indeed - to make it truly useful, __fmt__ would have to be allowed
wherever string literals are (the most common case I imagine being
string literal concatenation.)

If you just want it in private code, of course that's fine. You'll need to recognize __fmt__ as a new type of token, parse the following tokens appropriately (presumably as an arbitrary expression), and add an API call to Sema to turn the expression into a string literal (by looking at its type). If you do this in ParseStringLiteralExpression and fake up a string-literal token appropriately, most everything else should fall out.

Exactly what I was looking for to get me started, thanks very much!

David

As a side note, it wouldn't actually be necessary to make this distinction if __fmt__() was semantically treated as a string literal, since you can concatenate C string literals to ObjC ones:

@"Hello, " "World"
@"Hello, " __fmt__(userName) "!"
#define __nsfmt__(x) @__fmt__(x)

This works because, from the parser's point of view, an Objective-C string is an @ token followed by a C-string token. In fact, you are not concatenating C strings with Objective-C strings at all. The preprocessor is concatenating two C strings then the parser is constructing an Objective-C string from an @ and a C string.

I like the idea of __fmt__(), but it will be tricky to implement. The string concatenation happens in the preprocessor, but the type information isn't available until much layer. This is why you can't do things like #if sizeof(int) == 4 in preprocessor macros, even though everyone wants to. With clang, in normal operation, the preprocessor isn't quite such a separate step as in a traditional C compiler (where it is an entirely separate program), so this might be possible.

If you're happy with adding incompatible language extensions, maybe you should consider, as an alternative, adding a new type specifier that's understood in format strings to functions marked with the printf attribute. That way, you'd do something like this:

NSLog(@"Hello, %!! This is your user id: %!", username, userID);

The printf format string checker in clang already has code to see what the format string for the specified type should be, so you can just have it replace the ! with the correct value in the string.

Note that this has some problems, however. If the argument is an int, for example, would you insert %x or %d? This applies even with the original __fmt__() pseudo-macro idea. There is not an injective mapping from types to format strings. This is especially true if you consider modifiers. With __fmt__(), I'm not sure how you would specify whether to show the sign, the precision, or the width, for example.

David

-- Sent from my brain

If you're happy with adding incompatible language extensions, maybe you should consider, as an alternative, adding a new type specifier that's understood in format strings to functions marked with the printf attribute. That way, you'd do something like this:

NSLog(@"Hello, %!! This is your user id: %!", username, userID);

Wow - I like that a lot more; it's much more concise and less
confusing to the eye.

The printf format string checker in clang already has code to see what the format string for the specified type should be, so you can just have it replace the ! with the correct value in the string.

Note that this has some problems, however. If the argument is an int, for example, would you insert %x or %d? This applies even with the original __fmt__() pseudo-macro idea. There is not an injective mapping from types to format strings. This is especially true if you consider modifiers. With __fmt__(), I'm not sure how you would specify whether to show the sign, the precision, or the width, for example.

I was thinking of keeping it simple to start:

    integer types => %d, %u (and variants depending on width of course)
    char[], char * => %s
    all other pointers => %p
    float, double => %f

which would cover the vast majority of my uses, and perhaps adding
more options once the groundwork is laid.

David