Posix format strings

Hi all,

One of the things that currently annoys me, to a certain extent, is that
Clang can’t be configured to accept Posix-compliant format strings that
extend the standard C formats. There are some applications that use Posix
extensions so it seems only right to at least try to support Posix format
strings as this is both an ISO and IEEE standard, after all.

It looks like some of this done, but isn’t correctly attributed. For
instance, in ParsePrintfSpecifier there is:

    // Mac OS X (unicode) specific
    case 'C': k = ConversionSpecifier::CArg; break;
    case 'S': k = ConversionSpecifier::SArg; break;

The %C and %S formats are, in fact, defined in Posix, so more accurately
this could be commented // ISO/IEC 9945, IEEE Std 1003.1 aka POSIX.1.

The ‘ (tick) to introduce thousands grouping looks like low-hanging fruit
and should be easy to implement.

The %n$ (positional) version is a bit more of a challenge.

So, some questions:

(1) Is anybody other than me supportive of adding this feature to Clang?
(2) If so, should Posix format strings be accepted as default?
(3) Should there be a front-end flag to select Posix format support?

I would say that the %C, %D, %@, and %m extensions to ISO C are accepted
silently by Clang. That is, there is no -W option to turn on diagnosis of
these extensions, which feels wrong.

Thanks Paul. Comments inline.

Hi all,

One of the things that currently annoys me, to a certain extent, is that
Clang can’t be configured to accept Posix-compliant format strings that
extend the standard C formats. There are some applications that use Posix
extensions so it seems only right to at least try to support Posix format
strings as this is both an ISO and IEEE standard, after all.

It looks like some of this done, but isn’t correctly attributed. For
instance, in ParsePrintfSpecifier there is:

   // Mac OS X (unicode) specific
   case 'C': k = ConversionSpecifier::CArg; break;
   case 'S': k = ConversionSpecifier::SArg; break;

Do you have a handy link to the Posix specification of printf? That would be helpful, as I only saw %C and %S in the Mac OS X documentation.

The %C and %S formats are, in fact, defined in Posix, so more accurately
this could be commented // ISO/IEC 9945, IEEE Std 1003.1 aka POSIX.1.

Makes sense.

The ‘ (tick) to introduce thousands grouping looks like low-hanging fruit
and should be easy to implement.

I was unaware of the ` (tick) grouping.

The %n$ (positional) version is a bit more of a challenge.

Positional arguments are already implemented.

So, some questions:

(1) Is anybody other than me supportive of adding this feature to Clang?

Specifically, what are you referring to by "this feature"?

(2) If so, should Posix format strings be accepted as default?

For these warnings to be useful, I think they should closest match reality for the intended target. Specifically, if the target supports Posix format strings then I think they should be accepted with no extra effort from the developer. If Posix format strings are not accepted, then we should issue a warning.

In many ways this is no different than the various flags we have in LangOptions to control the behavior of Sema, particularly when handling different dialects of C and C++ (e.g., c99 versus c89, etc.).

(3) Should there be a front-end flag to select Posix format support?

I think a low-level -cc1 driver option might be appropriate to control this behavior (just as we do with other features such as "blocks"), but ideally have the logic to decide whether Posix format strings are supported to be put into the high-level driver. With the low-level driver option, users have the ability to override the default or to explicitly invoke the compiler in a specific configuration.

I would say that the %C, %D, %@, and %m extensions to ISO C are accepted
silently by Clang. That is, there is no -W option to turn on diagnosis of
these extensions, which feels wrong.

I agree that we should have a separate -cc1 flag, but I do think they should be silently accepted if the target supports them. Requiring that the user specify an extra command line option all the time to accept them also feels wrong when that is something the compiler can be made aware of in the most common cases.

I'm skeptical that it should be under a separate -W flag. -W flags are useful for turning warnings on/off simply by silencing them, but the option we are talking about would actually impact how a format string is parsed. No -W flag actually influences the compiler in this way, but other command line options, e.g -fblocks, actually do change the dialect/semantics of the parsed source file.

Hi Ted,

[ snip ]

Do you have a handy link to the Posix specification of printf? That
would be helpful, as I only saw %C and %S in the Mac OS X
documentation.

These printf format strings are describe in IEEE 1003.1-2008 online:

http://www.opengroup.org/onlinepubs/9699919799/functions/fprintf.html

This is inherited from X/Open and SUS. However, these are not ISO standards
as far as I can tell, and pinning the definition to an ISO standard is far
more preferable than pinning it to an X/Open spec.

> The ‘ (tick) to introduce thousands grouping looks like low-hanging
> fruit and should be easy to implement.

I was unaware of the ` (tick) grouping.

Well, it's an apostrophe. It's also part of Posix.

> The %n$ (positional) version is a bit more of a challenge.

Positional arguments are already implemented.

A-ha! clang::analyze_format_string::ParseArgPosition. Sorted. :slight_smile:

> (1) Is anybody other than me supportive of adding this feature to
Clang?

Specifically, what are you referring to by "this feature"?

As I have source code that uses Posix formats, specifically the ' modifier,
I would like that clang can be told to shut up about *this* particular
extension like it shuts up about %m and the other extensions to ISO C. The
' modifier has probably more right to be there than the 'm' format as it is
defined by a ISO standard rather than a de-facto one. Contentious, I
know... :wink:

> (2) If so, should Posix format strings be accepted as default?

For these warnings to be useful, I think they should closest match
reality for the intended target. Specifically, if the target supports
Posix format strings then I think they should be accepted with no extra
effort from the developer. If Posix format strings are not accepted,
then we should issue a warning.

Well, simply accepting %' is a start; doing what you think would mean
retrofitting warnings for %S, %C, %m, and %@ for targets that don't expect
or support them. This seems like it could be wasted effort, you soon see
problems at runtime.

In many ways this is no different than the various flags we have in
LangOptions to control the behavior of Sema, particularly when handling
different dialects of C and C++ (e.g., c99 versus c89, etc.).

> (3) Should there be a front-end flag to select Posix format support?

I think a low-level -cc1 driver option might be appropriate to control
this behavior (just as we do with other features such as "blocks"), but
ideally have the logic to decide whether Posix format strings are
supported to be put into the high-level driver. With the low-level
driver option, users have the ability to override the default or to
explicitly invoke the compiler in a specific configuration.

Ok.

> I would say that the %C, %D, %@, and %m extensions to ISO C are
> accepted silently by Clang. That is, there is no -W option to turn
on
> diagnosis of these extensions, which feels wrong.

I agree that we should have a separate -cc1 flag, but I do think they
should be silently accepted if the target supports them. Requiring
that the user specify an extra command line option all the time to
accept them also feels wrong when that is something the compiler can be
made aware of in the most common cases.

I'm skeptical that it should be under a separate -W flag. -W flags are
useful for turning warnings on/off simply by silencing them, but the
option we are talking about would actually impact how a format string
is parsed. No -W flag actually influences the compiler in this way,
but other command line options, e.g -fblocks, actually do change the
dialect/semantics of the parsed source file.

OK, in that case, is there a suggestion? :wink:

For now, if I cook up a patch to allow %' in all the places where Posix
allows it with some test cases, could we agree to apply that? And then we
can have a little more discussion on how to deal with the way that
"unexpected" formats or modifiers are diagnosed or not?

Rgds,

I think this is a good start. We can resolve the policy issues with Posix extensions to printf afterwards.