std::ctype, std::numpunct base templates

Hi,

I’m considering submitting a bug report against libc++, but thought I would check here first.

The standard gives a synopsis for the locale facet std::ctype in Section 25.4.1 (C++17, but it is also in C++11), and std::numpunct in Section 25.4.3. Under libc++ specialisations of these templates are defined for char and wchar_t (in header ‘__locale’) but the base template definition is omitted. This means in particular that I cannot define:

class my_ctype : public std::ctype<char16_t> { … };

class my_numpunct : public std::numpunct<char16_t> { … };

This seems to go against the intent of these classes, which include abstract virtual function declarations. Although clause 30.2.2 restricts the support for stream-based I/O that an implementation must provide to the types char and wchar_t, it is not clear to me that this means the definitions of the above templates can be omitted.

I would appreciate advice as to: a. whether this is actually an issue or my misreading of the spec; and b. if so, if it is a known issue.

Thanks a lot!
James

Hi,

I'm considering submitting a bug report against libc++, but thought I would check here first.

The standard gives a synopsis for the locale facet std::ctype<CharT> in Section 25.4.1 (C++17, but it is also in C++11), and std::numpunct<CharT> in Section 25.4.3. Under libc++ specialisations of these templates are defined for char and wchar_t (in header '__locale') but the base template definition is omitted. This means in particular that I cannot define:

class my_ctype : public std::ctype<char16_t> { ... };

class my_numpunct : public std::numpunct<char16_t> { ... };

Yes, I believe that this is correct.

This seems to go against the intent of these classes, which include abstract virtual function declarations. Although clause 30.2.2 restricts the support for stream-based I/O that an implementation must provide to the types char and wchar_t, it is not clear to me that this means the definitions of the above templates can be omitted.

I would appreciate advice as to: a. whether this is actually an issue or my misreading of the spec; and b. if so, if it is a known issue.

[locale.category] lists the required specializations for numpunct and ctype.

It doesn’t say anything about the general template.

So, I would say this is not a bug.

From an implementation point of view, the implementations of ctype<char> and type<wchar_t> are fairly different, and have a lot of system -specific knowledge baked into them. I don’t think I really know enough about charXX_t (8/16/32) on various systems to provide high-quality implementations (and in the case of char32_t, in a reasonable size of code/data). ICU does much of this, but it’s quite large.

— Marshall

Hi Marshall,

Thanks for your prompt response. I have some follow-ups below. I hope you don’t object to me pursuing this a little bit.

Hi,

I’m considering submitting a bug report against libc++, but thought I would check here first.

The standard gives a synopsis for the locale facet std::ctype in Section 25.4.1 (C++17, but it is also in C++11), and std::numpunct in Section 25.4.3. Under libc++ specialisations of these templates are defined for char and wchar_t (in header ‘__locale’) but the base template definition is omitted. This means in particular that I cannot define:

class my_ctype : public std::ctype<char16_t> { … };

class my_numpunct : public std::numpunct<char16_t> { … };

Yes, I believe that this is correct.

This seems to go against the intent of these classes, which include abstract virtual function declarations. Although clause 30.2.2 restricts the support for stream-based I/O that an implementation must provide to the types char and wchar_t, it is not clear to me that this means the definitions of the above templates can be omitted.

I would appreciate advice as to: a. whether this is actually an issue or my misreading of the spec; and b. if so, if it is a known issue.

https://wg21.link/locale.category lists the required specializations for numpunct and ctype.

It doesn’t say anything about the general template.

So, I would say this is not a bug.

Does it, anywhere in the specification, state that where required specialisations are listed, the need to implement the general template is removed?

There seems to be no such statement in Section 20.5.5.2 Conforming Implementations. On the other hand 20.5.2.2.1 states:

A C++ header shall provide the declarations and definitions that appear in its synopsis.

From an implementation point of view, the implementations of ctype and type<wchar_t> are fairly different, and have a lot of system -specific knowledge baked into them. I don’t think I really know enough about charXX_t (8/16/32) on various systems to provide high-quality implementations (and in the case of char32_t, in a reasonable size of code/data). ICU does much of this, but it’s quite large.

I think there is a problem with the way the specification is written in this instance. The virtual functions in the base class should really be pure virtual and the descriptions given of their (general) behaviour should be requirements on subclasses or specialisations. GCC and Visual Studio both implement these functions by casting (or trying to narrow) their character arguments to char. GCCs inline documentation states ‘implementations are provided for all the protected virtual functions, but will likely not be useful’. That said, bearing in mind that the ctype functionality only applies to characters in the basic execution character set, which are usually 7-bit ASCII (invariant) characters, this is probably a reasonable choice. Another choice, and perhaps no more nonstandard than simply omitting the base class, would be to implement these functions as assertion failures.

I agree that, notwithstanding the way the specification is written, it is unreasonable to expect implementers to provide useful implementations for more than the required specialisations. But actually I do not want a useful implementation, just the opportunity to override a virtual function.
As it happens my goal is specifically to extend ctype, numpunct, and the other facets to implement the functionality using ICU. I would prefer this work to be portable to clang.

Thanks again,
James

The general templates would be declared as per the spec. The member functions would be defined something like this:

template

charT

ctype::

do_widen(char c) const {

return c <= 0x7f ?

static_cast(c) :

static_cast(‘?’);

}

template

char*

ctype::

do_widen(const char* low, const char* high, charT* dest) const {

for(; low < high; low++)

dest++ = widen(*low);

return high;

}

template

typename numpunct::char_type

numpunct::

do_decimal_point() const {

return use_facet<ctype>(locale()).widen(‘.’);

}

James

And that comes down to the interpretation of the specification. Your position, as I understand it, is that the specification does not require the general template to be defined. But I don’t see the justification for that in the letter of the specification. And neither is this the position that GCC or Visual Studio take. So what am I failing to understand?

J.

Well, I have already more or less explicitly given three reasons why I think they are required, and they are good enough reasons that I expect the maintainers of VS and libstdc++ probably agree with me:

  1. It looks to me that the standard says they are required (20.5.2.2.1) and nowhere does it say they are not required;

  2. Other implementers provide them;

  3. Implementing them actually doesn’t require any major effort of implementation or the introduction of platform or locale dependent code into the library, so we can’t assume that the standard means for them to be excluded on that basis.

To this list I will add:

  1. The classes of the locale and Input/Output library are everywhere class templates parameterised by character type, and the possibility of extending support for other character types is explicitly extended to library implementers at least; nor is it explicitly denied to user code, and furthermore the standard library anticipates user extensions, and gives rules for how the standard library can be extended in user code. But if those two class templates, ctype and numpunct are omitted, then the formatted output operators for std::basic_ostream cannot be reused for char16_t and char32_t (or any other primitive type) by user code, because user code cannot define and add those facets for the different character type to their locale. User code is explicitly prevented from defining just those facets by the injunction (20.5.4.2.1.1) ‘A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the library requirements for the original template and is not explicitly prohibited.’ The specialisations std::ctype<char16_t>, std::numpunct<char16_t>, std::ctype<char32_t>, std::numpunct<char32_t> do not depend on any user-defined type and so cannot be defined in user code.

So did the designers of the C++ language and standard library, a group which you may count yourself a member of, Marshall, really provide their users with a library that seems extensible, and character types that seem perfect to use to extend it, but with the intent that this should not in fact be done and that nobody should be silly enough to try? And to make sure that nobody tries, they make the effort impossible due to a technicality involving two obscure aspects of the localisation library, and don’t even even bother to include a note to that effect in the specification?

Also, leaving aside the question of whether excluding these templates is an error, you should want to include them anyway because doing so significantly improves the utility of your implementation. Under GNU and VS the following works:

std::basic_stringstream<char16_t> s;
s << u"Hello world";

The same code fails to compile with libc++ because 'Implicit instantiation of undefined template ‘std::__1::ctype<char16_t>’. Note that neither the VS or GNU implementations depend on any explicitly specialised templates or char16_t-specific code to make the above work, so they are not actually providing anything beyond the specification - at least if you believe, as I do, that the base template of ctype should be included in the implementation.

So putting to one side the state of my knowledge, what do you know that convinces you that all these reasons are wrong and these templates are not required?

Cheers,
James

The general templates would be declared as per the spec. The member functions would be defined something like this:

Sure, they could be defined something like that.
But my point is that a conforming implementation need not have numpunct<char16_t> at all.

— Marshall

And that comes down to the interpretation of the specification. Your position, as I understand it, is that the specification does not require the general template to be defined.

My position is that if you attempt to instantiate numpunct<char16_t>, that is not required to succeed.

But I don’t see the justification for that in the letter of the specification. And neither is this the position that GCC or Visual Studio take. So what am I failing to understand?

What makes you think that the Visual Studio maintainers (or the libtdc++ maintainers) believe that those are required?

I see that:

  1. The standard requires that some specializations exist.
  2. libstdc++ and Visual Studio provide those, and more.

They’re allowed to do that.
That’s different from believing that they are required.

— Marshall

Well, I have already more or less explicitly given three reasons why I think they are required, and they are good enough reasons that I expect the maintainers of VS and libstdc++ probably agree with me:

  1. It looks to me that the standard says they are required (20.5.2.2.1) and nowhere does it say they are not required;

  2. Other implementers provide them;

  3. Implementing them actually doesn’t require any major effort of implementation or the introduction of platform or locale dependent code into the library, so we can’t assume that the standard means for them to be excluded on that basis.

To this list I will add:

  1. The classes of the locale and Input/Output library are everywhere class templates parameterised by character type, and the possibility of extending support for other character types is explicitly extended to library implementers at least; nor is it explicitly denied to user code, and furthermore the standard library anticipates user extensions, and gives rules for how the standard library can be extended in user code. But if those two class templates, ctype and numpunct are omitted, then the formatted output operators for std::basic_ostream cannot be reused for char16_t and char32_t (or any other primitive type) by user code, because user code cannot define and add those facets for the different character type to their locale. User code is explicitly prevented from defining just those facets by the injunction (20.5.4.2.1.1) ‘A program may add a template specialization for any standard library template to namespace std only if the declaration depends on a user-defined type and the specialization meets the library requirements for the original template and is not explicitly prohibited.’ The specialisations std::ctype<char16_t>, std::numpunct<char16_t>, std::ctype<char32_t>, std::numpunct<char32_t> do not depend on any user-defined type and so cannot be defined in user code.

Agreed.

So did the designers of the C++ language and standard library, a group which you may count yourself a member of, Marshall, really provide their users with a library that seems extensible, and character types that seem perfect to use to extend it, but with the intent that this should not in fact be done and that nobody should be silly enough to try? And to make sure that nobody tries, they make the effort impossible due to a technicality involving two obscure aspects of the localisation library, and don’t even even bother to include a note to that effect in the specification?

Yes. Part of this is historical; the iostreams classes were designed (back in the late 1980s/early 1990s) long before char8/16/32_t were added to the language. They were specified to work with char/wchar_t. There have been several proposals made to update iostreams to support these additional character types, but the committee has chosen to work on other things instead.

No. I still don’t believe the possibility of future extension was deliberately sandbagged via the localisation library.

Also, leaving aside the question of whether excluding these templates is an error, you should want to include them anyway because doing so significantly improves the utility of your implementation. Under GNU and VS the following works:

std::basic_stringstream<char16_t> s;
s << u"Hello world";

The same code fails to compile with libc++ because 'Implicit instantiation of undefined template ‘std::__1::ctype<char16_t>’. Note that neither the VS or GNU implementations depend on any explicitly specialised templates or char16_t-specific code to make the above work, so they are not actually providing anything beyond the specification - at least if you believe, as I do, that the base template of ctype should be included in the implementation.

And both of those behaviors are perfectly fine w.r.t the wording in the standard.

An implementation is neither required to nor prohibited from providing an implementation of ctype<char16_t>, and that’s what is happening here.

That isn’t what I wrote though. I wrote ‘leaving aside the question of whether excluding these templates is an error, you should want to include them anyway’. Including them mitigates quite a bit of the harm that the limitation of iostreams to char/wchar_t implies. Not including them, for the sake of saving having to write a couple of hundred lines of code, most of which is already provided in the specification, makes a few people’s lives easier at the expense of making a lot of people’s lives more difficult, which is more or less exactly the opposite of what libraries are intended to achieve.

— Marshall

P.S. I checked your claim that "this the position that GCC or Visual Studio take” with the libstdc++ maintainer. He says “I do not believe that. If we happen to provide that …. well we just happen to provide it”

But you did not address my point that the way the standard is written does not seem to support leaving out any of the definitions that it provides. It is possible that both you and he are wrong.

Well, that is all I have to say on this issue and I should really get on with improving my own code rather than trying to improve yours. Thanks Marshall.

Cheers,
James