confusion with character types

Hi!

These thoughts I'd like to share with you about character types (char, signed char, unsigned char).
At first I think it's an useful thing that modern languages such as java have separate types for
characters and small integers (e.g. char, byte, short).
In c++ we have the following problem:
char a = '0';
unsigned char b = 48;
signed char c = 48;
std::stringstream s; s << a << b << c;
the result is "000".

Now I have "discovered" that it is possible to overload for all three char types:
void foo(char);
void foo(unsigned char);
void foo(signed char);

Therefore c++ qualifies as a modern language :wink:
because we can distinguish between characters and small integer types.
It is possible to implement a stream like this
MyStringStream s; s << a << b << c;
the result is "04848".
Of course usually we would use int8_t and uint8_t instead of signed char and unsigned char.

So my question: Is there a paricular reason why the standard stream interprets signed char and
unsigned char as characters instead of numbers?

Another question: why uses OpenCL char and uchar as 8 bit number types? Would it be not
better to use e.g. byte and ubyte as 8 bit number types? maybe one time OpenCL gets used to
process strings and then we would need a separate char again. Since OpenCL is a c dialect
it soult be possible to typedef signed char to byte and unsigned char to ubyte.
This should be proposed to the Khronos Group but I don't know how to do it and I doubt
they will listen to me :wink:

-Jochen

Jochen Wilhelmy <j.wilhelmy-KvP5wT2u2U0@public.gmane.org> writes:

[...]

So my question: Is there a paricular reason why the standard stream
interprets signed char and
unsigned char as characters instead of numbers?

That seems to me to be the better interpretation. Specifically, it
would be surprising (to me) if

  std::cout << "Hello world" << '\n';

caused "Hello world10" to be displayed.

[...]

That seems to me to be the better interpretation. Specifically, it
would be surprising (to me) if

std::cout<< "Hello world"<< '\n';

caused "Hello world10" to be displayed.

Your reply shows that the topic is indeed confusing.
As '\n' in your example is of type char and neither of
type signed char nor of type unsigned char, it would
result in the correct output ("Hello world\n")
even if signed char (int8_t) and unsigned char (uint8_t)
are treated as numbers.

-Jochen

Jochen Wilhelmy <j.wilhelmy-KvP5wT2u2U0@public.gmane.org> writes:

That seems to me to be the better interpretation. Specifically, it
would be surprising (to me) if

std::cout<< "Hello world"<< '\n';

caused "Hello world10" to be displayed.

Your reply shows that the topic is indeed confusing. As '\n' in your
example is of type char and neither of type signed char nor of type
unsigned char, it would result in the correct output ("Hello world\n")
even if signed char (int8_t) and unsigned char (uint8_t) are treated
as numbers.

But would it be better if this gave a different result?

  signed char n = '\n';
  std::cout << "Hello world" << n;

(or s/signed/unsigned/)

That seems just as peculiar, though I guess I could go for "unsigned
char" being different; actually it would be quite convenient for my code
if that displayed as hex.

But would it be better if this gave a different result?

signed char n = '\n';
std::cout<< "Hello world"<< n;

of course :wink:
these are equivalent:
signed char n = '\n';
int8_t n = '\n';
int8_t n = 10;

therefore I would expect "Hello world10".

it's the same as this being different:
float n = '\n';
std::cout<< "Hello world"<< n;

Jochen Wilhelmy <j.wilhelmy-KvP5wT2u2U0@public.gmane.org> writes:

But would it be better if this gave a different result?

signed char n = '\n';
std::cout<< "Hello world"<< n;

of course :wink:
these are equivalent:
signed char n = '\n';
int8_t n = '\n';
int8_t n = 10;

therefore I would expect "Hello world10".

it's the same as this being different:
float n = '\n';
std::cout<< "Hello world"<< n;

Perhaps, though I tend to think (apparently not entirely correctly) of
"char" as being equivalent to either "signed char" or "unsigned char".

In any case there's nothing that clang/libcxx ought to do. The standard
is what it is. (Perhaps a stream which handles signed and/or unsigned
chars differently would be useful and that seems like a plausible
extension, but I doubt it's really worth adding.)

>Perhaps, though I tend to think (apparently not entirely correctly) of
>"char" as being equivalent to either "signed char" or "unsigned char".

This was the same for me until I "discovered" that
char != signed char
char != unsigned char.
The grandfathers of c would have done better having
char, byte and unsigned byte instead (for example)

>In any case there's nothing that clang/libcxx ought to do.  The standard
>is what it is.  (Perhaps a stream which handles signed and/or unsigned
>chars differently would be useful and that seems like a plausible
>extension, but I doubt it's really worth adding.)
no. but as here are compiler and libcxx writers I decided to discuss such
a subtle detail here.
And perhaps it is possible to influence the handling of char types for
standard streams by some traits.

Jochen Wilhelmy <j.wilhelmy-KvP5wT2u2U0@public.gmane.org> writes:

[...]

The grandfathers of c would have done better having
char, byte and unsigned byte instead (for example)

Probably, though actually I find it most annoying that different
libraries use char and unsigned char to represent a raw byte, so I'm not
sure that having a separate byte and unsigned byte would simplify
things. I agree there's possibly a case for 8-bit integral types
("short short", maybe).

[...]

And perhaps it is possible to influence the handling of char types for
standard streams by some traits.

Possibly.

Actually I get what he is saying, '\n' is a char, but unsigned char
and signed char are different types from char, and they should be
distinct from char and displayed as integers, where char should be
displayed as a string. I always have to remember to display my byte
arrays as hex as:
std::cout << std::hex << static_cast<int>(bytearray[i]) << ' ';

Way too freaking verbose when bytearray is an array of unsigned char's
(typedef'd as byte).

Helps if I send to the correct list...

Jochen Wilhelmy <j.wilhelmy-KvP5wT2u2U0@public.gmane.org> writes:

[...]

So my question: Is there a paricular reason why the standard stream
interprets signed char and
unsigned char as characters instead of numbers?

That seems to me to be the better interpretation. Specifically, it
would be surprising (to me) if

       std::cout << "Hello world" << '\n';

caused "Hello world10" to be displayed.

Actually I get what he is saying, '\n' is a char, but unsigned char
and signed char are different types from char, and they should be
distinct from char and displayed as integers, where char should be
displayed as a string. I always have to remember to display my byte
arrays as hex as:
std::cout << std::hex << static_cast<int>(bytearray[i]) << ' ';

Way too freaking verbose when bytearray is an array of unsigned char's
(typedef'd as byte).