On sizeof char, bytes, and bits in the C99 standard

Hi,

I always considered sizeof(char) = one byte = 8 bits.
However reading the C99 standard (N1256.pdf), and especially the C99
rationale (C99RationalV5.10.pdf) I see that the intent is to allow
for platforms where one byte != 8 bits.

For example:
"(Thus, for instance, on a machine with 36-bit words, a byte can be
defined to consist or 36 bits, these numbers being all the exact
divisors of 36 which are not less than 8.)"

So I read several sections of the C99 standard and the rationale, and if
you combine the standard with the rationale you get the only way to
satisfy all the rules,
is to have one byte = 8 bits. So why all this careful, generic
formulations to avoid defining one byte == 8 bits, when in fact you
can't have an implementation
where one byte != 8 bits that conforms to the standard/rationale.

Section 3.7.1 says "character single-byte character 〈C〉 bit
representation that fits in a byte", which is further strengthened by
C99RationaleV5.10: " A char whether signed or unsigned, occupies exactly
one byte.".
Thus no doubt one character = one byte.

Section 3.6 defines byte: "NOTE 2 A byte is composed of a contiguous
sequence of bits, the number of which is implementation-defined. The
least significant bit is called the low-order bit; the most significant
bit is called the high-order bit."

Section 7.18.1.1 defines int8_t: "Thus, int8_t denotes a signed integer
type with a width of exactly 8 bits."

This quote from C99Rationale V.5.10 " Thus, for instance, on a machine
with 36-bit words, a byte can be defined to consist of 9, 12, 18, or 36
bits, these numbers being all the exact divisors of 36 which are not
less than 8.)" shows that the intent was to allow for a definition of
byte that doesn't necessarily have 8 bits.

However according this quote " These strictures codify the widespread
presumption that any object can be treated as an array of characters,
the size of which is given by the sizeof operator with that object’s
type as its
operand." I should be able to treat any objects (thus including int8_t
type objects) as array of characters.
This implies that there exists an N such that: number_of_bits(char)*N =
number_of_bits(int8_t). Given what we know about char and int8_t this means:
there exists an N such that number_of_bits(byte)*N = 8, which implies
number_of_bits(byte) <= 8.

Now according to C99Rationale V5.10: " All objects in C must be
representable as a contiguous sequence of bytes, each of which is at
least 8 bits wide.",
number_of_bits(byte) >= 8.

Thus number_of_bits(byte) = 8.

Am I right, or am I wrong?

Best regards,
--Edwin

Török Edwin wrote:

Section 7.18.1.1 defines int8_t: "Thus, int8_t denotes a signed integer
type with a width of exactly 8 bits."

This quote from C99Rationale V.5.10 " Thus, for instance, on a machine
with 36-bit words, a byte can be defined to consist of 9, 12, 18, or 36
bits, these numbers being all the exact divisors of 36 which are not
less than 8.)" shows that the intent was to allow for a definition of
byte that doesn't necessarily have 8 bits.

However according this quote " These strictures codify the widespread
presumption that any object can be treated as an array of characters,
the size of which is given by the sizeof operator with that object’s
type as its
operand." I should be able to treat any objects (thus including int8_t
type objects) as array of characters.
This implies that there exists an N such that: number_of_bits(char)*N =
number_of_bits(int8_t). Given what we know about char and int8_t this means:
there exists an N such that number_of_bits(byte)*N = 8, which implies
number_of_bits(byte) <= 8.

Now according to C99Rationale V5.10: " All objects in C must be
representable as a contiguous sequence of bytes, each of which is at
least 8 bits wide.",
number_of_bits(byte) >= 8.

Thus number_of_bits(byte) = 8.

Am I right, or am I wrong?
  
You're wrong. 7.8.1.1p3 says that the exact forms are optional. An
implementation where CHAR_BITS is > 8 cannot provide (u)int8_t, but is
nevertheless conforming.

Sebastian

Ok, so I may assume that on any POSIX compliant platform CHAR_BITS is 8?
(POSIX requires (u)int8_t:
<stdint.h>)

Best regards,
--Edwin

Török Edwin wrote:

Török Edwin wrote:

Hi,

I always considered sizeof(char) = one byte = 8 bits.
However reading the C99 standard (N1256.pdf), and especially the C99
rationale (C99RationalV5.10.pdf) I see that the intent is to allow
for platforms where one byte != 8 bits.

For example:
"(Thus, for instance, on a machine with 36-bit words, a byte can be
defined to consist or 36 bits, these numbers being all the exact
divisors of 36 which are not less than 8.)"
  

These machines are not hypothetical, although the standard does require, of the historical conventions, the Multics convention (4 9-bit logical chars packed into a 36-byte physical char).

....
Section 3.6 defines byte: "NOTE 2 A byte is composed of a contiguous
sequence of bits, the number of which is implementation-defined. The
least significant bit is called the low-order bit; the most significant
bit is called the high-order bit."

Section 7.18.1.1 defines int8_t: "Thus, int8_t denotes a signed integer
type with a width of exactly 8 bits."
  

Right -- when the typedef exists at all.

This quote from C99Rationale V.5.10 " Thus, for instance, on a machine
with 36-bit words, a byte can be defined to consist of 9, 12, 18, or 36
bits, these numbers being all the exact divisors of 36 which are not
less than 8.)" shows that the intent was to allow for a definition of
byte that doesn't necessarily have 8 bits.

However according this quote " These strictures codify the widespread
presumption that any object can be treated as an array of characters,
the size of which is given by the sizeof operator with that object’s
type as its
operand." I should be able to treat any objects (thus including int8_t
type objects) as array of characters.
  

Yes, but int8_t is only guaranteed to exist on CHAR_BIT 8 machines that use two's complement integers. Neither int8_t nor uint8_t are allowed to exist on machines where CHAR_BIT!=8, due to the no padding bits requirement and a rote calculation that the practical minimum possibly compliant CHAR_BIT is 7.

In particular, C99 7.18.1.1p3:
"These types are optional. However, if an implementation provides integer types with
widths of 8, 16, 32, or 64 bits, no padding bits, and (for the signed types) that have a
two’s complement representation, it shall define the corresponding typedef names." [uint8_t, uint16_t, uint32_t, int64_t, int8_t, int16_t, int32_t, int64_t]"

On a machine with CHAR_BIT 9, a conforming implementation can, but need not, provide uint9_t (but I would expect it to as a quality of implementation issue).

Kenneth Boyd

I like this definition of Byte from the Wikipedia:

“A contiguous sequence of bits within a binary computer that comprises the smallest addressable sub-field of the computer’s natural word-size.”

Makslane

I think that it is safe to say that Clang should only care about 8-bit bytes until someone comes along with a machine that has a non-8-bit byte and is willing to do the work to enhance it...

-Chris

I recall reading that rather than require complicated semantics for
files, sockets, and such that would allow CHAR_BIT > 8, POSIX decided
just to require CHAR_BIT == 8 and be done with it. (Though all the
RFC Internet Standards are specified in terms of "octets", not
"bytes".)

This seems like a reasonable strategy, but I wanted to add: 24 bit byte processors are not rare in the DSP (audio/vidio processing) arena. Typically everything (char, short, int, long) is 1 byte / 24 bits on such a platform.

-Howard