RFC: On non 8-bit bytes and the target for it

Can someone please provide the progress of this issue? DSP compilers do need non-8bit bytes and related things.

As I understand, there are too many hard coded 8’s in the code to be changed at once. I think that changing them progressively is practical. We need to provide some configuration items in DataLayout or clang TargetInfo.

I’d like to discuss David_Chisnall3’s ‘byte’ concept:

  • The smallest unit that can be loaded / stored at a time.

I think this is mostly a problem of efficiency. Smaller/larger units can be loaded/stored by instruction sequences.

  • The smallest unit that can be addressed with a raw pointer in a specific address space.

This is important as it is related to pointer arithmetic in C. Increasing a void * pointer should make it point to the next memory unit. And there are castings between integers and pointers.

  • The largest unit whose encoding is opaque to anything above the ISA.

I do not quite get this. I guess this is the unit which is not impacted by the endianness.

  • The type used to represent char in C.

This is the char size in bits.

  • The type that has a size that all other types are a multiple of.

This should also be the char size. C says that sizeof(char) is 1, and result of sizeof is an integer.

I think the following should be separated.

  • CharSize: how many bits in a char variable. There is already a getCharWidth() in clang TargetInfo. But it is hard coded to 8.

  • AddressUnitSize: how many bits in a word a memory address points to. This could be added to DataLayout. GEP related calculation should respect this configuration.

Also should be separated is the string char type of the source code (which is consumed by clang), and the string char type of the compiled code (which is generated by clang).