RFC: On non 8-bit bytes and the target for it

I wouldn’t mind providing (or help out reviewing) patches upstream to reduce the amount of changes needed downstream for people having non-8-bit chars. But the community has been reluctant to accept such patches in the past.

It is understood that since there is not target with for example 16 bit char in-tree, it is hard to motivate patches when they can’t be motivated by any test cases. Still, I don’t really see why the community should be opposed to minor refactorings (not impacting in-tree performance and neither impacting readability a lot) such as:

  1. More consistently using CharTy instead of Int8Ty in the frontend when dealing with things that is a char.
  2. Adding assertions in code paths that have been identified as assuming that a byte/char is 8 bits (this would probably require adding some “bits-per-byte” or “addressable-unit-in-bits” notion in DataLayout).
  3. Replace some “8” constants (and shift by 3 etc), with named constants to help identify situations when the char/byte size is assumed to be eight. Or when possible, make sure we use “getSizeInBits()” or “getSizeInBytes()” etc properly. Also, in some situations we might wanna talk about “octets” rather than “bytes”.
  4. Making the val argument in the llvm.memset intrinsic overloadable (not fixed at i8).