RFC: On non 8-bit bytes and the target for it

bjope · August 8, 2024, 9:42am

I wouldn’t mind providing (or help out reviewing) patches upstream to reduce the amount of changes needed downstream for people having non-8-bit chars. But the community has been reluctant to accept such patches in the past.

It is understood that since there is not target with for example 16 bit char in-tree, it is hard to motivate patches when they can’t be motivated by any test cases. Still, I don’t really see why the community should be opposed to minor refactorings (not impacting in-tree performance and neither impacting readability a lot) such as:

More consistently using CharTy instead of Int8Ty in the frontend when dealing with things that is a char.
Adding assertions in code paths that have been identified as assuming that a byte/char is 8 bits (this would probably require adding some “bits-per-byte” or “addressable-unit-in-bits” notion in DataLayout).
Replace some “8” constants (and shift by 3 etc), with named constants to help identify situations when the char/byte size is assumed to be eight. Or when possible, make sure we use “getSizeInBits()” or “getSizeInBytes()” etc properly. Also, in some situations we might wanna talk about “octets” rather than “bytes”.
Making the val argument in the llvm.memset intrinsic overloadable (not fixed at i8).

Topic		Replies	Views
Support for byte sizes larger than 8 bits LLVM Dev List Archives	1	101	August 18, 2015
RFC: On removing magic numbers assuming 8-bit bytes LLVM Dev List Archives	40	292	May 29, 2019
Non-byte-oriented targets? Beginners	2	314	November 23, 2021
Non-standard byte sizes LLVM Dev List Archives	4	87	February 1, 2011
n-bit bytes for clang/llvm LLVM Dev List Archives	11	89	March 18, 2015

RFC: On non 8-bit bytes and the target for it

Related topics