Hello experts,
I am new to Clang I would like to support a system on chip where the smallest accessible data type is 16-bits. In other words sizeof(char) == 1 byte == 16 bits. My understanding is that C/C++ only requires 1 byte >= 8-bits and sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long).
In clang/TargetInfo.h:
unsigned getBoolWidth(bool isWide = false) const { return 8; } // FIXME
unsigned getBoolAlign(bool isWide = false) const { return 8; } // FIXME
unsigned getCharWidth() const { return 8; } // FIXME
unsigned getCharAlign() const { return 8; } // FIXME
:
unsigned getShortWidth() const { return 16; } // FIXME
unsigned getShortAlign() const { return 16; } // FIXME
These are easy enough to fix and to make them configurable the same as IntWidth and IntAlign are.
There are two consequences that I am aware of that arise because of this change.
The first is in preprocessor initialization. InitPreprocessor defines INT8_TYPE, _INT16_TYPE, INT32_TYPE, and sometimes INT64_TYPE. It only defines INT64 if sizeof(long long) is 64 which seems odd to me.
// 16-bit targets doesn’t necessarily have a 64-bit type.
if (TI.getLongLongWidth() == 64)
DefineType(“INT64_TYPE”, TI.getInt64Type(), Buf);
In my case, INT8_TYPE and INT64_TYPE don’t exist so it doesn’t really make sense to define them.
I think a better way of generating these definitions would be to say the following (psuedo-code, it doesn’t actually compile)
// Define types for char, short, int, long, long long
DefineType( “_INT" + TI.getCharWidth()) + "TYPE”, TI.getCharWidth());
if (TI.getShortWidth() > TI.getCharWidth())
DefineType( “_INT" + TI.getShortWidth() + "TYPE”, TI.getShortWidth());
if (TI.getIntWidth() > TI.getShortWidth())
DefineType( “_INT" + TI.getIntWidth() + "TYPE”, TI.getIntWidth());
if (TI.getLongWidth() > TI.getIntWidth())
DefineType( “_INT" + TI.getLongWidth() + "TYPE”, TI.getLongWidth());
if (TI.getLongLongWidth() > TI.getLongWidth())
DefineType( “_INT" + TI.getLongLongWidth() + "TYPE”, TI.getLongLongWidth());
This would result in the creation of INT8_TYPE, INT16_TYPE, INT32_TYPE, INT64_TYPE for most platforms. For my platform it would only create INT16_TYPE, INT32_TYPE. It would also work for wacky 9-bit machines and where INT8s don’t make much sense and architectures where long long was 128 bits.
The other place I am aware of (thanks to useful assertion) that makes a difference is in Lex/LiteralSupport.cpp for the char literal parser. I am still wrapping my head around this, but I think fixing it for arbitrary size is doable. (As a new person, I need to figure out a good way to test it.)
Do these changes seem reasonable to pursue? What other things are broken in Clang and LLVM by changing the assumption about 8-bit bytes?
Your advice is appreciated,
Ray