Dear community,
I have developed a backend of new 32-bit RISC ISA, which does not have
unaligned memory access instructions (e.g., LWL, LWR, SWL, and SWR in
MIPS).
Since char and short variables are not 32-bit alignment, these
variables cannot be correctly accessed.
Therefore, alignment member functions, especially getCharAlign() and
getShortAlign() of TargetInfo class in
clang/include/clang/Basic/TargetInfo.h, should be virtual, in order to
achieve an ISA-specific alignment in an ISA-specific subclass of
TargetInfo class.
In my subclass, these return values are 32.
Sincerely,
Hiroyuki Chishiro
Even if you fix TargetInfo, the assumption that getCharSize() == getCharAlign() == 8 is hardcoded all over the place; you'll have a very hard time making that work. And even if you do make it work, almost all software written in C assumes CHAR_BIT == 8, so you'll have a compiler which can't actually build any existing software.
-Eli
Dear Eli,
Thank you for your reply.
My backend is based on and extends MIPS ISA and disables generating
unaligned memory access instructions because my new 32-bit RISC ISA
does not have them.
In this case, an array of char string (i.e., its beginning address) is
not 32-bit align.
However, aligned memory access instructions (such as 32-bit LW) are
used in this case, and hence my compiler cannot correctly access char
string written in C.
For example, the beginning address of char string written in C is 0x11
and its contents are "abcd".
However, 32-bit LW instruction in my new 32-bit RISC ISA is used to
read the char string because of the backend implementation of MIPS
ISA.
In addition, the lower 2-bit of LW is always 0, and hence its
unaligned memory read is disabled.
In this case, 32-bit LW tries to read address [0x11, 0x14] ("abcd")
but actually reads [0x10, 0x13] ("*abc"), where '*' means an
uninitialized value at address 0x10.
If address 0x10 is NUL ('\0'), the char string misunderstands that it
is an empty string.
I do not try the array of short but the same case may happen.
By this reason, aligned member functions, especially getCharAlign()
and getShortAlign(), should be virtual.
Sincerely,
Hiroyuki Chishiro
This hardware situation was reasonably common in the past, but code
still accessed objects not aligned to 32-bits. The compiler just had
to use multiple aligned loads to access data crossing a word boundary
and combine the data.
You'll probably find that path simpler than inventing a pathological C
dialect. As Eli said, changing getCharAlign is likely to have huge
knock-on consequences that the rest of Clang just isn't ready for.
Cheers.
Tim.
Dear Tim,
Thank you for your reply.
However, multiple aligned loads (such as 8-bit LB and/or 16-bit LH) to
access data have more time overhead compared to single aligned load
(such as 32-bit LW).
Therefore, my backend still uses 32-bit LW and both arrays of char and
short are 32-bit aligned regardless of more memory overhead (in many
cases).
If any problems occur, I will report them in this ML.
Sincerely,
Hiroyuki Chishiro
This thread is about clang and should really be on cfe-dev.
John.