Alignment Member Functions should be Virtual

Hiroyuki_Chishiro · May 1, 2018, 11:35am

Dear community,

I have developed a backend of new 32-bit RISC ISA, which does not have
unaligned memory access instructions (e.g., LWL, LWR, SWL, and SWR in
MIPS).
Since char and short variables are not 32-bit alignment, these
variables cannot be correctly accessed.
Therefore, alignment member functions, especially getCharAlign() and
getShortAlign() of TargetInfo class in
clang/include/clang/Basic/TargetInfo.h, should be virtual, in order to
achieve an ISA-specific alignment in an ISA-specific subclass of
TargetInfo class.
In my subclass, these return values are 32.

Sincerely,
Hiroyuki Chishiro

Eli_Friedman · May 2, 2018, 9:07pm

Even if you fix TargetInfo, the assumption that getCharSize() == getCharAlign() == 8 is hardcoded all over the place; you'll have a very hard time making that work. And even if you do make it work, almost all software written in C assumes CHAR_BIT == 8, so you'll have a compiler which can't actually build any existing software.

-Eli

Hiroyuki_Chishiro · May 3, 2018, 2:23am

Dear Eli,

Thank you for your reply.

My backend is based on and extends MIPS ISA and disables generating
unaligned memory access instructions because my new 32-bit RISC ISA
does not have them.
In this case, an array of char string (i.e., its beginning address) is
not 32-bit align.
However, aligned memory access instructions (such as 32-bit LW) are
used in this case, and hence my compiler cannot correctly access char
string written in C.

For example, the beginning address of char string written in C is 0x11
and its contents are "abcd".
However, 32-bit LW instruction in my new 32-bit RISC ISA is used to
read the char string because of the backend implementation of MIPS
ISA.
In addition, the lower 2-bit of LW is always 0, and hence its
unaligned memory read is disabled.
In this case, 32-bit LW tries to read address [0x11, 0x14] ("abcd")
but actually reads [0x10, 0x13] ("*abc"), where '*' means an
uninitialized value at address 0x10.
If address 0x10 is NUL ('\0'), the char string misunderstands that it
is an empty string.
I do not try the array of short but the same case may happen.
By this reason, aligned member functions, especially getCharAlign()
and getShortAlign(), should be virtual.

Sincerely,
Hiroyuki Chishiro

TNorthover · May 3, 2018, 7:19am

This hardware situation was reasonably common in the past, but code
still accessed objects not aligned to 32-bits. The compiler just had
to use multiple aligned loads to access data crossing a word boundary
and combine the data.

You'll probably find that path simpler than inventing a pathological C
dialect. As Eli said, changing getCharAlign is likely to have huge
knock-on consequences that the rest of Clang just isn't ready for.

Cheers.

Tim.

Hiroyuki_Chishiro · May 3, 2018, 8:37am

Dear Tim,

Thank you for your reply.

However, multiple aligned loads (such as 8-bit LB and/or 16-bit LH) to
access data have more time overhead compared to single aligned load
(such as 32-bit LW).
Therefore, my backend still uses 32-bit LW and both arrays of char and
short are 32-bit aligned regardless of more memory overhead (in many
cases).
If any problems occur, I will report them in this ML.

Sincerely,
Hiroyuki Chishiro

John_McCall1 · May 3, 2018, 5:47pm

This thread is about clang and should really be on cfe-dev.

John.

Topic		Replies	Views
How to change CLang struct alignment behaviour? LLVM Dev List Archives	13	709	May 17, 2019
RFC: Enforcing pointer type alignment in Clang LLVM Dev List Archives	18	263	January 21, 2016
[RFC] [X86] Emit unaligned vector moves on avx machine with option control. LLVM Dev List Archives	33	454	April 20, 2021
Implementation of the CGRecordLayoutBuilder for Microsoft ABI. Clang Frontend	20	321	September 28, 2011
Global unaligned member can't be get with pointer Code Generation	2	202	May 23, 2022

Alignment Member Functions should be Virtual

Related topics