[ARM] Is code generation implemented for __unaligned?

Dear all, Dear Roger Ferrer Ibáñez,

at my company we have much Code, which uses the keyword "__unaligned" [1],
which is a Microsoft extension. I have read about an RFC called "Implement
code generation for __unaligned" on the mailing list (from Feburary 2017) [2].
But I could not find out, if it was actually implemented. Is it implemented? If
it is, it does not work for ARM. (Tested with ARMv7-M.)

Consider the following Code:

struct GUID {
  unsigned long  Data1;
  unsigned short Data2;
  unsigned short Data3;
  unsigned char  Data4[8];

void load_and_store_uGUID(__unaligned GUID* in, __unaligned GUID* out){
        *out = *in;

It produces the following assembly:

        ldm.w   r0, {r2, r3, r12}
        ldr     r0, [r0, #12]
        stm.w   r1, {r2, r3, r12}
        str     r0, [r1, #12]
        bx      lr

I have used the this commandline:
C:\Program Files\LLVM\bin\clang++.exe -O2 -S --target=armv7m-none-windows-eabihf -munaligned-access -x c++ -mcpu=cortex-M7 -fms-compatibility-version=19.16.27027.1 -fms-compatibility -fms-extensions

The problem is, that LDM and STM (load and store multiple) are used, which
cannot access one-byte-aligned addresses (aka. unaligned). LDR and STR however
can be used for unaligned access.

The following C++ code translates to code, which can access unalgined memory.

extern "C" void* memcpy( void* dest, const void* src, decltype(sizeof(0)) count );

void load_and_store_memcpy(__unaligned GUID* in, __unaligned GUID* out){
        memcpy(reinterpret_cast<char*>(out), reinterpret_cast<char*>(in),
                sizeof(__unaligned GUID));

The cast to a byte pointer is necessary.

Here is the assembler Code:

        ldr.w   r12, [r0]
        ldr     r3, [r0, #4]
        ldr     r2, [r0, #8]
        ldr     r0, [r0, #12]
        str     r0, [r1, #12]
        str     r2, [r1, #8]
        str     r3, [r1, #4]
        str.w   r12, [r1]
        bx      lr

I could rewrite all that old code to use memcpy or use structs with
__attribute__((packed)). But maybe --I hope-- there is some old patch for
clang, that didn't get into trunk, but could help me out. Or maybe it is easy
to implement. @Roger, is there a patch somewhere?

Site Note:
The problem with __attribute__((packed)) is, that it cannot be added to an
existing type (for example: "typedef __attribute__((packed)) GUID pGUID;"), but
the definition has to be changed. Another problem is, that it changes the offset
of the members in the struct. Is that called "Record Layout"?

Kind regards,

[1] https://docs.microsoft.com/en-us/cpp/cpp/unaligned
[2] http://lists.llvm.org/pipermail/cfe-dev/2017-February/052739.html
Beckhoff Automation GmbH & Co. KG | Managing Director: Dipl. Phys. Hans Beckhoff
Registered office: Verl, Germany | Register court: Guetersloh HRA 7075

There seems to be something implemented, but it looks buggy. Compiling the below example (https://godbolt.org/z/7rcErj) with “-O2 -fms-extensions -emit-llvm” emits:
store i64 5, i64* %2, align 1, !tbaa !2

in C, but
store i64 5, i64* %2, align 8, !tbaa !2

in C++.

struct Foo {
unsigned long x;

void foo(__unaligned struct Foo* out){
out->x = 5;

Hi all,

@James thanks a lot for the prompt diagnostic.

@Jan: I raised https://bugs.llvm.org/show_bug.cgi?id=47499 and I’ll look into it ASAP.

I think we want to keep both languages in sync, so bring the C behaviour to C++.

C++ users that might be impacted by this (clang in Windows I presume) might want to weigh in just in case this is a bad idea.

My understanding is that __unaligned is a specifier that came from the Windows Itanium era. A quick check (using James’ testcase) shows that x86 assembly output is not impacted. I assume that the X86 backend can be lenient when it comes to alignment due to what x86 has historically done there, correct me if I’m wrong.

Kind regards,

Missatge de James Y Knight <jyknight@google.com> del dia dv., 11 de set. 2020 a les 19:50: