Code generation problem with inline assembly on arm


I'm trying to use an inline assembly block to process 16 bytes of data.
I pass the pointer to the block into the inline assembly via an input
"m" operand, and return the result via an output "r" operand. For some
reasons, however, clang schedules my asm *before* the block is initialize,
and my asm hence reads bogus values.

Basically, I do

const short V1[32] = {…};
union ptr_t {
  short e;
  struct { char bytes[64]; } data;
int r;
asm ( "@use 64 bytes at %1, result in %0"
      : "=r" (r)
      : "m" (((const ptr_t*)V1)->data));

The complete test case is attached.

The union+struct stuff is supposed to inform clang about the size of
the referenced memory region, and should be valid under strict aliasing
rules (since ptr_t contains a member of type short). Note, however, that
-fno-strict-aliasing does *not* fix the issue. And neither do variations
of the theme above, like using just a struct, using attribute may_alias,

Whatever I do, clang moves the inline asm into the middle of the
initialization of V1, as can be seen in the following assembly output,
generated with "-arch armv7 -mthumb -O3 -fstrict-aliasing".
  push {r4, r7, lr}
  add r7, sp, #4
  sub sp, #68
  mov r4, sp
  bic r4, r4, #15
  mov sp, r4
  mov.w r0, <1st element of V1>
  strh.w r0, [sp]
  mov.w r0, <27th element of V1>
  strh.w r0, [sp, #52]
  @ InlineAsm Start
  @use 64 bytes at [r1], result in r1
  @ InlineAsm End
  mov.w r0, <28th element of V1>
  strh.w r0, [sp, #54]
  mov.w r0, <32th element of V1>
  strh.w r0, [sp, #62]

The only way I found to avoid this is to make mark my asm as
clobbers-memory *and* volatile. Since that causes all variables
to be reloaded after my asm block, I'd like to find a way to avoid

Am I doing something wrong, or is this a bug in clang?

best regards,
Florian Pflug

bug.cpp (1023 Bytes)

It's a bug, specifically .