Use of 'ldrd' instructions with unaligned addresses on armv7 (Major bug in LLVM optimizer?)

Hi,

I think I discovered a major armv7 optimization bug in Clang. I create a simple test case which exhibits the issue.
When you compile the attached file for armv7 with optimizations turned on (O2, O3 or Os), the binary generated led to a crash.
The issue can't be reproduced when using GCC 4.2. It can't be reproduced with Clang when the optimization is turned off (O0).
This issue can be reproduced with Xcode 4.2.1 and Xcode 4.3 Developer Preview 3 (4E71d).

The LLVM optimizer coalesces two loads into an ARM 'ldrd' double-load instruction. Unfortunately, some ARM hardware platforms (including all of Apple's) support misaligned accesses for single-register loads and stores ('ldr' and 'str' instructions), but not for double-register loads and stores ('ldrd' and 'strd' instructions.).
On those platforms, the attached code triggers an address error when passed a misaligned address if it's compiled by LLVM with the optimizer enabled, but not if it's compiled by gcc, or by LLVM with the optimizer disabled.
At a minimum, there should be an option to disable the generation of 'ldrd' and 'strd', or the automatic coalescing of two loads that just happen to be nearby.

I haven't found any bugs similar in the LLVM bug database. It is a known issue? This looks like a major optimization issue.

Thanks,
Alexandre

main.m (1.8 KB)

The problem is in your code, not the compiler. You're casting an unaligned char* to an int*, even though an int* pointer must be 4-byte aligned in every ARM ABI that I've ever seen.

In practice all Apple hardwares support misaligned accesses for single-register loads and stores.
If a pointer is not aligned, LLVM should not use the double-register loads and stores. It should keep the two single-register loads instead of trying to optimize them as one unsupported double-register load.
Note that this code compiled with GCC 4.2 runs perfectly whereas LLVM will produce a binary that crashes: LLVM breaks existing source code.

Alexandre

Hi Alexandre,

The platform ABIs usually state that a pointer must be aligned to the same
alignment as its pointee type requires.

If you need it I can look it up and quote it for you, but I'm pretty sure
this is the case.

Cheers,

James

Behalf Of Alexandre Colucci

Note that this code compiled with GCC 4.2 runs perfectly whereas LLVM will produce a binary that crashes: LLVM breaks existing source code.

On this point:
This is not uncommon - and the very nature of "Undefined Behaviour".
This reason alone is not enough to justify a change to Clang. We/you
would need to show that the behaviour is defined & Clang is violating
that definition.

Chris has posted about the general principle at length starting here:

No one is arguing that there aren't ABI specs or LLVM design
guidelines that say that unaligned accesses "should not", "could not"
or "aren't guaranteed to" work, because it's besides the point.

The point is that unaligned 32-bit loads and stores *work in practice*
on every single ARM device Apple has ever manufactured. I'm not a
hardware person, but I'm guessing it takes a non-negligible amount of
silicon to support them. With Xcode's switch to LLVM, this deployed
silicon has suddenly become off-limits because of a single overzealous
optimization. The only possible workarounds are assembly code and
turning the optimizer off altogether.

It would be one thing if the optimizer generated ldrd/strd for 64-bit
loads and stores only. But it actually goes as far as taking two
separate 32-bit accesses and merging them into one
silently-incompatible 64-bit access. These two accesses could be
unrelated to one another in the context of the code at hand, and even
syntactically distant.

This introduces a stealth crash into Apple-only code that was bug-free
under gcc. And realizing what's going on requires familiarity with ARM
assembly. With a bit of googling you'll find other support board posts
asking "Why is my code suddenly crashing after upgrading Xcode?"

Unfortunately, there is no way to turn this optimization off. Not to
mention that it's not much of an optimization at all. I'd be surprised
if you could measure a performance improvement on a real-world program
with less than a million iterations, maybe orders more.

You can measure its damage pretty significantly, though. We've already
spent a lot of hours tracking down unaligned accesses and wrapping
them in assembly macros. Which of course ends up disabling other,
actually useful optimizations. And with a large codebase, we can't be
sure we've found every last one yet.

Whether this optimization is academically acceptable or not, its net
impact in real-world terms is exceedingly negative.

EdE

No one is arguing that there aren't ABI specs or LLVM design
guidelines that say that unaligned accesses "should not", "could not"
or "aren't guaranteed to" work, because it's besides the point.

No, it is the core of the issue. Standard C gives the compiler certain
garantees and one of them is correct alignment of pointers to whatever
the platform wants. For some architectures, this is normally not enforced
(x86, ppc), on some violating results in traps (SPARC), on some it
results in unexpected behavior. Early ARM generations for example fall
into the last category.

This introduces a stealth crash into Apple-only code that was bug-free
under gcc. And realizing what's going on requires familiarity with ARM
assembly. With a bit of googling you'll find other support board posts
asking "Why is my code suddenly crashing after upgrading Xcode?"

Your code is buggy. Stop justifying it by saying that GCC doesn't
utilise optimisation potential. It has been proven over and over again,
that half of the cases where GCC misoptimises code turned out to be
completely broken assumptions by the code in question. Never versions of
GCC tended to expose new bugs. It's not a problem in the compiler. I
wouldn't be surprised if you get the same behavior with a recent GCC,
too.

You can measure its damage pretty significantly, though. We've already
spent a lot of hours tracking down unaligned accesses and wrapping
them in assembly macros. Which of course ends up disabling other,
actually useful optimizations. And with a large codebase, we can't be
sure we've found every last one yet.

There is a warning about casts that change the alignment for a reason.
Sorry, but I have absolute no sympathy for case.

Joerg

BTW as the question was asked on IRC: it is possible to force LLVM to
forget about the natural alignment of pointers. Examples are the packed
attribute on structures or using the align attribute on pointers.
consider the attached example for the latter. Notice the significant
difference in code size...

Joerg

unalignedptr.c (106 Bytes)

unalignedptr.s (684 Bytes)

Hi Esperanza,

No one is arguing that there aren't ABI specs or LLVM design
guidelines that say that unaligned accesses "should not", "could not"
or "aren't guaranteed to" work, because it's besides the point.

The C standard, 6.3.2.3 clause 7 states:

"A pointer to an object or incomplete type may be converted to a pointer to
a different object or incomplete type. If the resulting pointer is not
correctly aligned for the pointed-to type, the behaviour is undefined."

Your program is exhibiting undefined behaviour, and just because others have
also written code that exhibits the same undefined behaviour does not make
yours right.

It would be one thing if the optimizer generated ldrd/strd for 64-bit
loads and stores only. But it actually goes as far as taking two
separate 32-bit accesses and merging them into one
silently-incompatible 64-bit access.

That's two accesses *to the same array*, right? Which has incorrect
alignment. If you really want to force this behaviour, you can possibly mark
the array as an array of volatile ints. But it's still undefined and has no
guarantee.

The point is that unaligned 32-bit loads and stores *work in practice*
on every single ARM device Apple has ever manufactured.

They're slow and can cover multiple cache lines. They require two memory
accesses in the worst cast for a word access. If you don't care about having
optimised code, why not turn optimisations off?

TL;DR: Don't blindly cast between char*/void* and int*. Just because it
works on x86 and recent ARM hardware supports unaligned access does not make
it standards compliant.

Cheers,

James

Behalf Of Esperanza de Escobar

Instead of resorting to assembler, have you tried __attribute__((align(n))) ?

-Joe