Cortex-M4F Assembly Code Generation Incorrect for float casting, causing HardFaults

Basically the following code can repro the issue:

Compile flags are:

Wall -c -fshort-enums -ggdb3 -std=c++14 -fno-rtti -fno-exceptions -mfpu=fpv4-sp-d16 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mfloat-abi=hard -ffunction-sections -fdata-sections -O1 -target arm-none-eabi -mcpu=cortex-m4 -mthumb

code to reproduce is:

template <typename T> inline T Read (uint8_t* data)
auto valuePtr = reinterpret_cast<T*> (data);

auto value = *valuePtr;

return value;

static float test = 180.0f;

int main (void)
uint8_t* data = new uint8_t[8];

memcpy (data, &test, sizeof (float));

auto value = Read<float> (data);

memcpy (data + 1, &test, sizeof (float));

value = Read<float> (data + 1); // hard fault here!

return value;

My diagnosis is that when it dereferences the float pointer it uses the VLDR instruction,

This instruction can only be used on word aligned boundaries.

So It works for first call to my Read method, but the second call which is not aligned, it crashes.

Not sure who is the right person to fix this.

I will reply with a bug, once I manage to login to the bug database.



If have filed a bug here:

Does the C and C++ standards not state that loading a value from an unaligned addres (e.g. float from an address not aligned to 4) is undefined behaviour?

I’m not saying SO is the definitive reference for this, but it would seem like it says so, with a reference to the spec:

Whilst it may well be that this “works” on some archiectures (particularly x86), I do believe it’s perfectly valid to produce code that assumes aligned data.

Or am I missing something on the subject?

Well I can workaround by marking the method with:

attribute ((optnone))

I would expect code that compiles with no optimizations to work with optimizations on clang. (if it were bug free :wink:)

Also this bug also shows up if you use uint32 so not limited to floats.

So its not currently possible to read a uint32 value from a buffer at an unaligned position. This should work shouldn’t it?

The issue is with using VLDR instruction.




This seems to be as expected. This pair of lines:

    auto valuePtr = reinterpret_cast<T*> (data);
    auto value = *valuePtr;

Is asserting to the compiler that data is correctly aligned for T, which in your case is a float. Dereferencing this pointer is therefore allowed to assume alignment. If you have an underaligned pointer, then C++ provides the alignas keyword that allows you to specify an under-aligned type. If the VLDR instruction is still generated with an alignas(1) type, then this is a bug, otherwise this is the compiler generating code that does what you instructed it to do.


Formally, alignas cannot be used to relax alignment:

10.6.2 p5:
"The combined effect of all alignment-specifiers in a declaration shall not specify an alignment that is less strict than the alignment that would be required for the entity being declared if all alignment-specifiers appertaining to that entity were omitted.”

Clang and recent GCC both support relaxing alignment via the `__attribute__((aligned(N)))` extension; older GCCs are documented as not supporting relaxing alignment, though it may work in practice.

My usual advice for misaligned load/stores, however, is "just use memcpy”. Clang and GCC are both quite good at lowering the call and doing what you want:

– Steve

People have already pointed out that you're violating alignment
requirements which makes the code wrong. There's actually a second
issue called "strict aliasing". Essentially, if you have a buffer or
other object of one type it's invalid to cast a pointer to a different
type and access it even if the alignment requirements are met.

The portable way to write your Read function is:

    template <typename T> inline T Read (uint8_t* data)
        T value;
        memcpy(&value, data, sizeof(Val));
        return Val;

because "memcpy" is one of the very few ways of legitimately playing
those kinds of tricks. It'll also completely avoid the alignment
problems you were having and the compiler should optimize this to the
best permitted code sequence (for example it might use an unaligned
load followed by an aligned store if it thinks that's profitable).



Thanks Guys,

using memcpy resolves this.