SSE instructions and alignment of the return value of 'new'

I have some programs crashing when I upgraded from clang 3.9.1 to clang 4.0.1.

Debugging this I found the reason for the crash. This is happening in the following assembly fragment for a piece of code allocating a class object (size: 24 bytes) using operator new and then initializing it:

0x00002aaaafc145f3 <+35>: callq 0x2aaaafdf5f90 <operator new(unsigned long)>

0x00002aaaafc145f8 <+40>: mov %rax,%r13

0x00002aaaafc145fb <+43>: xorps %xmm0,%xmm0

=> 0x00002aaaafc145fe <+46>: movaps %xmm0,0x0(%r13)

The value in %r13 (from the return value of operator new) is not appropriately aligned causing the crash. The memory allocation is done by a custom memory allocator that is returning 8-byte aligned blocks. The memory allocator has not changed between the two versions of the program (the one using clang 3.9.1 versus the one using clang 4.0.1). The version of libstdc++ is also the same. The command line options to clang are unchanged (-msse2 is specified in both cases). But I found that clang 3.9.1 is not generating SSE instructions but clang 4.0.1 is generating them in the above case.

The fix in our code is to make an API call to configure the custom allocator to always return appropriately aligned memory. But I would like to know if there is a known change in LLVM or clang to assume that malloc will return > 8 byte aligned memory based on the allocation size or if this has always been the case. I want to know if my program compiled with 3.9.1 also has a problem that was just not exposed in testing.

Thanks in advance.


Does the crash happen if you compile with -fnew-alignment=8? That’s supposed to change what clang assumes the alignment of memory allocated with new will be.

I think as you alluded to, movaps xmm, m128 requires m128 to be 16 byte aligned to load 4 single precision fp into xmm.

Glibc had a bug open for not supporting variable alignment on malloc/new as standard mandates it, but they decided not to fix according to this bug

The issue is ‘resolved wontfix’ but I’m not exactly sure what they decided to do – it seems they tried to put in workaround in gcc as glibc won’t budge. If workaround is in gcc, clang for sure would also need that.


We started optimizing global operator new more aggressively in . -Eli

-fnew-alignment=8 makes the crash go away. Can you point me to the documentation for this option? I couldn’t find it.

/ Riyaz

We started optimizing global operator new more aggressively in .

Thank you. It is good that it is being more aggressive. However, I plan to use -fnew-alignment=8 until we can configure our memory allocator to align based on the size (this is a more disruptive change which will take some time to stabilize).

The only published documentaton is here, but it doesn’t say much

I only know about it because someone else asked almost the exact same question as you last week