SSE instructions and alignment of the return value of 'new'

Riyaz_Puthiyapurayil · October 2, 2017, 10:11pm

I have some programs crashing when I upgraded from clang 3.9.1 to clang 4.0.1.

Debugging this I found the reason for the crash. This is happening in the following assembly fragment for a piece of code allocating a class object (size: 24 bytes) using operator new and then initializing it:

0x00002aaaafc145f3 <+35>: callq 0x2aaaafdf5f90 <operator new(unsigned long)>

0x00002aaaafc145f8 <+40>: mov %rax,%r13

0x00002aaaafc145fb <+43>: xorps %xmm0,%xmm0

=> 0x00002aaaafc145fe <+46>: movaps %xmm0,0x0(%r13)

The value in %r13 (from the return value of operator new) is not appropriately aligned causing the crash. The memory allocation is done by a custom memory allocator that is returning 8-byte aligned blocks. The memory allocator has not changed between the two versions of the program (the one using clang 3.9.1 versus the one using clang 4.0.1). The version of libstdc++ is also the same. The command line options to clang are unchanged (-msse2 is specified in both cases). But I found that clang 3.9.1 is not generating SSE instructions but clang 4.0.1 is generating them in the above case.

The fix in our code is to make an API call to configure the custom allocator to always return appropriately aligned memory. But I would like to know if there is a known change in LLVM or clang to assume that malloc will return > 8 byte aligned memory based on the allocation size or if this has always been the case. I want to know if my program compiled with 3.9.1 also has a problem that was just not exposed in testing.

Thanks in advance.

/Riyaz

topperc · October 2, 2017, 10:44pm

Does the crash happen if you compile with -fnew-alignment=8? That’s supposed to change what clang assumes the alignment of memory allocated with new will be.

choikwa · October 2, 2017, 11:16pm

I think as you alluded to, movaps xmm, m128 requires m128 to be 16 byte aligned to load 4 single precision fp into xmm.

Glibc had a bug open for not supporting variable alignment on malloc/new as standard mandates it, but they decided not to fix according to this bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795

The issue is ‘resolved wontfix’ but I’m not exactly sure what they decided to do – it seems they tried to put in workaround in gcc as glibc won’t budge. If workaround is in gcc, clang for sure would also need that.

-Kevin

Eli_Friedman · October 3, 2017, 12:15am

We started optimizing global operator new more aggressively in . -Eli

Riyaz_Puthiyapurayil · October 3, 2017, 1:27am

-fnew-alignment=8 makes the crash go away. Can you point me to the documentation for this option? I couldn’t find it.

/ Riyaz

Riyaz_Puthiyapurayil · October 3, 2017, 1:32am

We started optimizing global operator new more aggressively in .

Thank you. It is good that it is being more aggressive. However, I plan to use -fnew-alignment=8 until we can configure our memory allocator to align based on the size (this is a more disruptive change which will take some time to stabilize).

topperc · October 3, 2017, 1:34am

The only published documentaton is here, but it doesn’t say much

https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fnew-alignment

I only know about it because someone else asked almost the exact same question as you last week http://lists.llvm.org/pipermail/cfe-dev/2017-September/055635.html

Topic		Replies	Views
How to get CLang array alloca alignments to be smaller than 16 bytes? LLVM Dev List Archives	12	86	May 6, 2019
alloca + strd issue on arm freebsd LLVM Dev List Archives	0	65	October 10, 2017
alloca + strd issue on arm freebsd LLVM Dev List Archives	1	58	October 11, 2017
[RFC] [X86] Emit unaligned vector moves on avx machine with option control. LLVM Dev List Archives	33	98	April 20, 2021
SIMD instructions and memory alignment on X86 LLVM Dev List Archives	23	121	July 21, 2013

SSE instructions and alignment of the return value of 'new'

Related Topics