libc++ std::cout alignment trouble (was: Re: [llvm] r240144 - [SLP] Vectorize for all-constant entries.)

Hi,

tl;dr: after some research, I found that this commit appears to change the behavior of llvm in such a way as to cause trouble for libc++ users on FreeBSD, due to alignment changes.

Recently, I upgraded llvm and clang to 3.7.0 in the FreeBSD base system. Shortly afterwards, I received reports from users that some of their previously compiled C++ applications (dynamically linked to libc++.so.1) were crashing with SIGBUS errors. The stack traces always looked like this:

    #0 0x0000000800c95cfd in std::__1::basic_ostream<char, std::__1::char_traits<char> >::basic_ostream (this=<optimized out>, __sb=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280
    #1 std::__1::ios_base::Init::Init (this=<optimized out>) at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53
    #2 0x0000000800c96029 in ?? () at /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:44 from /usr/obj
    /usr/src/lib/libc++/libc++.so.1
    #3 0x0000000800cef352 in ?? () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #4 0x0000000800628c00 in ?? ()
    #5 0x00007fffffffdf18 in ?? ()
    #6 0x00007fffffffdea0 in ?? ()
    #7 0x0000000800c925c6 in _init () from /usr/obj/usr/src/lib/libc++/libc++.so.1
    #8 0x00007fffffffdea0 in ?? ()

E.g. it crashed in ios_base::Init::Init(), at this line:

    ostream* cout_ptr = ::new(cout) ostream(::new(__cout) __stdoutbuf<char>(stdout, &mb_cout));

and the relevant disassembly is:

    0x0000000800c95ccf <+255>: callq 0x800c96210 <std::__1::__stdoutbuf<char>::__stdoutbuf(__sFILE*, __mbstate_t*)>
    0x0000000800c95cd4 <+260>: mov 0x27d565(%rip),%rax # 0x800f13240
    0x0000000800c95cdb <+267>: lea 0x40(%rax),%rcx
    0x0000000800c95cdf <+271>: movq %rcx,%xmm0
    0x0000000800c95ce4 <+276>: lea 0x18(%rax),%rcx
    0x0000000800c95ce8 <+280>: movq %rcx,%xmm1
    0x0000000800c95ced <+285>: punpcklqdq %xmm0,%xmm1
    0x0000000800c95cf1 <+289>: movdqa %xmm1,-0x40(%rbp)
    0x0000000800c95cf6 <+294>: mov 0x27d86b(%rip),%rcx # 0x800f13568
=> 0x0000000800c95cfd <+301>: movdqa %xmm1,(%rcx)
    0x0000000800c95d01 <+305>: mov (%rax),%r14
    0x0000000800c95d04 <+308>: lea (%rcx,%r14,1),%rbx
    0x0000000800c95d08 <+312>: mov %rbx,%rdi
    0x0000000800c95d0b <+315>: mov %r15,%rsi
    0x0000000800c95d0e <+318>: callq 0x800c92c4c <_ZNSt3__18ios_base4initEPv@plt>

In this case, %rcx was the address of std::cout, 0x609068 (or some other address that was aligned to 8 bytes, *not* 16 bytes), which causes movdqa to result in a SIGBUS.

These crashing executables were compiled by clang 3.6.1 (or earlier versions), and it turns out that all of them had a global symbol for std::cout, which was aligned to 8 bytes. For example, in case of the original report, one of the executables showed the following in readelf output:

    Relocation section '.rela.dyn' at offset 0x2070 contains 6 entries:
        Offset Info Type Symbol's Value Symbol's Name + Addend
    [...]
    0000000000609068 0000003c00000005 R_X86_64_COPY 0000000000609068 _ZNSt3__14coutE + 0

and further down:

    Symbol table '.dynsym' contains 87 entries:
       Num: Value Size Type Bind Vis Ndx Name
    [...]
        60: 0000000000609068 160 OBJECT GLOBAL DEFAULT 25 _ZNSt3__14coutE

(Note: the executable gets a copy of std::cout, because the linker finds a global data object with copy relocations.)

The std::cout symbol is explicitly declared with an alignment in iostream.cpp, as follows:

    _ALIGNAS_TYPE (ostream) _LIBCPP_FUNC_VIS char cout[sizeof(ostream)];

The alignment of ostream is 8 bytes, therefore the alignment of cout is also 8 bytes.

When libc++ was previously compiled by clang 3.6.1, the assembly generated from ios_base::Init::Init() looked different than above, and std::cout was explicitly aligned to 8 bytes, in the .bss section:

        #DEBUG_VALUE: Init:this <- RDI
        .loc 3 669 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:5
        movq $0, 136(%r15,%rbx)
        .loc 3 670 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:5
        movl $-1, 144(%r15,%rbx)
.Ltmp44:
        .loc 2 53 77 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq __stdoutp@GOTPCREL(%rip), %r15
        movq (%r15), %rsi
        .loc 2 53 45 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:45
        leaq _ZNSt3__1L6__coutE(%rip), %r12
        leaq _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq %r12, %rdi
.Ltmp45:
        callq _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc 20 280 1 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
        leaq 24(%rax), %rcx
        movq %rcx, -64(%rbp) # 8-byte Spill
        movq _ZNSt3__14coutE@GOTPCREL(%rip), %rbx
        movq %rcx, (%rbx)
        leaq 64(%rax), %rcx
        movq %rcx, -48(%rbp) # 8-byte Spill
        movq %rcx, 8(%rbx)
        .loc 20 281 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
.Ltmp47:
        movq (%rax), %r14
        leaq (%rbx,%r14), %rdi
        .loc 3 668 15 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:15
.Ltmp48:
.Ltmp6:
        movq %r12, %rsi
        callq _ZNSt3__18ios_base4initEPv@PLT
[... skip to .bss ...]
        .type _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
        .globl _ZNSt3__14coutE
        .align 8
_ZNSt3__14coutE:
        .zero 160
        .size _ZNSt3__14coutE, 160

In contrast, when you compile the same file with clang 3.7.0, the assembly becomes similar to the crashing case:

        #DEBUG_VALUE: Init:this <- RDI
        .loc 3 669 12 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:669:12
        movq $0, 136(%rbx)
        .loc 3 670 13 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:670:13
        movl $-1, 144(%rbx)
.Ltmp44:
        .loc 2 53 77 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:77
        movq __stdoutp@GOTPCREL(%rip), %r12
        movq (%r12), %rsi
        .loc 2 53 59 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/src/iostream.cpp:53:59
        leaq _ZNSt3__1L6__coutE(%rip), %r15
        leaq _ZNSt3__1L7mb_coutE(%rip), %rdx
        movq %r15, %rdi
.Ltmp45:
        callq _ZNSt3__111__stdoutbufIcEC2EP7__sFILEP11__mbstate_t
        .loc 20 280 1 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:280:1
.Ltmp46:
        movq _ZTVNSt3__113basic_ostreamIcNS_11char_traitsIcEEEE@GOTPCREL(%rip), %rax
        leaq 64(%rax), %rcx
        movd %rcx, %xmm0
        leaq 24(%rax), %rcx
        movd %rcx, %xmm1
        punpcklqdq %xmm0, %xmm1 # xmm1 = xmm1[0],xmm0[0]
        movdqa %xmm1, -64(%rbp) # 16-byte Spill
        movq _ZNSt3__14coutE@GOTPCREL(%rip), %rcx
        movdqa %xmm1, (%rcx)
.Ltmp47:
        .loc 20 281 5 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        movq (%rax), %r14
.Ltmp48:
        .loc 20 281 5 is_stmt 0 # /usr/src/lib/libc++/../../contrib/libc++/include/ostream:281:5
        leaq (%rcx,%r14), %rbx
        .loc 3 668 5 is_stmt 1 # /usr/src/lib/libc++/../../contrib/libc++/include/ios:668:5
.Ltmp49:
.Ltmp6:
        movq %rbx, %rdi
        movq %r15, %rsi
        callq _ZNSt3__18ios_base4initEPv@PLT

and the definition of of std::cout is now with 16 bytes alignment instead:

        .type _ZNSt3__14coutE,@object # @_ZNSt3__14coutE
        .globl _ZNSt3__14coutE
        .align 16
_ZNSt3__14coutE:
        .zero 160
        .size _ZNSt3__14coutE, 160

Bisecting has shown that r240144 is the commit where this behavior changed, i.e. before this commit, std::cout is aligned at 8 bytes, and the instructions to access it take this into account. After this commit, it becomes aligned at 16 bytes, and it is then accessed using SSE (movdqa, etc). (Note that it is certainly possible that r240144 is just exposing a deeper issue, instead of being the root cause!)

So unfortunately, this new alignment makes a dynamic libc++ library, compiled by clang after this commit, incompatible with previously built applications. This is a big problem for FreeBSD, since we rather value backwards compatibility. :slight_smile:

I have had several discussions with people who indicate that 16 byte alignment is an x86_64 ABI requirement, at least for "large enough" objects. This may very well be true, but on the other hand, if the program authors explicitly specify that their objects must be aligned to X bytes, where X < 16, then the compiler should obey this, right? And changing existing behavior is incompatible with previously compiled programs.

As a temporary workaround, I have now reverted r240144 in FreeBSD, which restores the previous behavior, and aligns std::cout to 8 bytes again. But I would like to bring this to your attention, in the hope that we can find out what the real fix should be.

And to finish this way too long mail, I am attaching a minimized test case, which shows the behavior, and which should be appropriate to attach to a PR, if that is needed later on.

This test case should be compiled with -O2, to see the effects. Clang 3.6.x will result in the following IR for std::cout:

    @cout = global [24 x i8] zeroinitializer, align 8

while clang 3.7.0 will result in:

    @cout = global [24 x i8] zeroinitializer, align 16

-Dimitry

cout-align.cpp (402 Bytes)

Hi Dimitry,

Thanks for the report! I took a quick look at the test and found the following:

  1. Before SLP we have this:

@cout = global [24 x i8] zeroinitializer, align 8

define void @_Z7do_initv() #0 {

entry:
store i32 (…)** bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (…)), i32 (…)* bitcast ([24 x i8]* @cout to i32 (…)), align 8, !tbaa !2
store i32 (…)
bitcast (i8
* getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (…)), i32 (…)* bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i32 (…)***), align 8, !tbaa !2
ret void
}

  1. After SLP we get this:

@cout = global [24 x i8] zeroinitializer, align 8

define void @_Z7do_initv() #2 {
entry:
store <2 x i32 (…)> <i32 (…) bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (…)), i32 (…) bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (…))>, <2 x i32 (…)>* bitcast ([24 x i8]* @cout to <2 x i32 (…)**>*), align 8, !tbaa !2
ret void
}

  1. And then, after instcobmine, we get this:

@cout = global [24 x i8] zeroinitializer, align 16


; Function Attrs: nounwind ssp uwtable
define void @_Z7do_initv() #2 {
entry:
store <2 x i32 (…)> <i32 (…) bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 3) to i32 (…)), i32 (…) bitcast (i8** getelementptr inbounds ([10 x i8*], [10 x i8*]* @_ZTV1BIciE, i64 0, i64 8) to i32 (…))>, <2 x i32 (…)>* bitcast ([24 x i8]* @cout to <2 x i32 (…)**>*), align 16, !tbaa !2
ret void
}

I’ll take a look at instombine to understand why it replaces “align 8” with "align 16” in this case. Maybe it’s just a bug, or maybe we shouldn’t vectorize this case for some reason.

Thanks,
Michael

Hi,

This is a smaller reproducer of the issue:

$ cat test.ll

target datalayout = “e-m:o-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-apple-macosx10.10.0”

@cout = global [24 x i8] zeroinitializer, align 8

define void @foo() {
entry:
store i64 100, i64* bitcast ([24 x i8]* @cout to i64*), align 8
store i64 200, i64* bitcast (i8* getelementptr inbounds ([24 x i8], [24 x i8]* @cout, i64 0, i64 8) to i64*), align 8
ret void
}

$ opt -slp-vectorizer < test.ll -S
; ModuleID = ‘’
target datalayout = “e-m:o-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-apple-macosx10.10.0”

@cout = global [24 x i8] zeroinitializer, align 8

define void @foo() {
entry:
store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 8
ret void
}

$ opt -slp-vectorizer < test.ll | opt -instcombine -S
; ModuleID = ‘’
target datalayout = “e-m:o-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-apple-macosx10.10.0”

@cout = global [24 x i8] zeroinitializer, align 16

define void @foo() {
entry:
store <2 x i64> <i64 100, i64 200>, <2 x i64>* bitcast ([24 x i8]* @cout to <2 x i64>*), align 16
ret void
}

It looks like instcombine intentionally rewrites the alignment. It happens in getOrEnforceKnownAlignment called by InstCombiner::visitStoreInst. Then compiler performs several checks and decides that it can bump up the alignment, which is presumable illegal in this case.

Reid,
it looks like you relatively recently touched code in enforceKnownAlignment. Could you please comment on whether it’s correct behavior or not?

Thanks,
Michael

This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.

Joerg

I'm not so sure, especially when we get to other targets that don't
have the special AMD64 rule about objects > 16 bytes in size getting
that alignment.

It seems that the way ELF's R_..._COPY relocations work make an
externally visible object's alignment part of a .so's ABI whether we
like it or not. I just don't see a way around that which allows us to
increase it at the moment.

Tim.

>> it looks like you relatively recently touched code in
>> enforceKnownAlignment. Could you please comment on whether it’s correct
>> behavior or not?
>
> This looks entirely correct to me as long as PrefAlign <= 16 for AMD64.

I'm not so sure, especially when we get to other targets that don't
have the special AMD64 rule about objects > 16 bytes in size getting
that alignment.

So there are two slightly different things here. I think we should be
able to raise the alignment of global objects, independently of what the
platform ABI says. Now on the specific case of AMD64, there is a
separate ABI rule which makes this a correct transform.

It seems that the way ELF's R_..._COPY relocations work make an
externally visible object's alignment part of a .so's ABI whether we
like it or not. I just don't see a way around that which allows us to
increase it at the moment.

The strange part is that noone seems to be observing this outside
FreeBSD, which has a pretty ancient ld. What I don't know is where ld
*is* picking up the 64bit alignment from.

Joerg

So there are two slightly different things here. I think we should be
able to raise the alignment of global objects, independently of what the
platform ABI says.

I'd like that to be the case, but it clearly is visible at shared
object boundaries so I don't think it's something we can change at
will.

Now on the specific case of AMD64, there is a
separate ABI rule which makes this a correct transform.

I don't think it does quite that. If you're applying that rule it
makes it a *necessary* transformation (in this view LLVM enforces ABI
alignments whether you want them or not for your language, LLVM 3.6
had a bug by not doing so, and trunk still has a bug for not always
doing so).

The strange part is that noone seems to be observing this outside
FreeBSD, which has a pretty ancient ld. What I don't know is where ld
*is* picking up the 64bit alignment from.

ELF files don't actually contain an alignment field for symbols.
Gold's algorithm appears to start with the section's alignment, keep
reducing that until the symbol's address is properly aligned and then
decide that was "obviously" the intended alignment. Not ideal (objects
can easily get over-aligned by accident), but about the only sensible
way to do it given the data available.

So the 64-bit alignment ultimately came from LLVM only aligning the
section containing std::cout to 64-bits.

Tim.

Any chance this is because libc++ gives these objects different types in
the declaration versus the definition?

Any chance this is because libc++ gives these objects different types in the
declaration versus the definition?

I think it’s unlikely. The decision to increase alignment doesn’t seem to be based on anything you’d expect to be affected by that (underlying global type, possibly TBAA info). It’s linkages and sizes that actually get used.

Tim.

Yes, this is the important part. There will be existing executables out
there, compiled by previous versions of clang, or even gcc, that have
this 'bad' alignment. Increasing the alignment now, with no way to undo
it, will break backwards compatibility.

Another interesting data point is that in the following one-liner:

__attribute__((__aligned__(8))) char cout[160];

cout will *not* be aligned to 16 bytes, whereas the sample I posted
earlier, and the code in iostream.cpp, *will* be aligned to 16 bytes,
even though the actual declarations look quite the same.

So there is something going on in llvm that bumps up the alignment, for
some reason, related to r240144. I would like to either revert that
behavior, or try to find a way to work around it (except from reverting
r240144, that is).

Another reason to not bump the alignment is that all versions of gcc
that I tried (up to the latest 6.0.0 trunk) emit .align 8 for std::cout.
Of course we don't want to be bug-for-bug compatible with *everything*
gcc does, but this one seems a no-brainer to me. :slight_smile:

-Dimitry

P.S.: In hindsight, specifying an explicit alignment for std::cout etc
was maybe not so handy. But it's too late to change now, without
breaking the libc++ ABI...

cout will *not* be aligned to 16 bytes, whereas the sample I posted
earlier, and the code in iostream.cpp, *will* be aligned to 16 bytes,
even though the actual declarations look quite the same.

It's particular operations on the variable that trigger the bump,
rather than just the declaration itself.

So there is something going on in llvm that bumps up the alignment, for
some reason, related to r240144. I would like to either revert that
behavior, or try to find a way to work around it (except from reverting
r240144, that is).

Reverting that revision would just be hiding the underlying issue;
completely the wrong thing to do on trunk.

P.S.: In hindsight, specifying an explicit alignment for std::cout etc
was maybe not so handy. But it's too late to change now, without
breaking the libc++ ABI...

It's impossible to do otherwise (and maintain our cunning lazy
initialization). Other places are using std::cout as an ostream, which
has an ABI-required alignment of (at least) 8.

Cheers.

Tim.

So there is something going on in llvm that bumps up the alignment, for
some reason, related to r240144. I would like to either revert that
behavior, or try to find a way to work around it (except from reverting
r240144, that is).

Reverting that revision would just be hiding the underlying issue;
completely the wrong thing to do on trunk.

Right, we need to find the root cause. For now Dimitry has reverted
r240144 in the FreeBSD tree to work around the issue, but we need to
replace it with a real fix.

P.S.: In hindsight, specifying an explicit alignment for std::cout etc
was maybe not so handy. But it's too late to change now, without
breaking the libc++ ABI...

It's impossible to do otherwise (and maintain our cunning lazy
initialization). Other places are using std::cout as an ostream, which
has an ABI-required alignment of (at least) 8.

I'm a bit confused about where the bug actually is though. As I
understand it, the alignment attribute specifies the minimum
alignment, and the cunning char array trick actually imposes a 16-byte
ABI alignment. If so then I'd say the bug is libc++'s.

It's impossible to do otherwise (and maintain our cunning lazy
initialization). Other places are using std::cout as an ostream, which
has an ABI-required alignment of (at least) 8.

First, Joerg pointed out on IRC that my statement above was in error
and the ABI-alignment of "char cout[LOTS]" is actually 16, which means
that on AMD64 only we could have skipped the aligned attribute. I
don't think that affects what we do here though.

I'm a bit confused about where the bug actually is though. As I
understand it, the alignment attribute specifies the minimum
alignment, and the cunning char array trick actually imposes a 16-byte
ABI alignment.

So the AMD64 ABI alignment should override the attribute? I think
Dmitri has a good point that that's not how GCC interprets it (even
the documentation seems to allow under-aligning non-struct variables).
But even if so the bug would be in Clang 3.6[*] for emitting that
declaration into libc++.so with only 8 bytes alignment.

We'd also still have the issue that we probably shouldn't overalign
externally visible globals on ELF, even if the AMD64 ABI came to the
rescue in this one case.

My summary position is:
1. We need to stop over-aligning externally visible globals on ELF.
Ideally only if they're going into a shared library, but I don't think
that info is available.
2. "align 8" is correct for a "__attribute__((aligned(8))) char
arr[LOTS]" declaration.

Cheers.

Tim.

[*] Or LLVM. Or the linker. Depending on who you ask.

So the AMD64 ABI alignment should override the attribute? I think
Dmitri has a good point that that's not how GCC interprets it (even
the documentation seems to allow under-aligning non-struct variables).
But even if so the bug would be in Clang 3.6[*] for emitting that
declaration into libc++.so with only 8 bytes alignment.

GCC says "This attribute specifies a minimum alignment (in bytes) for
variables of the specified type." which implies to me the 16-byte ABI
alignment should apply. However, that's not true for all GCC versions
we've examined and Clang 3.6 and earlier, and it seems to be true on
Clang 3.7 only by accident.

Perhaps this issue is _also_ a Clang bug. But if we assume the
alignment attribute is a minimum, and we want the natural 8-byte
alignment for ostream, libc++ is also in error.

My summary position is:
1. We need to stop over-aligning externally visible globals on ELF.
Ideally only if they're going into a shared library, but I don't think
that info is available.

I'm not sure how we avoid overalignment though?

2. "align 8" is correct for a "__attribute__((aligned(8))) char
arr[LOTS]" declaration.

I don't believe this is true according to GCC's documentation. I agree
that if we expect aligned(8) to specify exact instead of at least
8-byte alignment then this is a Clang bug.

GCC says "This attribute specifies a minimum alignment (in bytes) for
variables of the specified type." which implies to me the 16-byte ABI
alignment should apply.

I think there might be two places discussing the attribute. One
applying to types, the other to variables (and typedefs, strangely).

From Common Variable Attributes (Using the GNU Compiler Collection (GCC)),

I'm basing my thoughts on:

"This attribute specifies a minimum alignment for the variable or
structure field, measured in bytes. [...] The default alignment is
fixed for a particular target ABI. [...] When used on a struct, or
struct member, the aligned attribute can only increase the alignment."

To me those 3 imply that attribute((aligned(N))) takes precedence over
the ABI and can reduce the alignment below ABI as long as it's not
applied to a struct. I take the "minimum" to mean that the compiler
isn't going to deliberately misalign a variable for you to guarantee
that "addr % (2*N) != 0".

Perhaps this issue is _also_ a Clang bug. But if we assume the
alignment attribute is a minimum, and we want the natural 8-byte
alignment for ostream, libc++ is also in error.

I followed your reasoning on the aligned attribute (though still don't
agree), but I'm afraid I don't see this one. Are you saying that
libc++ should have used some other means to make sure that cout was
aligned to precisely 8 bytes, rather than 16? Should have known all
such attempts were futile and not tried, or what?

And how does this reasoning apply to other platforms where without the
attribute the ABI alignment would only have been 1?

Cheers.

Tim.

Ok, now I see what you mean. An alignment is inherently a minimum,
since a symbol may always end up with a wider alignment than required.
So the attribute overrides the "minimum alignment," not sets a minimum
value for the alignment (that may end up higher due to ABI).

This thread unfortunately died out without any resolution, so we are now
nearing 3.8 branching, and the problem is still not fixed. E.g., the
only custom patch we have now left in FreeBSD's version of llvm (which
we would really like to get rid of!) is the reversal of r240144: it is a
pretty gross workaround, but it works for now.

But is there any way that we can get this resolved before 3.8.0 ships,
please? :slight_smile:

-Dimitry

I believe the summary of the thread is that LLVM has a fundamentally unsound transformation: automatically increasing the required alignment of an externally visible global variable.

In enforceKnownAlignment in lib/Transforms/Utils/Local.cpp, it checks isStrongDefinitionForLinker() purportedly to ensure that the “memory we set aside for the global” is “the memory used by the final program.”. However, even if said predicate returns true, that condition is NOT guaranteed, at least on ELF platforms, because of copy relocations.

(There’s also very similar-looking alignment-fixing code in CodeGenPrepare::optimizeCallInst, too, I’m not sure why it’s duplicated there.)

Basically, we need to not modify the alignment if:

if (Subtarget->isTargetELF() &&
TM.getRelocationModel() == Reloc::PIC_ &&
GV->hasDefaultVisibility() && !GV->hasLocalLinkage())

The above code snippet appears in X86FastISel.cpp, and equivalent-but-not-identical code appears in several other places, to determine whether a PLT load for the variable is required.

I’m not sure if isStrongDefinitionForLinker itself ought to be taking that into account, or if there should be a different new predicate for that check. isReallyActuallyTrulyStrongDefinitionForLinker()? :slight_smile:

I believe the pending change at http://reviews.llvm.org/D16145 will solve this issue.