After a very brief thought, I'd still go for GlobalMerge now, in
conjunction with an enhanced "alias" so that you could emit something
like:
@g1 = hidden alias [100 x i32]* bitcast(i32* getelementptr([300 x
i32]* @Merged, i32 0, i32 0) to [100 x i32]*)
We certainly don't seem to handle this alias properly now though, and
it may violate the intended uses. Rafael's doing some thinking about
"alias" at the moment, so I've CCed him.
Would that be a horrific abuse of the poor alias system?
I think it would
Folding objects like this prevents the linker
from deleting one of them if it is unused for example.
I think it is just a missing optimization in the ARM backend. If it
knows multiple objecs are in the same DSO, it can use the address of
one to find the other.
Given:
@g0 = hidden global [100 x i32] zeroinitializer, align 4
@g1 = hidden global [100 x i32] zeroinitializer, align 4
define void @foo() {
tail call void @bar(i8* bitcast ([100 x i32]* @g0 to i8*))
tail call void @bar(i8* bitcast ([100 x i32]* @g1 to i8*))
ret void
}
declare void @bar(i8*)
The command "llc -mtriple=i686-pc-linux -relocation-model=pic" produces
calll .L0$pb
.L0$pb:
popl %ebx
.Ltmp3:
addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L0$pb), %ebx
leal g0@GOTOFF(%ebx), %eax
movl %eax, (%esp)
calll bar@PLT
leal g1@GOTOFF(%ebx), %eax
movl %eax, (%esp)
calll bar@PLT
Which is ok , since the add of ebx is folded and the constant is an
immediate in x86.
On ARM, that is not the case. We produce
ldr r0, .LCPI0_0
add r4, pc, r0 // r4 is the equivalent of ebx in the x86 case.
ldr r0, .LCPI0_1 // r0 is the constant that is an
immediate in x86.
add r0, r0, r4 // that is the add that is folded in x86
...
.LCPI0_0:
.long _GLOBAL_OFFSET_TABLE_-(.LPC0_0+8)
.LCPI0_1:
.long g0(GOTOFF)
For ARM, codegen already keeps tracks of offset so it can implement
the constant islands, so it should be able to see that the two globals
are close enough that offset between them fits an immediate.
Nick, will this work on MachO or can ld64 move _g0, _g1 and _g2 too far apart?
BTW, what will gcc produce for
void init(void *);
extern int g0[100] __attribute__((visibility("hidden")));
extern int g1[100] __attribute__((visibility("hidden")));
extern int g2[100] __attribute__((visibility("hidden")));
void foo() {
init(&g0);
init(&g1);
init(&g2);
}
Cheers,
Rafael