Possible bug in CLANG/LLVM

Hi,

Taking a sample program:
/* A very simple function to test memory stores. */
static int mem1 = 0x123;
int *test99() {
        return &mem1; // Return a 64bit pointer to the heap.
}

clang-10 -c -o test99.o -O1 test99.c

objdump -drt test99.o
test99.o: file format elf64-x86-64
SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 test99.c
0000000000000000 l O .data 0000000000000004 mem1
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g F .text 0000000000000006 test99
Disassembly of section .text:
0000000000000000 <test99>:
   0: b8 00 00 00 00 mov $0x0,%eax <---32 bit HERE
                        1: R_X86_64_32 .data
   5: c3 retq

In this case, %eax is supposed to be a 64bit pointer on x86-64
While mov $0x0,%eax zero extends to fill %rax. Can there be situation
where, when the program is linked and loaded as part of a larger
program or dynamically loaded as a .so lib, the address of .data might
be > 32bits ?

What is it that forces the .data segment to be loaded < 32bits address ?
I see that the linux elf loader, will fail to load the program if the
address is >32 bits, and also compiling with -fPIC gets round the
problem to an extent.

Kind Regards

James

James Courtier-Dutton via llvm-dev <llvm-dev@lists.llvm.org> writes:

Disassembly of section .text:
0000000000000000 <test99>:
   0: b8 00 00 00 00 mov $0x0,%eax <---32 bit HERE
                        1: R_X86_64_32 .data
   5: c3 retq

In this case, %eax is supposed to be a 64bit pointer on x86-64
While mov $0x0,%eax zero extends to fill %rax. Can there be situation
where, when the program is linked and loaded as part of a larger
program or dynamically loaded as a .so lib, the address of .data might
be > 32bits ?

What is it that forces the .data segment to be loaded < 32bits address ?

Nothing. By default clang compiles for the x86_64 small memory model,
which on SysV systems means pointers are assumed to be < 32 bits (it's a
bit more complicated than that, see the SysV ABI document for details).

I see that the linux elf loader, will fail to load the program if the
address is >32 bits, and also compiling with -fPIC gets round the
problem to an extent.

If you try to link the object and pointers exceed the small memory
model assumptions you will get messages about truncated relocations.
The executable will link but will probably not run correctly.

You can force the large memory model with -mcmodel=large:

clang -mcmodel=large -c -O1 test.c
objdump -d test.o

0000000000000000 <foo>:
   0: 48 b8 00 00 00 00 00 movabs $0x0,%rax
   7: 00 00 00

The medium memory model is a compromise between the small and large
models. "small" objects are assumed to have addresses < 32 bits while
"large" objects can have addresses > 32 bits.

The reason -fPIC works is that -fPIC forces all addresses through
the GOT, whose entries can be > 32 bits.

clang -fPIC -c -O1 test.c
objdump -d test.o

0000000000000000 <foo>:
   0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <foo+0x7>
   7: c3 retq

There is also -fpic, which assumes the GOT itself doesn't exceed some
specified size. For x86_64 there is no such limit so the two options
are equivalent on that platform.

                 -David