PIC and mcmodel=large on x86 doesn't use any relocations

We're at the point in our port of OpenVMS to x86 using LLVM to make choices
on mcmodel. Given OpenVMS's history, our linker will allocate static data
(ie, .data, .bss, .plt, GOT, etc.) in the bottom 32-bits of address space
(ie, 00000000.xxxxxxxx). However, we support code anywhere in the 64-bit
address space as PIC code (we do this on Itanium today using our own
code-generator and linker). Given this requirement, I'm looking at the
support for -fPIC and -mcmodel=large. Either I'm missing something or there
is something broken (and has been for quite a while).

Using the code samples in the AMD64 ABI document, I wrote a little abi.c
program to look at the generated code. The code from gcc matches almost
exactly what is listed in the ABI document. However, LLVM seems very
different. I don't see -fPIC has having any impact with mcmodel=large.



For example,

static int src; // Lsrc: .long
static int dst; // Ldst: .long
extern int *dptr; // .extern dptr
void DataLoadAndStore() {

  // Large Memory Model code sequences from AMD64 abi

  // Figure 3.22: Position-Independent Global Data Load and Store
  // Assume that %r15 has been loaded with GOT address by
  // function prologue.

  // movabs $Lsrc@GOTOFF,%rax ; R_X86_64_GOTOFF64
  // movabs $Ldst@GOTOFF,%rdx ; R_X86_64_GOTOFF64
  // movl (%rax,%r15),%ecx
  // movl %ecx,(%rdx,%r15)
  dst = src;

  // movabs $dptr@GOT,%rax ; R_X86_64_GOT64
  // movabs $Ldst@GOTOFF,%rdx ; R_X86_64_GOTOFF64
  // movq (%rax,%r15),%rax
  // leaq (%rdx,%r15),%rcx
  // movq %rcx,(%rax)
  dptr = &dst;

  // movabs $Lsrc@GOTOFF,%rax ; R_X86_64_GOTOFF64
  // movabs $dptr@GOT,%rdx ; R_X86_64_GOT64
  // movl (%rax,%r15),%ecx
  // movq (%rdx,%r15),%rdx
  // movl %ecx,(%rdx)
  *dptr = src;

generates (using 'clang -c -S -fPIC -mcmodel=large'):

DataLoadAndStore: # @DataLoadAndStore
# BB#0:
        pushq %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset %rbp, -16
        movq %rsp, %rbp
        .cfi_def_cfa_register %rbp
        movabsq $src, %rax
        movl (%rax), %ecx
        movabsq $dst, %rdx
        movl %ecx, (%rdx)
        movabsq $dptr, %rsi
        movq %rdx, (%rsi)
        movl (%rax), %ecx
        movl %ecx, (%rdx)
        movl (%rax), %ecx
        movq (%rsi), %rax
        movl %ecx, (%rax)
        popq %rbp

Where are the GOT accesses?

Where is the computation of the GOT address? Since it is more than 2GB away
from the code,
the ABI says to generate:

  // ABI document suggests:
  // pushq %r15
  // leaq 1f(%rip),%r11
  // 1:
  // movabs $_GLOBAL_OFFSET_TABLE_,%r15
  // leaq (%r11,%r15),%r15
  // gcc generates:
  // .L2:
  // leaq .L2(%rip), %rax
  // movabsq $_GLOBAL_OFFSET_TABLE_-.L2, %r11
  // addq %r11, %rax

I don't think we support the large code model on amd64 for anything but
JIT use. I'm generally not sure how much point there really is. Do you
actually have individual share objects larger than 2GB? It's not a
problem to have multiple DSOs that span much more than 2GB, but doing
that inside a single object is very expensive.