PIC preferred too strongly, even at CodeModel::Large?

Hi,

We were just debugging a sporadic crash the other day, when we noticed
that RIP-relative addressing was being used in a JumpTable, even when
code and data were well over 4G apart. This is confusing, because we
picked CodeModel::Large, and expected this to be taken care of. Isn't
that what gcc would do given a Large CodeModel?

The default Relocation Model, Reloc::Default, folds into Reloc::PIC_
in most cases. However, if we explicitly specify Reloc::Static, our
program should work on all platforms except Darwin:

  // If we are on Darwin, disallow static relocation model in X86-64 mode, since
  // the Mach-O file format doesn't support it.
  if (RM == Reloc::Static && TT.isOSDarwin() && is64Bit)
    RM = Reloc::PIC_;

(from X86MCTargetDesc.cpp)

First, is the Mach-O limitation still there? Second, is it okay to
silently fold into Reloc::PIC_ in this case and leave the user with
sporadic crashes? Finally, can we bypass this limitation by simply
appending "-elf" to our Target Triple, forcing ELF generation on all
three platforms?

Ram

Hi,

We were just debugging a sporadic crash the other day, when we noticed
that RIP-relative addressing was being used in a JumpTable, even when
code and data were well over 4G apart. This is confusing, because we
picked CodeModel::Large, and expected this to be taken care of. Isn't
that what gcc would do given a Large CodeModel?

This sounds like a bug, but I can't reproduce it. Testcase?

The default Relocation Model, Reloc::Default, folds into Reloc::PIC_
in most cases. However, if we explicitly specify Reloc::Static, our
program should work on all platforms except Darwin:

  // If we are on Darwin, disallow static relocation model in X86-64 mode,
since
  // the Mach-O file format doesn't support it.
  if (RM == Reloc::Static && TT.isOSDarwin() && is64Bit)
    RM = Reloc::PIC_;

(from X86MCTargetDesc.cpp)

First, is the Mach-O limitation still there?

Yes, the Mach-O limitation still exists as far as I know.

Second, is it okay to

silently fold into Reloc::PIC_ in this case and leave the user with
sporadic crashes?

Large code model and PIC should be compatible.

Finally, can we bypass this limitation by simply

appending "-elf" to our Target Triple, forcing ELF generation on all
three platforms?

What exactly are you planning to do with an ELF object file on OS X?

-Eli

Eli Friedman wrote:

We were just debugging a sporadic crash the other day, when we noticed
that RIP-relative addressing was being used in a JumpTable, even when
code and data were well over 4G apart. This is confusing, because we
picked CodeModel::Large, and expected this to be taken care of. Isn't
that what gcc would do given a Large CodeModel?

This sounds like a bug, but I can't reproduce it. Testcase?

I've attached an example with a standard switch instruction, compiled
with `llc -code-model=large`. It produces:

movslq (%rax,%rdi,4), %rsi
addq %rax, %rsi
jmpq *%rsi

Second, is it okay to
silently fold into Reloc::PIC_ in this case and leave the user with
sporadic crashes?

Large code model and PIC should be compatible.

Technically, yes. My understanding is that, instead of a cheap
implicit $rip offset, you have to materialize the value in a register
and do the `add`. I don't think LLVM is doing it correctly.

Finally, can we bypass this limitation by simply
appending "-elf" to our Target Triple, forcing ELF generation on all
three platforms?

What exactly are you planning to do with an ELF object file on OS X?

I forgot to mention: we're JIT'ting. In any case, isOSDarwin() isn't
influenced by an extra "-elf" to the target triple.

Ram

jumptable_bug.ll (633 Bytes)

jumptable_bug.s (1.03 KB)

Eli Friedman wrote:
>> We were just debugging a sporadic crash the other day, when we noticed
>> that RIP-relative addressing was being used in a JumpTable, even when
>> code and data were well over 4G apart. This is confusing, because we
>> picked CodeModel::Large, and expected this to be taken care of. Isn't
>> that what gcc would do given a Large CodeModel?
>
>
> This sounds like a bug, but I can't reproduce it. Testcase?

I've attached an example with a standard switch instruction, compiled
with `llc -code-model=large`. It produces:

movslq (%rax,%rdi,4), %rsi
addq %rax, %rsi
jmpq *%rsi

Ah; I guess I actually should have spotted that.

See
https://github.com/llvm-mirror/llvm/blob/f79c57a412cf8ba35884c1d4e011e07baad334d9/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L281
. I think you can just force it to use EK_BlockAddress for the large code
model.

>> Second, is it okay to
>> silently fold into Reloc::PIC_ in this case and leave the user with
>> sporadic crashes?
>
>
> Large code model and PIC should be compatible.

Technically, yes. My understanding is that, instead of a cheap
implicit $rip offset, you have to materialize the value in a register
and do the `add`. I don't think LLVM is doing it correctly.

>> Finally, can we bypass this limitation by simply
>> appending "-elf" to our Target Triple, forcing ELF generation on all
>> three platforms?
>
>
> What exactly are you planning to do with an ELF object file on OS X?

I forgot to mention: we're JIT'ting. In any case, isOSDarwin() isn't
influenced by an extra "-elf" to the target triple.

Ah... I think it might have been possible to use ELF on OS X with MCJIT at
some point, but it's not really supported in any case.

-Eli

Hi All,

Ah… I think it might have been possible to use ELF on OS X with MCJIT at some point, but it’s not really supported in any case.

ELF on Darwin in the JIT may work, especially for simple cases, but definitely isn’t supported.

  • Lang.

Just to avoid any confusion, your code is well within 4GB, but the
location of the jump table can be more than 4GB away? I'm asking as one
of the open issues for me in the PowerPC target is supporting
function-relative jump tables, so that the jump table needs only 32bit
entries as long as a single function is not longer than 4GB.

Joerg

Joerg Sonnenberger wrote:

Just to avoid any confusion, your code is well within 4GB, but the
location of the jump table can be more than 4GB away? I'm asking as one
of the open issues for me in the PowerPC target is supporting
function-relative jump tables, so that the jump table needs only 32bit
entries as long as a single function is not longer than 4GB.

Yes, that's right.

Ram