.globl

I need to be able to emit .globl for the soft float routines used by mips16.
The routines are called but there is no .globl definition for them.

How can I do this?

Background:

I have a strange issue that I encountered with mips16 hard float.

Part of mips16 hard float is to emit calls to runtime routines with the same signature as usual soft float routines, except that they are implemented using mips32 code which uses real floating point instructions (mips16 processor mode has no hardware floating point instructions).

These routines have the same names as the corresponding softfloat routines, except with the additional prefix __mips16_ . So for example, __mips16_floatsidf.

For these intrinsics, (and not others), gcc mips16 emits a .globl.

Without this .globl ( which llvm does not emit), then the program will run very slow if compiled in -fPIC and linked as C++. It seems to be stuck in the loader (probably doing dynamic binding over and over again).

I'm trying to understand why this happens but for now that's not important because it just works that way.

Tia.

Reed

Hi Reed,

Still catching up on email, so hope this isn't already covered...

reed kotler <rkotler@mips.com> writes:

I have a strange issue that I encountered with mips16 hard float.

Part of mips16 hard float is to emit calls to runtime routines with the
same signature as usual soft float routines, except that they are
implemented using mips32 code which uses real floating point
instructions (mips16 processor mode has no hardware floating point
instructions).

These routines have the same names as the corresponding softfloat
routines, except with the additional prefix __mips16_ . So for example,
__mips16_floatsidf.

For these intrinsics, (and not others), gcc mips16 emits a .globl.

Without this .globl ( which llvm does not emit), then the program will
run very slow if compiled in -fPIC and linked as C++. It seems to be
stuck in the loader (probably doing dynamic binding over and over again).

This might or might not be related, but I notice that for the attached
testcase, LLVM emits:

  lui $2, %hi(_gp_disp)
  addiu $2, $2, %lo(_gp_disp)
  addiu $sp, $sp, -32
$tmp2:
  .cfi_def_cfa_offset 32
  sw $ra, 28($sp) # 4-byte Folded Spill
  sw $18, 24($sp) # 4-byte Folded Spill
  sw $17, 20($sp) # 4-byte Folded Spill
  sw $16, 16($sp) # 4-byte Folded Spill
$tmp3:
  .cfi_offset 31, -4
$tmp4:
  .cfi_offset 18, -8
$tmp5:
  .cfi_offset 17, -12
$tmp6:
  .cfi_offset 16, -16
  addu $16, $2, $25
  move $17, $4
  lw $18, %call16(foo)($16)
$BB0_1: # %loop
                                        # =>This Inner Loop Header: Depth=1
  move $25, $18
  jalr $25
  move $gp, $16
  addiu $17, $17, -1
  bnez $17, $BB0_1
  nop
# BB#2: # %exit
  lw $16, 16($sp) # 4-byte Folded Reload
  lw $17, 20($sp) # 4-byte Folded Reload
  lw $18, 24($sp) # 4-byte Folded Reload
  lw $ra, 28($sp) # 4-byte Folded Reload
  jr $ra
  addiu $sp, $sp, 32

where the %call16 is hoisted out of the loop. It really needs to be
kept inside the loop and loaded for each iteration. The same goes for
consecutive calls to the same function; the second call needs to load
%call16 separately, after the first call has finished.

As things stand, if foo() hasn't been bound by the time the function
above is entered, $18 will contain the address of the lazy binding stub,
and so the loop will try to resolve foo on every iteration. That's
usually what's happened for me when a testcase gets bogged down in
the dynamic linker.

Maybe the lack of .globl is preventing the function from being resolved
lazily, and so avoids this kind of problem?

Does removing the .globls from the GCC asm output make any difference?
Or is it just that adding them to LLVM output makes a difference?

Thanks,
Richard

foo.ll (305 Bytes)

You the man!

Nice catch.

That make total sense.

As you said, .global might prevent the symbol from participating in lazy binding but I need to investigate this issue thoroughly.

http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00975.html

Reed