LLD symbol types for defsym

I noticed that LLD doesn’t preserve the symbol type for a defsym directive. For example:

$ cat f.c
void f() {}

$ clang -c f.c
$ ld.lld -shared --defsym=g=f f.o
$ objdump -T a.out
DYNAMIC SYMBOL TABLE:
00000000000012a0 g DF .text 0000000000000006 f
00000000000012a0 g D .text 0000000000000000 g

f is marked as a function symbol, but g is not.

I recognize this is hard to do in the general case, where you can have e.g. arithmetic being performed in the defsym, but in this particular case, it would seem desirable for the alias symbol to have the same type for the target. My question is if this will end up making any difference in practice. The case I'm concerned about in particular is ARM-Thumb interworking, where I believe there might be some logic that's based on symbol types. Is there any possibility that we'll have issues with that logic because of the alias not being marked as a function symbol?

I recognize this is hard to do in the general case, where you can have e.g. arithmetic being performed in the defsym, but in this particular case, it would seem desirable for the alias symbol to have the same type for the target.
My question is if this will end up making any difference in practice. The case I'm concerned about in particular is ARM-Thumb interworking, where I believe there might be some logic that's based on symbol types.
Is there any possibility that we'll have issues with that logic because of the alias not being marked as a function symbol?

Thanks for pointing that out. There can be a problem on Arm as no interworking will be performed for symbols that are not STT_FUNC. Given that ld.bfd does preserve the symbol type for aliases I think this is worth raising a PR.

To extend your example with:
$ cat h.c
extern void f();
extern void g();

void h() { f(); g(); }

$ clang --target=armv7a-none-eabi -c f.c
$ clang --target=armv7a-none-eabi -c h.c -mthumb
$ ld.lld f.o h.o --defsym g=f # No --shared to prevent a PLT entry.
$ objdump -d a.out

000200e4 <f>:
   200e4: e12fff1e bx lr

000200e8 <h>:
   200e8: b580 push {r7, lr}
   200ea: 466f mov r7, sp
   200ec: f7ff effa blx 200e4 <f>
   200f0: f7ff fff8 bl 200e4 <f>
   200f4: bd80 pop {r7, pc}

The blx to f() is correct as a state change is required. The bl to f() will likely crash the program.

ld.bfd correctly marks g as STT_FUNC so it gets the state change correct for both calls.
00008000 <f>:
    8000: e12fff1e bx lr

00008004 <h>:
    8004: b580 push {r7, lr}
    8006: 466f mov r7, sp
    8008: f7ff effa blx 8000 <f>
    800c: f7ff eff8 blx 8000 <f>
    8010: bd80 pop {r7, pc}

Thanks! Filed https://llvm.org/PR46790

    >I recognize this is hard to do in the general case, where you can have e.g. arithmetic being performed in the defsym, but in this particular case, it would seem desirable for the alias symbol to have the same type for the target.
    > My question is if this will end up making any difference in practice. The case I'm concerned about in particular is ARM-Thumb interworking, where I believe there might be some logic that's based on symbol types.
    > Is there any possibility that we'll have issues with that logic because of the alias not being marked as a function symbol?

    Thanks for pointing that out. There can be a problem on Arm as no interworking will be performed for symbols that are not STT_FUNC. Given that ld.bfd does preserve the symbol type for aliases I think this is worth raising a PR.

    To extend your example with:
    $ cat h.c
    extern void f();
    extern void g();

    void h() { f(); g(); }

    $ clang --target=armv7a-none-eabi -c f.c
    $ clang --target=armv7a-none-eabi -c h.c -mthumb
    $ ld.lld f.o h.o --defsym g=f # No --shared to prevent a PLT entry.
    $ objdump -d a.out

    000200e4 <f>:
       200e4: e12fff1e bx lr

    000200e8 <h>:
       200e8: b580 push {r7, lr}
       200ea: 466f mov r7, sp
       200ec: f7ff effa blx 200e4 <f>
       200f0: f7ff fff8 bl 200e4 <f>
       200f4: bd80 pop {r7, pc}

    The blx to f() is correct as a state change is required. The bl to f() will likely crash the program.

    ld.bfd correctly marks g as STT_FUNC so it gets the state change correct for both calls.
    00008000 <f>:
        8000: e12fff1e bx lr

    00008004 <h>:
        8004: b580 push {r7, lr}
        8006: 466f mov r7, sp
        8008: f7ff effa blx 8000 <f>
        800c: f7ff eff8 blx 8000 <f>
        8010: bd80 pop {r7, pc}