What should IRObjectFile expose?

Hi Rafael,

There’s a source file in Chromium that does something like this:

target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

module asm “.text”
module asm “foo: ret”

declare void @foo()

define void @_start() {
call void @foo()
ret void
}

Currently the llvm-nm output for that looks like this:

---------------- T _start
U foo
---------------- t foo

That second entry is a bug, right? I just wanted to confirm before I go ahead and fix it, since the fix seems like it would be rather involved.

It depends on how much you want it to do I guess.

Given bugs people find when trying to use LTO I am pretty sure that
parsing global assembly to detect symbols is necessary. For another
recent report see https://llvm.org/bugs/show_bug.cgi?id=26745.

Normally having a U and T just looks silly in llvm-nm, but as you
noticed that breaks down when the definition is not marked global.

(very?) long term my idea is to add a proper symbol table to the
bitcode file. The idea is that it would have the final word on what
symbols are defined in a given .bc. In particular:

* A @foo would show up as "foo" or "_foo" or "_foo@some_windows_thing"
in the symbol table.
* There would be entries for symbols declared as inline assembly.

In that universe IRObjectFile would be a lot more like any other
object file implementation and not depend on MC :slight_smile:

The flip side is that llvm-as would be doing mangling and either we
would require asm symbol declarations in .ll or it would also parse
assembly.

Cheers,
Rafael

Thanks Rafael, that all makes sense. I think the first step would be to add
some logic to IRObjectFile to have it compute a symbol table that's good
enough to handle cases like this. Later we can perhaps consider moving some
of that logic to somewhere like the bitcode writer and make IRObjectFile
rely on that symbol table.

Thanks,

I am glad to read this as I have the exact same (vague) plan!

Thanks Rafael, that all makes sense. I think the first step would be to add
some logic to IRObjectFile to have it compute a symbol table that's good
enough to handle cases like this. Later we can perhaps consider moving some
of that logic to somewhere like the bitcode writer and make IRObjectFile
rely on that symbol table.

To fix the bug, probably.

But one question: For the time being, can you work around the bug by
making the symbol global? If I modify your test case to include

module asm ".global foo"

llvm-nm will print

---------------- T _start
                 U foo
---------------- T foo

Which looks silly, but will work since the linker will resolve "U foo"
to "T foo".

Cheers,
Rafael