Embedding LLD version to executables

I’d like to make LLD embed version information so that we can determine if an executable was created by LLD and if that’s the case which version of LLD.

ld.bfd doesn’t seem to embed any information, so we cannot tell whether an executable was linked by ld.bfd or not easily.

ld.gold embeds a string “GNU gold ” as “.note.gnu.gold-version” section contents.

I’m wondering what we should do for LLD. The gold’s way seems almost right, but I think we don’t want to use the same section name because it contains “gold” as part of the section name. However, at the same time, I don’t believe we want to create “.note.gnu.lld-version”, because if we do, all programs that determine linker version need to look at “.note.gnu.-version” for all known linkers. That’s absurd. (Also, our product is not GNU, so “.gnu” part is probably irrelevant.)

I’m leaning towards “.note.linker-version” in hope that other linkers follow it.

Any opinions?

I'd like to make LLD embed version information so that we can determine if an executable was created by LLD and if that's the case which version of LLD.

Pardon my ignorance, but what’s the motivation?

We don’t embed the clang version in every binary clang generates for instance.

ld.bfd doesn't seem to embed any information, so we cannot tell whether an executable was linked by ld.bfd or not easily.

ld.gold embeds a string "GNU gold <version>" as ".note.gnu.gold-version" section contents.

I'm wondering what we should do for LLD. The gold's way seems almost right, but I think we don't want to use the same section name because it contains "gold" as part of the section name. However, at the same time, I don't believe we want to create ".note.gnu.lld-version", because if we do, all programs that determine linker version need to look at ".note.gnu.<linker-name>-version" for all known linkers. That's absurd. (Also, our product is not GNU, so ".gnu" part is probably irrelevant.)

I'm leaning towards ".note.linker-version" in hope that other linkers follow it.

At least that would look much better than a .gnu.xxxx

From: "Mehdi Amini via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Rui Ueyama" <ruiu@google.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Tuesday, October 18, 2016 10:22:00 PM
Subject: Re: [llvm-dev] Embedding LLD version to executables

>
> I'd like to make LLD embed version information so that we can
> determine if an executable was created by LLD and if that's the
> case which version of LLD.

Pardon my ignorance, but what’s the motivation?

We don’t embed the clang version in every binary clang generates for
instance.

We do. Clang outputs an "ident" comment with its version information, and that ends up in the object files (at least on Linux).

>
> ld.bfd doesn't seem to embed any information, so we cannot tell
> whether an executable was linked by ld.bfd or not easily.
>
> ld.gold embeds a string "GNU gold <version>" as
> ".note.gnu.gold-version" section contents.
>
> I'm wondering what we should do for LLD. The gold's way seems
> almost right, but I think we don't want to use the same section
> name because it contains "gold" as part of the section name.
> However, at the same time, I don't believe we want to create
> ".note.gnu.lld-version", because if we do, all programs that
> determine linker version need to look at
> ".note.gnu.<linker-name>-version" for all known linkers. That's
> absurd. (Also, our product is not GNU, so ".gnu" part is probably
> irrelevant.)
>
> I'm leaning towards ".note.linker-version" in hope that other
> linkers follow it.

At least that would look much better than a .gnu.xxxx

I agree.

-Hal

Interesting, we don’t on Darwin:

$ echo “int main() {}” | clang -x c -o - - -S | grep clang
$ echo “int main() {}” | clang -x c -o - - -S -target x86_64-pc-linux-gnu | grep clang
.ident "Apple LLVM version 8.0.0 (clang-800.0.42.1)”

Dos it show up in the final binary? If yes, how does it behave when you mix-and-match versions in different .o?

Also I’m still not sure why doing this?

>
> I'd like to make LLD embed version information so that we can determine
if an executable was created by LLD and if that's the case which version of
LLD.

Pardon my ignorance, but what’s the motivation?

It should make trouble shooting linker issues easier. Here are a few
scenarios.

Scenario 1: you added -fuse-ld=lld to your command line, but you are
suspecting that the option is ignored. Currently, it's actually
surprisingly hard to tell if an output was created by LLD.

Scenario 2: someone reported an issue about an executable linked by LLD,
and we know that version X has a similar bug, then the first thing we want
to do is to check if the executable was created with version X.

I can see how it's useful during a switch (from a linker to another).
Generally the linker is never invoked directly so there's still a
certain amount of uncertainty (did I use the correct one or not?), in
particular when lld is used as part of a bigger build system. YMMV.

From: "Mehdi Amini" <mehdi.amini@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Rui Ueyama"
<ruiu@google.com>
Sent: Tuesday, October 18, 2016 10:38:57 PM
Subject: Re: [llvm-dev] Embedding LLD version to executables

> > From: "Mehdi Amini via llvm-dev" < llvm-dev@lists.llvm.org >
>

> > To: "Rui Ueyama" < ruiu@google.com >
>

> > Cc: "llvm-dev" < llvm-dev@lists.llvm.org >
>

> > Sent: Tuesday, October 18, 2016 10:22:00 PM
>

> > Subject: Re: [llvm-dev] Embedding LLD version to executables
>

> >
>

> > > I'd like to make LLD embed version information so that we can
> >
>

> > > determine if an executable was created by LLD and if that's the
> >
>

> > > case which version of LLD.
> >
>

> > Pardon my ignorance, but what’s the motivation?
>

> > We don’t embed the clang version in every binary clang generates
> > for
>

> > instance.
>

> We do. Clang outputs an "ident" comment with its version
> information,
> and that ends up in the object files (at least on Linux).

Interesting, we don’t on Darwin:

$ echo "int main() {}" | clang -x c -o - - -S | grep clang
$ echo "int main() {}" | clang -x c -o - - -S -target
x86_64-pc-linux-gnu | grep clang .ident "Apple LLVM version 8.0.0
(clang-800.0.42.1)”

Dos it show up in the final binary?

Yes

If yes, how does it behave when you mix-and-match versions in
different .o?

I see both versions in the final binary.

-Hal

Pretty cool! Thanks for looking.

Is there a trace in the binary the list of object files for each version? I guess that’d be more costly to store.

From: "Mehdi Amini" <mehdi.amini@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Rui Ueyama"
<ruiu@google.com>
Sent: Tuesday, October 18, 2016 11:35:02 PM
Subject: Re: [llvm-dev] Embedding LLD version to executables

> > From: "Mehdi Amini" < mehdi.amini@apple.com >
>

> > To: "Hal Finkel" < hfinkel@anl.gov >
>

> > Cc: "llvm-dev" < llvm-dev@lists.llvm.org >, "Rui Ueyama" <
> > ruiu@google.com >
>

> > Sent: Tuesday, October 18, 2016 10:38:57 PM
>

> > Subject: Re: [llvm-dev] Embedding LLD version to executables
>

> >
>

> >
>

> > > > From: "Mehdi Amini via llvm-dev" < llvm-dev@lists.llvm.org >
> > >
> >
>

> > > > To: "Rui Ueyama" < ruiu@google.com >
> > >
> >
>

> > > > Cc: "llvm-dev" < llvm-dev@lists.llvm.org >
> > >
> >
>

> > > > Sent: Tuesday, October 18, 2016 10:22:00 PM
> > >
> >
>

> > > > Subject: Re: [llvm-dev] Embedding LLD version to executables
> > >
> >
>

> > > > > On Oct 18, 2016, at 6:44 PM, Rui Ueyama via llvm-dev
> > > >
> > >
> >
>

> > > >
> > >
> >
>

> > > > > I'd like to make LLD embed version information so that we
> > > > > can
> > > >
> > >
> >
>

> > > > > determine if an executable was created by LLD and if that's
> > > > > the
> > > >
> > >
> >
>

> > > > > case which version of LLD.
> > > >
> > >
> >
>

> > > > Pardon my ignorance, but what’s the motivation?
> > >
> >
>

> > > > We don’t embed the clang version in every binary clang
> > > > generates
> > > > for
> > >
> >
>

> > > > instance.
> > >
> >
>

> > > We do. Clang outputs an "ident" comment with its version
> > > information,
> > > and that ends up in the object files (at least on Linux).
> >
>

> > Interesting, we don’t on Darwin:
>

> > $ echo "int main() {}" | clang -x c -o - - -S | grep clang
>

> > $ echo "int main() {}" | clang -x c -o - - -S -target
> > x86_64-pc-linux-gnu | grep clang .ident "Apple LLVM version 8.0.0
> > (clang-800.0.42.1)”
>

> > Dos it show up in the final binary?
>

> Yes

> > If yes, how does it behave when you mix-and-match versions in
> > different .o?
>

> I see both versions in the final binary.

Pretty cool! Thanks for looking.

Is there a trace in the binary the list of object files for each
version? I guess that’d be more costly to store.

As far as I can tell, the version strings all get concatenated in the .comment section of the resulting executable. The strings are null-terminated, and so there is a null byte separating the strings. For example, objdump -x -s shows this from my test:

Contents of section .comment:
0000 4743433a 2028474e 55292034 2e382e35 GCC: (GNU) 4.8.5
0010 20323031 35303632 33202852 65642048 20150623 (Red H
0020 61742034 2e382e35 2d342900 636c616e at 4.8.5-4).clan
0030 67207665 7273696f 6e20342e 302e3020 g version 4.0.0
0040 28687474 703a2f2f 6c6c766d 2e6f7267 (http://llvm.org
0050 2f676974 2f636c61 6e672e67 69742036 /git/clang.git 6
0060 31653732 36613362 63633664 34633262 1e726a3bcc6d4c2b
0070 35346330 30366337 61623762 31316133 54c006c7ab7b11a3
0080 33613034 32613629 20286874 74703a2f 3a042a6) (http:/
0090 2f6c6c76 6d2e6f72 672f6769 742f6c6c /llvm.org/git/ll
00a0 766d2e67 69742061 62383338 37303961 vm.git ab838709a
00b0 63356330 35623262 36393031 65366630 c5c05b2b6901e6f0
00c0 65303964 65613561 35356636 36333729 e09dea5a55f6637)
00d0 00636c61 6e672076 65727369 6f6e2033 .clang version 3
00e0 2e392e30 20286874 74703a2f 2f6c6c76 .9.0 (http://llv
00f0 6d2e6f72 672f6769 742f636c 616e672e m.org/git/clang.
0100 67697420 30373330 37663935 64356338 git 07307f95d5c8
0110 32643435 33636463 35633233 66396363 2d453cdc5c23f9cc
0120 64353364 35666637 35343236 29202868 d53d5ff75426) (h
0130 7474703a 2f2f6c6c 766d2e6f 72672f67 ttp://llvm.org/g
0140 69742f6c 6c766d2e 67697420 30333136 it/llvm.git 0316
0150 66303235 64616234 36643737 36646565 f025dab46d776dee
0160 37306433 62363935 65316130 37646235 70d3b695e1a07db5
0170 33373163 2900 371c).

-Hal

My understanding is that a note section is primarily for recording
information needed for conformance rather than communicating the
version of the tool. Although I think the distinction is somewhat
arbitrary in practice.

I believe many compilers put version information in the .comment
section already. For example the linaro gcc puts its version string
there. GCC: (Linaro GCC 5.3-2016.02) 5.3.1 20160113

Speaking from experience, In ARM's proprietary toolchain we used the
.comment section to record information including the version number
and the command line used from all the tools. Curiously enough it was
often the compiler/assembler information that had been combined into
the final .comment section that was most useful in troubleshooting.
The disadvantage of recording so much information in the .comment
sections was that the linker had to do some kind of processing of the
.comment section contents to common up the information.

To summarise I think a note section would be most useful if the format
of the version string never changed and was read and checked by other
tools. A .comment section would be more suitable for free form
information.

Peter

http://www.sco.com/developers/gabi/latest/ch4.sheader.html
http://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section

Hi,

Scenario 1: you added -fuse-ld=lld to your command line, but you are suspecting that the option is ignored. Currently, it’s actually surprisingly hard to tell if an output was created by LLD.

Can’t you use -v to see what linker the compiler driver invoked?

I am ok with both ".note.linker-version" and adding an entry to .comment.

I guess I have a small preference for .comment if this is a free format string.

Cheers,
Rafael

I’ll rework on this after the linker-synthesized input section becomes a thing. Currently I cannot easily append a string piece into an exiting .comment section.

There is one important different between the two -- strip normally has
to preserve the former, but not the latter. As such, I'd quite a bit
prefer using .comment.

Joerg

Hi,

> Scenario 1: you added -fuse-ld=lld to your command line, but you are
suspecting that the option is ignored. Currently, it's actually
surprisingly hard to tell if an output was created by LLD.

Can't you use -v to see what linker the compiler driver invoked?

Sometimes, things are buried very deep in a build system, such that
extracting the clang invocation is difficult.

-- Sean Silva

> I am ok with both ".note.linker-version" and adding an entry to .comment.

There is one important different between the two -- strip normally has
to preserve the former, but not the latter. As such, I'd quite a bit
prefer using .comment.

+1 for .comment; we should be consistent with how clang handles .ident,
since this is conceptually the same information.

In fact, during LTO, LLD should probably add an ident entry to any module
that it codegens to indicate that it was codegenerated through LLD; but
that's a separate discussion.

-- Sean Silva

For the record, this feature is implemented in r286496 (https://reviews.llvm.org/D26487). Now the LLD version string is embedded to output’s .comment section. You can use objdump to see it.

$ objdump -s -j .comment foo

foo: file format elf64-x86-64

Contents of section .comment:
0000 00474343 3a202855 62756e74 7520342e .GCC: (Ubuntu 4.
0010 382e342d 32756275 6e747531 7e31342e 8.4-2ubuntu1~14.

00c0 766d2f74 72756e6b 20323835 38343629 vm/trunk 285846)
00d0 004c696e 6b65723a 204c4c44 20342e30 .Linker: LLD 4.0
00e0 2e302028 7472756e 6b203238 36343036 .0 (trunk 286406
00f0 2900 ).

Awesome. Thanks so much for doing this.