Linking the FreeBSD base system with lld -- status update

As I've mentioned before[1] I've been regularly attempting to build
the FreeBSD/amd64 base system with lld, in order to keep track of
progress and identify issues on an ongoing basis. As of last November
a 'buildword' (i.e., userland build) ran to completion, with several
workarounds applied to the FreeBSD base system. However, the result
did not actually work.

I'm pleased to report that I can now build a runnable FreeBSD system
using lld as the linker (for buildworld), with a few workarounds and
work-in-progress patches. I have not yet extensively tested the result
but it is possible to login to the resulting system, and basic sanity
tests I've tried are successful. Note that the kernel is still linked
with ld.bfd.

I have tracking PR[2] 23214 open for lld issues affecting the FreeBSD
base system use case, and I'll briefly summarize the outstanding
issues. Unless otherwise specified my workaround is to use ld.bfd to
link the affected program.

1. Symbol version support (PR 23231)

FreeBSD relies on symbol version support in libc, libthr, and rtld for
backwards compatibility and symbol visibility control.

For testing I'm willing to build a system without symbol versioning.
FreeBSD has a WITHOUT_SYMVER knob intended to enable this. It is
currently broken, and I have patches[3][4] in review to fix it.

I also added a "--hide-symbol" command to elfcopy, ELF Tool Chain's
objcopy-equivalent. I use this to change the visibility of all symbols
in libc_pic.a to hidden so that they do not end up incorrectly
exported by ld-elf.so.1.

2. Linker script expression support (PR 26731)

The FreeBSD kernel linker scripts contain expressions not supported by
lld - for example, ". = ALIGN(. != 0 ? 64 / 8 : 1);". I'm using ld.bfd
to link the kernel for now.

3. Library search paths

In FreeBSD /usr/lib/libc.so is a linker script that contains "GROUP (
libc.so.7 libc_nonshared.a libssp_nonshared.a )". ld.bfd includes a
built-in /lib search path and finds /lib/libc.so.7 there. lld relies
only on the -L paths specified on the command line, and cannot locate
libc.so.7. As a workaround I've changed /usr/lib/libc.so to include
the full path.

4. -N/--omagic option

-N makes the text and data sections RW and does not page-align data.
It is used by boot loader components.

5. -dc option

-dc assigns space to common symbols when producing relocatable output
(-r). It is used by the /rescue build, which is a single binary
assembled from a collection of individual tools (sh, ls, fsck, ...)

6. -Y option

-Y adds a path to the default library search path. It is used by the
lib32 build, which provides i386 builds of the system libraries for
compatibility with i386 applications.

7. Use of -r to convert a binary file into an ELF object

A tool for loading firmware into a wireless USB device includes a
built-in copy of the firmware image, and the image is converted to an
ELF file using ld -r.

The first two issues above are significant; the others can be
addressed relatively easily but have simple workarounds and aren't
holding up further progress.

Thanks to all who have contributed to LLD's impressive progress over
the last three months or so!

[1] http://lists.llvm.org/pipermail/llvm-dev/2015-November/092572.html
[2] http://llvm.org/pr23214
[3] ⚙ D5571 libc/{i386,amd64}: Do not export .cerror when building WITHOUT_SYMVER
[4] ⚙ D5572 RFC: Fix WITHOUT_SYMVER buildworld

Ed,

Thank you for the update! That is very informative, and it’s very existing to hear that LLD can now build a runnable FreeBSD system. It’s a great milestone. I personally want to build the entire FreeBSD system including the kernel within six months or at least in this year, and I believe it is a reasonable target based on the progress we’ve made so far in the last six months to the ELF linker.

3. Library search paths

In FreeBSD /usr/lib/libc.so is a linker script that contains "GROUP (
libc.so.7 libc_nonshared.a libssp_nonshared.a )". ld.bfd includes a
built-in /lib search path and finds /lib/libc.so.7 there. lld relies
only on the -L paths specified on the command line, and cannot locate
libc.so.7. As a workaround I've changed /usr/lib/libc.so to include
the full path.

"/lib" and other system-default directories are added to the search path by
the default linker script in GNU ld.

I don't know how gold works. Does gold have the same issue as LLD?

I haven't tried a buildworld with gold, but a quick search suggests
gold does have /lib and /usr/lib paths baked-in. I linked a hello
world with gold, and it does not show the problem mentioned above.

4. -N/--omagic option

-N makes the text and data sections RW and does not page-align data.
It is used by boot loader components.

It is probably better to update FreeBSD boot loader rather than adding this
option (which is probably for '80s Unix compatibles).

Indeed. For those boot components that do need a single RW segment I
think it would be better accomplished with a linker script anyway.

5. -dc option

-dc assigns space to common symbols when producing relocatable output
(-r). It is used by the /rescue build, which is a single binary
assembled from a collection of individual tools (sh, ls, fsck, ...)

Why does it need -dc option?

The "crunchide" tool used in building the rescue binary segfaults when
operating on an object with common symbols, and a comment in the
source claims:

This program relies on the use of the linker's -dc flag to actually
put global bss data into the file's bss segment (rather than leaving
it as undefined "common" data).

but that still does not explain why it's necessary. I believe it could
be modified to avoid the requirement for now.

However, there is a todo item in the source: arrange that all the BSS
segments start at the same address, so that the final crunched binary
BSS size is the max of all the component programs' BSS sizes, rather
than their sum. It seems this would require -dc along with some linker
script magic.

6. -Y option

-Y adds a path to the default library search path. It is used by the
lib32 build, which provides i386 builds of the system libraries for
compatibility with i386 applications.

This option seems like an alias to -L. Why are they still using -Y?

It's not exactly an alias: -Y adds the path to the list of default
search paths, which are searched after any -L paths. I suspect it's
done this way to simplify setting up the correct path ordering despite
building up the compiler invocation through several levels of make
variables. Anyway, I believe we can address the requirement using only
-L, and this one is a low priority.

7. Use of -r to convert a binary file into an ELF object

A tool for loading firmware into a wireless USB device includes a
built-in copy of the firmware image, and the image is converted to an
ELF file using ld -r.

After George's commit to support -r, does it work now?

George's work addressed the other uses of -r in the FreeBSD tree, but
not this one.

The invocation here is:
ld -b binary -d -warn-common -r -d -o ar5523.o ar5523.bin
the resulting ar5523.o is then linked into the executable, and
accessed using the _binary_ar5523_bin_start and _binary_ar5523_bin_end
symbols created by the linker.

I don't think this is a reasonable case to support in lld. In FreeBSD
we can achieve the same result with something like:
objcopy -I binary -O elf64-x86-64 ar5523.bin ${.TARGET}
or just create a .c source file with the firmware contents as an array.

5. -dc option

-dc assigns space to common symbols when producing relocatable output
(-r). It is used by the /rescue build, which is a single binary
assembled from a collection of individual tools (sh, ls, fsck, ...)

Why does it need -dc option?

The "crunchide" tool used in building the rescue binary segfaults when
operating on an object with common symbols, and a comment in the
source claims:

This program relies on the use of the linker's -dc flag to actually
put global bss data into the file's bss segment (rather than leaving
it as undefined "common" data).

but that still does not explain why it's necessary. I believe it could
be modified to avoid the requirement for now.

However, there is a todo item in the source: arrange that all the BSS
segments start at the same address, so that the final crunched binary
BSS size is the max of all the component programs' BSS sizes, rather
than their sum. It seems this would require -dc along with some linker
script magic.

What are the input .o files? If you are compiling just for this you
should be able to use -fno-common.

Cheers,
Rafael

>
>> 3. Library search paths
>>
>> In FreeBSD /usr/lib/libc.so is a linker script that contains "GROUP (
>> libc.so.7 libc_nonshared.a libssp_nonshared.a )". ld.bfd includes a
>> built-in /lib search path and finds /lib/libc.so.7 there. lld relies
>> only on the -L paths specified on the command line, and cannot locate
>> libc.so.7. As a workaround I've changed /usr/lib/libc.so to include
>> the full path.
>
> "/lib" and other system-default directories are added to the search path
by
> the default linker script in GNU ld.
>
> I don't know how gold works. Does gold have the same issue as LLD?

I haven't tried a buildworld with gold, but a quick search suggests
gold does have /lib and /usr/lib paths baked-in. I linked a hello
world with gold, and it does not show the problem mentioned above.

This should be addressed in r262910 that I've just submitted.

>> 4. -N/--omagic option
>>
>> -N makes the text and data sections RW and does not page-align data.
>> It is used by boot loader components.
>
> It is probably better to update FreeBSD boot loader rather than adding
this
> option (which is probably for '80s Unix compatibles).

Indeed. For those boot components that do need a single RW segment I
think it would be better accomplished with a linker script anyway.

>> 5. -dc option
>>
>> -dc assigns space to common symbols when producing relocatable output
>> (-r). It is used by the /rescue build, which is a single binary
>> assembled from a collection of individual tools (sh, ls, fsck, ...)
>
>
> Why does it need -dc option?

The "crunchide" tool used in building the rescue binary segfaults when
operating on an object with common symbols, and a comment in the
source claims:

This program relies on the use of the linker's -dc flag to actually
put global bss data into the file's bss segment (rather than leaving
it as undefined "common" data).

but that still does not explain why it's necessary. I believe it could
be modified to avoid the requirement for now.

However, there is a todo item in the source: arrange that all the BSS
segments start at the same address, so that the final crunched binary
BSS size is the max of all the component programs' BSS sizes, rather
than their sum. It seems this would require -dc along with some linker
script magic.

>> 6. -Y option
>>
>> -Y adds a path to the default library search path. It is used by the
>> lib32 build, which provides i386 builds of the system libraries for
>> compatibility with i386 applications.
>
> This option seems like an alias to -L. Why are they still using -Y?

It's not exactly an alias: -Y adds the path to the list of default
search paths, which are searched after any -L paths. I suspect it's
done this way to simplify setting up the correct path ordering despite
building up the compiler invocation through several levels of make
variables. Anyway, I believe we can address the requirement using only
-L, and this one is a low priority.

Agreed. This should probably be addressed by updating the program that
needs -Y. GNU ld's manual is terse about this option and says that this is
for Solaris compatibility. It does make much less sense for us to add this
option for Solaris-compatibility-through-a-GNU-tool.