Linker Option support for ELF

Hello all,

There was some interest from a number of a few people about adding support for embedded linker options to ELF. This would be an extension that requires linker support to actually work, but has significant prior art with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the necessary information into the object file. In order to keep this focused on that, this thread is specifically for the backend, we are not discussing how to get the information to the backend here at all, but assuming that the information is present in the same LLVM IR encoding (llvm.linker-options module metadata string).

In order to have compatibility with existing linkers, I am suggesting the use of an ELF note. These are implicitly dropped by the linker so we can be certain that the options will not end up in the final binary even if the extension is not supported. The payload would be a 4-byte version identifier (to allow future enhancements) and a null-terminated string of options.

This allows for the backend to be entirely oblivious to the data as the other backends and allows for extensions in the future without having to teach the backend anything about the new functionality (again, something which both of the other file formats support).

As an example of how this can be useful, it would help with Swift support on Linux where currently the linker options are pushed into a custom section, and a secondary program is used to extract the options from this section prior to the linker being invoked. This adds a lot of complexity to the driver as well as additional tools being invoked in the build chain slowing down the build.

Saleem Abdulrasool via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hello all,

There was some interest from a number of a few people about adding support
for embedded linker options to ELF. This would be an extension that
requires linker support to actually work, but has significant prior art
with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the
necessary information into the object file. In order to keep this focused
on that, this thread is specifically for the *backend*, we are not
discussing how to get the information to the backend here at all, but
assuming that the information is present in the same LLVM IR encoding
(llvm.linker-options module metadata string).

In order to have compatibility with existing linkers, I am suggesting the
use of an ELF note. These are implicitly dropped by the linker so we can
be certain that the options will not end up in the final binary even if the
extension is not supported.

So far LGTM.

The payload would be a 4-byte version
identifier (to allow future enhancements) and a null-terminated string of
options.

This allows for the backend to be entirely oblivious to the data as the
other backends and allows for extensions in the future without having to
teach the backend anything about the new functionality (again, something
which both of the other file formats support).

That is the part I have issues with.

A linker has a lot of command line options, we should not make them part
of the object format. What happens if a file has a -strip-all,
--gc-sections or -pie?

I think the supported features should be explicitly documented and be
encoded a option-number,argument pair.

The format should also be discussed at least with the gnu linker
maintainers and ideally on generic-abi@googlegroups.com.

Cheers,
Rafael

Saleem Abdulrasool via llvm-dev <llvm-dev@lists.llvm.org> writes:

> Hello all,
>
> There was some interest from a number of a few people about adding
support
> for embedded linker options to ELF. This would be an extension that
> requires linker support to actually work, but has significant prior art
> with PE/COFF as well as MachO both having support for this.
>
> The desire here is to actually add support to LLVM to pass along the
> necessary information into the object file. In order to keep this
focused
> on that, this thread is specifically for the *backend*, we are not
> discussing how to get the information to the backend here at all, but
> assuming that the information is present in the same LLVM IR encoding
> (llvm.linker-options module metadata string).
>
> In order to have compatibility with existing linkers, I am suggesting the
> use of an ELF note. These are implicitly dropped by the linker so we can
> be certain that the options will not end up in the final binary even if
the
> extension is not supported.

So far LGTM.

> The payload would be a 4-byte version
> identifier (to allow future enhancements) and a null-terminated string of
> options.
>
> This allows for the backend to be entirely oblivious to the data as the
> other backends and allows for extensions in the future without having to
> teach the backend anything about the new functionality (again, something
> which both of the other file formats support).

That is the part I have issues with.

A linker has a lot of command line options, we should not make them part
of the object format. What happens if a file has a -strip-all,
--gc-sections or -pie?

I think the supported features should be explicitly documented and be
encoded a option-number,argument pair.

So you are suggesting that the backend take the opaque blob, peer through
it, map it to something else and then encode that? This means that every
single new flag (also consider vendor extensions and non-GNU linkers) would
need their own mapping and would need additional support for every single
variant of an option. This makes adding support for a flag extremely
expensive IMO. Filtering options or only supporting a subset of them in a
particular linker is still viable in the approach I have proposed.
Emitting a warning on an unsupported flag and dropping it would be a way to
handle this without adding the complexity in the backend and frontend and
so tightly coupling all the various pieces of the toolchain.

The format should also be discussed at least with the gnu linker
maintainers and ideally on generic-abi@googlegroups.com.

Sure, sending that along to them is perfectly reasonable.

Cheers,

Saleem Abdulrasool <compnerd@compnerd.org> writes:

So you are suggesting that the backend take the opaque blob, peer through
it, map it to something else and then encode that?

The llvm backend? No, it should probably be done by whatever produced
the IR. If viewing this a part of the file format, having the FE create
a metadata asking for (add_lib_enum_value, "foo.a") is not too different
than than asking for a particular visibility or dll import.

This means that every
single new flag (also consider vendor extensions and non-GNU linkers) would
need their own mapping and would need additional support for every single
variant of an option. This makes adding support for a flag extremely
expensive IMO.

Is some sense that is the idea: forcing each feature to be documented
and discussed. What feature other than "add that lib" do you have in mind?

Cheers,
Rafael

Hello all,

There was some interest from a number of a few people about adding support
for embedded linker options to ELF. This would be an extension that
requires linker support to actually work, but has significant prior art
with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the
necessary information into the object file. In order to keep this focused
on that, this thread is specifically for the *backend*, we are not
discussing how to get the information to the backend here at all, but
assuming that the information is present in the same LLVM IR encoding
(llvm.linker-options module metadata string).

Do we have agreement about this assumption? I think one main point of
disagreement is how open-ended we want things, and llvm.linker.options
implies, at least at first glance, a very open-ended approach. Can you
describe how open-ended llvm.linker.options is, in fact/in practice? I.e.
what subset of linker options do the COFF/MachO targets actually support?

In order to have compatibility with existing linkers, I am suggesting the
use of an ELF note. These are implicitly dropped by the linker so we can
be certain that the options will not end up in the final binary even if the
extension is not supported. The payload would be a 4-byte version
identifier (to allow future enhancements) and a null-terminated string of
options.

One thing that is on my mind is that the fact that llvm.linker.options is
metadata means it can be dropped. So, playing devil's advocate here, it is
correct for ELF targets to just ignore it (as they currently do AFAIK). if
your intended use case does not actually behave correctly with the
llvm.linker.options dropped, then that suggests that something is fishy.

I guess the fact that llvm.linker.options is currently used for COFF/MachO
suggests that it is not dropped in practice in the situations that matter,
but it does provide some evidence that we may want to move away from
llvm.linker.options. For example, we could let frontends parse legacy
open-ended linker pragmas and emit a new IR format with constrained
semantics.

(also, do you know if the COFF/MachO representation for the linker options
in the .o file can/cannot be dropped? AFAIK, SHT_NOTE can be ignored)

I just find it very fishy to consider linker options as advisory.
Presumably if a user is passing them, then they are required for
correctness.

-- Sean Silva

We should do this. ELF is the odd duck out that lacks this capability.

I agree with Rafael we should have a whitelist of flags that we support, but I’d rather leave the syntax as more or less just a response file. That’s basically what’s implemented for COFF.

Hello all,

There was some interest from a number of a few people about adding
support for embedded linker options to ELF. This would be an extension
that requires linker support to actually work, but has significant prior
art with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the
necessary information into the object file. In order to keep this focused
on that, this thread is specifically for the *backend*, we are not
discussing how to get the information to the backend here at all, but
assuming that the information is present in the same LLVM IR encoding
(llvm.linker-options module metadata string).

Do we have agreement about this assumption? I think one main point of
disagreement is how open-ended we want things, and llvm.linker.options
implies, at least at first glance, a very open-ended approach. Can you
describe how open-ended llvm.linker.options is, in fact/in practice? I.e.
what subset of linker options do the COFF/MachO targets actually support?

This is something which is already well established. I'm not proposing to
change that. llvm.linker.options is something which is a string passed by
the frontend. There is nothing that is interpreted by either side. This
is not new metadata that I am introducing, it is the existing
infrastructure. Now, if you like, a newer more restrictive mechanism could
be introduced, but that would be beyond the scope of this change IMO. Both
of those do not have any restrictions AFAIK; and any control of what they
permit is from the *frontend* side.

In order to have compatibility with existing linkers, I am suggesting the
use of an ELF note. These are implicitly dropped by the linker so we can
be certain that the options will not end up in the final binary even if the
extension is not supported. The payload would be a 4-byte version
identifier (to allow future enhancements) and a null-terminated string of
options.

One thing that is on my mind is that the fact that llvm.linker.options is
metadata means it can be dropped. So, playing devil's advocate here, it is
correct for ELF targets to just ignore it (as they currently do AFAIK). if
your intended use case does not actually behave correctly with the
llvm.linker.options dropped, then that suggests that something is fishy.

Yeah, the metadata can be dropped. If the metadata is dropped, then
nothing gets embedded into the object file. The changes being discussed
would be embedding additional metadata into the object file. A separate
change would be needed to actually process that in the linker as well as
one to the frontend to actually emit the metadata.

I guess the fact that llvm.linker.options is currently used for COFF/MachO
suggests that it is not dropped in practice in the situations that matter,
but it does provide some evidence that we may want to move away from
llvm.linker.options. For example, we could let frontends parse legacy
open-ended linker pragmas and emit a new IR format with constrained
semantics.

Right, this is inline with what I was suggesting as a second mechanism for
this that could be designed. But, again, that is beyond the scope of the
changes that I am proposing.

(also, do you know if the COFF/MachO representation for the linker
options in the .o file can/cannot be dropped? AFAIK, SHT_NOTE can be
ignored)

Well, the content is only in the object files. The final linked binary
does not contain it (which is why Im abusing the SHT_NOTE). Do you mean
does the linker ignore it? Well, if the linker doesn't support the
feature, it would. In PE/COFF, it is encoded as a special section
(.drectve). In fact, GNU ld doesn't have as complete of an implementation
as lld/link and does ignore a bunch of options. MachO has a special load
command (LC_LINKOPT) that encodes this. But, in both cases, it requires
the linker to interpret it, and if it does not, then the same behavior
would be observed.

I just find it very fishy to consider linker options as advisory.
Presumably if a user is passing them, then they are required for
correctness.

Sure, but that failure would generally be pretty obvious: linking would
fail.

We should do this. ELF is the odd duck out that lacks this capability.

Exactly, and the amount of work that swift goes through to accommodate this
is silly.

I agree with Rafael we should have a whitelist of flags that we support,
but I'd rather leave the syntax as more or less just a response file.
That's basically what's implemented for COFF.

This is precisely what I have been proposing: we pass through everything as
if it as a response file, and the linker can whitelist the flags.

Saleem Abdulrasool <compnerd@compnerd.org> writes:

> So you are suggesting that the backend take the opaque blob, peer through
> it, map it to something else and then encode that?

The llvm backend? No, it should probably be done by whatever produced
the IR. If viewing this a part of the file format, having the FE create
a metadata asking for (add_lib_enum_value, "foo.a") is not too different
than than asking for a particular visibility or dll import.

Okay, so, that is a different conversation.. Im discussing purely the
encoding from LLVM IR -> object file. The frontend changes would still
need to be made, and those can be discussed at that point. Im suggesting
that we treat the current behavior similar to the PE/COFF and MachO
mechanism: it is effectively a response file.

> This means that every
> single new flag (also consider vendor extensions and non-GNU linkers)
would
> need their own mapping and would need additional support for every single
> variant of an option. This makes adding support for a flag extremely
> expensive IMO.

Is some sense that is the idea: forcing each feature to be documented
and discussed. What feature other than "add that lib" do you have in mind?

What about trying to do something like the "-framework" option (thinking
along the various things that swift has to deal with, which don't exist
yet) or the embedded AST modules (for swift debugging) or additional search
paths.

Hello all,

There was some interest from a number of a few people about adding support
for embedded linker options to ELF. This would be an extension that
requires linker support to actually work, but has significant prior art
with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the
necessary information into the object file. In order to keep this focused
on that, this thread is specifically for the *backend*, we are not
discussing how to get the information to the backend here at all, but
assuming that the information is present in the same LLVM IR encoding
(llvm.linker-options module metadata string).

In order to have compatibility with existing linkers, I am suggesting the
use of an ELF note. These are implicitly dropped by the linker so we can
be certain that the options will not end up in the final binary even if the
extension is not supported. The payload would be a 4-byte version
identifier (to allow future enhancements) and a null-terminated string of
options.

This allows for the backend to be entirely oblivious to the data as the
other backends and allows for extensions in the future without having to
teach the backend anything about the new functionality (again, something
which both of the other file formats support).

As an example of how this can be useful, it would help with Swift support
on Linux where currently the linker options are pushed into a custom
section, and a secondary program is used to extract the options from this
section prior to the linker being invoked. This adds a lot of complexity
to the driver as well as additional tools being invoked in the build chain
slowing down the build.

I realized that I had missed the link to the change that I had put up due
to the original conversation. It is available at
⚙ D40849 CodeGen: support an extension to pass linker options on ELF for those who are interested.

Hello all,

There was some interest from a number of a few people about adding
support for embedded linker options to ELF. This would be an extension
that requires linker support to actually work, but has significant prior
art with PE/COFF as well as MachO both having support for this.

The desire here is to actually add support to LLVM to pass along the
necessary information into the object file. In order to keep this focused
on that, this thread is specifically for the *backend*, we are not
discussing how to get the information to the backend here at all, but
assuming that the information is present in the same LLVM IR encoding
(llvm.linker-options module metadata string).

Do we have agreement about this assumption? I think one main point of
disagreement is how open-ended we want things, and llvm.linker.options
implies, at least at first glance, a very open-ended approach. Can you
describe how open-ended llvm.linker.options is, in fact/in practice? I.e.
what subset of linker options do the COFF/MachO targets actually support?

This is something which is already well established. I'm not proposing to
change that. llvm.linker.options is something which is a string passed by
the frontend. There is nothing that is interpreted by either side. This
is not new metadata that I am introducing, it is the existing
infrastructure. Now, if you like, a newer more restrictive mechanism could
be introduced, but that would be beyond the scope of this change IMO. Both
of those do not have any restrictions AFAIK; and any control of what they
permit is from the *frontend* side.

I agree that introducing a new format with constrained semantics is
probably beyond the scope of what you're trying to do now.

Personally, I think that if we can get ELF linker developers to support the
open-ended format similar to COFF/MachO then we should move forward with
this. Otherwise we are just implementing dead code.

I'm especially interested in getting Rui's SGTM on this since his
experience with COFF should be very informative.

-- Sean Silva

Thank you for starting the discussion thread.

In general I’m in favor of the proposal. Defining a generic way to convey some information from the compiler to the linker is useful, and it looks like it is just a historical reason that the ELF lacks the feature at the moment.

This is a scenario in which the feature is useful: when you include math.h, a compiler (which is driven by some pragma) could added -lm to the note section so that a linker automatically links libm.

I think I’m also in favor of the format, which is essentially runs of null-terminated strings (*1) that are basically opaque to compilers.

However, you should define as a spec what options are allowed and what their semantics are. We should not accept arbitrary linker options because semantics of some linker options cannot be clearly defined when they appear as embedded options. Just saying “this feature allows you to embed linker options to object files” is too weak as a specification. You need to clearly define a list of options that will be supported by linkers with their clear semantics.

(*1) One of the big annoyances that I noticed when I was implementing the same feature for COFF is that the COFF’s .drctve section that contains linker options have to be tokenized in the same way as the Windows command line does. So it needs to interpret double quotes and backslashes correctly especially when handling space-containing pathnames. This is a design failure that a COFF file contains just a single string instead of runs of strings that have already been tokenized.

Thank you for starting the discussion thread.

In general I’m in favor of the proposal. Defining a generic way to convey some information from the compiler to the linker is useful, and it looks like it is just a historical reason that the ELF lacks the feature at the moment.

This is a scenario in which the feature is useful: when you include math.h, a compiler (which is driven by some pragma) could added -lm to the note section so that a linker automatically links libm.

I think I’m also in favor of the format, which is essentially runs of null-terminated strings (*1) that are basically opaque to compilers.

Yes. However, I think I want to clarify that we want this to be completely opaque to the backend. The front end could possibly have some enhancements to make this better. But, that will be a separate change, and that discussion should take place then. We shouldn’t paint ourselves into a corner. Basically, I think that there is some legitimate concerns here, but they would not be handled at this layer, but above.

However, you should define as a spec what options are allowed and what their semantics are. We should not accept arbitrary linker options because semantics of some linker options cannot be clearly defined when they appear as embedded options. Just saying “this feature allows you to embed linker options to object files” is too weak as a specification. You need to clearly define a list of options that will be supported by linkers with their clear semantics.

Personally, I would like to see the ability to add support for additional options without having to modify the compiler. That said, I think that there are options which can be scary (e.g. -nopie). I think that the linker should make the decision of what it supports and error out on others. This allows for us to enhance the support over time without a huge overhead. As a starting point, I think that -l and -L are two that would be interesting. I can see -u being useful as well, but the point is that we can slowly grow the support after consideration by delaying the validation of the options.

(*1) One of the big annoyances that I noticed when I was implementing the same feature for COFF is that the COFF’s .drctve section that contains linker options have to be tokenized in the same way as the Windows command line does. So it needs to interpret double quotes and backslashes correctly especially when handling space-containing pathnames. This is a design failure that a COFF file contains just a single string instead of runs of strings that have already been tokenized.

I think that there is room for refinement on this. The best part is that the refinement for that is delayed! It would be best done in the front end IMO, and we can actually further discuss and design the feature there. I don’t think that we need to be completely blind to the issues we have seen, but we shouldn’t over constrain either.

In general I'm in favor of the proposal. Defining a generic way to convey
some information from the compiler to the linker is useful, and it looks
like it is just a historical reason that the ELF lacks the feature at the
moment.

This is a scenario in which the feature is useful: when you include
math.h, a compiler (which is driven by some pragma) could added `-lm` to the
note section so that a linker automatically links libm.

I agree that this would be a very useful addition to ELF. I've always
wanted to reach the point where you could just type "ld main.o" and
have all the dependencies automatically linked in. (Go kind of
achieves this, I think.)

I'm not in favor of using yet another note section, however. SHT_NOTE
sections are intended for the use of "off-axis" tools, not for
something the linker would need to look for. I don't want to have the
linker parsing individual note entries looking for notes of interest
to itself, and then having to decide whether to edit those entries out
of the larger section, or merge them together. And I also don't want
to key off of individual section names -- the linker is not supposed
to have to care about the section name. There should be a new section
type for this feature. This is the kind of extension that ELF was
designed for.

I think I'm also in favor of the format, which is essentially runs of
null-terminated strings (*1) that are basically opaque to compilers.

Yes. However, I think I want to clarify that we want this to be completely
opaque to the backend. The front end could possibly have some enhancements
to make this better. But, that will be a separate change, and that
discussion should take place then. We shouldn’t paint ourselves into a
corner. Basically, I think that there is some legitimate concerns here, but
they would not be handled at this layer, but above.

However, you should define as a spec what options are allowed and what
their semantics are. We should not accept arbitrary linker options because
semantics of some linker options cannot be clearly defined when they appear
as embedded options. Just saying "this feature allows you to embed linker
options to object files" is too weak as a specification. You need to clearly
define a list of options that will be supported by linkers with their clear
semantics.

Personally, I would like to see the ability to add support for additional
options without having to modify the compiler. That said, I think that
there are options which can be scary (e.g. -nopie). I think that the linker
should make the decision of what it supports and error out on others. This
allows for us to enhance the support over time without a huge overhead. As
a starting point, I think that -l and -L are two that would be interesting.
I can see -u being useful as well, but the point is that we can slowly grow
the support after consideration by delaying the validation of the options.

(*1) One of the big annoyances that I noticed when I was implementing the
same feature for COFF is that the COFF's .drctve section that contains
linker options have to be tokenized in the same way as the Windows command
line does. So it needs to interpret double quotes and backslashes correctly
especially when handling space-containing pathnames. This is a design
failure that a COFF file contains just a single string instead of runs of
strings that have already been tokenized.

I too would like to keep the linker from having to tokenize the
strings. I kind of agree with Rafael that there should be defined tags
and values, much like a .dynamic section, but I wouldn't want to have
values pointing to strings in yet another section, so I'd prefer
something in between that and a free-form null-terminated string. I
also wouldn't want to open up the complete list of linker options, so
I'd prefer a defined list of tags in string form that could easily be
augmented without additional backend support. We could start with,
perhaps, "lib" to inject a library (a la "-l"), "file" to inject an
object file by full name, and "path" to provide a search path (a la
"-L"). I don't think an equivalent for "-u" would be needed, since the
compiler can simply generate an undef symbol for that case. For the
section format, I'd suggest a series of null-terminated strings,
alternating between tags and values, so that no quote or escape
parsing is necessary.

For the header files, a simple syntax like

   #pragma linker_directive "lib" "m"

would provide the extensibility needed to add new tags with no
additional support in the front end or back end.

-cary

In general I’m in favor of the proposal. Defining a generic way to convey
some information from the compiler to the linker is useful, and it looks
like it is just a historical reason that the ELF lacks the feature at the
moment.

This is a scenario in which the feature is useful: when you include
math.h, a compiler (which is driven by some pragma) could added -lm to the
note section so that a linker automatically links libm.

Glad to have you chime in; I know that you have quit a bit of experience from binutils and gold. I really would love to see this support be implemented there too, and having your input is certainly valuable.

I agree that this would be a very useful addition to ELF. I’ve always
wanted to reach the point where you could just type “ld main.o” and
have all the dependencies automatically linked in. (Go kind of
achieves this, I think.)

Excellent! I think that everyone agrees that this is a useful extension to add.

I’m not in favor of using yet another note section, however. SHT_NOTE
sections are intended for the use of “off-axis” tools, not for
something the linker would need to look for. I don’t want to have the
linker parsing individual note entries looking for notes of interest
to itself, and then having to decide whether to edit those entries out
of the larger section, or merge them together. And I also don’t want
to key off of individual section names – the linker is not supposed
to have to care about the section name. There should be a new section
type for this feature. This is the kind of extension that ELF was
designed for.

I’m really not tied to the note approach of implementing this. I am (admittedly) abusing the notes due to a couple of behavioral aspects of them. So, the main things to realize is that this information is embedded into the object files that are built. The information should be processed by the linker and then discarded, none of it should be in the final binary (unless it is a relocatable link). I’m concerned about linkers which do not support this feature preserving the contents. Now, this could very well be a misconception on my part. If that is the case, then, I would say that this needs to be entirely reworked, because then adding the section sounds much nicer.

I think I’m also in favor of the format, which is essentially runs of
null-terminated strings (*1) that are basically opaque to compilers.

Yes. However, I think I want to clarify that we want this to be completely
opaque to the backend. The front end could possibly have some enhancements
to make this better. But, that will be a separate change, and that
discussion should take place then. We shouldn’t paint ourselves into a
corner. Basically, I think that there is some legitimate concerns here, but
they would not be handled at this layer, but above.

However, you should define as a spec what options are allowed and what
their semantics are. We should not accept arbitrary linker options because
semantics of some linker options cannot be clearly defined when they appear
as embedded options. Just saying “this feature allows you to embed linker
options to object files” is too weak as a specification. You need to clearly
define a list of options that will be supported by linkers with their clear
semantics.

Personally, I would like to see the ability to add support for additional
options without having to modify the compiler. That said, I think that
there are options which can be scary (e.g. -nopie). I think that the linker
should make the decision of what it supports and error out on others. This
allows for us to enhance the support over time without a huge overhead. As
a starting point, I think that -l and -L are two that would be interesting.
I can see -u being useful as well, but the point is that we can slowly grow
the support after consideration by delaying the validation of the options.

(*1) One of the big annoyances that I noticed when I was implementing the
same feature for COFF is that the COFF’s .drctve section that contains
linker options have to be tokenized in the same way as the Windows command
line does. So it needs to interpret double quotes and backslashes correctly
especially when handling space-containing pathnames. This is a design
failure that a COFF file contains just a single string instead of runs of
strings that have already been tokenized.

I too would like to keep the linker from having to tokenize the
strings. I kind of agree with Rafael that there should be defined tags
and values, much like a .dynamic section, but I wouldn’t want to have
values pointing to strings in yet another section, so I’d prefer
something in between that and a free-form null-terminated string. I
also wouldn’t want to open up the complete list of linker options, so
I’d prefer a defined list of tags in string form that could easily be
augmented without additional backend support. We could start with,
perhaps, “lib” to inject a library (a la “-l”), “file” to inject an
object file by full name, and “path” to provide a search path (a la
“-L”). I don’t think an equivalent for “-u” would be needed, since the
compiler can simply generate an undef symbol for that case. For the
section format, I’d suggest a series of null-terminated strings,
alternating between tags and values, so that no quote or escape
parsing is necessary.

Sounds like we agree on the direction: we don’t want the backend to be involved in adding new options, we don’t think that all options make sense but want to be able to add options still. As to the -u option, Im thinking about cases were an unreferenced symbol would like to be preserved with —gc-sections and being built with -ffunction-sections and/or -fdata-sections.

So, after discussing some of the items, we ended up somewhere in-between. My current proposal is a semi-pre-tokenized linker response file. Basically, each option/parameter “pair” would be a single string entry in an array of string values. The only difference is instead of TLV entries, it is simply the raw entry. My resistance to the TLV really is driven more by the LLVM IR (which I suppose is possible to alter):

https://llvm.org/docs/LangRef.html#automatic-linker-flags-named-metadata

For the header files, a simple syntax like

#pragma linker_directive “lib” “m”

would provide the extensibility needed to add new tags with no
additional support in the front end or back end.

I really wish to avoid this discussion right now. I am happy to loop you into a subsequent thread discussing that. I figure that this will be a much more contentious issue as syntax is something everyone has differing opinions on. I’m trying to split this work into three distinct pieces: the frontend support to emit the information, the backend to emit this into the object, and the linker to use it.

As an aside, personally, I was thinking more along the lines of #pragma comment(lib, “m”).

Wouldn’t a special section type trigger an “unrecognized section type” error for linkers that don’t support it?

– Sean Silva

Thank you for starting the discussion thread.

In general I'm in favor of the proposal. Defining a generic way to convey
some information from the compiler to the linker is useful, and it looks
like it is just a historical reason that the ELF lacks the feature at the
moment.

This is a scenario in which the feature is useful: when you include
math.h, a compiler (which is driven by some pragma) could added `-lm` to
the note section so that a linker automatically links libm.

I think I'm also in favor of the format, which is essentially runs of
null-terminated strings (*1) that are basically opaque to compilers.

Yes. However, I think I want to clarify that we want this to be
completely opaque to the backend. The front end could possibly have some
enhancements to make this better. But, that will be a separate change, and
that discussion should take place then. We shouldn’t paint ourselves into
a corner. Basically, I think that there is some legitimate concerns here,
but they would not be handled at this layer, but above.

However, you should define as a spec what options are allowed and what

their semantics are. We should not accept arbitrary linker options because
semantics of some linker options cannot be clearly defined when they appear
as embedded options. Just saying "this feature allows you to embed linker
options to object files" is too weak as a specification. You need to
clearly define a list of options that will be supported by linkers with
their clear semantics.

Personally, I would like to see the ability to add support for additional
options without having to modify the compiler. That said, I think that
there are options which can be scary (e.g. -nopie). I think that the
linker should make the decision of what it supports and error out on
others. This allows for us to enhance the support over time without a huge
overhead. As a starting point, I think that -l and -L are two that would
be interesting. I can see -u being useful as well, but the point is that
we can slowly grow the support after consideration by delaying the
validation of the options.

I think no one including me opposed the idea of designing the feature so
that new options can be added without having to modify the compiler. That's
fine to me.

What I said is that you still have to specify a list of options that are
allowed in the "linker options" section in your *specification*. Your
specification is incomplete and underspecified if it doesn't explicitly
list linker options with their semantics. If you don't do that, people
would start using arbitrary linker options whose semantics are vague/that
are dangerous/that are simply wrong (e.g. unclosed `-whole-archive`
option) after
you would implement the feature. We really shouldn't let that mess happen.

Concretely, which options do you want to use? I could imagine everyone
wants to use `-l`. `-L` is arguable because I do not see an obvious reason
to add a search path from the compiler. You mentioned `-u`. Is there any
other flag that you have in your mind?

(*1) One of the big annoyances that I noticed when I was implementing the

Excellent. I think that the TLV approach requires that to a certain extent (the enumeration would be hardcoded).

I think we all agree that blindly allowing the linker to honor the options would be scary. I agree that we should whitelist the options, and am of the opinion that we should force validation on the linker side (use of any option which the linker doesn’t support in this form can be fatal). Starting small is the best way, with -l and -L as a starting point. I want to retain the ability to add additional options which may not be available in all linkers. However, whitelisting obviously requires working with the linker as would adding such options, so that could be handled at that time.

Furthermore, note that the LLVM side of the implementation does not provide a way to add anything from the user - that is a change to clang, and is something that I intend to be a follow up change. As I mentioned in a previous email, I have my own preferences for how to handle those, but, I think that when we start adding support to the frontend for it is the right time to discuss the syntax of how to support that. If we go with the approach of tokenizing the options and doing a C-string per option, I think it leaves sufficient flexibility to permit the frontend proper control to emit the necessary argument and not have the linker parse the options having to worry about quoting and space handling which I believe to be a good compromise on all sides.

I’m thinking about future enhancements. MachO does actually provide something like -L -l in a single go via -framework. But, no such option exists for ELF since it doesn’t have the concept of framework bundles (but the layout itself is interesting), and I just want to try to keep the door open for such features.

Wouldn’t a special section type trigger an “unrecognized section type” error for linkers that don’t support it?

Yeah, that is possible. Compatibility problem exists with ld64, and are handled by means of -mlinker-version. I don’t know how others feel about bringing that flag over to other platforms. We could force the use of a new section and bring along -mlinker-version and base it on that or silently drop the flags (sounds slightly unexpected). Or we could abuse the notes and not have to worry about the compatibility problem (which is why my initial work went that route). Again, I’m not tied to the exact mechanism we use to provide compatibility, though my personal preference tends to be go with the nicer solution for longer term (which does feel like the -mlinker-version + custom section, but I’m worried about the silent dropping of flags). Perhaps there is a better solution that I haven’t considered.

Btw, apple has support for “Linker Options” as module flags. Is there a reason not to use a similar mechanism?

void addLinkerOptions(Module &M) {
#ifdef APPLE
M.addModuleFlag(
llvm::Module::AppendUnique, “Linker Options”,
llvm::MDNode::get(M.getContext(),
llvm::MDNode::get(M.getContext(),
llvm::MDString::get(M.getContext(),
“-lcvrfi-rt”))));
#endif
}