[lld] Verifying the Architecture of files read

Hi,

It is needed that lld verifies the input to the linker.

For example : a x86 ELF file can be given to lld when the target is x86_64. Similiarly with other flavors.

I was thinking to have a varargs function in the LinkingContext that would be overridden by each of the LinkingContexts to verify files after being read.

The reader would call the varargs function in the LinkingContext and raise an error if the input is not suitable with the current link mode.

Thanks

Shankar Easwaran

Why would it need to be varargs? Also, parse can just return an error that
specifies the format is wrong.

Specifically this would be a good place to use the user data part of
ErrorOr to specify what was expected and what was received.

- Michael Spencer

Hi,

It is needed that lld verifies the input to the linker.

For example : a x86 ELF file can be given to lld when the target is
x86_64. Similiarly with other flavors.

I was thinking to have a varargs function in the LinkingContext that would
be overridden by each of the LinkingContexts to verify files after being
read.

The reader would call the varargs function in the LinkingContext and raise
an error if the input is not suitable with the current link mode.

Thanks

Shankar Easwaran

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation

Why would it need to be varargs?

LinkingContext for ELF would require fields from the ELF header to verify if the file thats being read belongs to the current target architecture.

I am not sure of how many fields would determine the same for Darwin and COFF.

This is the reason I thought it should be varargs.

Also, parse can just return an error that
specifies the format is wrong.

It would still need to call the LinkingContext to figure out if the format is associated with the target.

Specifically this would be a good place to use the user data part of
ErrorOr to specify what was expected and what was received.

Couldnt follow this. Can you elaborate ?

Thanks

Shankar Easwaran

There are two different situations that need to be handled differently.
If the user specifies a path to an incompatbile file, it should create
an error. If the search for a library finds an incompatible file, it
should be skipped and a non-fatal warning might be emitted.

Joerg

Yes. We need a way to error out if there is an architecture mismatch. But there are some interesting scenarios we need to support.

* If linking with a static library, you may not know until you actually need to load one of the members if the architecture is wrong, and it may not be an error if the architecture is wrong, but nothing is loaded.

* It might be a warning instead of an error to link against a shared library of the wrong architecture. That is, the linker may need to ignore (and warn) but continue and try to complete the link without it.

* The mach-o linker also allows you to not specify the architecture on the command line. Instead the linker infers the architecture by looking at the first object file. This is mostly used in -r mode. So, where the check is done to see that the arch is correct, may actually cause the architecture in the LinkingContext to be set.

* mach-o also has “fat” files which can contain multiple architectures. So, the reader needs to know the arch to even try to parse. In other words, if the Reader is told the intended arch, the Reader could error out if the file is not of that arch (and for mach-o the Reader would select the right slice in a fat file).

-Nick

It is needed that lld verifies the input to the linker.

For example : a x86 ELF file can be given to lld when the target is x86_64. Similiarly with other flavors.

I was thinking to have a varargs function in the LinkingContext that would be overridden by each of the LinkingContexts to verify files after being read.

The reader would call the varargs function in the LinkingContext and raise an error if the input is not suitable with the current link mode.

Yes. We need a way to error out if there is an architecture mismatch. But there are some interesting scenarios we need to support.

Ok. will create a varArg function (verifyArch ?)

I am trying to see if variadic functions would be another alternative too.

* If linking with a static library, you may not know until you actually need to load one of the members if the architecture is wrong, and it may not be an error if the architecture is wrong, but nothing is loaded.

* It might be a warning instead of an error to link against a shared library of the wrong architecture. That is, the linker may need to ignore (and warn) but continue and try to complete the link without it.

* The mach-o linker also allows you to not specify the architecture on the command line. Instead the linker infers the architecture by looking at the first object file. This is mostly used in -r mode. So, where the check is done to see that the arch is correct, may actually cause the architecture in the LinkingContext to be set.

For lld, I think the flavor also would need to be inferred from the first object, isnt it ?

* mach-o also has “fat” files which can contain multiple architectures. So, the reader needs to know the arch to even try to parse. In other words, if the Reader is told the intended arch, the Reader could error out if the file is not of that arch (and for mach-o the Reader would select the right slice in a fat file).

Since all of the code ends up within the parseFile function in the Reader, we should be able to query LinkingContext and return an actual error/warning on a need basis and only on valid scenarios.

Thanks

Shankar Easwaran

Hi,

It is needed that lld verifies the input to the linker.

For example : a x86 ELF file can be given to lld when the target is
x86_64. Similiarly with other flavors.

I was thinking to have a varargs function in the LinkingContext that
would
be overridden by each of the LinkingContexts to verify files after being
read.

The reader would call the varargs function in the LinkingContext and
raise
an error if the input is not suitable with the current link mode.

Thanks

Shankar Easwaran

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation

Why would it need to be varargs?

LinkingContext for ELF would require fields from the ELF header to verify
if the file thats being read belongs to the current target architecture.

I am not sure of how many fields would determine the same for Darwin and
COFF.

This is the reason I thought it should be varargs.

The reader has an {ELF,COFF,Darwin}LinkingContext. It can just ask the
context.

Also, parse can just return an error that

specifies the format is wrong.

It would still need to call the LinkingContext to figure out if the format
is associated with the target.

Exactly.

Specifically this would be a good place to use the user data part of

ErrorOr to specify what was expected and what was received.

Couldnt follow this. Can you elaborate ?

ErrorOr supports user data. See unittests/Support/ErrorOrTest.cpp.

It would simply be: return ArchMismatch(expected, actual);

Then an error handler higher up can extract that to form a proper error
message.

- Michael Spencer

  Hi,

It is needed that lld verifies the input to the linker.

For example : a x86 ELF file can be given to lld when the target is
x86_64. Similiarly with other flavors.

I was thinking to have a varargs function in the LinkingContext that
would
be overridden by each of the LinkingContexts to verify files after being
read.

The reader would call the varargs function in the LinkingContext and
raise
an error if the input is not suitable with the current link mode.

Thanks

Shankar Easwaran

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by the Linux Foundation

  Why would it need to be varargs?

LinkingContext for ELF would require fields from the ELF header to verify
if the file thats being read belongs to the current target architecture.

I am not sure of how many fields would determine the same for Darwin and
COFF.

This is the reason I thought it should be varargs.

The reader has an {ELF,COFF,Darwin}LinkingContext. It can just ask the
context.

This would make it not part of the LinkingContext interface then. I was thinking of a pure virtual function defined in LinkingContext and all
LinkingContexts to implement the function in their code.

Specifically this would be a good place to use the user data part of

ErrorOr to specify what was expected and what was received.

Couldnt follow this. Can you elaborate ?

ErrorOr supports user data. See unittests/Support/ErrorOrTest.cpp.

It would simply be: return ArchMismatch(expected, actual);

Then an error handler higher up can extract that to form a proper error
message.

Ok.

Thanks

Shankar Easwaran

Hi Nick, Bigcheese,

Resurrecting a old thread.

Now since we have a Registry that models Readers, do we want to have a function in the Registry that evaluates whether a file should be parsed into atoms (or) raise an appropriate error ?

I would think the output architecture could be chosen from the first file that was parsed, I think each flavor's LinkingContext should store a field pointing to the architecture of the first input file that was tried to be parsed.

Thanks

Shankar Easwaran

Could you elaborate a bit about the issue that you are trying to solve with this suggestion?

Ruiu,

I am not sure if you looked at this thread (http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066155.html)

let me know if you still have questions.

As a short summary, we dont verify the architecture of files that are being read. We could very well be passed in a hexagon input file while the target specified was x86_64. we got to reject the input file as the user has chosen the architecture to be x86_64.

Thanks

Shankar Easwaran

Moreover, inside MIPS architecture there are some incompatible
"sub-architectures". It would be nice if we can check input files and
reject incorrect combinations.

For a simple design, I was thinking that the LinkingContext per flavor could store input file information as its processed by the linker and choose to reject files or accept them.

The only problem here is that since the input files are parsed in parallel, how do we want to deal with this scenario ?

Thanks

Shankar Easwaran

So is for PE32 and PE32+. They cannot be mixed because they are for x86 and x86-64, respectively.

I’d think we can simply wait for all files to be parsed and pass them to a LinkingContext to ask whether or not the input file set seems OK.

There are pros and cons to your approach :- a) easier to implement b) lot of usecases that the linker usually deals with, are users specifying the right architecture a) the input files will need to be looked at again, either the elf header has to be re-read or stored in LinkingContext (increase in memory footprint). b) there might be an issue with just one input file, and the user has to wait till all files have been parsed to get the actual error. c) users use a huge command line linking few applications(example:building clang), and the cost of revisiting all the input files may be huge. I would prefer the architecture be read/verified at the time of reading, and stop reading the rest of the files as soon as we see a discrepancy. If I misunderstood your suggestion, please let me know. Thanks Shankar Easwaran

I'd think my question is whether or not we want to have some sophisticated
approach for the situation that a user accidentally give a wrong object
file to LLD. If that really happens frequently, and if we really don't want
users to wait for the parser to parse all files, we need some solution that
works progressively as we read files.

I think, in such situation, the most important thing is to correctly raise
an error. Response time to raise an error is not that important.

So is for PE32 and PE32+. They cannot be mixed because they are for x86 and
x86-64, respectively.

I'd think we can simply wait for all files to be parsed and pass them to a
LinkingContext to ask whether or not the input file set seems OK.

There are pros and cons to your approach :-

*pros*
a) easier to implement
b) lot of usecases that the linker usually deals with, are users
specifying the right architecture

*cons*
a) the input files will need to be looked at again, either the elf header
has to be re-read or stored in LinkingContext (increase in memory
footprint).
b) there might be an issue with just one input file, and the user has to
wait till all files have been parsed to get the actual error.
c) users use a huge command line linking few applications(example:building
clang), and the cost of revisiting all the input files may be huge.

I don't get (a). Why do you have to scan a file again?

It is more than “verify” . Mach-o has “fat” (aka “universal”) files which contain multiple “slices”. Each slice is for a different arch. The Reader needs to know the intended arch up front to pick the right slice.

It is more than “verify” . Mach-o has “fat” (aka “universal”) files which
contain multiple “slices”. Each slice is for a different arch. The Reader
needs to know the intended arch up front to pick the right slice.

That's true. Does the linker on Mac knows which architecture that is trying
to link prior to handling input files (by a command line flag, environment
variable, etc), or does it have to make a decision by reading a few files?
(e.g. set the target architecture with the same one as the first file's
magic.)

99% of the time there is a -arch option on the command line which forces which architecture is to be linked. If there is no -arch on the command line, the linker sniffs the input files in order until it finds the first non-fat object file and then forces the architecture to match that. This sniffing is done independently (and prior to) actual input file processing.

-Nick

Currently the GnuLdDriver creates a LinkingContext(X86_64, X86, …) from the -target option in the Gnu flavor. So here, the driver has to sniff the first file if the -target option is not mentioned in the command-line I think. This would create more problems if the first input file is just a linker script. Are you thinking of lld::File returning a llvm::Triple as in :- llvm::Triple triple() const = 0; Agree. Thanks Shankar Easwaran