defining symbols with lld

Hi Nick,

I am planning to work on adding support for definining expressions for the Gnu flavor.

Currently Gnu ld supports an option --defsym symbol=expression. The expression may be composed of other symbols.
Any symbol that appears in the expression, gets its value from the output symbol value (address of the symbol in the output file).

In addition the symbol only gets defined if and only if there is a relocation, that refers to the symbol.

I was wondering how to accomplish this with lld. One thought that I had was to take list of user defined symbols and the expressions associated with the symbols and evaluate at the time of writing the output file.
The problem with that is

a) if the expression uses a symbol thats not defined anywhere else except the defsym expression, the symbol would get garbage collected.
b) Supporting linker scripts which create new symbols, will be an issue.

What are your thoughts ?

Thanks

Shankar Easwaran

Linker scripts have the same need. My idea for this was to allow atoms to
have an associated expression tree and would have references to all symbols
in that tree. The resolver wouldn't need any special handling for this, and
the backend would just need to evaluate the expression at the end.

- Michael Spencer

Agreed.

On one hand, this sounds like an AbsoluteAtom because it has no content and no section, but has an address. On the other hand, it needs References to the symbols used in its expression and only DefinedAtoms can have content. But DefinedAtoms are normally laid out and assigned addresses. One way to work this in would be to make them DefinedAtoms of size zero, with a new ContentType that the ELF Writer knows does not need an address assigned. The References to each symbol used in its expression ensures that all needed symbols exist and are not dead stripped. The References will need some Kind the ELF Writer knows is not a relocation, but there for the expression evaluator to find the addresses of the symbols used.

-Nick

These are the changes I plan to make, and some questions that I have

a) Define a new contentType for DefinedAtoms to say 'Expression'
b) Create a new class ExprnAtom derived from DefinedAtom
c) The expression could also contain various functions that could be set in the expression, how should that be represented ?
d) The actual content of the Atom would be a string representation of the expression, that can be used to emit YAML information
e)The expression tree needs to be stored into the Native intermediate representation too right ? Store them as atoms ? How to represent constants and functions ?
f) What about lld core ?
g) Create a new reference type, How does (ExpressionAtom, ExpressionFunction, ExpressionConstant) this sound ?
h) I still need to figure out, what are the ways this symbol can be overridden, if the same symbol is defined in a file, does it override, (Resolver may need to handle it).

Thanks

Shankar Easwaran

Linker scripts have the same need. My idea for this was to allow atoms to have an associated expression tree and would have references to all symbols in that tree. The resolver wouldn't need any special handling for this, and the backend would just need to evaluate the expression at the end.
Agreed.

On one hand, this sounds like an AbsoluteAtom because it has no content and no section, but has an address. On the other hand, it needs References to the symbols used in its expression and only DefinedAtoms can have content. But DefinedAtoms are normally laid out and assigned addresses. One way to work this in would be to make them DefinedAtoms of size zero, with a new ContentType that the ELF Writer knows does not need an address assigned. The References to each symbol used in its expression ensures that all needed symbols exist and are not dead stripped. The References will need some Kind the ELF Writer knows is not a relocation, but there for the expression evaluator to find the addresses of the symbols used.

-Nick

These are the changes I plan to make, and some questions that I have

a) Define a new contentType for DefinedAtoms to say 'Expression'
b) Create a new class ExprnAtom derived from DefinedAtom
c) The expression could also contain various functions that could be set in the expression, how should that be represented ?

I don’t understand this. I thought expression where like "_foo + 10”. What do you mean by functions set in expression?

d) The actual content of the Atom would be a string representation of the expression, that can be used to emit YAML information

Or a parse tree of the expression.

e)The expression tree needs to be stored into the Native intermediate representation too right ? Store them as atoms ? How to represent constants and functions ?

Well, technically the only places these expressions come from is the command line or linker scripts, so we don’t *have* to have a way to externalize the atoms in yaml or native format. But, it would be nice to allow that, so that some future C or asm extension would let you create these.

f) What about lld core ?
g) Create a new reference type, How does (ExpressionAtom, ExpressionFunction, ExpressionConstant) this sound ?

The expression could be an opaque string except that we need to validate it and we need the resolver to find the symbol names referenced in the expression. The data structure lld provides is a sequence of References. The normal data structure for an expression is an (expression) tree. We can fit the square peg in the round hole by changing the expression to posfix and make the Reference order be the evaluation order. So the expression "A + B * 2” would become the Reference sequence:
  kind=push-sym target=B
  kind=push-const addend=2
  kind=multiple
  kind=push-sym target=A
  kind=add
With this sequence of References we have references to the symbols we need and a simple way to evaluate. It is also easy to write as a native file format and yaml (just dump the references as I showed above). The original expression string is lost, but could be recreated if we wanted to write a post-fix to in-fix converter.

h) I still need to figure out, what are the ways this symbol can be overridden, if the same symbol is defined in a file, does it override, (Resolver may need to handle it).

Oh my, weak aliases and expression symbols combined!

-Nick

Hi Nick,

Thanks for your reply.

These are the changes I plan to make, and some questions that I have

a) Define a new contentType for DefinedAtoms to say 'Expression'
b) Create a new class ExprnAtom derived from DefinedAtom
c) The expression could also contain various functions that could be set in the expression, how should that be represented ?
I don’t understand this. I thought expression where like "_foo + 10”. What do you mean by functions set in expression?

Linker scripts can set expressions to be, some of the examples that I have seen are :-

foo=SIZEOF(.text)
foo=14+ADDR(.data)
foo=ALIGN(4096)

Thanks

Shankar Easwaran

So you are saying there are some built in functions in addition to operators like + and - .

-Nick

Yes.

Thanks

Shankar Easwaran

These are the changes I plan to make, and some questions that I have

a) Define a new contentType for DefinedAtoms to say 'Expression'

b) Create a new class ExprnAtom derived from DefinedAtom

c) The expression could also contain various functions that could be set
in the expression, how should that be represented ?
d) The actual content of the Atom would be a string representation of the
expression, that can be used to emit YAML information
e)The expression tree needs to be stored into the Native intermediate
representation too right ? Store them as atoms ? How to represent constants
and functions ?
f) What about lld core ?
g) Create a new reference type, How does (ExpressionAtom,
ExpressionFunction, ExpressionConstant) this sound ?

We only need one reference type, as the resolver only cares that it is
referenced.

h) I still need to figure out, what are the ways this symbol can be
overridden, if the same symbol is defined in a file, does it override,
(Resolver may need to handle it).

It would also be useful to keep source location information about
expressions so we can give nice diagnostics for things that can only be
detected by the backend.

I'm not completely convinced that DefinedAtom is the correct thing to use
for this case, as it follows completely different layout rules.

- Michael Spencer

However, the kinds of expressions allowed in --defsym is much more
restricted. From the GNU ld manpage:

       --defsym=symbol=expression
           Create a global symbol in the output file, containing the
absolute address given by expression. You may use this option as many
times as
           necessary to define multiple symbols in the command line. *A
limited form of arithmetic is supported for the expression in this context:
you may*
* give a hexadecimal constant or the name of an existing symbol,
or use "+" and "-" to add or subtract hexadecimal constants or symbols.* If you
           need more elaborate expressions, consider using the linker
command language from a script. Note: there should be no white space
between symbol,
           the equals sign ("="), and expression.

The full linker script expression language is another whole can of worms,
and we probably only want to implement a judiciously chosen subset of the
functionality. If you haven't already, check out <
http://sourceware.org/binutils/docs/ld/Expressions.html>. It is
well-written.

-- Sean Silva

I agree, but I think we need to have one expression evaluator in lld. Linker scripts that define expressions follow the same bucket
as expressions being dealt in the command line.

Thanks

Shankar Easwaran

We only need one reference type, as the resolver only cares that it is
referenced.

We would need multiple reference types for all the operators that you need to support, like what Nick mentioned.

It makes it easier to store them in native format too, and can test reading/writing from core.

h) I still need to figure out, what are the ways this symbol can be
overridden, if the same symbol is defined in a file, does it override,
(Resolver may need to handle it).

It would also be useful to keep source location information about
expressions so we can give nice diagnostics for things that can only be
detected by the backend.

For expressions, you mean to say record information, whether it was defined in the command line
or defined in the linker script ?

If we want to keep source location information, it might be useful to add this to all the atoms,
as it might be useful to throw information if source code was compiled with debug information ?

I'm not completely convinced that DefinedAtom is the correct thing to use
for this case, as it follows completely different layout rules.

The absolute atom is the only thing that comes to mind which doesnot have a content/reference list though
as Nick's reply mentioned.

Thanks

Shankar Easwaran