Authors: Hongyu Chen, Daniel Thornburgh, Prabhu Karthikeyan Rajasekaran
Reviewers: Fangrui Song, Paul Kirth, Petr Hosek
Background
In contrast to high level programming languages, linker scripts suffer from inadequate support and tooling, making tasks such as writing correct linker scripts and debugging them quite challenging. To improve the status quo, the linker script lexer and parser implementation in LLD must account for the needs of embedded development such as better diagnostics, debuggability of LLD’s internal states, support for LTO code generation and even possibly developing an intermediate representation for linker script within LLD. We would like to propose a well contained improvement to the underlying data structures used in linker script parser to get us started towards the above mentioned long term goals.
Replace use of Expr lambda
Expressions in linker script are represented as lambdas within LLD. Current Expr type declaration: link
using Expr = std::function<ExprValue()>;
It was a practical choice to leverage lambdas to construct and evaluate expressions. It offers a straightforward way of implementing expression handling in the current parser implementation to help modify the global state as needed while parsing the linker script. However, this prevents us from decoupling expression evaluation from the parser.
We propose transitioning from lambdas to new, well-defined expression types (a.k.a ScriptExpr) to streamline and clarify the management of state within the linker script parser. The current challenge lies in the fact that these lambdas currently gather state from various parts of the linker, making the process opaque and difficult to manage. Our goal is to ensure that this state is systematically accessible to the expression evaluator through a well-defined context, whether this involves creating a new context object or adapting an existing one. This approach not only improves the organization of the code but also sets the stage for more maintainable and extensible future developments.
Why?
- Improved debuggability. Current use of callbacks makes it nearly impossible to dump linker scripts in a human-readable format which is one of our long term goals.
- Holding lambdas for expression evaluation limits the flexibility and efficiency of expression handling.
- Performance.
- The current lambda approach incurs a performance cost, as each assignment requires a separate memory allocation, leading to inefficiencies. By replacing `Expr` lambdas with syntactic types, we can leverage LLD’s `make` bump pointer allocation, allowing for efficient bulk allocation without requiring individual deallocations.
- This approach can significantly improve evaluation time by eliminating the function pointer calls inherent in lambdas, potentially replacing them with an inlined evaluator structured as a single giant switch table.
Example use of Expr lambda within LLD
Expr combine(StringRef op, Expr l, Expr r);
Expr readExpr();
Expr readExpr1(Expr lhs, int minPrec);
Expr readPrimary();
Expr readTernary(Expr cond);
Expr readParenExpr();
...
//https://github.com/llvm/llvm-project/blob/f86594788ce93b696675c94f54016d27a6c21d18/lld/ELF/ScriptParser.cpp#L1474
if (tok == "MAX" || tok == "MIN") {
expect("(");
Expr a = readExpr();
expect(",");
Expr b = readExpr();
expect(")");
if (tok == "MIN")
return [=] { return std::min(a().getValue(), b().getValue()); };
return [=] { return std::max(a().getValue(), b().getValue()); };
}
Proposed new types
Our proposed direction is inspired by MCExpr in llvm: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCExpr.h.
class ScriptExpr {
// Base class for linker script expressions which are needed for parsing.
public:
enum class ExprKind: uint8_t {
Constant,
Dynamic,
Unary,
Binary
};
private:
ExprKind kind_;
protected:
explicit ScriptExpr(ExprKind kind): kind_(kind) {}
public:
ExprKind getKind() const { return kind_; }
};
class ConstantExpr: public ScriptExpr {
// Represents a constant integer expression.
public:
ConstantExpr(ExprValue val):
ScriptExpr(ExprKind::Constant), val_(val) {}
ConstantExpr(uint64_t val):
ScriptExpr(ExprKind::Constant), val_(ExprValue(val)) {}
ExprValue getVal() const { return val_; }
private:
ExprValue val_;
};
class DynamicExpr: public ScriptExpr {
public:
DynamicExpr(std::function<ExprValue()> impl)
: ScriptExpr(ExprKind::Dynamic), impl_(impl) {}
std::function<ExprValue()> getImpl() const { return impl_; }
private:
std::function<ExprValue()> impl_;
};
class UnaryExpr: public ScriptExpr {
public:
UnaryExpr(const ScriptExpr *operand)
: ScriptExpr(ExprKind::Unary), operand_(operand){}
private:
const ScriptExpr *operand_;
};
class BinaryExpr: public ScriptExpr {
public:
BinaryExpr(const ScriptExpr *LHS, const ScriptExpr *RHS)
: ScriptExpr(ExprKind::Binary), LHS(LHS), RHS(RHS) {}
private:
const ScriptExpr *LHS, *RHS;
};
Evaluation
Following are some of the preconditions we would like to meet prior to landing our changes.
- All existing LLD tests pass.
- Add new LLD tests to reflect our changes.
- Performance.
- While proposing this new direction for handling linker script expressions, we recognize that the ultimate impact on performance remains uncertain until the implementation is complete. To address this, we plan to thoroughly profile LLD to ensure these changes do not adversely affect performance. Given that this project is more experimental in nature, it’s valuable to us to collect important insights for future developments.
Prototype implementation
Here’s a link to our work in progress implementation.
Branch: https://github.com/yugier/llvm-project/tree/lld-elf-script-expr
Please share your thoughts on the proposed changes to LLD. We invite ideas to improve our direction as well as strategies we could develop to help understand the performance impact of these changes.