Intro
In this RFC I introduce a Data Inspection Language (DIL) to improve stability and performance of variable introspection and address the inconsistencies in existing implementations to improve the debugger user experience. The names in this document (e.g. DIL) are placeholders and might change. If you have any suggestions, please leave comments ![]()
DIL is an expression language designed to inspect the data (e.g. variables) in the program. It looks and feels like a source language of the program (e.g. C++), but may deviate from it in certain aspects and can support extra features like dynamic type resolution and synthetic children providers. Quick example:
(lldb) inspect (char*)(&foo->bar + 64)
"hello world"
(Note: the inspect command doesnât exist in LLDB)
LLDB already has two mechanisms for data inspection: expr and frame variable. The first one, expr, uses a compiler-powered expression evaluator and can execute almost any valid C++ code. Itâs very powerful, but among its downsides are instability, poor performance, and lack of support for dynamic types and synthetic children. The second one, frame variable, uses a very simple interpreter that supports a limited number of operations. However, it does dynamic type resolution and can follow fields generated by synthetic field providers. Developers often use these two commands interchangeably, but the results may be different for the same input. Many people donât realize that there are differences and just use whatever theyâre more used to.
The eventual goal for DIL is to completely replace the implementation of the frame variable command (GetValueForExpressionPath()) and to be used by default for the p/print command. The expr command (EvaluateExpression()) will continue to use the compiler-based expression evaluation.
Motivation
A big motivator for introducing DIL is debugger stability. The expression evaluation in LLDB is fragile and can often fail even for simple expressions depending on the complexity of the program and current context. The expression evaluator is used by default in the print command, which many (most) command-line users use for basic data inspection purposes. Simply doing print x might crash the debugger if the program is currently stopped in some tricky context. We hope DIL will eventually be used by default in both print and frame variable (or at least can be aliased by the interested users), which will improve the overall reliability.
Note: compiler-based expression evaluation is not fundamentally unstable, itâs just very hard to get the implementation right; it relies on ClangAST which has a lot of internal invariants and is tricky to construct.
Inconsistency in behaviour is another important factor. Having print and frame variable to produce different results is a source of frustration for many users. Educating users on which command to use under which circumstances can only do so much, so removing the inconsistencies between these commands is important for improving the user experience.
Another motivator is performance. EvaluateExpression() can be pretty slow depending on the circumstances and DIL can be made significantly faster. Performance was the primary motivation for creating lldb-eval, which was used for implementing NatVis in the Stadia debugger. The gdb-to-lldb pretty-printer adapter (gala also relies on expression evaluation to achieve compatibility with GDB.
However, using DIL in data formatters is explicitly out of scope for the current proposal. Expression evaluation is already highly discouraged from use in data formatters and DIL wonât challenge that. It will be accessible via SB API though and it can evolve as we progress through the implementation.
Detailed Design
The idea is that DIL would be a reasonable subset of the source language of the target. This way the users donât need to worry much about the syntax differences, although I acknowledge that there might be some inconsistencies. Here I am focusing on C++ and will use it as an example. Later Iâll outline the ideas and solutions for supporting other languages.
DIL will support the following operations:
- Basic arithmetic â addition, subtraction, multiplication, division, modulo
- Bitwise operations â and, or, xor, negation, left/right shifts
- Member access â
foo->barandfoo.bar - Array subscript (
foo[bar]) - Dereference and address-of (
*foo,&foo) - Type casts â C-style casts (
(int)x) and C+±style casts (static_cast,reinterpret_cast, etc) - Simple function calls (
foo(),foo->bar()) (see âFunction callsâ section below)
Some operations here are language-specific (and might have language specific semantics) and
more operations may be added in the future. Let me know if you think something should be added to or removed from this list.
DIL is to be implemented as a parser + interpreter. It will use the information about types and values provided by LLDB to resolve identifiers, perform type checking and compute the result. This approach avoids using ASTImporter, so it doesnât depend on the Clang compiler.
Language semantics
DIL doesnât aim to have the exact same semantics as the program source language, but it will follow the âbasicâ semantics like operator precedence and overflow/underflow rules. It can deviate from the source language if that makes sense from the perspective of user-experience. DIL can provide extra convenience features, e.g. builtin intrinsic functions or something for common complex types.
Dynamic typing
Users often expect the member access operator to work âcorrectlyâ even if the static type of the variable doesnât have the requested field. Consider the following example:
struct Base {
// some virtual methods
}
struct Deriv : Base {
int foo;
}
Base* base = new Deriv{};
// Debugger session
(lldb) print base->foo
Base type doesnât have the field foo, however the actual type of the variable is Deriv. If the user were to print the whole base variable (via print base) they would see something like this:
(lldb) print base
(Deriv*) {
(int) foo = 42
}
It only makes sense for print base->foo to print 42 as well.
The pure compiler based approach (i.e. the current implementation of EvaluateExpression()) is not able to resolve this, as the compiler always uses the static type information. See another proposal for a hybrid expression evaluation implementation â DWIM Print Command.
DIL will resolve the dynamic types as it parses and evaluates the expression (where possible) and therefore can choose and use the correct field. If necessary, this can be enabled/disabled via a setting.
Synthetic children providers
Data formatters in LLDB can provide synthetic children for existing objects, which is often used for complex data structures. When the user prints the variable, they see the members generated by the data formatters â this is very handy for visualizing complex objects. The current implementation of EvaluateExpression() does not support synthetic children, which means the user canât use synthetic fields in the expressions. GetValueForExpressionPath() does support them and the user can access generated fields in frame variable expressions.
DIL will be data-formatter-aware and will support synthetic fields. The current synthetic children API can be used the resolve the children by name and index (Variable Formatting - đ LLDB), so the interpreter can support both foo->synth_child and foo[3], which can be handy for things like containers (e.g. vectors). It will also support respect dereference (defined via $$dereference$$).
Properties
Some languages make heavy use of properties (or computed fields), e.g. Swift or Objective-C. Generally accessing properties requires executing code, because theyâre essentially function calls. In theory itâs possible to call functions, however it requires knowledge of the ABI and is generally non-trivial. The current expression evaluator solves this problem by compiling and executing the whole expression.
DIL needs to support properties for languages that have them. A proper implementation would rely on calling the backing accessor function, however itâs not trivial because of the ABI and is even further complicated by language specific logic (e.g. in Objective-C we need to deal with the ObjC dispatch machinery).
A simpler (temporary) approach might be to use an expression evaluator for accessing properties. The DIL parser can recognize that foo.bar is a property access and can call EvaluateExpression() on this specific field. This may impact the performance and introduce instability, but it will work. It could be disabled by default, e.g. something like this:
(lldb) print foo->bar
Error: Foo::bar is a property and accessing properties requires executing code in the process. This is disabled by default, but you can enable it via "settings set dil.allow-property-access true".
Function calls
Calling functions is very useful in many situations. Similar to properties, in general case calling a function requires a complete knowledge of ABI. The expression evaluator solves this problem by using a compiler.
However itâs possible to call some functions without a âcompleteâ knowledge of ABI. For example, functions with no arguments or functions with primitive arguments. LLDB already has a capability to invoke such âsimpleâ functions and it can be leveraged by DIL. It wonât cover all use cases, but itâs better than nothing and hopefully can cover many simple cases (e.g. calling methods like v->size()).
Supporting different source languages
Everything above kind of assumes C++ as the source language of the target. However LLDB supports other languages like Swift and Objective-C (and ~Rust). Moreover, the target program can have modules of different languages, e.g. C++ code calling into Swift or the other way around.
DIL can have flavors (or dialects), which implement language specific features or semantics. By default, the interpreter would pick the flavor of the current frame, but it can be overridden. Implementation-wise DIL can have one parser that can branch based on the flavor or completely different parsers for different flavors. Since itâs not a goal to implement the whole source language, I believe the DIL parser can be relatively simple and easy to maintain.
Another option could be to incorporate features from different languages into one âtrueâ implementation of DIL. This might be a good option from the user perspective (i.e. always works, donât need to care about flavors). However Iâm not sure whether it would be possible to properly differentiate between language-specific features in a single parser.
Comparison to other debuggers
GDB
GDBâs primary mechanism for data inspection is the print/inspect command. It uses an interpreter under the hood, which supports arithmetic operations, type casts, function calls. This is similar to the proposal in this RFC.
GDB also has a capability to compile and execute arbitrary code. Here it uses gcc under the hood.
Visual Studio Debugger
Visual Studio debugger supports evaluating expressions in the Immediate and Watch windows. It uses an interpreter-based approach, supports many C++ features like arithmetic, casts, function calls and provides many builtin intrinsics (like strlen or __findNonNull functions). This expression evaluator is extensively used in NatVis, which is a framework for defining custom data visualizers.
Milestones
Hereâs the approximate implementation plan for DIL:
- Implement the same set of operations supported by the
frame variablecommand- This includes dynamic type resolution and synthetic children support
- Replace the implementation of
GetValueForExpressionPath()with DIL- This should be a no-op change from the user perspective
- Implement type casts (e.g.
(int)fooorstatic_cast<int>(foo)) - Implement arithmetic operations (addition/subtraction, multiplication, bitwise operations, etc)
- Implement properties
- Implement basic function calls
- Disable the fallback to
exprby default in thep/printcommand- At the moment
pis aliased todwim-print(commit)
- At the moment
Thanks @jingham @labath @cmtice @kastiglione @dblaikie for discussing and refining this proposal.