What do you plan to do about operator overloads in the DIL? In the course of debugging a library that implements overloaded operators, you sometimes want to see the result of the overload (particularly for ->
) but other times you want to see an actual field and NOT run the overload. This was handled in the current scheme because expr
would always show the overloaded operator result, and frame var
would always show you the memory dereference. So provided you understand how the system works, you can always get what you want.
Are we going to preserve this distinction, or do we need a way to tell the DIL which one we want?
Jim
On Apr 11, 2023, at 2:15 AM, Andy Hippo via LLVM Discussion Forums notifications@llvm.discoursemail.com wrote:
werat
April 11Intro
In this RFC I introduce a Data Inspection Language (DIL) to improve stability and performance of variable introspection and address the inconsistencies in existing implementations to improve the debugger user experience. The names in this document (e.g. DIL) are placeholders and might change. If you have any suggestions, please leave comments
DIL is an expression language designed to inspect the data (e.g. variables) in the program. It looks and feels like a source language of the program (e.g. C++), but may deviate from it in certain aspects and can support extra features like dynamic type resolution and synthetic children providers. Quick example:
(lldb) inspect (char*)(&foo->bar + 64) "hello world"
(Note: the
inspect
command doesn’t exist in LLDB)LLDB already has two mechanisms for data inspection:
expr
andframe variable
. The first one,expr
, uses a compiler-powered expression evaluator and can execute almost any valid C++ code. It’s very powerful, but among its downsides are instability, poor performance, and lack of support for dynamic types and synthetic children. The second one,frame variable
, uses a very simple interpreter that supports a limited number of operations. However, it does dynamic type resolution and can follow fields generated by synthetic field providers. Developers often use these two commands interchangeably, but the results may be different for the same input. Many people don’t realize that there are differences and just use whatever they’re more used to.The eventual goal for DIL is to completely replace the implementation of the
frame variable
command (GetValueForExpressionPath()
) and to be used by default for thep/print
command. Theexpr
command (EvaluateExpression()
) will continue to use the compiler-based expression evaluation.Motivation
A big motivator for introducing DIL is debugger stability. The expression evaluation in LLDB is fragile and can often fail even for simple expressions depending on the complexity of the program and current context. The expression evaluator is used by default in the
print x
might crash the debugger if the program is currently stopped in some tricky context. We hope DIL will eventually be used by default in bothframe variable
(or at least can be aliased by the interested users), which will improve the overall reliability.Note: compiler-based expression evaluation is not fundamentally unstable, it’s just very hard to get the implementation right; it relies on ClangAST which has a lot of internal invariants and is tricky to construct.
Inconsistency in behaviour is another important factor. Having
frame variable
to produce different results is a source of frustration for many users. Educating users on which command to use under which circumstances can only do so much, so removing the inconsistencies between these commands is important for improving the user experience.Another motivator is performance.
EvaluateExpression()
can be pretty slow depending on the circumstances and DIL can be made significantly faster. Performance was the primary motivation for creating lldb-eval, which was used for implementing NatVis in the Stadia debugger. The gdb-to-lldb pretty-printer adapter (gala also relies on expression evaluation to achieve compatibility with GDB.However, using DIL in data formatters is explicitly out of scope for the current proposal. Expression evaluation is already highly discouraged from use in data formatters and DIL won’t challenge that. It will be accessible via SB API though and it can evolve as we progress through the implementation.
Detailed Design
The idea is that DIL would be a reasonable subset of the source language of the target. This way the users don’t need to worry much about the syntax differences, although I acknowledge that there might be some inconsistencies. Here I am focusing on C++ and will use it as an example. Later I’ll outline the ideas and solutions for supporting other languages.
DIL will support the following operations:
- Basic arithmetic – addition, subtraction, multiplication, division, modulo
- Bitwise operations – and, or, xor, negation, left/right shifts
- Member access –
foo->bar
andfoo.bar
- Array subscript (
foo[bar]
)- Dereference and address-of (
*foo
,&foo
)- Type casts – C-style casts (
(int)x
) and C+±style casts (static_cast
,reinterpret_cast
, etc)- Simple function calls (
foo()
,foo->bar()
) (see “Function calls” section below)Some operations here are language-specific (and might have language specific semantics) and
more operations may be added in the future. Let me know if you think something should be added to or removed from this list.
DIL is to be implemented as a parser + interpreter. It will use the information about types and values provided by LLDB to resolve identifiers, perform type checking and compute the result. This approach avoids using ASTImporter, so it doesn’t depend on the Clang compiler.
Language semantics
DIL doesn’t aim to have the exact same semantics as the program source language, but it will follow the “basic” semantics like operator precedence and overflow/underflow rules. It can deviate from the source language if that makes sense from the perspective of user-experience. DIL can provide extra convenience features, e.g. builtin intrinsic functions or something for common complex types.
Dynamic typing
Users often expect the member access operator to work “correctly” even if the static type of the variable doesn’t have the requested field. Consider the following example:
struct Base { // some virtual methods } struct Deriv : Base { int foo; } Base* base = new Deriv{}; // Debugger session (lldb) print base->foo
Base
type doesn’t have the fieldfoo
, however the actual type of the variable isDeriv
. If the user were to print the wholebase
variable (viaprint base
) they would see something like this:(lldb) print base (Deriv*) { (int) foo = 42 }
It only makes sense for
print base->foo
to print 42 as well.The pure compiler based approach (i.e. the current implementation of
EvaluateExpression()
) is not able to resolve this, as the compiler always uses the static type information. See another proposal for a hybrid expression evaluation implementation – DWIM Print Command.DIL will resolve the dynamic types as it parses and evaluates the expression (where possible) and therefore can choose and use the correct field. If necessary, this can be enabled/disabled via a setting.
Synthetic children providers
Data formatters in LLDB can provide synthetic children for existing objects, which is often used for complex data structures. When the user prints the variable, they see the members generated by the data formatters – this is very handy for visualizing complex objects. The current implementation of
EvaluateExpression()
does not support synthetic children, which means the user can’t use synthetic fields in the expressions.GetValueForExpressionPath()
does support them and the user can access generated fields inframe variable
expressions.DIL will be data-formatter-aware and will support synthetic fields. The current synthetic children API can be used the resolve the children by name and index (Variable Formatting — The LLDB Debugger), so the interpreter can support both
foo->synth_child
andfoo[3]
, which can be handy for things like containers (e.g. vectors). It will also support respect dereference (defined via$$dereference$$
).Properties
Some languages make heavy use of properties (or computed fields), e.g. Swift or Objective-C. Generally accessing properties requires executing code, because they’re essentially function calls. In theory it’s possible to call functions, however it requires knowledge of the ABI and is generally non-trivial. The current expression evaluator solves this problem by compiling and executing the whole expression.
DIL needs to support properties for languages that have them. A proper implementation would rely on calling the backing accessor function, however it’s not trivial because of the ABI and is even further complicated by language specific logic (e.g. in Objective-C we need to deal with the ObjC dispatch machinery).
A simpler (temporary) approach might be to use an expression evaluator for accessing properties. The DIL parser can recognize that
foo.bar
is a property access and can callEvaluateExpression()
on this specific field. This may impact the performance and introduce instability, but it will work. It could be disabled by default, e.g. something like this:(lldb) print foo->bar Error: Foo::bar is a property and accessing properties requires executing code in the process. This is disabled by default, but you can enable it via "settings set dil.allow-property-access true".
Function calls
Calling functions is very useful in many situations. Similar to properties, in general case calling a function requires a complete knowledge of ABI. The expression evaluator solves this problem by using a compiler.
However it’s possible to call some functions without a “complete” knowledge of ABI. For example, functions with no arguments or functions with primitive arguments. LLDB already has a capability to invoke such “simple” functions and it can be leveraged by DIL. It won’t cover all use cases, but it’s better than nothing and hopefully can cover many simple cases (e.g. calling methods like
v->size()
).Supporting different source languages
Everything above kind of assumes C++ as the source language of the target. However LLDB supports other languages like Swift and Objective-C (and ~Rust). Moreover, the target program can have modules of different languages, e.g. C++ code calling into Swift or the other way around.
DIL can have flavors (or dialects), which implement language specific features or semantics. By default, the interpreter would pick the flavor of the current frame, but it can be overridden. Implementation-wise DIL can have one parser that can branch based on the flavor or completely different parsers for different flavors. Since it’s not a goal to implement the whole source language, I believe the DIL parser can be relatively simple and easy to maintain.
Another option could be to incorporate features from different languages into one “true” implementation of DIL. This might be a good option from the user perspective (i.e. always works, don’t need to care about flavors). However I’m not sure whether it would be possible to properly differentiate between language-specific features in a single parser.
Comparison to other debuggers## GDB
GDB’s primary mechanism for data inspection is the
print/inspect
command. It uses an interpreter under the hood, which supports arithmetic operations, type casts, function calls. This is similar to the proposal in this RFC.GDB also has a capability to compile and execute arbitrary code. Here it uses
gcc
under the hood.Visual Studio Debugger
Visual Studio debugger supports evaluating expressions in the Immediate and Watch windows. It uses an interpreter-based approach, supports many C++ features like arithmetic, casts, function calls and provides many builtin intrinsics (like
strlen
or__findNonNull
functions). This expression evaluator is extensively used in NatVis, which is a framework for defining custom data visualizers.Milestones
Here’s the approximate implementation plan for DIL:
- Implement the same set of operations supported by the
frame variable
command
- This includes dynamic type resolution and synthetic children support
- Replace the implementation of
GetValueForExpressionPath()
with DIL
- This should be a no-op change from the user perspective
- Implement type casts (e.g.
(int)foo
orstatic_cast<int>(foo)
)- Implement arithmetic operations (addition/subtraction, multiplication, bitwise operations, etc)
- Implement properties
- Implement basic function calls
- Disable the fallback to
expr
by default in thep/print
command
- At the moment
p
is aliased todwim-print
(commit)
Thanks @jingham @labath @cmtice @kastiglione @dblaikie for discussing and refining this proposal.
Visit Topic or reply to this email to respond.
To unsubscribe from these emails, click here.