Synthetic frame variables

Hello! I was putzing around with the new {Scripted/Synthetic}FrameProvider support and came across the get_variables callback in the ScriptedFrame example. I was wondering how much y’all are allowed to talk about how far along this is/how it’s doing.

For context, I have a local assembly/interpreter register machine thing I use for playing around, and I want to be able to debug it (because why not). I’m sure there are other people who also want to debug interpreted languages :slight_smile:

I have a patch that I’ve been working on locally that allowed me to use a subclass of ScriptedFrame to list variables (caveat: 1), as well as run expressions, which is how I read the current interpreter state and print things like registers. Running expressions is a little interesting, but it’s nice to have the ability for my custom frame to have complete control over the expressions so I can filter a little bit.

For now, the major changes are (in increasing order of controversy/feedback required):

  • Adding SBValueList to swig
  • Allowing SBInterpreter to get a ValueObjectSP from an SBValue and associated conversions to/from python for ValueObjectSP and ValueObjectListSP
  • Adding std::optional<lldb::ValueObjectListSP> GetVariables() override; and lldb::ValueObjectSP GetValueObjectForExpression(llvm::StringRef expr, Status &status) override; to ScriptedFramePythonInterface
  • Adding implementations for GetVariableList, GetInScopeVariableList, GetValueObjectForFrameVariable, and GetValueForVariableExpressionPath to ScriptedFrame
  • Adding a patch to CommandObjectDWIMPrint that essentially delegates printing frame variables to the synthetic frame (if we have a frame, and it’s synthetic).

I would be more than happy to contribute these changes, and if y’all have thoughts/advice/comments happy to go through that as well. Basically, I want to make sure I’m not stepping on any toes, or if y’all are close to having something ready then I can just use that :slight_smile: I don’t have a PR yet, but I can put one up if y’all are interested.

I was also thinking a little about how to support synthetic frame variables (like fr v on synthetic frames) and wasn’t really sure - something like an SBVariable could be interesting, but then it really would need to be a separate command that the scripted interface completely handles. Another thing that I’m also trying to avoid is too much reliance on Python - for reasons I am going to try and port this over to C++ when I have time, so I don’t want to end up with something that only works through the python API.

(1) The caveat is that it looks like it has to be a C++ variable object, because GetVariableList requires VariableSP, which is hard to construct synthetically. It’d be really cool to make that easier, but not strictly necessary as long as the caveat is documented somewhere.

cc:
@JDevlieghere @jingham and @mib

1 Like

Hi Aman,

Glad to see your interest on contributing to LLDB more specifically to frame providers and scripted frames :slight_smile:

We’re not actively working on this quite yet, but it’s definitely on our list of things to do, so any contribution is welcome! Please feel free to add @jingham and I as reviewers.

  • Adding SBValueList to swig

This should already be the case. Did you have to do something specific ?

  • Allowing SBInterpreter to get a ValueObjectSP from an SBValue and associated conversions to/from python for ValueObjectSP and ValueObjectListSP
  • Adding std::optional<lldb::ValueObjectListSP> GetVariables() override; and lldb::ValueObjectSP GetValueObjectForExpression(llvm::StringRef expr, Status &status) override; to ScriptedFramePythonInterface
  • Adding implementations for GetVariableList, GetInScopeVariableList, GetValueObjectForFrameVariable, and GetValueForVariableExpressionPath to ScriptedFrame

This sounds right, please make sure to split it into different PRs to make it easier to review.

  • Adding a patch to CommandObjectDWIMPrint that essentially delegates printing frame variables to the synthetic frame (if we have a frame, and it’s synthetic).

At first glance, I’m not sure what would need to change here, since the CommandObject’s ExecutionContext should hold the synthetic frame, so it should just dispatch the frame->GetValueForVariableExpressionPath call to python implementation (mentioned in the previous bullet).

I was also thinking a little about how to support synthetic frame variables (like fr v on synthetic frames) and wasn’t really sure.

Is the goal here to separate the synthetic variables from the real local variables ?
If that’s the case, I know frame var (and it’s respective python API) has a frame variable --dynamic-type option that can be set. May be we could extend that to add a synthetic option, @jingham might have a good idea where this should go.

This should already be the case. Did you have to do something specific ?

Yeah - I needed conversions for Python → SBValueList, and a few other ValueObject helpers in ScriptedPythonInterface.h. I’ll put up the PRs, hopefully they’ll be nice and easy :slight_smile:

This sounds right, please make sure to split it into different PRs to make it easier to review.

Absolutely, will do. Might take a while but I’ll do it when I have time.

At first glance, I’m not sure what would need to change here, since the CommandObject’s ExecutionContext should hold the synthetic frame, so it should just dispatch the frame->GetValueForVariableExpressionPath call to python implementation (mentioned in the previous bullet).

So you’re right - it does. The main issue here is actually the side effect of using the current path, which is that the printer tries again and swallows errors (which makes sense, given how it’s implemented). But, if you’re handling expressions at your scripting layer, you want to customize the errors. For example - if I don’t modify the printer, I have a situation where I try to print an out-of-bounds register and the error is “unknown identifier rMN” - my provider knows that MN is out of bounds and so could provide a more descriptive error, but that error gets swallowed if I bubble it back up the SBError, or if I print it, then it’s displayed alongside the confusing error.

Is the goal here to separate the synthetic variables from the real local variables ?
If that’s the case, I know frame var (and it’s respective python API) has a frame variable --dynamic-type option that can be set. May be we could extend that to add a synthetic option, @jingham might have a good idea where this should go.

Sort of - I want to have frame variable show synthetic frame variables, but right now they’re tied to having a VariableSP which is difficult to construct. By difficult here, I mean “you need a lot of information that assumes things are compiled to something that looks like machine code with dwarf”. I was thinking that having a new command (or something, I like your idea of a flag) might mean I could provide just ValueObjectSP, which are possible to construct through SBValue without messing up the frame commands.

We already make a distinction between showing “locals” and showing “arguments” when printing variables from the stack (the -an and -l flags to frame var.) So adding another qualifier for “extended” variables would fit nicely.

What did you have in mind? Also, just to be clear, when you say ‘extended’ variables, you mean for example ‘interpreter-materialized variables’?

Yes, pretty much, though I was being intentionally abstract.

I was following what we did for thread backtrace where if there are “backtraces related to this backtrace but not a natural part of it” we will print them with the `–extended` flag. There’s a python callback that can be implemented to provide these “extended backtrace threads” to lldb. We don’t have to make any claims about who produces these extended backtraces, presumably the producer will name them in a way that makes that clear.

At the implementation layer, I don’t think we need to do more than provide a way to add a callback to SBFrame (and probably SBTarget which manages globals) that returns an SBValueList, and some API to return all the results from the providers attached to the frame or target, with some SBFrame equivalent API. Then if you did frame var -e we would print that result. It would probably be convenient to add a “category” string to SBValueList, so that the provider can annotate what they are providing. If we wanted extra credit, we could do: frame var -e <category> to select one of these categories.

Stack of PRs - I haven’t addressed Jim’s comments about the implementation for the extended variable support, but the rest are up.

Stack here: [lldb] Move ValueImpl and ValueLocker to ValueObject, NFC. by bzcheeseman · Pull Request #178573 · llvm/llvm-project · GitHub

The stack has landed at this point, I wanted to move discussion back from the PR (link here: [lldb] Make `print` delegate to synthetic frames. by bzcheeseman · Pull Request #178602 · llvm/llvm-project · GitHub) to this thread:

This doesn’t seem like the right way to do this to me. What should be happening fundamentally is that when anybody asks “can you find the variable foo” for a synthetic frame, the frame’s synthetic provider’s variable list should be queried, rather than (or along with depending on what policy we’re going to choose for access to synthetic variables) the “native” variable list from the frame. You would want that to work not just in dwim-print but anywhere that variables get looked up in a frame.

Yes that does seem much more correct. I think the issue I was running into was understanding where that change should be made!

Once you’ve handed out the right root variable, the parsing of child nodes for an expression will happen naturally since that’s just querying the root’s type, so if you get the right root variable, you get the appropriate children. That shows that properly the task is “hand out the right variables from the frame when asked”.

That makes sense. I hope that works with Values as well, since we can’t produce variables at this point :sweat_smile: My reading of the code is that you’re right, it should, but I’m not as confident as I’d like to be.

The one thing that dwim-print and expr should probably do is warn or error out if the user tries to running expressions in a frame with synthetic variables. Even that wouldn’t be necessary if we take the extra step of materializing the synthetic values into the inferior when the frame makes them. But that’s probably going to be tricky and seems like a “version 2” kind of effort. In the case of StackFrame::EvaluateExpression and expr you should be able to do this at the top of the command (except expr --top-level which explicitly forces a different scope). For dwim-print, you would error out if the frame var lookup failed and before it falls back to running expressions.

Do you mean in the context of something like “I have provided a synthetic variable from my interpreter, and now I want to run an expression on it”? Wouldn’t you just have the interpreter itself run that expression? Or not, I guess, depending on the thing being interpreted.

But note, if you said frame var (Foo *) synthesized_var_name with the new DIL support, you would first want the DIL parser to parse this to the point where it sees a name it has to ask for. If you handed out the right variable when the DIL asked for it by name, then the rest of this would all fall out naturally.

Sorry what’s the DIL here? That sounds promising, but I don’t know enough to evaluate. The main blocker I encountered was finding things/methods that required VariableSP (hard to create) vs ValueObjectSP (easy to create).

So while we probably want to put some “can’t run expressions that have to run in the target in a frame with synthesized variables” warning when in a frame with synthesized variables, the actual lookup into the synthesized variables list needs to happen at a lower layer in the lookup.

Got it, makes sense. Do you have a sense for where something like that should be done? It kind of sounds like you’re shooting for putting some of this stuff into the expression evaluation machinery? I’m happy to try and put some more time into it, I will just need your guidance :slight_smile: I would also like to see if there’s a good way for the plugin to propagate errors back up to the user through the print/expression evaluation.

I also wonder how much overlap there is between this and the extended frame variables you mentioned above @jingham - it sounds like there is, but I am not seeing it so was hoping you could point me in the right direction.

The “don’t evaluate expressions on synthetic frames” should be handled at the StackFrame::EvaluateExpression layer. For now you can check whether the stack frame is a synthetic frame and return an appropriate error for now StackFrame::EvaluateExpression. When somebody gets super-ambitious, they could implement a evaluate_expression affordance that the Synthetic StackFrames can supply, so for instance in the case where we’re emulating a Python call from some C stack frame sequence, the synthetic python frame can try running the expression as a python expression.

The other issue is how frame var and SBFrame::FindVariable and the like should work with scripted frames. Do we want to have “arguments”, “locals” and “extended” variables. That would be the most reasonable model if the added variables were extra info that you want to tell some user about this frame. But if we’re trying to use the scripted frame to re-present the work of the frame in some more understandable way, then you would want to replace the variable list in the frame altogether.

I think the simplest rule is that the scripted frame entirely takes over the variables lists in the frame. And then there should be a way to get at the “unscripted” frame’s variables (or if the scripted frames stack, the previous frame in the scripted stack’s variables). That way if the script writer wants to make up some new variables, and forward others, they could do either as they wished.

If I’m reading it aright, all the code in StackFrame that looks over list of variables uses the list returned by StackFrame::GetVariableList. So that seems like a good place to check whether the current StackFrame has a variables provider, and if it does, return that provider’s list of variables.

So long as the scripted stack frame has a handle on the frame(s) it is the presentation of, it can query that if it wants to access or present the ‘real’ frame variables.

I don’t see this code on StackFrame - the closest thing I see is GetValueForVariableExpressionPath, which is already forwarded. Did you mean something else?

That’s how it works today. The script writer is expected to provide all variables, whether they forward the ‘real’ ones or not is up to them.

The issue with that (and above) is that producing new Variables is dramatically harder than producing new ValueObjects. I don’t disagree though, and maybe the answer is just to make it simpler to produce new Variable objects? Or maybe I’m misunderstanding your statements?

I think this is the crux - I want to replace the variable list with synthetic variables, but it’s difficult to do that because lldb_private::Variable objects are hard to produce. I think the thing I was trying to do previously was basically, is there a world in which frame var et al call a different function on synthetic frames so that they can look through the ValueObject list, rather than the Variable list. Or like I said above, if we can make Variables easier to produce I’m all ears :slight_smile:

Just to be clear, nobody but the producer should care about the difference between ValueObjectVariables and ValueObject’s you make with CreateValueObjectFromData, etc. And often, the StackFrames don’t actually hand out the ValueObjectVariable even when we start with one.

For instance, if the actual ValueObjectVariable is a pointer to a C++ base class, then we don’t hand out the ValueObjectVariable but rather the ValueObject we make to reflect the dynamic type of the object rather than it’s base-class static type. Similarly, if there’s a synthetic child provider, what you get from FindVariables is not the original ValueObjectVariable but the ValueObjectSynthetic that generates the synthetic children.

So the term “Variable” in SBFrame::FindVariable and so forth does not mean “this returns the ValueObjectVariable’s made directly from the DWARF DW_TAG_variable’s in this frame”. It means “this returns what would be meaningful as the list of variables in this frame.”

They have to be ValueObjects, but that’s about the only requirement.

W.R.T. StackFrame::EvaluateExpression: I was writing too loosely, internally we chose to implement “Evaluate an Expression in the context of a StackFrame” by calling Target::EvaluateExpression, but passing the stack in the execution context. That’s where you would check “is this frame showing synthetic variables” and error out because you don’t actually know how to run expressions using these synthetic variables.

You should really never have to worry about GetValueForExpressionPath. That’s just about finding a root ValueObject in the Variable’s list from the StackFrame, and then navigating its children. The latter is done by following the child structure you made in the ValueObjects, so you influence that by making the child structure correctly. And the finding part we do by returning the synthetic rather than the “natural” variable object list.

The reason I like to consider the “add extended variables model” as a counterpart to the “replace all the variables” is that it gives you a more flexible Pythonic way to do one fairly common IDE debugger trick. If you are in some frame and want to keep your eye on some signal that’s not just the value of a single variable, but rather the result of an expression, you can usually add that expression in the Locals view in your debugger, and then as you step along, it’s easy to follow the result of the expression.

If we added “extended” variables that key off a stack frame in some way, then instead of having to do this manually every time you stopped in that frame, you could proved the result of this evaluation as a pythonic “extended variable” in the frame. Once that’s in place, every time you stopped in this frame, the result of the signal is automatically in front of you.

For that, it would be annoying to have to write “forward everything and add one thing” it would be more fitting to the purpose to be able to “add one more”.

Note, this is another project, and I’m certainly not saying both have to be done together in the same PR. This is just “why I think extended frame variables programmed in Python might be a useful feature.”