When is it safe to cache `SyntheticProvider.update()`?

The documentation of SyntheticProvider.update() mentions that you can return true or false to control the caching, but that doing this incorrectly might lead to incorrect output. It’s not super clear what the correct instance to cache is. There are a couple of other relevant questions:

  • What is the lifecycle of a valobj? Does it persist when stepping to the next line?
  • What exactly is being cached? Synthetic children can be created via raw memory addresses, is it caching the memory address or the data stored at the memory address?
  • Does it matter if the language has dynamic values(?) (e.g. objc) or not?

Looking through the code that uses eRefresh and eReuse isn’t super clear either. Some sort of minimal example of when to use it and/or a description of the “intended” usecase (e.g. non-relocatable objects? non-resizable objects? effectively-const objects?) would help a lot.

There are two reasons why you might need to refetch children from a synthetic child provider.

For the first reason, the salient question is “is the data source for the Synthetic children I have produced at stop id 5 the same as it is at stop id 6?” If the data source is the same, then you don’t need to remake the value object provided by the Synthetic Child provider. When someone asks the child what its value is, it will tell its data source to sync up with the process state to fetch the new value and since the data source is the same, the new value from the original child will be correct.

Suppose, for instance, I have a ValueObject where the Synthetic Child Provider is providing children by forwarding the ValueObjects of two of the original ValueObject’s ivars as the Synthetic Children.

On update, provided the address of the non-synthetic ValueObject has not changed since the last time it was updated, the two ValueObjects you made from the ivars will still fetch their data from the same memory location using the same type as they did when they were made.

So those ValueObjects don’t need to be reconstructed, they just need to have UpdateValue called on them to sync them up with the current memory contents they are supposed to be reporting. And since we always call UpdateValueIfNeeded before we ask questions of a ValueObject, that update will happen naturally. Their data source remained constant, so they didn’t need to be recreated.

You only need to tell lldb to flush its cache when the data source for one of the children you produced is no longer valid on the new stop. For instance, suppose I have a class that offers a dynamically resizable store, but for efficiency stores “small” data in a fixed sized ivar and then when the data gets bigger than some limit, it moves the data from a fixed size ivar to a dynamically resizable one. In that case, you would always want to refetch, because you originally handed out a child that pointed to the fixed ivar, and you need to replace it with one that points to the resizable store.

The other reason for using eRefetch is when the number of children for a container class changes. In that case, when the number of children changes, you need to throw them all away and recreate them. We could have added something like eRefetchNew in case you knew that all the ones you had already produced had the same data source.

But that’s not necessary because you don’t ACTUALLY have to destroy the individual VO’s when the number of children changes. The synthetic provider can hold onto them internally as well as provide them as children. Then if it knows that the first N synthetic VO’s still have the same data source, it can tell the ValueObjectSynthetic to discard its ValueObjectSP’s and refetch them, and can reuse all the ones that have the same data source, and add to them as needed. But from the standpoint of the ValueObjectSynthetic, it’s throwing them all away and getting new ones.

Note, this is actually a good way to do it, since the ValueObjects hold onto their “IsChanged” flag. So if you provide the same value object for a given slot in the ValueObjectSynthetic, it will calculate IsChanged correctly for you.

For a bit of context, I’m maintaining the visualizer scripts for Rust. Some of this might have gone a bit over my head, so let me know if I’m misunderstanding anything:

  • LLDB will retain a valobj if it has the same address as it did on the prior step
  • On the new step, it will read from that address to update its value

For example, given:

let mut x: u8 = 5;
x += 10;

x will have the same ValueObject on both steps, but a different value. The same is true for synthetic children of ValueObjects

  • This update process is distinct from the caching process (and thus happens whether you return True or False from SyntheticProvider.update(), since the caching process completely yeets the old ValueObject and replaces it with a new one

To sum all that up, when you tell LLDB to cache update, the intention is to store the total number of children and the address of each child (via the child SBValue’s).


There’s a few more questions I had if the logic works like that, but I did some manual testing and either I’m drastically misunderstanding or something is broken.

To quote the full text of the footnote for update()

This method is optional. Also, a boolean value must be returned (since lldb 3.1.0). If False is returned, then whenever the process reaches a new stop, this method will be invoked again to generate an updated list of the children for a given variable. Otherwise, if True is returned, then the value is cached and this method won’t be called again, effectively freezing the state of the value in subsequent stops. Beware that returning True incorrectly could show misleading information to the user.

I don’t know if there’s a single statement in that that is correct at the moment.

  1. a boolean value isn’t required to be returned. Returning None works fine (though this may just be None being coerced to False?)

  2. update is not invoked at a breakpoint. It’s invoked (twice for some reason) when you request the variable (e.g. frame var x), but only the first time at each breakpoint, regardless of whether update returns True or False

  3. At subsequent breakpoints, update is called again (twice) for the value regardless of whether True or False is returned. This appears to be because the underlying ValueObject is not retained between breakpoints.

Here’s a quick demonstration of this behavior. Here is the update function, it prints out every time it is run, and includes the SBValue.GetID() and SBValue.changed values:

def update(self):
        print(f"update for var '{self.valobj.name}' ({self.valobj.GetID()=}) ({self.valobj.changed=})")

        self.length = self.valobj.GetChildMemberWithName("length").GetValueAsUnsigned()
        self.data_ptr: SBValue = self.valobj.GetChildMemberWithName("data_ptr")

        self.element_type = self.data_ptr.GetType().GetPointeeType()
        self.element_size = self.element_type.GetByteSize()
        return True

And the output from LLDB

Process 31272 stopped
* thread #1, name = 'main', stop reason = breakpoint 1.1
    frame #0: 0x00007ff6e2b11a2d sample.exe`sample::main::hdd4c521338427624 at main.rs:53:9
   50
   51   fn main() {
   52       let mut x = [1, 2, 3, 4].as_slice();
-> 53       x = x;
   54       let y = [5, 6, 7, 8, 9, 10].as_slice();
   55       x = y;
   56
(lldb) v 
update for var 'x' (self.valobj.GetID()=1) (self.valobj.changed=False)
update for var 'x' (self.valobj.GetID()=1) (self.valobj.changed=False)
(&[i32]) x = size=4 {
  [0] = 1
  [1] = 2
  [2] = 3
  [3] = 4
}
(lldb) v
(&[i32]) x = size=4 {
  [0] = 1
  [1] = 2
  [2] = 3
  [3] = 4
}
(lldb) s   
Process 31272 stopped
* thread #1, name = 'main', stop reason = step in
    frame #0: 0x00007ff6e2b11a41 sample.exe`sample::main::hdd4c521338427624 at main.rs:54:33
   51   fn main() {
   52       let mut x = [1, 2, 3, 4].as_slice();
   53       x = x;
-> 54       let y = [5, 6, 7, 8, 9, 10].as_slice();
   55       x = y;
   56
   57       dbg!(x);
(lldb) v
update for var 'x' (self.valobj.GetID()=20) (self.valobj.changed=False)
update for var 'x' (self.valobj.GetID()=20) (self.valobj.changed=False)
(&[i32]) x = size=4 {
  [0] = 1
  [1] = 2
  [2] = 3
  [3] = 4
}
(lldb) thread step-over
Process 31272 stopped
* thread #1, name = 'main', stop reason = step over
    frame #0: 0x00007ff6e2b11a5d sample.exe`sample::main::hdd4c521338427624 at main.rs:55:5
   52       let mut x = [1, 2, 3, 4].as_slice();
   53       x = x;
   54       let y = [5, 6, 7, 8, 9, 10].as_slice();
-> 55       x = y;
   56
   57       dbg!(x);
   58       // let signed = -1i8;
(lldb) s
Process 31272 stopped
* thread #1, name = 'main', stop reason = step in
    frame #0: 0x00007ff6e2b11a67 sample.exe`sample::main::hdd4c521338427624 at main.rs:57:5
   54       let y = [5, 6, 7, 8, 9, 10].as_slice();
   55       x = y;
   56
-> 57       dbg!(x);
   58       // let signed = -1i8;
   59       // let chr = 'c';
   60       // let val1 = 1u8;
(lldb) v
update for var 'x' (self.valobj.GetID()=39) (self.valobj.changed=False)
update for var 'x' (self.valobj.GetID()=39) (self.valobj.changed=False)
update for var 'y' (self.valobj.GetID()=61) (self.valobj.changed=False)
update for var 'y' (self.valobj.GetID()=61) (self.valobj.changed=False)
(&[i32]) x = size=6 {
  [0] = 5
  [1] = 6
  [2] = 7
  [3] = 8
  [4] = 9
  [5] = 10
}
(&[i32]) y = size=6 {
  [0] = 5
  [1] = 6
  [2] = 7
  [3] = 8
  [4] = 9
  [5] = 10
}

I think you correctly understood what I was describing.

I’m not sure about the comment you cited, I agree it seems wrong. Calling update is supposed to be cheap, because it’s neither required nor recommended to produce all the synthetic children for a value when update is called. After all, UpdateValue will get called when I ask the Type of a ValueObject, and you certainly don’t want to realize 10000 array children to answer struct foo[10000]. So it would make no sense not to call update every time you stop. And the synthetic child provider can do a much more efficient job of handling its children if it gets called every time.

We do tend to convert None returns from Python as whatever the default behavior for that API should be. I’m not sure whether that’s a great idea or not, but we do do that in a lot of places…

Note also, there’s a separate GetIsConstant API that gets checked before any of the updating is done to bypass calling UpdateValue, so the UpdateValue return is not about handling constant ValueObjects, it is for ValueObjects where we expect the underlying entity to change…

But there’s another wrinkle that’s getting in the way of your experiments:

There’s two cases to consider here.

The simple one is where you get a ValueObject yourself, for instance with SBFrame.FindVariable and hold onto it over a continue and then ask it for its value on the next stop. That’s probably the easiest case to use to examine the actual behavior of ValueObject’s.

The second case is, call frame var x then step, call frame var x again and see what happens. That’s a little more complicated. The problem is, if you continue and stop again, how does lldb know whether the ValueObject for x which you realized in a frame on one stop is the right ValueObject for the local variable in any given frame on the next stop?

That would be a simple question to answer if we didn’t build frames lazily; you’d just make sure that all the stack frames older than and up to the one where you realized x had the same StackID’s and then you’d know it was a pretty safe bet you should reuse it.

But lldb works quite hard to make sure that it doesn’t build the entire stack frame on every stop since that can be time consuming for deep stacks.

So if for instance you only ever unwind the stack to the first frame, how do we know whether we should reuse x for the first frame? After all, the you could have run from breakpoint to breakpoint, and when you hit the breakpoint the second time, the stack above the 0th frame is completely different. It wouldn’t be correct to reuse the ValueObject in that case.

It’s been a while since I’ve gone looking at the details, but lldb does this rather crudely right now. IIRC, the way it works is:

  1. right before a Resume, lldb stores the old StackFrameList (including the StackFrame objects that hold the ValueObjects for the locals) in an “old stack frame list” in the Thread.

  2. After the stop, at the point where you have fully unwound the new stack frame list, lldb will compare the old and new stack frame lists and copy over the StackFrames with their ValueObjects from the old list to the new list. Then when you ask for a local variable from the frame, it will be represented by the ValueObject made in the previous stop.

This can fail in two ways.

  1. If you haven’t fully unwound the old stack frame we’ll never believe the two have equivalent frames and won’t do the copy.

  2. If you ask for a local variable from a frame before you’ve fully unwound the new stack, then at that point we have to make a new StackFrame object to store the ValueObject we return to represent that local variable. So after that point, even if you do fully unwind the stack we can’t copy over the old to new StackFrame since we’ve already handed out ValueObjects from the one we made to service the locals request.

I suspect that we’re being too conservative about not wrongly reporting “IsChanged” for local variables and we could come up with better heuristics that would result in preserving the locals ValueObjects more often. For instance, we could take into account how we got from the old to the new stop. If it was a continue all bets are off. But if it was a step or a next then it’s highly unlikely that a frame with the same StackID between the two stops represents a different StackFrame. But I haven’t had time to revisit that code to see what can be done.

Anyway, so if you are primarily interested in following the handling of ValueObjects over resumes, using the script interpreter to get the SBValue for some variable and explicitly fetching the value from that SBValue after a continue will be a more useful experiment. This works because the ValueObject’s for local variables record the StackID they were realized at, and if you ask them to updated themselves, they will unwind the StackFrames up to that StackID and if it still exists unquestioningly refresh themselves in the context of that stack frame.

OH, I’m a dummy. Between some fiddling with this and looking at the C++ synthetics, I think I get it. I thought that update() was a 1-time choice (i.e. at some point you say “okay, no more updates anymore”). I was really confused at why it bothered to keep calling update() even after it had returned True before lol

Instead, it looks like it’s a choice you make at each step (i.e. “there’s no need to update it right now”). That makes a LOT more sense.

Thanks for all the info =)