Custom formatter request

Hello everyone,

I need a little help with a custom formatter for the class below; I'd appreciate if someone could throw together a skeleton I can start from.

Thanks,

Akos

Here's the class layout:

class UniString {

private:
  struct SharedBuffer;

  SharedBuffer* content; // shared buffer storing content of the string

    // Implementation classes and structures

  struct SharedBuffer {
    USize length; // length of the string in UniChar::Layout units
    USize capacity; // capacity of the allocated buffer in UniChar::Layout units
    SInt32 refCounter; // stores number of references to this shared buffer
    char reserved[2]; // padding reserved for the future use
    unichar string[1]; // buffer storing content of the string (extends beyond of the SharedBuffer)

    inline SharedBuffer ();
    inline SharedBuffer (USize initialLength, USize initialCapacity, Int32 initialRefCounter);
  };

};

Hi Akos, have you looked at http://lldb.llvm.org/varformats.html?

The page has lots of information about defining custom formatters in python and with lldb commands.

Dan

Hi.
It looks like your task should be relatively easy since you have the full definition of the UniString and the SharedBuffer.
I am assuming you will want to show the string content as your summary (as it happens for std::string and NSString).
If so, how are you going to use the string buffer? Is it going to contain a pointer to the real data?
I am probably slightly confused by your comment next to it.
Could you explain the logic for storing the string data or provide an example of code that handles e.g. the string allocation, or access to it?
Given that understanding, it should be relatively straightforward to accomplish your task.

Enrico Granata
:email: egranata@.com
✆ (408) 972-7683

Hi Enrico & Daniel,
Thanks all, I’m browsing through the formatters page. I managed to throw together a skeleton, but it doesn’t seem to work too well. Maybe I failed to grasp something:

import lldb
def UniString_SummaryProvider(valobj, dict):
e = lldb.SBError()
s = u’"’
if valobj.GetValue() != 0:
content = valobj.GetChildMemberWithName(‘content’)
length = content.GetPointeeData(0,1).GetChildMemberWithName(‘length’)
string = content.GetPointeeData(0,1).GetChildMemberWithName(‘string’)
i = 0
newchar = -1
while newchar != 0 and i < length:

read next wchar character out of memory

data_val = string.GetPointeeData(i, 1)
newchar = data_val.GetUnsignedInt16(e, 0) # utf-16
if e.fail:
return ‘’
i = i + 1

add the character to our string ‘s’

if newchar != 0:
s = s + unichr(newchar)
s = s + u’"’
return s.encode(‘utf-8’)

def __lldb_init_module(debugger,dict):
debugger.HandleCommand(“type summary add -F UniString.UniString_SummaryProvider GS::UniString”)

Yes, I’d like to show the string member (it’s in UTF-16) in the summary. I was looking at the CFString.py formatter, but that seems to be an overkill in our case.
The string[1] member is an ‘old’ C trick, here’s the allocation routine. It preallocates a separate buffer of the right size, then uses placement new to put the SharedBuffer member there.

inline GS::UniString::SharedBuffer::SharedBuffer (USize initialLength, USize initialCapacity, Int32 initialRefCounter):
length (initialLength),
capacity (initialCapacity),
refCounter (initialRefCounter)
{
}

inline USize GS::UniString::CapacityToBufferSize (USize capacity)
{
return (sizeof (SharedBuffer) + (capacity - 1) * sizeof (unichar));
}

inline USize GS::UniString::BufferSizeToCapacity (USize bufferSize)
{
return ((bufferSize - sizeof (SharedBuffer)) / sizeof (unichar) + 1);
}

GS::UniString::SharedBuffer* GS::UniString::AllocateBuffer (USize capacity)
{
if (capacity == 0)
return ShareEmptyBuffer ();

capacity++; // to ensure capacity for the closing 0 (used in conversion)

USize bufferSize = CapacityToBufferSize (capacity);
if (bufferSize < MinBufferSize4) {
if (bufferSize < MinBufferSize1)
bufferSize = MinBufferSize1;
else if (bufferSize < MinBufferSize2)
bufferSize = MinBufferSize2;
else if (bufferSize < MinBufferSize3)
bufferSize = MinBufferSize3;
else
bufferSize = MinBufferSize4;
}

void* buffer = cachedAllocator.Allocate (bufferSize, &bufferSize);
capacity = BufferSizeToCapacity (bufferSize);

return new (buffer) SharedBuffer (0, capacity, 1);
}

Best, Akos

Hi Akos,
what do you mean by “it doesn’t work too well”? Do you get any diagnostics?
A good debugging technique is to use the LLDB interactive script interpreter to repeat the steps in your script and check where you get issues such as empty or invalid objects (all our SB API objects have an IsValid() call on them).
To bootstrap yourself, you can use:
valobj = lldb.frame.FindVariable(“name of the variable to format here”)

At a glance I can see two issues here:

if valobj.GetValue() != 0:

If your valobj is not a pointer, but a plain record, its value is going to be None (only scalars have values).
You might want to rephrase your check along the lines of:

if (valobj.TypeIsPointerType() == False or valobj.GetValue() != 0):

Also, I am pretty sure you do not need the GetPointeeData() calls here:

content = valobj.GetChildMemberWithName(‘content’)
length = content.GetPointeeData(0,1).GetChildMemberWithName(‘length’)
string = content.GetPointeeData(0,1).GetChildMemberWithName(‘string’)

You should be able to dereference the children of content by name even if content is a pointer.
GetPointeeData() is used when you have a pointer/array and you want to retrieve the byte values it points to/refers.
In your case, you want to retrieve a child of a variable and maintain the value-ness of whatever you get back - hence the SBValue returning calls are what you want/need.

As an aside,

data_val = string.GetPointeeData(i, 1)
newchar = data_val.GetUnsignedInt16(e, 0) # utf-16

You should be able to get away with something along the lines of:

data_val = string.GetPointeeData(0,length) outside the while, and then access the individual uint16’s as data_val.uint16[index]

This way you avoid generating a bunch of one-off SBData objects and keep the entire buffer around (of course, if your buffers are huge, the former solution might still be preferable).

Let me know if I can help you more with the script!

Enrico Granata
:email: egranata@.com
✆ (408) 972-7683