Help with C++ API

Hello,

I have got some questions regarding the LLDB C++ API which I haven't
found answers to in the documentation or by investigating the source
code so far. If there is a better place than this mailing list to
direct this kind of questions to, please let me know!

All the questions below are related to using the API to debug a
program compiled from C++ source.

1. When trying to read data from a dynamically allocated array of
"doubles" I fail to get all the data correctly by using
SBValue::GetPointeeData() and then SBData::ReadRawData(), despite the
fact that no call reports any error, and all the involved SBxxx
objects as valid. However, if I use SBValue::GetChildAtIndex for each
array member, each member is read correctly. What's the difference? I
guess this has to do something with data caching (or lack thereof)?

2. When reading data from a dynamically allocated data structure by
using SBValue::GetChildMemberWithName() etc. on a SBValue representing
a pointer to the (complex) object, and obtained in a *different*
breakpoint callback then the current one, again reading does not fail
in any way, but the read data of any field will be equivalent to
whatever was last read in the *originating* breakpoint callback.
Probably again, only data cached when reading with the SBValue in
scope is returned. I understand that a SBValue is meant to represent a
value a particular execution context, and as such does not need to
support reading data outside of that context. But it there any other
API that would facilitate reading fields of a complex-type object on
the heap at any point in program execution, regardless of the current
execution context? It seems logical that this should be possible.

3. Despite reading the source code, I have not managed to grasp the
exact meaning of the words "dynamic" and "synthetic", whenever they
occur in the API. For example, what precisely do the
lldb::DynamicValueType flags mean? Is this related to dynamically
typed languages, or does it have to do something with caching of data
exchanged between debugger and debugee? What exactly does it imply if
I set a SBValue's "preference" to use dynamic or synthetic value?

Thank you for any answers, or directions that would help me find the
answers myself.

Best regards,
Jakob Leben

Hello,

I have got some questions regarding the LLDB C++ API which I haven’t
found answers to in the documentation or by investigating the source
code so far. If there is a better place than this mailing list to
direct this kind of questions to, please let me know!

All the questions below are related to using the API to debug a
program compiled from C++ source.

  1. When trying to read data from a dynamically allocated array of
    “doubles” I fail to get all the data correctly by using
    SBValue::GetPointeeData() and then SBData::ReadRawData(), despite the
    fact that no call reports any error, and all the involved SBxxx
    objects as valid. However, if I use SBValue::GetChildAtIndex for each
    array member, each member is read correctly. What’s the difference? I
    guess this has to do something with data caching (or lack thereof)?

Do you have a test case for your issue? I would like to take a look.
The fact that GetChildAtIndex() works is not unexpected. I would like to figure out why GetPointeeData() & ReadRawData() are failing.

  1. Despite reading the source code, I have not managed to grasp the
    exact meaning of the words “dynamic” and “synthetic”, whenever they
    occur in the API. For example, what precisely do the
    lldb::DynamicValueType flags mean? Is this related to dynamically
    typed languages, or does it have to do something with caching of data
    exchanged between debugger and debugee? What exactly does it imply if
    I set a SBValue’s “preference” to use dynamic or synthetic value?

Dynamic means “is LLDB going to be allowed to do something to figure out the runtime type of this object”?
Imagine the following C++:

class Foo { virtual int DoSomething(); … };
class Bar : public Foo { virtual int DoSomething(); … };

int TakeAFoo (Foo* f) { return f->DoSomething(); }

TakeAFoo(new Bar());

now if you stop in TakeAFoo(), the “formal” (or static) type of f is Foo*, but its actual (or dynamic) type is Bar*. Can LLDB figure it out for you, or not?
There are three options: No, Without Running Code, By Running Code When Needed.
There needs to be a ternary value because at some point in the future we might need to run some code in the inferior process in order to figure out this information. Currently, either works just fine.

Synthetic means “is LLDB going to be allowed to apply synthetic children to this object”?
Imagine an STL vector: usually it is implemented in terms of three pointers: begin, end, end of storage.
This view is not very useful when you see it in the debugger, e.g.

std::vector V;
V.push_back(1); V.push_back(2);

(lldb) frame variable V
V = {
begin = 0x1234
end = 0x123D
end of storage=0x1248
}

you would rather much want to see
V = {
[0] = 1
[1] = 2
}
(unless you are the implementor of the C++ standard library, that is)

Synthetic children are the LLDB name for the feature that allows the debugger to show logically useful child information. The way it is implemented is to have a new ValueObject on top of the “real” one that vends an aptly crafted set of children.

When you set dynamic and synthetic preferences on an SBValue you are essentially telling LLDB whether you want your SBValue to be backed by a static vs. a dynamic value, and whether you want to see real children or crafted ones, when available.

Hope this helps.

Thank you for any answers, or directions that would help me find the
answers myself.

Best regards,
Jakob Leben


lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Enrico Granata
:envelope_with_arrow: egranata@.com
:phone: 27683

Enrico answered questions 1 & 3.

For question 2, that's really a bit of missing API. There's an SBValue::CreateValueFromAddress which takes one SBValue an address, and an SBType to make a new value object that is just tied to that address and type. That was an SBValue method because that's what was convenient to implement the Synthetic types. But since all that it uses from the original SBValue is the process, it really should just be a method on SBProcess. That would be perfect for you. It would be pretty easy to add that API, if you want to give it a try.

Otherwise, you can cons up a new expression from the type and address of the original SBValue, and use SBTarget::EvaluateExpression to make a free-standing SBValue from that.

Jim

Hi Enrico,

Do you have a test case for your issue? I would like to take a look.
The fact that GetChildAtIndex() works is not unexpected. I would like to
figure out why GetPointeeData() & ReadRawData() are failing.

Erm, sorry, that was my mistake! While cooking up a simple test case,
I realized I was passing size to ReadRawData as element count instead
of count * sizeof(type).

Thanks for other answers, that makes things much clearer!

Further observations that I would like to clarify:

1. Reading elements of std::list works just great using
GetNumChildren() and GetChildAtIndex() on the container, when the
container is set to use synthetic value. However the
"can_create_synthetic" argument of GetChildAtIndex() does not seem to
have any effect, contrary to what documentation suggests. Am I
misreading something or is documentation wrong? Perhaps the argument
only controls the preference of children with regard to grandchildren,
and not how children are obtained.

2. For std::vector, GetNumChildren() returns -1 (with preference for
synthetic)! However, reading children using GetChildAtIndex() etc.
still succeeds.

3. For both std::vector or std::list, I would then expect
container.GetData().GetSignedXxx(error, index) to work as well. It
fails in either case, and regardless of preference of container with
regard to synthetic values.

4. For std::string, GetNumChildren() always returns 1 regardless of
preference for synthetic values, and I haven't found any way to read
elements succesfully.

Hi Jim,

For question 2, that's really a bit of missing API. There's an SBValue::CreateValueFromAddress which takes one SBValue an address, and an SBType to make a new value object that is just tied to that address and type. That was an SBValue method because that's what was convenient to implement the Synthetic types. But since all that it uses from the original SBValue is the process, it really should just be a method on SBProcess. That would be perfect for you. It would be pretty easy to add that API, if you want to give it a try.

That's exactly what I was thinking when looking at
SBValue::CreateValueFromAddress! If you'd make this method independent
of a SBValue instance - that would be fabulous!

Otherwise, you can cons up a new expression from the type and address of the original SBValue, and use SBTarget::EvaluateExpression to make a free-standing SBValue from that.

Good idea, thanks!

Cheers,
Jakob

Hi Enrico,

Do you have a test case for your issue? I would like to take a look.
The fact that GetChildAtIndex() works is not unexpected. I would like to
figure out why GetPointeeData() & ReadRawData() are failing.

Erm, sorry, that was my mistake! While cooking up a simple test case,
I realized I was passing size to ReadRawData as element count instead
of count * sizeof(type).

Thanks for other answers, that makes things much clearer!

Further observations that I would like to clarify:

  1. Reading elements of std::list works just great using
    GetNumChildren() and GetChildAtIndex() on the container, when the
    container is set to use synthetic value. However the
    “can_create_synthetic” argument of GetChildAtIndex() does not seem to
    have any effect, contrary to what documentation suggests. Am I
    misreading something or is documentation wrong? Perhaps the argument
    only controls the preference of children with regard to grandchildren,
    and not how children are obtained.

That is because historically LLDB called “synthetic” something entirely different from “synthetic children”.
Assume you have

Foo* aFoo = new Foo[3];

The previous notion of synthetic was “LLDB, can you make me aFoo[2] as if aFoo was a large enough array of Foo”?

That is was can_create_synthetic controls in GetChildAtIndex().
It is an unfortunate legacy naming that I am aware of and just have never had time to fix.

  1. For std::vector, GetNumChildren() returns -1 (with preference for
    synthetic)! However, reading children using GetChildAtIndex() etc.
    still succeeds.

Interesting. Test case?

  1. For both std::vector or std::list, I would then expect
    container.GetData().GetSignedXxx(error, index) to work as well. It
    fails in either case, and regardless of preference of container with
    regard to synthetic values.

That is probably because you are trying to access the synthetic children by reading into raw memory?
That won’t work. When you try getting data you are reading the raw object, not the synthetic data.
The synthetic data is only accessible by poking at the children

  1. For std::string, GetNumChildren() always returns 1 regardless of
    preference for synthetic values, and I haven’t found any way to read
    elements succesfully.

That makes sense. An std::string only has one child, which is a struct that then contains the pointer
std::string has no synthetic children, also.
It only has a summary.
To figure out what types have what formatters the “type xxxxxx list” commands (where xxxx can be format, summary, synthetic or filter[1]) is your friend

[1] filters are a lightweight type of synthetic children which you can use to only show a subset of the real member variables

Enrico Granata
:envelope_with_arrow: egranata@.com
:phone: 27683

See attached files for debugger and debugee programs.

The output of debugger is always:
count = -1
val = 11
val = 12
val = 13

This is when both debugger and debugee are built either with clang or gcc.

debugee.cpp (80 Bytes)

debugger.cpp (2.44 KB)

Well, we have established that "elements" of a std::list are only
called so in a "synthetic" sense. I think characters are elements of a
std::string in exactly the same sense, so I think the same API should
support accessing both.

Moreover, the "summary" has the notion of pretty-printing, and does
not seem logically suitable for actual-data access.

What do you think of this?

If you feel the need for a synthetic children provider for std::string that vends the actual characters as items, then by all means feel free to make one.

I think such a visualization would be highly inconvenient for most users most of the time, much as you do not want to see a char* so much as an array of chars but rather as a logical “string”, the same is true of std::string

Also, if you know the layout of the string class, you can directly access the data buffer and read the individual bytes out of memory, which is also a discouraging argument from going down the synthetic children route: the added value compared to the actual type layout is quite low.

This is why LLDB vends a summary instead of synthetic children.

Nothing says that you cannot vend both of course (a vector vends a summary, the count, and then the individual items as children), so you could keep the builtin summary and then plug your own synthetic children provider for the purpose of fetching the individual bytes if you need this level of access.
If you need support to figure out the moving parts in this process, feel free to ping back :slight_smile:

Enrico Granata
:envelope_with_arrow: egranata@.com
:phone: 27683

I see you point.

I am approaching the issue from another perspective though: I am
building an application with the concept of "programming-model-aware
debugging", specifically data-flow programming model. I found the LLDB
C++ API valuable, because it allows me to easily build an application
that attaches to another process which performs processing of a
data-flow graph (think multimedia processing) and displays the
processing behavior in a graphical way. So I consider the LLDB C++ API
as a convenient utility and abstraction on top of
operating-system-provided debugging facilities, but for other purpose
than implementing a classical command-line debugger. Instead, it is
used as a utility to examine another program's state and data in a
convenient way.

Therefore, in my use case I need to access the *actual* data to
display it to the end-user in any arbitrary way that LLDB API should
never need to assume in itself. Providing a summary/value of a data
structure's contents as a string (returned by GetSummary() or
GetValue()) is an example of such an assumption.

This is what data formatters are for.
There is an underlying ground truth which is what the DWARF type information vends.
On top of that, the data formatters enable you to vend a different reality of your own making. Whether that reality is that a vector is a container of items, or a string is a container of bytes, that is up to you.
The debugger vends two things in this area:
a) the data formatters subsystem (SBType*.h at the API level and the DataFormatters/ folder in the internals)
b) data formatters for types of interest

The second item is obviously tailored to common expectations and driven by what a user debugging an app expects to see
These formatters are pretty much constrained to live within this space of convenience and usability, or people would complain fairly loudly about it

This is where the first item comes into the picture. We cannot (or do not want) to vend any possible data view through the builtin formatters, but we vend a formatters model through which you can vend whatever formatting suits your needs
Yes, unfortunately that means that you need to know a little more about your types, but somebody needs to “bake the knowledge” in LLDB [1], whether it’s me writing a builtin formatter, or you writing your own

[1] DWARF is still the source of ground truth, anything else you see is custom knowledge that someone implemented on top of the ground truth

What you are trying to access is really not “raw data”. Raw data for a string looks like this:

(std::__1::string) X = {
_r = {
std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >::__rep, std::__1::allocator > = {
_first = {
= {
__l = {
_cap = 139639644119302
_size = 4294967296
_data = 0x00007fff5fbffb70
}
__s = {
= {
_size = ‘\x06’
__lx = ‘\x06’
}
_data = {
[0] = ‘a’
[1] = ‘b’
[2] = ‘c’
[3] = ‘\0’
[4] = ‘\x7f’
[5] = ‘\0’
[6] = ‘\0’
[7] = ‘\0’
[8] = ‘\0’
[9] = ‘\0’
[10] = ‘\0’
[11] = ‘\x01’
[12] = ‘\0’
[13] = ‘\0’
[14] = ‘\0’
[15] = ‘p’
[16] = ‘?’
[17] = ‘?’
[18] = ‘_’
[19] = ‘?’
[20] = ‘\x7f’
[21] = ‘\0’
[22] = ‘\0’
}
}
__r = {
__words = {
[0] = 139639644119302
[1] = 4294967296
[2] = 140734799805296
}
}
}
}
}
}
}

Your desired view of the world is:

(std::__1::string) X = {

[0] = ‘a’
[1] = ‘b’
[2] = ‘c’

[3] = ‘\0’

}

This is far from raw data. And to get anything other than raw data, you need a formatter.

Enrico Granata
:envelope_with_arrow: egranata@.com
:phone: 27683

We cannot (or do not want) to vend any possible data view through the
builtin formatters, but we vend a formatters model through which you can
vend whatever formatting suits your needs
Yes, unfortunately that means that you need to know a little more about
your types, but somebody needs to “bake the knowledge” in LLDB [1], whether
it’s me writing a builtin formatter, or you writing your own

What you are trying to access is really not “raw data”.

[...]

You are right, I should be corrected that what I want to access is not
really raw data, but data formatted in some sense.

However, my point was that we could distinguish (at least) two kinds of
formatters:
a) pretty-printing formatters: they format data for display as strings for
humans to read - the user is the human eye.
b) programming-access formatters: they format data for convenient use in
code (e.g. C++) - the user is the code.

You could see a C++ programming interface as a type b) formatter: the C++
STL specification practically defines how data structures should be viewed,
interacted with, by user-code, regardless of how they are implemented (in
memory and implementation-code). So for example the std::string API is
already a formatter in itself, but the issue in debugging another process
is of course that we can't simply use the address of a std::string in
target program's memory as a std::string pointer in the host program and
use it just as any other std::string.

Now, obviously LLDB offers a generic way to implement formatters of both
types. Naturally, it offers access to raw data, so one can do anything they
please with it. But then again, you can access raw data even without the
LLDB C++ API! So from the point of view of a C++ API user, what's valuable
in LLDB library is not the fact that they can build formatters of their own
from raw data, but rather the already built-in formatters.

Secondly, from the point of view of a LLDB C++ API user, if any built-in
formatters are useful, those would be formatters of type b). Arguments:
1. Primarily, the data accessed through a C++ API should be formatted for
use by client C++ code, and not for use by the human eye.
2. Formatting of type a) can be built on-top-of formatting of type b), but
not the other way around.
3. Consequently, formatting of type a) is most probably implicit within the
implementation of formatting of type b), type a) will arguable often
correspond to a layer within the implementation of type b). So why not
expose this layer through the API?

But since formatters are specific to any C++ library and machine
architecture (and are in this sense a Simple-Matter-of-Programming [1]) I
can accept the answer that this is just not implemented, and I can go and
do it myself if I want. But instead, we are discussing here the semantics
of the generic LLDB C++ API, what interpretation of it makes sense and what
doesn't, and what functionality should be offered behind the API.

So in this regard I argue that it makes sense for characters within a
std::string to be offered as "children" in the context of the SBValue API.
To repeat, I think this sense is exactly the same as the sense of
"children" with regard to std::vector and std::list. This argument can
further be supported by the fact that STL specification offers same
mechanisms to iterate through elements of a std::string, std::vector or
std::list. It can further be argued that a SBValue should offer as children
*anything* that is offered by *any* STL container as its elements, and
furthermore that this becomes a de-facto simple and elegant definition of
SBValue semantics with regard to children and with regard to STL containers.

[1] http://en.wikipedia.org/wiki/Simple_matter_of_programming

Erm, typo, a) and b) should be swapped in the above sentences.

After all, it would actually make a lot of sense to me to have a separate
API (C++ class) for "synthetic" (better named "formatted") observation of
complex data structures, as opposed to having a single boolean flag in
SBValue control such a drastic distinction between SBValue API semantics
with regard to the concept of "children".

The nice thing about doing it the way the SBValues do it now is that you can implement the policy for how variables are going to be displayed at the time you create the values, and then you can just hand the values out to whatever viewers you have, and if they use the simple GetChildAtIndex API's they will obey that policy automatically. I don't think in normal debugger usage you would want to display BOTH the synthetic view of a type AND the the raw view, so the current implementation nicely encapsulates the most common usages.

You can always dial up the specific view options you want later on (for instance by calling the GetChildAtIndex that takes use_dynamic and can_create_synthetic.) You can also fetch the synthetic or dynamic SBValue by hand if you want to ensure you are working with the synthetic children. So you can choose to present data any way you want. We just make it easy to move the choice to value creation time.

Note that the "synthetic" children have other uses besides just emulating STL containers. For instance if you have a complex data structure, but used for purpose A only the first 5 fields are used, and for purpose B fields 1, 5, 7 and 9 are used. So you would make two simple synthetic child providers, one of which returns the first 5 fields, etc, and then you could turn on and off these synthetic child providers according to the problem you are currently debugging. In lldb, the Command access to this feature is by using a type "filter" but under the covers it is just another synthetic child provider.

So it would be narrowing to view the "synthetic children" as only a framework for viewing container types.

Jim

Well, I said "definition of SBValue semantics with regard to children
and with regard to STL containers", in other words - only as far as
STL containers are concerned. I'm just promoting consistency across
STL containers.

Perhaps what I've missed is that there can be several different
synthetic child providers for any parent data type. Even in that case,
it seems there is one such provider that's installed on an SBValue
provided via C++ API by default. So then my proposal applies to these
default providers.

Besides, it's not that SBValue for std::string provides synthetic
children in a different way than I would like. The issue is that it
doesn't provide synthetic children at all! And so far I simply haven't
heard any good reason why it shouldn't by default provide characters
as children.

My writing in this thread has actually gone well beyond the topic of
getting help, so I don't actually expect any reaction to my proposals
here. I'm happy to have provided my suggestions based on using the
API...

So it would be narrowing to view the “synthetic children” as only a framework for viewing container types.

Perhaps what I’ve missed is that there can be several different
synthetic child providers for any parent data type. Even in that case,
it seems there is one such provider that’s installed on an SBValue
provided via C++ API by default. So then my proposal applies to these
default providers.

There can only be one synthetic provider per type that is active.
If you end up installing multiple, through regexp or whatnot, the order of categories determines which one wins.
Technically, a category can only ever contain one provider per type. With regular expressions it is fairly easy to violate this requirement. What happens then is undefined (i.e. it depends on the order in which they were added or in which the iterators we use provide them to us for inspection, …) long story short: don’t rely on any such tricks.

Besides, it’s not that SBValue for std::string provides synthetic
children in a different way than I would like. The issue is that it
doesn’t provide synthetic children at all! And so far I simply haven’t
heard any good reason why it shouldn’t by default provide characters
as children.

It is a largely uninteresting view for most people. The majority of people using LLDB have never expressed the desire to twist their strings open and see an array of characters. Actually, there has been an opposite drive: even a char[] should be displayed as a string.
This is really the only reason why that was not implemented.

Enrico, I’d be grateful if you could help me with this!

Assuming you work with libstdc++, your string data is located at ._M_dataplus._M_p

You are going to have to implement a Python class, as described at: http://lldb.llvm.org/varformats.html

class SyntheticChildrenProvider:
def init(self, valobj, internal_dict):
this call should initialize the Python object using valobj as the variable to provide synthetic children for
def num_children(self):
this call should return the number of children that you want your object to have
def get_child_index(self,name):
this call should return the index of the synthetic child whose name is given as argument
def get_child_at_index(self,index):
this call should return a new LLDB SBValue object representing the child at the index given as argument
def update(self):
this call should be used to update the internal state of this Python object whenever the state of the variables in LLDB changes.[1]
def has_children(self):
this call should return True if this object might have children, and False if this object can be guaranteed not to have children.[2]

What you are probably going to do is:

  • save the valobj as an ivar of the children provider in init
  • in update, you should grab the value of _M_p (a pointer) and save it somewhere. That is going to be your real data source. Return None from update. Really. It’s much safer :slight_smile:

def init(self, valobj, internal_dict):
self.value = valobj
def update(self):

self.ptr_value = self.value.GetChildMemberWithName(“_M_dataplus”).GetChildMemberWithName(“_M_p”).GetValueAsUnsigned(0)

To actually compute your number of children (len of string), you can do one of two things:

  • call strlen() as an expression; it is unsafe
  • read chunks of data until a \0 is encountered, or you realize you are reading way too much and you should bail; this latter case can be fairly common with uninitialized data, and anything that deals with potentially bogus data needs to be hardened against such events, or suffer great pain down the road
    What you technically want is a strlen-with-bounds, that will fail if the length is waaaaay beyond reasonable. We can work details here.

As for get_child_index, I am assuming you will want to call your children [0], [1], …
It is a simple matter of rejecting any name other than those formed as [number], and for those that are well formed, return the number token
Internally, LLDB likes to deal with children by index. But for us humans, it is highly convenient to name children, since we are better at memorizing names than indexes. This call allows LLDB to answer the question “when I am asked for child foo, where do I find it?”. Natural ordering of the DWARF debug information would answer this in the “raw data” case.

Now, the real deal. In get_child_at_index you are going to extract the individual byte.
You can fairly simply go to the process, read a byte at ptr_value+index, and use an expression to make that into a char.
There are more interesting/efficient ways to get the same result, which would involve retrieving the pointee type for _M_p and using that to build an SBValue. Feel free to ask if you want to delve more

has_children is fairly easy: return True and be done with it.

Hope these pointers help you get started!

Enrico Granata
:envelope_with_arrow: egranata@.com
:phone: 27683

As a debugger user I definitely want things (even a char[]) displayed as strings most of the time; however, I do occasionally wish to open the string up and see the raw characters.

For example, yesterday I was troubleshooting an bug with a spurious newline character... being able to see the raw character data in an unambiguous way would have been useful in that case and other cases involving characters that don't display in an "obvious" way (other non-printing characters, combining characters, etc.)

On the other hand, if it's going to introduce a 10% performance penalty in the display as string case, I'll do without.

So while it is rare enough that it hasn't driven me to try and fix it, I would have occasional use for the feature.

(I did implement a summary provider for std::wstring before lldb provided one out of the box because that pain was huge for me.)

Thanks,

Joseph

PS. I may not qualify as most people either :slight_smile: