Frame documentation

Hello,

Continuing with my GUI program, where can I find information or
documentation about the frame format and what everything represents?

There is a lack of documentation in the code, it makes a lot of
assumptions about what every thing means. I wonder where this
assumptions come from.

For instance, what SBLineEntry means? The documentation lacks detail.

If I do:
line_entry = frame.GetLineEntry();

then:
uint32_t line = line_entry.GetLine();
SBFileSpec source = line_entry.GetFileSpec();

I suppose line_enty represents the line entry of the frame in the
source code, and source the source code original file.

I could follow all this code with script and guess most of the
meaning, but it is a lot of work just manually traversing all possible
combinations to know what everything means.

Is there any better way?

I want to travel around functions instead of code. I mean I want to
display just a function and the line offset of the current program
counter in source code with the start of the function as reference,
probably pc-offset works with this.

Also I need to display all the variables of the frame, and to know the
type of every variable, the address pointers point to and a fast way
to access those pointers(I mean getting the real address, not the
ascii representation).

I have this feeling that there is something obvious I don't know that
makes my work unnecessary slow.

Bye

Hi Jose,

There is API documentation, I think it would help to answer some of these questions. From within lldb you can do

(lldb) script help (lldb.SBLineEntry)

There's a copy of the documentation at LLDB python API although the one in lldb is the most up-to-date.

The methods on the SBFrame object should do everything you want.

Process 98511 stopped
* thread #1: tid = 0x1b8a20, 0x0000000100000eba a.out`main(argc=1, argv=0x00007fff5fbffb40) + 122 at a.c:11, queue = 'com.apple.main-thread', stop reason = step over
    #0: 0x0000000100000eba a.out`main(argc=1, argv=0x00007fff5fbffb40) + 122 at a.c:11
   8 for (int i = 0; i < arrsize; i++)
   9 for (int j = 0; j < arrsize; j++)
   10 {
-> 11 buf[i*j] = accum++;
   12 }
   13
   14 return buf[(arrsize * arrsize) - 2] + printf ("%d\n", buf[(arrsize * arrsize) - 3]);
(lldb) fr v
(int) argc = 1
(char **) argv = 0x00007fff5fbffb40
(const int) arrsize = 10
(int) accum = 1
(int) i = 0
(int) j = 0
(lldb) scri

print lldb.frame.GetLineEntry().IsValid()

True

print lldb.frame.GetLineEntry().GetFileSpec().GetFilename()

a.c

print lldb.frame.GetLineEntry().GetLine()

11

print lldb.frame.GetFunction().IsValid()

True

print lldb.frame.GetFunction().GetName()

main

print lldb.frame.GetFunctionName()

main

print lldb.frame.GetPCAddress().GetLoadAddress(lldb.target) - lldb.frame.GetFunction().GetStartAddress().GetLoadAddress(lldb.target)

122

vars = lldb.frame.GetVariables(True, True, False, True)
print vars.IsValid()

True

print vars.GetSize()

8

print vars.GetValueAtIndex(0).GetName()

argc

Note that I'm using "lldb.frame" in these examples. This is only defined in interactive scripting mode, along with lldb.thread, lldb.process, lldb.target. For Python commands that you're writing, you'll be passed in a 'debugger' object and you can do things like

  if debugger and debugger.GetSelectedTarget() and debugger.GetSelectedTarget().GetProcess():
    process = debugger.GetSelectedTarget().GetProcess()
    if process and process.GetSelectedThread().IsValid():
      thread = process.GetSelectedThread()
      frame = thread.GetFrameAtIndex(0)

There's a newer alternate format for python commands where it gets passed in a symbol context (which would give you the frame, thread, process, target all in one) but I don't remember the format for that off-hand.

J

Thank you Jason.

I looked at the online documentation of lldb, that looks like it uses
doxygen or something, and it is inside the code.

My mind supposed(without thinking) the documentation of this was
better than the python help so I did not even looked at it.

What is the difference between GetVariables and get_all_variables?

I am a little anxious to have something that minimally work that I
could publish.

SBFrame::get_all_variables is a shortcut provided for Python scripts (but is not available when using the C++ SB API).

The SBFrame::GetVariables() call I used in my example allows you to specify whether you want argument variables, static variables (I don't know if this is function-static or file-static ...), or local variables. I used it in my example so it would be language-agnostic - valid for either C++ or Python scripting.

If you're writing your script in Python (instead of C++), get_all_variables is fine. There is also get_locals, get_arguments, and get_statics if you want to pick out some of those variables.

J

(I checked, the "statics" bool arg in SBFrame::GetVariables refers to static / global variables that are defined in this compilation unit -- the most likely behavior for that argument.)

Oh, I see, thank you again for the information, Jason.

By its name I was thinking the "all" meaning was getting all the
variables in the stack, not just the one in the frame, so the spected
behavior did not agreed with the reality.

I believe that with this info I could start having something that at
least works minimally. I already have all the user interface working
in cocoa, so we will see how much this thing takes...

Just being curious(and not loving the fact of using something without
understanding it, what the last argument of frame.GetVariable does?.

It says:
DynamicValueType use_dynamic

In the c++ documentation it says I could use those values:
eDynamicCanRunTarget, eNoDynamicValues, eDynamicCanRunTarget,
eDynamicDontRunTarget

What dynamic value type means?. It means c++ auto or something like a
template value?.

How can someone help improve the lldb documentation, so the same
questions do not need to be answered again. I mean how I could help
complete the documentation with the same answers you have provided but
adding some visuals?

This area needs diagrams and pictures to make it easier to get the big
picture fast.

I’ve been running into similar issues as you regarding documentation. What ive been doing is asking here – as you’re doing – and then submitting patches to improve comments. That said, I generally prefer in-code comments over external documentation because external documentation is almost guaranteed to get out of date sooner or later

Same ‘issue’ here, altough i found most things to be intuitive by checking the core classes used by the public API. The only thing that i think needs more attention is the event mechanism, how to use it, what not to do etc.

I’d volunteer to submit doxygen patches for the public API if this becomes a coordinated effort.

Imagine this situation:

class Foo { public: virtual void something() … };
class Bar: public Foo { … };

Foo *aFoo = new Bar();

The declared type of “aFoo” is Foo*, but its actual type at runtime is Bar*
This “actual type at runtime” notion is what LLDB calls the “dynamic type” (vs. the declared, aka “static” type)

You can tell LLDB whether you want the static or the dynamic type when resolving a variable. And you can control whether LLDB is allowed to run code in the inferior process to perform this resolution
As far as I know, there are no cases in which LLDB actually needs to run code to perform this resolution (it used to be the case historically), except that we cheat, and we will run code even if you say “Don’t run” to fetch a table of classes from the ObjC runtime
We are aware that this is bending the rules a little, of course.

Thanks,
- Enrico
:envelope_with_arrow: egranata@.com :phone: 27683

Thank You Enrico, very clear explanation.

Hello,

Continuing with my GUI program, where can I find information or
documentation about the frame format and what everything represents?

There is a lack of documentation in the code, it makes a lot of
assumptions about what every thing means. I wonder where this
assumptions come from.

For instance, what SBLineEntry means? The documentation lacks detail.

If I do:
line_entry = frame.GetLineEntry();

then:
uint32_t line = line_entry.GetLine();
SBFileSpec source = line_entry.GetFileSpec();

I suppose line_enty represents the line entry of the frame in the
source code, and source the source code original file.

When you take an address, like the frame's PC value, you can lookup its symbol context. A symbol context is represented by a SBSymbolContext object.

So you can get a frame:

SBFrame frame = thread.GetFrameAtIndex(0);

Now you can get its symbol context:

SBSymbolContext sc = frame.GetSymbolContext(what_to_get);

A symbol context is a context within the symbol and executable files.

The symbol will come from a module:

lldb::SBModule module = sc.GetModule ();

The following information will be valid if you have debug info (DWARF) and you have debug info for the address that you are stopped at:

lldb::SBCompileUnit cu = sc.GetCompileUnit ();
lldb::SBFunction func = sc.GetFunction ();
lldb::SBBlock block = sc.GetBlock ();
lldb::SBLineEntry line_entry = sc.GetLineEntry ();

The following info will be valid if you have a symbol table in your executable:

lldb::SBSymbol symbol sc.GetSymbol ();

The symbol context represents all of the objects that represent the address that you looked up, in this case it will be the object that represent where you stopped (the frame PC). Each item may or may not be valid and each can be checked with a <obj>.IsValid() call. For example, if you don't have debug info for libc.so, you might end up with a SBSymbol object for "malloc" and a SBModule for "libc.so", and if you don't have debug info the cu, fun, block and line_entry will be invalid.

When getting a symbol context from a frame you can specify exactly what you would like to get from the symbol context by filling in "what_to_get" with a mask from the following bits:

    typedef enum SymbolContextItem
    {
        eSymbolContextTarget = (1u << 0), ///< Set when \a target is requested from a query, or was located in query results
        eSymbolContextModule = (1u << 1), ///< Set when \a module is requested from a query, or was located in query results
        eSymbolContextCompUnit = (1u << 2), ///< Set when \a comp_unit is requested from a query, or was located in query results
        eSymbolContextFunction = (1u << 3), ///< Set when \a function is requested from a query, or was located in query results
        eSymbolContextBlock = (1u << 4), ///< Set when the deepest \a block is requested from a query, or was located in query results
        eSymbolContextLineEntry = (1u << 5), ///< Set when \a line_entry is requested from a query, or was located in query results
        eSymbolContextSymbol = (1u << 6), ///< Set when \a symbol is requested from a query, or was located in query results
        eSymbolContextEverything = ((eSymbolContextSymbol << 1) - 1u) ///< Indicates to try and lookup everything up during a query.
    } SymbolContextItem;

LLDB will only fetch what you ask it to fetch and it will lazily parse debug info in order to achieve the results.

Most people just get everything from a frame:

SBSymbolContext sc = frame.GetSymbolContext(lldb:: eSymbolContextEverything);

The compile unit represents the source file that the frame is in. The function represents the concrete function (like "main"). The block is the deepest lexical block that the frame's PC is stopped within. These lexical blocks all contain variables and the SBBlock might represent an inlined function. Inlined functions are just blocks that provide additional info:

const char *inlined_name = block.GetInlinedName ();

You can also find the call site file and line:

    lldb::SBFileSpec
    SBBlock::GetInlinedCallSiteFile () const;

    uint32_t
    SBBlock::GetInlinedCallSiteLine () const;

    uint32_t
    SBBlock::GetInlinedCallSiteColumn () const;

The block you are stopped in might be a lexical block within a function.

Lets say you are in a a block that is inside an inlined version of std::vector<int>::push_back() within the function "main":

Something like:

/tmp/main.c:

1 int main()
2 {
3 std::vector<int> ints;
4 ints.push_back(12);
5 }

/usr/include/vector:

200 std::vector<T> {
201 void push_back(const T &t)
202 {
203 int x;
204 { <<< frame's block points to this lexical block
205 int y;
206 ... <<< PC is stopped here
207 }
208 }
209 }

Your "block" would have no name and would represent the lexical block at line 204 in /usr/include/vector.
The line table entry in "line_entry" would point to line 206 of /usr/include/vector.
The "func" would represent the "main" function.

So if you ask a frame what function name should represent it you would need to ask "block" to get its containing inlined block. So to find the actual name you want to show to the user, you can use code like:

SBBlock inline_block = block.GetContainingInlinedBlock();

const char *function_name = NULL;
if (inline_block.IsValid())
{
    // We have an inlined function
    function_name = inline_block.GetInlinedName()
}
else if (func.IsValid())
{
    // The frame is a concrete function
    function_name = func.GetName();
}
else if (symbol.IsValid()
{
    function_name = symbol.GetName();
}

Of course this is a very common thing to do to a frame, so this is already done for you with a helper function:

const char *function_name = frame.GetFunctionName();

The SBFrame will do all the work for you.

Inlined frames are all represented by new SBFrame objects, so in the above example, if we get the next frame:

SBFrame frame2 = thread.GetFrameAtIndex(1);

frame2 will have:

function = "main"
block = block from main.c:2 and this block will have no containing inlined block
line_entry = main.c:4

I could follow all this code with script and guess most of the
meaning, but it is a lot of work just manually traversing all possible
combinations to know what everything means.

Is there any better way?

Ask here, and we need to improve the documentation, that is for sure.

I want to travel around functions instead of code. I mean I want to
display just a function and the line offset of the current program
counter in source code with the start of the function as reference,
probably pc-offset works with this.

For this you will do the following for each frame:

uint32_t num_frames = thread.GetNumFrames();
for (i = 0...num_frames)
{
    SBFrame frame = thread.GetFrameAtIndex(i);
    const char *name = frame.GetFunctionName();
    SBLineEntry line_entry = frame.GetLineEntry();
    // Display info from name and line_entry
    SBValueList variables = frame.GetVariables (true, // arguments
                          true, // locals
                        true, // statics variables
                          true, // only get variables that are in scope
                        use_dynamic); // Set to one of the values from lldb::DynamicValueType

}

Also I need to display all the variables of the frame, and to know the
type of every variable, the address pointers point to and a fast way
to access those pointers(I mean getting the real address, not the
ascii representation).

These are all represented by the values in the "variables" list which is of type SBValueList.

for (v = 0...variables.GetSize())
{
    SBValue var = variables.GetValueAtIndex(v);
    const char *type_name = var.GetDisplayTypeName();
    const char *name = var.GetName();
    const char *value = var.GetValue();
    const char *summary = var.GetSummary();
    const char *location = var.GetLocation();
    // You will put these into a tree view of course and allow the entry to be expanded of the value has children
    const bool might_have_children = var.MightHaveChildren();

}

Later, you can get all child values of "var" by calling:
    for (c = 0...var.GetNumChildren())
    {
       SBValue child = var.GetChildAtIndex(c);
       ....
    }

Hopefully this explains all you need to do. Let me know if anything needs clarification.

Greg Clayton

Wow!! Greg, this is simple an incredible answer. Very useful
information, thank you again.

I was scrapping little bits of the information you have posted on your
mail on my own, but it is way more clear now. It will take a while for
me to process all this information.

One of the things I realized about variables is that they are not a
hierarchy but a network. So in order to make a graphical treeview that
in independent of the back end code it seems I need to do little
tricks.

Imagine we have an structure parent, parent references children, but
each children also references their parent, so we have circular
references or loop references.

The same could happen if we use double linked chain list , with each
node pointing to next node but also to the last one.

Now I want to represent the hierarchical structure of the variables in
a DataSource(the model of the data in a ModelViewContoller design)
that a viewcontroller is going to represent in the GUI. The
viewcontroller knows nothing about SBData or any other lldb code,
which makes porting(and maintaining) to other GUIs like Qt later much
easier.

So for generating this Data Model we load the frame variable parent*
as an SBValue, we dereference the pointers with each of their
children, now we continue with each children and we find the reference
to the parent again, so if we continue it will never end, and the
three will be infinite.

So for every reference I need some way of looking at the other
references in the in process tree, and if already there stop there.

But I suppose the people of lldb faced this problem before I did and
you have already solved it somehow. Did you?

Anyway,like with Frankenstein, the thing is finally taking shape and
soon will get alive. The possibilities lldb bring look out of this
world. Dreams like creating a video of the evolution of data
structures while the program become possible, making possible to see
your program from other perspectives and using parts of your brain
that monotone text atrophies is within reach now, something you could
almost touch now.

If I understand your concern, you’re asking how we deal with situations like this:

class Foo {
Foo* next;
};
// … code …
Foo foo;
foo.next = &foo;

where there is a back-reference chain, and you would essentially end up navigating forever

The solution that LLDB uses is actually quite simple. We have a concept of “pointer depth”, i.e. you can tell the debugger how many levels of pointers one should traverse. If, for example, you set it to 2, you would get

foo
foo.next
foo.next
unexpanded pointer

In a graphical environment, you could only turn down pointers at explicit user request (imagine a turndown like Xcode’s Variables View has). Then, sure, the user can keep turning down, but it’s not an infinite self-hanging loop, it’s an action-reaction feedback with constantly identical-yet-deeper reactions.

Or, of course, you could get really fancy, and store pointers as you encounter them, and refuse to expand anything you expanded, maybe providing a visual “back up” arrow that tells you you’re in a reference loop.

Thanks,
- Enrico
:envelope_with_arrow: egranata@.com :phone: 27683

Yes, exactly Enrico, this is what I suspected after navigating
manually forever(for a limited time) in Xcode.

But I need a finite data model. I could store a tree of SBDatas and
tell the ViewController how to traverse the nodes in "just in time".
At first it looks like a good solution, but after adding all the types
complexity, handling all the combinations of 8 bit, 16 bit, 32 and 64
types of data and all the integer, signed and unsigned, float... it is
going to be a fairly complex code.

Mixing complex code with already complex lldb an GUI code is not a good idea.

With a finite data model, we just add a viewcontroller for Qt and most
of the code and complexity is common among platforms and can be used
in different OSes, and people don't need to understand cocoa(or the
given GUI) and lldb and objective c and c++ at the same time for
understanding the code as it is modularized.

Wow!! Greg, this is simple an incredible answer. Very useful
information, thank you again.

I was scrapping little bits of the information you have posted on your
mail on my own, but it is way more clear now. It will take a while for
me to process all this information.

One of the things I realized about variables is that they are not a
hierarchy but a network. So in order to make a graphical treeview that
in independent of the back end code it seems I need to do little
tricks.

Imagine we have an structure parent, parent references children, but
each children also references their parent, so we have circular
references or loop references.

That is fine. You don’t auto expand anything in your view (or you can, but you should stop at pointers or references). If you don’t auto expand anything, then your users will do so by clicking to expand the tree view item and they can expand as much as they want to.

“frame variable” will print children or structs, unions and classes, but pointers and references don’t get expanded for this very reason.

The same could happen if we use double linked chain list , with each
node pointing to next node but also to the last one.

Again, don’t expand pointers and references and you are ok.

Now I want to represent the hierarchical structure of the variables in
a DataSource(the model of the data in a ModelViewContoller design)
that a viewcontroller is going to represent in the GUI. The
viewcontroller knows nothing about SBData or any other lldb code,
which makes porting(and maintaining) to other GUIs like Qt later much
easier.

So for generating this Data Model we load the frame variable parent*
as an SBValue, we dereference the pointers with each of their
children, now we continue with each children and we find the reference
to the parent again, so if we continue it will never end, and the
three will be infinite.

Just don’t auto expand ptrs and refs.

So for every reference I need some way of looking at the other
references in the in process tree, and if already there stop there.

Most GUIs will just show the top level variables and not even expand structs, unions and classes unless the use clicks on a disclosure triangle. Try loading the lldb/examples/python/lldbtk.py file:

(lldb) file a.out
(lldb) b main
(lldb) run
(lldb) command script import /users/me/lldb/examples/python/lldbtk.py
(lldb) tk-variables

Not the pointers haven’t been expanded. You can probably loot the python code in lldbtk.py and make it work with your GUI framework.

But I suppose the people of lldb faced this problem before I did and
you have already solved it somehow. Did you?

Yep, GUIs don’t expand anything unless the user clicks to expand it. Below I clicked on the triangle before “path” to expand it:

Anyway,like with Frankenstein, the thing is finally taking shape and
soon will get alive. The possibilities lldb bring look out of this
world. Dreams like creating a video of the evolution of data
structures while the program become possible, making possible to see
your program from other perspectives and using parts of your brain
that monotone text atrophies is within reach now, something you could
almost touch now.

Hopefully my comments shed some light on things above?

That is fine. You don't auto expand anything in your view (or you can, but you should stop at pointers or references). If you don't auto expand anything, then your users will do so by clicking to expand the tree view item and >they can expand as much as they want to.

Oh, as first user I want to expand things a lot. :smiley:
But I am not auto expanding all the nodes for the user. I am creating
a hierarchical structure that is finished when the user clicks on it.
This way every time the debugger stops I can do:

1-Create Data Model(translated from lldb)
2-Display Data Model(Not auto expanded, but the user can navigate the
already created model WITHOUT access to lldb code, only with access to
an lldb free Data Model).

If I analyze the already stored nodes and do not repeat it I will be fine too.

Most GUIs will just show the top level variables and not even expand structs, unions and classes unless the use clicks on a disclosure triangle. Try loading the lldb/examples/python/lldbtk.py file:

Sure. That is what most GUIs do, and the reason I am doing my own,
because I want to do things that those GUIs don't let me to do. :slight_smile:
The treeview is just the middle step in a working(proof of concept)
prototype. In the future I plan on using something more similar to
this:

http://www.nicholaschristakis.net/images/research/images/network-images/1.jpg

A node graph with an aerial view of the variables of my program, that
could display graphically how they evolve over time.

(lldb) file a.out
(lldb) b main
(lldb) run
(lldb) command script import /users/me/lldb/examples/python/lldbtk.py
(lldb) tk-variables
Not the pointers haven't been expanded. You can probably loot the python code in lldbtk.py and make it work with your GUI framework.

I will give it a look. thanks.

Yep, GUIs don't expand anything unless the user clicks to expand it. Below I clicked on the triangle before "path" to expand it:
Hopefully my comments shed some light on things above?

All the feedback in the list has been of great help, but there is
limitations to any online communication. Some times the best
communication is through working code, even offline I have seen people
understand what I mean only after testing the program itself.

Another thing is that I need to access big amounts of memory at the same time.

The intended use of this program is for accessing 5.000, 10.000, a
million elements at the same time, for displaying it in a graph, for
creating a difference graph, or for showing a little image. That kind
of things.

So accessing just one element at a time using a function for each
children element of an array is very inconvenient and inefficient.

So the ideal thing should be to access memory directly but with a
mutex or something for avoiding conflict as the GUI lives on a thread,
the debugger on another and (I suppose) the debugged process in
another one.

What is the best way to access memory of a process directly?

lldb uses a cache when it reads memory from the process it is debugging. So when any of the FindVariable type calls need to read memory, they will read in page sized chunks and cache it. Another read from the same page won't require a second call into the target to read the same memory, but will use the cache. I think you want to let the cache handle ganging memory reads in some efficient way.

Jim

Another thing is that I need to access big amounts of memory at the same time.

The intended use of this program is for accessing 5.000, 10.000, a
million elements at the same time, for displaying it in a graph, for
creating a difference graph, or for showing a little image. That kind
of things.

So accessing just one element at a time using a function for each
children element of an array is very inconvenient and inefficient.

Not to mention if you have:

uint32_t ints[10000000];

If you read the memory for "ints" you will need to read "10000000 * sizeof(uint32_t)" bytes. If you start accessing children by expanding "ints" you will get:

ints[0]
ints[1]
...
ints[9999999]

You don't want to read the bytes for any children of "ints" because they will just duplicate what you read for "ints". So you can either skip reading bytes for aggregate types (structs, unions, classes, arrays) and only read them for basic types (ints, floats, pointers, bools, etc).

So the ideal thing should be to access memory directly but with a
mutex or something for avoiding conflict as the GUI lives on a thread,
the debugger on another and (I suppose) the debugged process in
another one.

What happens when your variable is in a register like rax? You can't assume all variables have addresses.

What is the best way to access memory of a process directly?

size_t
SBProcess::ReadMemory (addr_t addr, void *buf, size_t size, lldb::SBError &error);

Again, remember, not everything has an address. You can check with your SBValue by calling:

SBValue value = ...;

lldb::addr_t load_addr = value.GetLoadAddress();

if (load_addr != LLDB_INVALID_ADDRESS)
{
    // Item is in memory
}
else
{
    // Not in memory, or in memory but section isn't loaded (like global variable before or after running your process)
}