lldb showing wrong type structure for virtual pointer type

Hi,

I'm having a problem where lldb will resolve the wrong type for virtual pointers, showing incorrect data for variables. This makes debugging our project very hard.

In our project, we commonly have the following structure:

class Transform : SomeParentClass
{
   virtual foo();
   void bar();
   /*...*/
}

namespace Scripting {
class Transform : SomeScriptingParentClass
{
   /*...*/
}
}

ie, we have native internal classes (like "Transform" in this case), and wrapper classes with identical names (but in a different namespace). (These are used to link the native class to a user facing API of the same name).

Now, when I put a breakpoint into "Transform::bar" and try to inspect the "this" variable, lldb shows "this" as a pointer to "Scripting::Transform" instead of as a pointer to "::Transform", thus showing the wrong data and making it impossible to inspect it's member variables. Since this is a common structure in our code, it makes debugging very hard.

Now, it seems that this is caused by Transform being a virtual class. So lldb will try to derive it's type at runtime by looking at the symbol name of the vtable (which is "__ZTV9Transform"). Then it will incorrectly map that to "Scripting::Transform" instead of "::Transform", which seems to be a bug in lldb.

I can work around the problem by patching the mach-o binary to remove the name of the vtable ("__ZTV9Transform") from the symbol table. Then lldb will be unable to look up the type dynamically at runtime, and use the dwarf info of the "bar" function, which specifies "this" to be a pointer to "::Transform". Obviously, this is a rather inconvenient workaround.

I guess I could rename the scripting representations of all our classes to use a different naming scheme (like "Scripting::_Transform"), but I'd only like to do that as a last resort.

I'm using lldb-900.0.64.
Unfortunately, I have not yet succeeded in coming up with a small, independent repro case which shows this problem.

So I'm wondering:
-Is this a known issue?
-Is there a fix?
-Any ideas for a better workaround?

Thanks for any help!

jonas

I'm using lldb-900.0.64.

             ^^^^^^^^^^^^^^
             ???
Latest official release is 5.0.1; also there are 6.0.0 (at -rc3, the next release)
and 7.0.0 (a.k.a SVN trunk). What's the 'version' output of your LLDB prompt?

Unfortunately, I have not yet succeeded in coming up with a small, independent repro case which shows this problem.

IIUC this is it:

struct A {
   int id0;
   A () { id0 = 111; }
   virtual int f (int x) { return x + 1; }
   int g (int x) { return x + 11; }
};

struct B: A {
   int id1;
   B () { id1 = 222; }
   virtual int f (int x) { return x + 2; }
   int g (int x) { return x + 12; }
};

namespace S {
   struct AS {
     int id0;
     AS () { id0 = 333; }
     virtual int f (int x) { return x + 3; }
     int g (int x) { return x + 13; }
   };
   struct B: AS {
     int id1;
     B () { id1 = 444; }
     virtual int f (int x) { return x + 4; }
     int g (int x) { return x + 14; }
   };
}

int main (int argc, char *argv[])
{
   B obj1;
   S::B obj2;
   return
     obj1.f (argc) +
     obj2.f (argc) +
     obj1.g (argc) +
     obj2.g (argc);
}

And in gdb, it is:

$ gdb -q t-class2
Reading symbols from t-class2...done.
(gdb) b S::b::f
Breakpoint 1 at 0x400775: file t-class2.cc, line 25.
(gdb) b S::b::g
Breakpoint 2 at 0x400789: file t-class2.cc, line 26.
(gdb) r
Starting program: /home/dantipov/tmp/t-class2

Breakpoint 1, S::b::f (this=0x7fffffffdb50, x=1) at t-class2.cc:25
25 virtual int f (int x) { return x + 4; }
(gdb) bt
#0 S::b::f (this=0x7fffffffdb50, x=1) at t-class2.cc:25
#1 0x0000000000400643 in main (argc=1, argv=0x7fffffffdc68) at t-class2.cc:36
(gdb) p this
$1 = (S::B * const) 0x7fffffffdb50
(gdb) p *this
$2 = {<S::AS> = {_vptr.AS = 0x400840 <vtable for S::B+16>, id0 = 333}, id1 = 444}
(gdb) c
Continuing.

Breakpoint 2, S::b::g (this=0x7fffffffdb50, x=1) at t-class2.cc:26
26 int g (int x) { return x + 14; }
(gdb) bt
#0 S::b::g (this=0x7fffffffdb50, x=1) at t-class2.cc:26
#1 0x0000000000400669 in main (argc=1, argv=0x7fffffffdc68) at t-class2.cc:38
(gdb) p this
$3 = (S::B * const) 0x7fffffffdb50
(gdb) p *this
$4 = {<S::AS> = {_vptr.AS = 0x400840 <vtable for S::B+16>, id0 = 333}, id1 = 444}

E.g. in calls to obj2.f () and obj2.g (), 'this' is 0x7fffffffdb50, and the object
itself is {333, 444}.

With lldb, it is:

$ /home/dantipov/.local/llvm-6.0.0/bin/lldb t-class2
(lldb) target create "t-class2"
Current executable set to 't-class2' (x86_64).
(lldb) breakpoint set -n S::b::f
Breakpoint 1: where = t-class2`S::b::f(int) at t-class2.cc:25, address = 0x000000000040076a
(lldb) breakpoint set -n S::b::g
Breakpoint 2: where = t-class2`S::b::g(int) + 11 at t-class2.cc:26, address = 0x0000000000400789
(lldb) run
Process 5180 launched: '/home/dantipov/tmp/t-class2' (x86_64)
Process 5180 stopped
* thread #1, name = 't-class2', stop reason = breakpoint 1.1
     frame #0: 0x000000000040076a t-class2`S::b::f(this=0x00007fffffffdb50, x=1) at t-class2.cc:25
    22 struct B: AS {
    23 int id1;
    24 B () { id1 = 444; }
-> 25 virtual int f (int x) { return x + 4; }
    26 int g (int x) { return x + 14; }
    27 };
    28 }
(lldb) bt
* thread #1, name = 't-class2', stop reason = breakpoint 1.1
   * frame #0: 0x000000000040076a t-class2`S::b::f(this=0x00007fffffffdb50, x=1) at t-class2.cc:25
     frame #1: 0x0000000000400643 t-class2`main(argc=1, argv=0x00007fffffffdc58) at t-class2.cc:36
     frame #2: 0x00007ffff712000a libc.so.6`__libc_start_main(main=(t-class2`main at t-class2.cc:31), argc=1, argv=0x00007fffffffdc58, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdc48) at libc-start.c:308
     frame #3: 0x000000000040054a t-class2`_start + 42
(lldb) p this
(S::B *) $0 = 0x00007fffffffdb50
(lldb) p *this
(S::B) $1 = {
   S::AS = (id0 = 111)
   id1 = 222
}
(lldb) c
Process 5180 resuming
Process 5180 stopped
* thread #1, name = 't-class2', stop reason = breakpoint 2.1
     frame #0: 0x0000000000400789 t-class2`S::b::g(this=0x00007fffffffdb40, x=1) at t-class2.cc:26
    23 int id1;
    24 B () { id1 = 444; }
    25 virtual int f (int x) { return x + 4; }
-> 26 int g (int x) { return x + 14; }
    27 };
    28 }
    29
(lldb) bt
* thread #1, name = 't-class2', stop reason = breakpoint 2.1
   * frame #0: 0x0000000000400789 t-class2`S::b::g(this=0x00007fffffffdb40, x=1) at t-class2.cc:26
     frame #1: 0x0000000000400669 t-class2`main(argc=1, argv=0x00007fffffffdc58) at t-class2.cc:38
     frame #2: 0x00007ffff712000a libc.so.6`__libc_start_main(main=(t-class2`main at t-class2.cc:31), argc=1, argv=0x00007fffffffdc58, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffdc48) at libc-start.c:308
     frame #3: 0x000000000040054a t-class2`_start + 42
(lldb) p this
(S::B *) $2 = 0x00007fffffffdb40
(lldb) p *this
(S::B) $3 = {
   S::AS = (id0 = 333)
   id1 = 444
}

Here 'this' is different between calls to obj2.f () and obj2.g () (0x00007fffffffdb50 vs.
0x00007fffffffdb40), and objects are shown as different as well - {111, 222} vs. {333, 444}.

Dmitry

I'm using lldb-900.0.64.

           ^^^^^^^^^^^^^^
           ???
Latest official release is 5.0.1; also there are 6.0.0 (at -rc3, the next release)
and 7.0.0 (a.k.a SVN trunk). What's the 'version' output of your LLDB prompt?

It is what I posted:

jechter$ lldb --version
lldb-900.0.64
  Swift-4.0

Maybe Apple uses a different versioning scheme for lldb distributed with their toolchains?

Unfortunately, I have not yet succeeded in coming up with a small, independent repro case which shows this problem.

IIUC this is it:

[...]

Here 'this' is different between calls to obj2.f () and obj2.g () (0x00007fffffffdb50 vs.
0x00007fffffffdb40), and objects are shown as different as well - {111, 222} vs. {333, 444}.

Thanks. What you are showing there seems very peculiar.

But I don't think it's the same problem as I have (and also, using the same steps on my machine does not repro the problem you showed - I get the same value for "this" and it's members between the calls to S::b::f and S::b::g).

My problem was not about showing a wrong object (My "this" pointer value was correct), but about showing a wrong type representation of the correct object data.

jonas

Jonas,

What are you using to inspect the this pointer? You can use "frame variable" (the equivalent of gdb's "info locals") which just relies on debug info or the expression evaluator e.g. "print". Do both methods show the same problem?

Also note that lldb by default will try to discern the full dynamic type of the variables it prints. You can disable this by doing:

(lldb) expr -d no-dynamic-values -- this

or equivalently:

(lldb) frame variable -d no-dynamic-values this

Is it the dynamic value resolution that's causing the incorrect printing?

Jim

Jonas,

What are you using to inspect the this pointer?

Normally, the Xcode debugger UI.

You can use "frame variable" (the equivalent of gdb's "info locals") which just relies on debug info or the expression evaluator e.g. "print". Do both methods show the same problem?

(lldb) frame variable this
(Scripting::UnityEngine::Transform *) this = 0x000000010fe2eb20

That gives me the wrong namespace

(lldb) print this
(Scripting::UnityEngine::Transform *) $4 = 0x000000010fe2eb20

That also gives me the wrong namespace

But:

(lldb) print *this
(Transform) $5 = {
[...]

gives me the correct (global) namespace.

Also:

(lldb) frame variable -d no-dynamic-values this
(Transform *) this = 0x000000010fe2eb20

gives me the correct namespace.

Also note that lldb by default will try to discern the full dynamic type of the variables it prints. You can disable this by doing:

(lldb) expr -d no-dynamic-values -- this

or equivalently:

(lldb) frame variable -d no-dynamic-values this

Is it the dynamic value resolution that's causing the incorrect printing?

Yes, both of those above give me the correct types!

Now, this is already very helpful - Thank you!
This means I can get correct values using the lldb console. If there was some way to make the Xcode UI show the correct values, that would be even better.

jonas

Interesting.

First off, you can turn off fetching dynamic values globally (including in the Xcode Locals view) by putting:

settings set target.prefer-dynamic-value no-dynamic-values

in your ~/.lldbinit file. You can toggle this on and off in a session, though Xcode won't notice you've changed the value till you cause it to refresh the locals (step or whatever).

We do log the process of finding the dynamic type. You can see this by running the command:

log enable -f /tmp/lldb-object-log.txt lldb object

Probably easiest to put that in your .lldbinit.

That channel also logs when we read in modules, and so it might be a little chatty, but you should see:

<SOME_ADDRESS>: static-type = '<STATIC_TYPE>' has vtable symbol 'vtable for <DYNAMIC_CLASS>'

and then some more messages that trace our attempt to look up DYNAMIC CLASS. If you turn on those logs, what do you see for these classes?

Jim

Interesting.

First off, you can turn off fetching dynamic values globally (including in the Xcode Locals view) by putting:

settings set target.prefer-dynamic-value no-dynamic-values

in your ~/.lldbinit file. You can toggle this on and off in a session, though Xcode won't notice you've changed the value till you cause it to refresh the locals (step or whatever).

This will fix the output of "frame variable". But it does not seem to fix the variable display in the UI.

We do log the process of finding the dynamic type. You can see this by running the command:

log enable -f /tmp/lldb-object-log.txt lldb object

Probably easiest to put that in your .lldbinit.

That channel also logs when we read in modules, and so it might be a little chatty, but you should see:

<SOME_ADDRESS>: static-type = '<STATIC_TYPE>' has vtable symbol 'vtable for <DYNAMIC_CLASS>'

and then some more messages that trace our attempt to look up DYNAMIC CLASS. If you turn on those logs, what do you see for these classes?

0x000000010f62ecd0: static-type = 'Transform *' has vtable symbol 'vtable for Transform'

0x000000010f62ecd0: static-type = 'Transform *' has dynamic type: uid={0x100012d7a}, type-name='Transform'

jonas

Interesting.

First off, you can turn off fetching dynamic values globally (including in the Xcode Locals view) by putting:

settings set target.prefer-dynamic-value no-dynamic-values

in your ~/.lldbinit file. You can toggle this on and off in a session, though Xcode won't notice you've changed the value till you cause it to refresh the locals (step or whatever).

This will fix the output of "frame variable". But it does not seem to fix the variable display in the UI.

They must be setting the dynamic value directly when they fetch the values. That of course overrides the general setting. I'll go see why they do that, but that won't help you for now.

We do log the process of finding the dynamic type. You can see this by running the command:

log enable -f /tmp/lldb-object-log.txt lldb object

Probably easiest to put that in your .lldbinit.

That channel also logs when we read in modules, and so it might be a little chatty, but you should see:

<SOME_ADDRESS>: static-type = '<STATIC_TYPE>' has vtable symbol 'vtable for <DYNAMIC_CLASS>'

and then some more messages that trace our attempt to look up DYNAMIC CLASS. If you turn on those logs, what do you see for these classes?

0x000000010f62ecd0: static-type = 'Transform *' has vtable symbol 'vtable for Transform'

0x000000010f62ecd0: static-type = 'Transform *' has dynamic type: uid={0x100012d7a}, type-name='Transform'

Grr... We go from the name in the vtable symbol to the class type using Module::FindTypes, passing the exact_match flag 'cause we know this is an exact match. Turns out the Module::FindTypes only obeys its exact_match flag if either the name you pass in starts with :: or if you can dial up the exact type kind (struct, class, enum, etc...) you are looking for. We can't do the latter here because we don't know whether the dynamic type is a class or a struct, and we were just passing the name we got from the vtable.

I have a small fix for the dynamic type issue in r326412. Fixing the FindTypes behavior is more involved, and I don't know whether other places rely on this misbehavior. I filed:

https://bugs.llvm.org/show_bug.cgi?id=36556

to examine that issue further.

Thanks for reporting this.

Jim

FWIW, I just out found that I can trivially work around this problem by changing the order of object files passed to the linker.

It turns out that lldb will just pick up the first type named "Transform" in the binary (regardless of namespace). Now, if I make sure that the object file containing the "correct" Transform type comes first in the linker command line, it will also be first in the binary, and it will be picked up by lldb. Now, in theory, this might cause the same problem, just the "other way around", when debugging code using the namespaced version of the type, but in our case, that should not be a problem, because the latter is not a virtual type, so lldb should not use dynamic lookup in that case.

jonas

Yes, the code was assuming that "exact_match" would be obeyed so it explicitly specified returning only 1 match. When exact doesn't work, that means you get the first one... You will only have a problem with the one whose name matches both types, since the other way around the vtable name will have the namespace in it, and it won't match the bare one. So your workaround should be good altogether.

Jim