Bug? in evaluation

Hello everyone.

I just realised thing which leads to crash of debuggee in some cases. We had a bug in our tracker: http://youtrack.jetbrains.com/issue/OC-7389

We have some system of value renderers. Each renderer(e.g for NSCollections) evaluates some stuff to get info about collection elements. So does a number of Summary and Synthetic Providers too.

In SB-API it is implemented with EvaluateExpression function. One of the ways we can evaluate expression is to call lldb::SBFrame::EvaluateExpression() member function. Actually it performs execution on selected thread/frame. But not on the frame we call EvaluateExpression function on. It’s very not obvious and in my opinion buggy. Usage of API in this way leads to crashes of debuggee process like in the ticket above. So crashes not only attempt to evaluate expression but attempt to get local variables with dynamic types if it executes target as well.

So workaround for us was to select specified thread/frame before doing evaluation. So does interpreter’s expr command.

That is not the intention of the design, and also not what I see:

(lldb) source list -f foo.c -l 1
   1 #include <stdio.h>
   2
   3 int
   4 foo (int input)
   5 {
   6 int local_var = input * 5;
   7 printf ("Local var: %d.\n", local_var);
   8 return local_var;
   9 }
   10
   11 int
(lldb)
   12 main (int argc, char **argv)
   13 {
   14 int local_var = argc;
   15 printf ("Foo returns: %d.\n", foo (local_var));
   16 return 1;
   17 }
(lldb) b s -p "return local_var"
Breakpoint 1: where = foo`foo + 41 at foo.c:8, address = 0x0000000100000ef9
(lldb) run
Process 98518 launched: '/private/tmp/foo' (x86_64)
Local var: 5.
Process 98518 stopped
* thread #1: tid = 0x3663ec, function: foo , stop reason = breakpoint 1.1
    frame #0: 0x0000000100000ef9 foo`foo at foo.c:8
   5 {
   6 int local_var = input * 5;
   7 printf ("Local var: %d.\n", local_var);
-> 8 return local_var;
   9 }
   10
   11 int

So I am in foo, there is a "local_var" in frame 0, where its value is 5, and in frame 1 where its value is 1. So I do:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.

frame_0 = lldb.thread.GetFrameAtIndex(0)
print frame_0.EvaluateExpression("local_var")

(int) $0 = 5

frame_1 = lldb.thread.GetFrameAtIndex(1)
print frame_1.EvaluateExpression("local_var")

(int) $1 = 1

quit()

SBFrame::EvaluateExpression, at least in this example, works on the SBFrame that you call it on.

Note, here I am using lldb.thread, which is INDEED the command interpreter's currently selected thread. I just did that to shorten the example. If you wanted to get the process/thread/frame independent of entities selected in the Command Interpreter, then you should use the accessors on SBDebugger, SBTarget, SBProcess and SBThread. The Python docs explicitly state that you should NOT use lldb.process/lldb.thread or lldb.frame anywhere but in the interactive script interpreter. They aren't even guaranteed to be set in other contexts.

Anyway, if you have some example (not using lldb.process/lldb.thread/lldb.frame) where the EvaluateExpression on a SBFrame object is evaluating the expression is the context of some other frame, please file a bug and we will take a look.

Jim

Hi Jim,

You’re right that it uses correct context of frame specified. But as far as I can see, real execution is running in other thread. May it cause something like EXC_ARM_BREAKPOINT?
And what run_others parameter of Process::RunThreadPlan() does?

One of the big dangers with evaluating expressions that run code in the debugee is that the code you run might try to acquire some locked resource that is held by another thread in the program. That will cause the expression evaluation to deadlock.

One solution is to run the expression evaluation with a timeout, and if it takes too long, cancel the expression, clean up its stack and return an error. However, that can also be dangerous, for instance if the expression you run successfully acquires lock A, tries to get lock B, but that is held on another thread, so it deadlocks. If you cancel the expression evaluation at that point, you will leave lock A stranded, and your program going to grind to a halt as different threads all try to get that resource.

The obvious solution to this problem is to run all threads when you do, expression evaluation. However, that's not what most users want, they would find it disconcerting to have expression evaluation on one thread to cause some other thread to make progress.

lldb's solution to this problem is that when we run expressions, we first try running just the thread on which the expression is evaluated, but with some timeout (which is user settable.) If the expression evaluation times out on the one thread, and the "run_others" parameter to RunThreadPlan is passed in as true, we will interrupt the evaluation, and then restart with all threads running. So it is very possible that expression evaluation could cause another thread to get a chance to execute, which could in turn hit a breakpoint, or crash, or whatever else a thread might do while executing...

Short of the ability to track all locks in the system (a capability which most OS'es don't provide), this is the safest way around this problem. It will fail if the expression you are running tries to acquire a non-recursive lock held on the thread which runs the code. We could work around even that if we made up a debugger thread in the debugee for running expressions, though that would fail if any of the code in the expression tried to access thread specific data. I don't know whether it is possible to fake a thread's thread specific data so it looks like it belongs to another thread. If that's possible, we could probably make that work. OTOH, this particular problem doesn't seem to occur that often.

Anyway, again without being able to play around with the locks in a program directly, I can't see a way to run general expressions that might acquire locked resources without allowing all threads to run, at least as a fallback.

Jim

Hello, Jim

The code I was debugging in described case is GCD-related stuff so the issue with locks could be the cause. But issue is gone once we select thread/frame - that’s weird.

One of the big dangers with evaluating expressions that run code in the debugee is that the code you run might try to acquire some locked resource that is held by another thread in the program. That will cause the expression evaluation to deadlock.

One solution is to run the expression evaluation with a timeout, and if it takes too long, cancel the expression, clean up its stack and return an error. However, that can also be dangerous, for instance if the expression you run successfully acquires lock A, tries to get lock B, but that is held on another thread, so it deadlocks. If you cancel the expression evaluation at that point, you will leave lock A stranded, and your program going to grind to a halt as different threads all try to get that resource.

The obvious solution to this problem is to run all threads when you do, expression evaluation. However, that’s not what most users want, they would find it disconcerting to have expression evaluation on one thread to cause some other thread to make progress.

lldb’s solution to this problem is that when we run expressions, we first try running just the thread on which the expression is evaluated, but with some timeout (which is user settable.) If the expression evaluation times out on the one thread, and the “run_others” parameter to RunThreadPlan is passed in as true, we will interrupt the evaluation, and then restart with all threads running. So it is very possible that expression evaluation could cause another thread to get a chance to execute, which could in turn hit a breakpoint, or crash, or whatever else a thread might do while executing…

Short of the ability to track all locks in the system (a capability which most OS’es don’t provide), this is the safest way around this problem. It will fail if the expression you are running tries to acquire a non-recursive lock held on the thread which runs the code. We could work around even that if we made up a debugger thread in the debugee for running expressions, though that would fail if any of the code in the expression tried to access thread specific data. I don’t know whether it is possible to fake a thread’s thread specific data so it looks like it belongs to another thread. If that’s possible, we could probably make that work. OTOH, this particular problem doesn’t seem to occur that often.

Anyway, again without being able to play around with the locks in a program directly, I can’t see a way to run general expressions that might acquire locked resources without allowing all threads to run, at least as a fallback.

What does it mean for debugger’s front-end? What shall we see after evaluation? Threads will stop in different frames and do we need to refetch all info about all threads/frames/variables to display correct data?

Thank you Jim, for such a detailed clarification about LLDB internals. I’ll try to investigate the issue deeper and write back if I have some questions.

Hello, Jim

The code I was debugging in described case is GCD-related stuff so the issue with locks could be the cause. But issue is gone once we select thread/frame - that’s weird.

One of the big dangers with evaluating expressions that run code in the debugee is that the code you run might try to acquire some locked resource that is held by another thread in the program. That will cause the expression evaluation to deadlock.

One solution is to run the expression evaluation with a timeout, and if it takes too long, cancel the expression, clean up its stack and return an error. However, that can also be dangerous, for instance if the expression you run successfully acquires lock A, tries to get lock B, but that is held on another thread, so it deadlocks. If you cancel the expression evaluation at that point, you will leave lock A stranded, and your program going to grind to a halt as different threads all try to get that resource.

The obvious solution to this problem is to run all threads when you do, expression evaluation. However, that’s not what most users want, they would find it disconcerting to have expression evaluation on one thread to cause some other thread to make progress.

lldb’s solution to this problem is that when we run expressions, we first try running just the thread on which the expression is evaluated, but with some timeout (which is user settable.) If the expression evaluation times out on the one thread, and the “run_others” parameter to RunThreadPlan is passed in as true, we will interrupt the evaluation, and then restart with all threads running. So it is very possible that expression evaluation could cause another thread to get a chance to execute, which could in turn hit a breakpoint, or crash, or whatever else a thread might do while executing…

Short of the ability to track all locks in the system (a capability which most OS’es don’t provide), this is the safest way around this problem. It will fail if the expression you are running tries to acquire a non-recursive lock held on the thread which runs the code. We could work around even that if we made up a debugger thread in the debugee for running expressions, though that would fail if any of the code in the expression tried to access thread specific data. I don’t know whether it is possible to fake a thread’s thread specific data so it looks like it belongs to another thread. If that’s possible, we could probably make that work. OTOH, this particular problem doesn’t seem to occur that often.

Anyway, again without being able to play around with the locks in a program directly, I can’t see a way to run general expressions that might acquire locked resources without allowing all threads to run, at least as a fallback.

What does it mean for debugger’s front-end? What shall we see after evaluation? Threads will stop in different frames and do we need to refetch all info about all threads/frames/variables to display correct data?

Yes, if other threads get a chance to run, you will need to refresh state. There isn’t any notification that this has happened at present. I think it would be hard for us to know that other threads actually got a chance to run, all we can tell is that it could have happened. But still, it might be a good idea to return that fact from the expression evaluation somehow…

Jim