Redefining functions

Hi,

I’m trying to create an LLDB command that sets an internal breakpoint for a function, and then executes some code, but I’m having come difficulties…

I’ve seen the expression command, which does something close to what I want to do after the breakpoint, but I have some doubts. I want the code to be able to return from the function where it’s called, but the “target->EvaluateExpression” doesn’t let the code return from it (while I would like to execute code with something like “if (condition) return NULL; more code…”). Is there a way to compile arbitrary code (with return statements) and execute it?

Is there a way to create something like an anonymous function (with certain parameters), and have it compiled and linked, while looking up global variables? ClangUtilityFunction doesn’t look up any variables, and I can’t seem to find a way to look up global variables without a Frame object.

Is there a way to know a function (or method)'s address from its prototype?

My final purpose is to be able to redefine functions on-the-fly (with caveats for inlined functions, etc). The only way I saw that could work was creating a (similar) function and making the other function a trampoline (either using breakpoints, or writing a jmp expression at its address)… Did I miss another easier way?

Thanks for the help,

Filipe

Hi,

I'm trying to create an LLDB command that sets an internal breakpoint for a function, and then executes some code, but I'm having come difficulties...

I've seen the expression command, which does something close to what I want to do after the breakpoint, but I have some doubts. I want the code to be able to return from the function where it's called, but the "target->EvaluateExpression" doesn't let the code return from it (while I would like to execute code with something like "if (condition) return NULL; more code…"). Is there a way to compile arbitrary code (with return statements) and execute it?

Not currently.

Is there a way to create something like an anonymous function (with certain parameters), and have it compiled and linked, while looking up global variables?

Current expressions can do the lookups, but as you already know they don't live beyong the first invocation.

ClangUtilityFunction doesn't look up any variables, and I can't seem to find a way to look up global variables without a Frame object.

For globals you shouldn't need the frame. If the globals are in your symbol table and are external you might be able to use dlsym().

Is there a way to know a function (or method)'s address from its prototype?

A normal fuction that was compiled into your code or an expression function?

My final purpose is to be able to redefine functions on-the-fly (with caveats for inlined functions, etc). The only way I saw that could work was creating a (similar) function and making the other function a trampoline (either using breakpoints, or writing a jmp expression at its address)… Did I miss another easier way?

We do want the ability to just compile up something in an LLDB command but we don't have that yet. You currently can do this via python if you really want to by making a source file, invoking the compiler on it, and then making a dylib. You can then use the "process load" command to load the shared library:

(lldb) process load foo.so

So if you have your python code do the global variable lookups and create the source code, you could hack something together.

When/if you are ready to try and take over the function, you can look for any "Trampoline" symbols. For a simple a.out program on darwin we see:

(lldb) file ~/Documents/src/args/a.out
Current executable set to '~/Documents/src/args/a.out' (i386).
(lldb) image dump symtab a.out
Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out, num_symbols = 18:
               Debug symbol
               >Synthetic symbol
               >>Externally Visible
               >>>
Index UserID DSX Type File Address/Value Load Address Size Flags Name
------- ------ --- ------------ ------------------ ------------------ ------------------ ---------- ----------------------------------
....
[ 10] 16 Trampoline 0x0000000000001e76 0x0000000000000006 0x00010100 __stack_chk_fail
...
[ 12] 18 Trampoline 0x0000000000001e7c 0x0000000000000006 0x00010100 exit
[ 13] 19 Trampoline 0x0000000000001e82 0x0000000000000006 0x00010100 getcwd
[ 14] 20 Trampoline 0x0000000000001e88 0x0000000000000006 0x00010100 perror
[ 15] 21 Trampoline 0x0000000000001e8e 0x0000000000000006 0x00010100 printf
[ 16] 22 Trampoline 0x0000000000001e94 0x0000000000000006 0x00010100 puts

On MacOSX, you could then easily patch the trampoline code to call your own function for say "printf" by modifying the function address in the PLT entry.

For what it's worth, we implemented a feature like this in the Apple fork of gdb, called "Fix and Continue". Sun's dbx has the feature as well (also called F&C there). MS Visual Studio and HP's wdb also had something similar. All of these allow you to modify source code in the middle of a debug session, compile it into a loadable bundle, and have the debugger patch all references to the newly loaded code.

With C code, it's difficult to implement but possible. With C++ things get much more complicated. On our system, we had some assistance from the Objective-C runtime and compiler to make things work .. but it's very hard for users to understand what is legal and what is illegal (or ill-advised) to change in the middle of execution. It makes for great demos, and for certain types of changes it can work well. But it's so easy to mess up that in practice I don't think it's a good feature to support with these languages.

There's a feature in the Apple debugger GUI (Xcode's debugger interface) where the about-to-be-executed source line has an arrow pointing to it in the source code window. The user can drag this arrow back a few lines, or forward a few lines, to change which source line will be executed when execution resumes. It's a cool feature but you really need to understand how the compiler generates code and what assumptions about scopes and variable lifetimes it is using -- skip over a ctor and execute some code and there will all kinds of problems. I think this is in the same class as F&C - much riskier than regular developers appreciate, and confusing when it fails.

My two cents,

J

Hi!

Hi,

I’m trying to create an LLDB command that sets an internal breakpoint for a function, and then executes some code, but I’m having come difficulties…

I’ve seen the expression command, which does something close to what I want to do after the breakpoint, but I have some doubts. I want the code to be able to return from the function where it’s called, but the “target->EvaluateExpression” doesn’t let the code return from it (while I would like to execute code with something like “if (condition) return NULL; more code…”). Is there a way to compile arbitrary code (with return statements) and execute it?

Not currently.

Is there a way to create something like an anonymous function (with certain parameters), and have it compiled and linked, while looking up global variables?

Current expressions can do the lookups, but as you already know they don’t live beyong the first invocation.

ClangUtilityFunction doesn’t look up any variables, and I can’t seem to find a way to look up global variables without a Frame object.

For globals you shouldn’t need the frame. If the globals are in your symbol table and are external you might be able to use dlsym().

Is there a way to know a function (or method)'s address from its prototype?

A normal fuction that was compiled into your code or an expression function?

For my first try (a command like “expr” but that would re-define functions) I wantes to find out the location of some function/method, given the prototype (e.g: “ProcessGDBRemote::StartDebugserverProcess(char const*)”). I would suppose we could mangle the name and try to find the symbol. I haven’t seen any way to do that in lldb, but I suppose it’s possible to do. Maybe I’m looking at it wrong.

My final purpose is to be able to redefine functions on-the-fly (with caveats for inlined functions, etc). The only way I saw that could work was creating a (similar) function and making the other function a trampoline (either using breakpoints, or writing a jmp expression at its address)… Did I miss another easier way?

We do want the ability to just compile up something in an LLDB command but we don’t have that yet. You currently can do this via python if you really want to by making a source file, invoking the compiler on it, and then making a dylib. You can then use the “process load” command to load the shared library:

(lldb) process load foo.so

So if you have your python code do the global variable lookups and create the source code, you could hack something together.

When/if you are ready to try and take over the function, you can look for any “Trampoline” symbols. For a simple a.out program on darwin we see:

(lldb) file ~/Documents/src/args/a.out
Current executable set to ‘~/Documents/src/args/a.out’ (i386).
(lldb) image dump symtab a.out
Symtab, file = /Volumes/work/gclayton/Documents/src/args/a.out, num_symbols = 18:
Debug symbol

Synthetic symbol

Externally Visible

Index UserID DSX Type File Address/Value Load Address Size Flags Name



[ 10] 16 Trampoline 0x0000000000001e76 0x0000000000000006 0x00010100 __stack_chk_fail

[ 12] 18 Trampoline 0x0000000000001e7c 0x0000000000000006 0x00010100 exit
[ 13] 19 Trampoline 0x0000000000001e82 0x0000000000000006 0x00010100 getcwd
[ 14] 20 Trampoline 0x0000000000001e88 0x0000000000000006 0x00010100 perror
[ 15] 21 Trampoline 0x0000000000001e8e 0x0000000000000006 0x00010100 printf
[ 16] 22 Trampoline 0x0000000000001e94 0x0000000000000006 0x00010100 puts

On MacOSX, you could then easily patch the trampoline code to call your own function for say “printf” by modifying the function address in the PLT entry.

That would be a good solution, at least to substitute functions that are accessed with the PLT. But are the trampolines reified (I don’t think so)? Or should I just write to the process’ PLT directly, after loading the function?

What about replacing other functions? Let’s say that I want to replace a random function (that I can’t replace by changing the PLT). If I have information about which functions call it, I can replace the definition of the function by a jump and, if necessary, get the new versions of the functions that call the replaced function (doing the same to them, for a maximum of X iterations, for example). Though I would suppose clang won’t give us that information (at least for now).

Thanks for the help,

Filipe

Hi,

I’ve been toying around with loading libraries and what I can do with lldb, but it seems some of the support isn’t there:

  • I can load a library from a command, but the only thing I get is a “token” (the return of dlopen());
  • I can’t (as far as I can tell) know what is the address for the GOT entry for a function (the one that will be changed by the dynamic linker on first invocation, they seem to be in the __DATA,__la_symbol_ptr section), but…
  • Substituting the address in the GOT wouldn’t work. I’ll have to turn the original function into a jump to the new one. Nothing is in place for that;
  • I found one email from Jason Molenda where he explained how they implemented F&C on gdb (http://www.cygwin.com/ml/gdb/2003-06/msg00531.html ), and am trying to do something similar. But it seems that the current dyld implementation doesn’t have a flag to not run global constructors (or re-register ObjC classes), and NSLinkModule was deprecated, so these cases would not.

I wanted to continue this work, but I have some doubts…

How could I get a handle (on my CommandObject) to the library loaded with dlopen? (It can have the same file name as an already loaded library, how can I tell which is which?)
If it is impossible, any ideas on how to add that feature?
After that, the easy way to replace the functions would be to get the symbols (at least for functions) that are defined in the recently loaded image and turn the current functions into jumps to the new functions.

Regards,

Filipe

Hi,

I've been toying around with loading libraries and what I can do with lldb, but it seems some of the support isn't there:

  - I can load a library from a command, but the only thing I get is a "token" (the return of dlopen());
  - I can't (as far as I can tell) know what is the address for the GOT entry for a function (the one that will be changed by the dynamic linker on first invocation, they seem to be in the __DATA,__la_symbol_ptr section), but…

On Mach-o you can see at least the stubs (locations that contain the lazy pointer indirections) as they are marked as "Trampoline" symbols:

(lldb) target modules dump symtab a.out
Symtab, file = /Volumes/work/gclayton/Documents/src/attach/a.out, num_symbols = 18:
               Debug symbol
               >Synthetic symbol
               >>Externally Visible
               >>>
Index UserID DSX Type File Address/Value Load Address Size Flags Name
------- ------ --- ------------ ------------------ ------------------ ------------------ ---------- ----------------------------------
[ 0] 0 D SourceFile 0x0000000000000000 Sibling -> [ 4] 0x00640000 /Volumes/work/gclayton/Documents/src/attach/test.c
[ 1] 2 D ObjectFile 0x000000004e440e1e 0x0000000000000000 0x00660001 /Volumes/work/gclayton/Documents/src/attach/test.o
[ 2] 4 D Code 0x0000000100000d80 0x0000000000000070 0x000f0000 sleep_loop
[ 3] 8 D Code 0x0000000100000df0 0x0000000000000066 0x000f0000 main
[ 4] 12 Data 0x0000000100001000 0x0000000000000000 0x000e0000 pvars
[ 5] 13 X Data 0x0000000100001068 0x0000000000000000 0x000f0000 NXArgc
[ 6] 14 X Data 0x0000000100001070 0x0000000000000000 0x000f0000 NXArgv
[ 7] 15 X Data 0x0000000100001080 0x0000000000000000 0x000f0000 __progname
[ 8] 16 X Absolute 0x0000000100000000 0x0000000000000000 0x00030010 _mh_execute_header
[ 9] 17 X Data 0x0000000100001078 0x0000000000000000 0x000f0000 environ
[ 10] 20 X Code 0x0000000100000d40 0x0000000000000000 0x000f0000 start
[ 11] 21 Trampoline 0x0000000100000e56 0x0000000000000006 0x00010100 exit
[ 12] 22 Trampoline 0x0000000100000e5c 0x0000000000000006 0x00010100 getchar
[ 13] 23 Trampoline 0x0000000100000e62 0x0000000000000006 0x00010100 getpid
[ 14] 24 Trampoline 0x0000000100000e68 0x0000000000000006 0x00010100 printf
[ 15] 25 Trampoline 0x0000000100000e6e 0x0000000000000006 0x00010100 puts
[ 16] 26 Trampoline 0x0000000100000e74 0x0000000000000006 0x00010100 sleep
[ 17] 27 X Extern 0x0000000000000000 0x0000000000000000 0x00010100 dyld_stub_binder

The symbols 11 - 16 above are the stub entries for the where all calls to "exit", "getchar", etc are.

  - Substituting the address in the GOT wouldn't work. I'll have to turn the original function into a jump to the new one. Nothing is in place for that;

You will need to manually write memory for now, but it should be do-able. You could add some new functions to the ABI plug-ins:

You could add an ABI function to the main ABI.h:

#include "lldb/Target/ABI.h"

  virtual bool
  ABI::UpdateGOT (const char *func_name, ModuleList *modules, addr_t new_func_addr)
  {
    return false;
  }

Then modify the x86_64 stuff to do the right thing

lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h
lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.cpp

If you don't end up overwriting the original function, the "modules" parameters could be nice as you might be able to take over say "print" but only for "a.out" and not other shared libraries. So if "modules" is NULL, then apply the new function to all modules, else, only try and apply it to the modules in the list. Just an idea...

  - I found one email from Jason Molenda where he explained how they implemented F&C on gdb (Jason Molenda - Re: Howdy from Apple; Fix and Continue implemented Yet Again ), and am trying to do something similar. But it seems that the current dyld implementation doesn't have a flag to not run global constructors (or re-register ObjC classes), and NSLinkModule was deprecated, so these cases would not.

I wanted to continue this work, but I have some doubts…

There are plenty of issues with all ways of doing things, yes...

How could I get a handle (on my CommandObject) to the library loaded with dlopen? (It can have the same file name as an already loaded library, how can I tell which is which?)
If it is impossible, any ideas on how to add that feature?

Why do you need the handle?

After that, the easy way to replace the functions would be to get the symbols (at least for functions) that are defined in the recently loaded image and turn the current functions into jumps to the new functions.

That is a good way if you don't want to call the original function. I have always wanted to "listen" to the malloc/free calls by making my own versions of malloc/free and do a little data gathering and yet still call through to the original functions.

Hope some of the above hints help.

Greg

Hi,

Hi,

I’ve been toying around with loading libraries and what I can do with lldb, but it seems some of the support isn’t there:

  • I can load a library from a command, but the only thing I get is a “token” (the return of dlopen());
  • I can’t (as far as I can tell) know what is the address for the GOT entry for a function (the one that will be changed by the dynamic linker on first invocation, they seem to be in the __DATA,__la_symbol_ptr section), but…

On Mach-o you can see at least the stubs (locations that contain the lazy pointer indirections) as they are marked as “Trampoline” symbols:

(lldb) target modules dump symtab a.out
Symtab, file = /Volumes/work/gclayton/Documents/src/attach/a.out, num_symbols = 18:

Debug symbol

Synthetic symbol

Externally Visible

Index UserID DSX Type File Address/Value Load Address Size Flags Name


[ 0] 0 D SourceFile 0x0000000000000000 Sibling → [ 4] 0x00640000 /Volumes/work/gclayton/Documents/src/attach/test.c
[ 1] 2 D ObjectFile 0x000000004e440e1e 0x0000000000000000 0x00660001 /Volumes/work/gclayton/Documents/src/attach/test.o
[ 2] 4 D Code 0x0000000100000d80 0x0000000000000070 0x000f0000 sleep_loop
[ 3] 8 D Code 0x0000000100000df0 0x0000000000000066 0x000f0000 main
[ 4] 12 Data 0x0000000100001000 0x0000000000000000 0x000e0000 pvars
[ 5] 13 X Data 0x0000000100001068 0x0000000000000000 0x000f0000 NXArgc
[ 6] 14 X Data 0x0000000100001070 0x0000000000000000 0x000f0000 NXArgv
[ 7] 15 X Data 0x0000000100001080 0x0000000000000000 0x000f0000 __progname
[ 8] 16 X Absolute 0x0000000100000000 0x0000000000000000 0x00030010 _mh_execute_header
[ 9] 17 X Data 0x0000000100001078 0x0000000000000000 0x000f0000 environ
[ 10] 20 X Code 0x0000000100000d40 0x0000000000000000 0x000f0000 start
[ 11] 21 Trampoline 0x0000000100000e56 0x0000000000000006 0x00010100 exit
[ 12] 22 Trampoline 0x0000000100000e5c 0x0000000000000006 0x00010100 getchar
[ 13] 23 Trampoline 0x0000000100000e62 0x0000000000000006 0x00010100 getpid
[ 14] 24 Trampoline 0x0000000100000e68 0x0000000000000006 0x00010100 printf
[ 15] 25 Trampoline 0x0000000100000e6e 0x0000000000000006 0x00010100 puts
[ 16] 26 Trampoline 0x0000000100000e74 0x0000000000000006 0x00010100 sleep
[ 17] 27 X Extern 0x0000000000000000 0x0000000000000000 0x00010100 dyld_stub_binder

The symbols 11 - 16 above are the stub entries for the where all calls to “exit”, “getchar”, etc are.

I saw those, but the only address they give me is the destination of the trampoline, not the trampoline itself. I’m going to double-check tomorrow (it’s a huge code-base :slight_smile: ), but I don’t think I can know the offset into the GOT from there.

  • Substituting the address in the GOT wouldn’t work. I’ll have to turn the original function into a jump to the new one. Nothing is in place for that;

You will need to manually write memory for now, but it should be do-able. You could add some new functions to the ABI plug-ins:

You could add an ABI function to the main ABI.h:

#include “lldb/Target/ABI.h”

virtual bool
ABI::UpdateGOT (const char *func_name, ModuleList *modules, addr_t new_func_addr)
{
return false;
}

Then modify the x86_64 stuff to do the right thing

lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.h
lldb/source/Plugins/ABI/SysV-x86_64/ABISysV_x86_64.cpp

If you don’t end up overwriting the original function, the “modules” parameters could be nice as you might be able to take over say “print” but only for “a.out” and not other shared libraries. So if “modules” is NULL, then apply the new function to all modules, else, only try and apply it to the modules in the list. Just an idea…

That’s a nice idea. I suppose we would also need to “communicate” with the linker, so we could do the same when new modules get loaded.

  • I found one email from Jason Molenda where he explained how they implemented F&C on gdb (http://www.cygwin.com/ml/gdb/2003-06/msg00531.html ), and am trying to do something similar. But it seems that the current dyld implementation doesn’t have a flag to not run global constructors (or re-register ObjC classes), and NSLinkModule was deprecated, so these cases would not.

I wanted to continue this work, but I have some doubts…

There are plenty of issues with all ways of doing things, yes…

How could I get a handle (on my CommandObject) to the library loaded with dlopen? (It can have the same file name as an already loaded library, how can I tell which is which?)
If it is impossible, any ideas on how to add that feature?

Why do you need the handle?

The handle is the only thing I can get from the process->LoadModule() method. My main concern is: If I reload a dylib (from a file with the same name), how can I know which module it is, from the ModuleList? Is it the one with the highest index? Will the “old” Module simply be replaced, and I can just search for filename?

After that, the easy way to replace the functions would be to get the symbols (at least for functions) that are defined in the recently loaded image and turn the current functions into jumps to the new functions.

That is a good way if you don’t want to call the original function. I have always wanted to “listen” to the malloc/free calls by making my own versions of malloc/free and do a little data gathering and yet still call through to the original functions.

Hope some of the above hints help.

Greg

That was one of the use-cases (instrumenting functions). :slight_smile:

Thanks for the reply,

Filipe