"Bound Methods" in LLVM Bytecode

Hello,

I have been thinking about efficient implementation of dynamically typed languages in my spare time. Specifically, I'm working on a toy implementation of a tiny piece of Python using LLVM as a native code generating JIT. I've run into a bit of an issue, involving how Python deals with method calls. I'm not sure how/if I can implement this in LLVM. In Python, the following code:

somefunc = a.method
somefunc()

Roughly translates into:

functionObject = lookup( "method" in object a )
functionObject->functionPointer()

The challenge is that if "method" is actually a method, calling it magically adds "a" as the first parameter. If it is NOT a method, then no messing with the arguments occurs. As far as can tell, this forces an implementation to create BoundMethod objects that wrap the actual method calls. The question is, how can I implement this efficiently, ideally using LLVM?

My idea is to add a NULL pointer as the first parameter to all function calls. "Normal" functions would ignore it, but methods would look at the first parameter to find the "this" pointer. I could then generate a tiny stub for each bound method that would do the following:

1. Replace the first argument with the appropriate "this"
2. Jump to the real function

Is it possible to do something like this in LLVM? Will it work if I just create a char array and copy in the appropriate native code for the current platform? I would rather let LLVM do the hard work, but if that isn't possible, I'm looking for some acceptable hack.

An additional ugly bit is that these objects will be created and destroyed frequently, so integration with LLVM's memory system is important. The last I checked, LLVM does not keep track of code in memory, so this would effectively create a memory leak.

Thanks for any help,

Evan Jones

I have been thinking about efficient implementation of dynamically typed languages in my spare time. Specifically, I'm working on a toy implementation of a tiny piece of Python using LLVM as a native code generating JIT. I've

Cool!

run into a bit of an issue, involving how Python deals with method calls. I'm not sure how/if I can implement this in LLVM. In Python, the following code:

Ok.

somefunc = a.method
somefunc()

Roughly translates into:

functionObject = lookup( "method" in object a )
functionObject->functionPointer()

The challenge is that if "method" is actually a method, calling it magically adds "a" as the first parameter. If it is NOT a method, then no messing with the arguments occurs. As far as can tell, this forces an implementation to create BoundMethod objects that wrap the actual method calls. The question is, how can I implement this efficiently, ideally using LLVM?

Okay. One simple option would be to insert code like this:

if (isamethod(functionObject))
   functionObject->functionPointer(a)
else
   functionObject->functionPointer()

My idea is to add a NULL pointer as the first parameter to all function calls. "Normal" functions would ignore it, but methods would look at the first parameter to find the "this" pointer. I could then generate a tiny stub for each bound method that would do the following:

1. Replace the first argument with the appropriate "this"
2. Jump to the real function

Is it possible to do something like this in LLVM?

Sure, you can do this. Another simple option would be to just make every "function" take a first pointer argument which they ignore. This would allow the caller to always pass a this pointer without knowing anything about the callee.

Will it work if I just create a char array and copy in the appropriate native code for the current platform?

Hrm, sometimes, sometimes not. Code is not always relocatable like that, it sounds dangerous.

I would rather let LLVM do the hard work, but if that isn't possible, I'm looking for some acceptable hack.

LLVM can do it, it's just a matter of picking the right solution. To me, adding a dummy 'this' argument to functions which is ignored seems like the most simple and logical way to do it.

An additional ugly bit is that these objects will be created and destroyed frequently, so integration with LLVM's memory system is important. The last I checked, LLVM does not keep track of code in memory, so this would effectively create a memory leak.

If possible, I would suggest avoiding creating and destroying lots of little stubs. Even if we teach llvm to recycle this memory (wouldn't be that hard), it will still be much less efficient than having a dummy argument for functions.

Besides, if the 'address is never taken' of these functions, the standard LLVM optimizations will remove dead arguments.

-Chris

The question is, how can I implement this efficiently, ideally using LLVM?

Okay. One simple option would be to insert code like this:

if (isamethod(functionObject))
  functionObject->functionPointer(a)
else
  functionObject->functionPointer()

Ah yes, the good old fashioned simple approach. The only change is that by the time I get to the function call, I may no longer have reference to the object (in the compiler), so I would have to stuff that into the bound method object itself.

Sure, you can do this. Another simple option would be to just make every "function" take a first pointer argument which they ignore. This would allow the caller to always pass a this pointer without knowing anything about the callee.

Ah, of course! This is probably the best way to do it, since it is so simple. The "FunctionObject" type would contain not only a function pointer, but also a "this" pointer. For normal functions, "this" would be NULL. Why didn't I think of that, since I was halfway to that solution already? That would change the call implementation to the following:

functionObject->functionPointer( functionObject->thisPointer )

Will it work if I just create a char array and copy in the appropriate native code for the current platform?

Hrm, sometimes, sometimes not. Code is not always relocatable like that, it sounds dangerous.

Ah, also a good point. A copying garbage collector, for example, would definitely make things more complicated.

Thanks for your help! I was definitely thinking the wrong way.

Evan Jones

[snip]

Will it work if I just
create a char array and copy in the appropriate native code for the
current platform? I would rather let LLVM do the hard work, but if that
isn't possible, I'm looking for some acceptable hack.

(1) The memory page/segment must be marked executable by the
OS. Under POSIX systems, this is typically done by mmap()ing
an anonymous file and then mprotect()ing the memory. As I remember,
POSIX doesn't guarantee that mprotect will work on memory directly
allocated with malloc or calloc. I believe some systems allow it, but
it's my understanding that this practice is non-portable. The Win32 API
has a function similar in name and function to mprotect
("MemProtect"?? "ProtectMem"??), but I'm not a Win32 guy.

Note: prior to OSes setting the x86 NX/DX bit, x86 code
was able to get away with the assumption that all readable pages are
executable. This doesn't make such code correct.

(2) As already mentioned by others, you need relocatable code for this
to work properly.

-Karl