Debugging lli using bugpoint

Hi,

I have a program that runs when statically compiled using llc and gcc but crashes with a segmentation fault when run with lli. I am trying to debug it with bugpoint and the initial part of bugpoint seems to be suggesting that I am somehow missing the linking with the libraries having dlsym/dlopen although I am passing it to lli :

$ bugpoint -run-jit $HOME/gap/gap4r4/bin/i686-pc-linux-gnu-llvm-gcc/gap.bc --tool-args -load=/usr/lib/libm.so -load=/usr/lib/libdl.so --args -l $HOME/gap/gap4r4/ -m 1024M

Read input file : ‘/home/pprabhu/gap/gap4r4/bin/i686-pc-linux-gnu-llvm-gcc/gap.bc’
*** All input ok
Initializing execution environment: Found lli: /home/pprabhu/llvm/llvm-install/bin/lli
Running the code generator to test for a crash:
Generating reference output from raw program:
Error running tool:
/usr/bin/gcc -x c -fno-strict-aliasing bugpoint-test-program.bc.cbe.c -x none -o bugpoint-test-program.bc.cbe.c.gcc.exe -lm -O2 -Wl,-R.
bugpoint-test-program.bc.cbe.c:5149: warning: conflicting types for built-in function ‘malloc’
bugpoint-test-program.bc.cbe.c: In function ‘ExecProccall0args’:
bugpoint-test-program.bc.cbe.c:401008: warning: passing argument 1 of ‘longjmp’ from incompatible pointer type

/tmp/cc08IpX8.o: In function SyLoadModule':** **bugpoint-test-program.bc.cbe.c:(.text+0x25705): undefined reference to dlopen’
bugpoint-test-program.bc.cbe.c:(.text+0x25719): undefined reference to dlsym'** **/tmp/cc08IpX8.o: In function SyFindOrLinkGapRootFile’:
bugpoint-test-program.bc.cbe.c:(.text+0x6b951): undefined reference to dlopen'** **bugpoint-test-program.bc.cbe.c:(.text+0x6b965): undefined reference to dlsym’
/tmp/cc08IpX8.o: In function FuncLOAD_DYN':** **bugpoint-test-program.bc.cbe.c:(.text+0x12a92d): undefined reference to dlopen’
bugpoint-test-program.bc.cbe.c:(.text+0x12a945): undefined reference to dlsym'** **/tmp/cc08IpX8.o: In function LoadWorkspace’:
bugpoint-test-program.bc.cbe.c:(.text+0x14487a): undefined reference to dlopen'** **bugpoint-test-program.bc.cbe.c:(.text+0x144892): undefined reference to dlsym’
collect2: ld returned 1 exit status

*** Debugging code generator crash!

Checking to see if we can delete global inits:

  • Removing all global inits hides problem!

*** Attempting to reduce the number of global variables in the testcase
Checking for crash with only these global variables: Revision_ariths_c ZeroFuncs ZeroMutFuncs AInvFuncs AInvMutFuncs OneFuncs OneMutFuncs InvFuncs InvMutFuncs EqFuncs… <5919 total>:

*** Attempting to reduce the number of functions in the testcase
Checking for crash with only these functions: InstallZeroObject VerboseZeroObject ZeroObject FuncZERO InstallZeroMutObject VerboseZeroMutObject ZeroMutObject FuncZERO_MUT InstallAinvObject VerboseAInvObject… <2883 total>:
Checking for crash with only these blocks: entry entry bb bb2 entry bb bb2 entry bb1 bb2… <77209 total>:

Is there something that I missing here (like some more parameters to lli or to bugpoint) because the program seems to run fine when compiled with llc.

Thanks for your time.

  • Prakash

Generating reference output from raw program: <cbe><gcc>
Error running tool:

[snip]

/tmp/cc08IpX8.o: In function `SyLoadModule':
bugpoint-test-program.bc.cbe.c:(.text+0x25705): undefined reference to
`dlopen'

[snip]

This is saying that compilation with CBE is failing. Try something
like -Xlinker -ldl?

-Eli

Hi Eli,

Thanks for the reply. I tried with -Xlinker="-ldl ". However it does not seem to make a difference. It seems that when bugpoint is run with --run-jit, the linker args are not passed to gcc (from tools/bugpoint/ExecutionDriver.cpp) :

if (InterpreterSel == RunLLC || InterpreterSel == RunCBE ||
InterpreterSel == CBE_bug || InterpreterSel == LLC_Safe)

RetVal = AI->ExecuteProgram(BitcodeFile, InputArgv, InputFile,
OutputFile, AdditionalLinkerArgs, SharedObjs,
Timeout, MemoryLimit);

else

RetVal = AI->ExecuteProgram(BitcodeFile, InputArgv, InputFile,
OutputFile, std::vectorstd::string(),
SharedObjs, Timeout, MemoryLimit);

I tried the following after this:

(1) Firstly instead of running Gap (http://www.gap-system.org/Download/UNIXInst.html), I am now trying to run python with lli (http://www.python.org/download/releases/2.5.2/). I managed to compile python.bc and here again I face the same problem:

llc and gcc can get python.exe to run (which is great :)!) :

$ llc -f python.bc
$ gcc -o python.exe python.s -ldl -lutil -lm -lrt
$ ./python.exe
Python 2.5.2 (r252:60911, Oct 31 2008, 14:41:11)
[GCC 4.2.1 (Based on Apple Inc. build 5623) (LLVM build)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

however, when I try to run python.bc using lli it crashes with a segmentation fault:

$ lli -load=/usr/lib/libdl.so -load=/usr/lib/libutil.so -load=/usr/lib/libm.so -load=/usr/lib/librt.so python.bc

When i try it with gdb, it seems that the crash is somewhere inside python code (since bt only shows ?? ). Before the crash, I could see that the memory consumption(VM) reaches somewhere near 80% of my 2GB RAM (seen via top, and that too a sudden increase from around when it was previously occupying around 2-3% of VM). I tried to run this on a 64-bit machine which has 8GB RAM and still have the same issue wrt memory.

(2) Finally I wrote a pass (and loaded it through opt) to instrument each function’s (in python code ) entry and exit and then ran the instrumented program with both [llc ; gcc] combination and lli. In the lli version a single method (subtype_traverse) is recursively called (about 2 million times) until the program runs out of memory while the statically compiled code (llc + gcc) calls this method (I am comparing the calls in the same context in both cases) only once:

python with llc + gcc :

(tupletraverse (visit_decref (type_is_gc)))
(subtype_traverse (visit_decref (type_is_gc))
(type_traverse (visit_decref) (visit_decref) …

python with lli:

(tupletraverse (visit_decref (type_is_gc)))
(subtype_traverse (visit_decref (type_is_gc)) (subtype_traverse (visit_decref (type_is_gc))(subtype_traverse (visit_decref (type_is_gc)) … about 2 million times

Looking at the code (Objects/typeobject.c: http://google.com/codesearch?hl=en&q=show:VK_wUSuAZto:jHKC99mjNVM:4z02hQcYQRY&sa=N&ct=rd&cs_p=http://gentoo.osuosl.org/distfiles/Python-2.5.tar.bz2&cs_f=Python-2.5/Objects/typeobject.c)

it seems the last call (through a function pointer) in subtype_traverse results in this never-ending recursive call.

Has anyone tried compiling python to bit code and running it the LLVM JIT before ?

Thanks for your time.

  • Prakash

Hi Prakash,

Unfortunately it looks like you need to do quite a bit of investigation into this. However, I hope I can provide some useful tips.

  1. In general, lli and llc generate exact the same code except lli default to static codegen while llc defaults to dynamic-no-pic codegen. So try passing -relocation-model=dynamic-no-pic to lli. If this works, that means there are issues with static codegen.
  2. It could be a JIT encoding bug. If you can identify a problematic function, it’s possible examine the generated code in gdb and compare it with llc generated assembly.
  3. It could be a bug in the app and it’s exposed when running under the JIT. You can try enabling additional debugging output.

Hope this helps.

Evan

Hi Evan,

Thanks for the pointers. We found a simple test case that causes the problem (thanks to Tom in my group):

#include<stdio.h>
#include<stdlib.h>

void test();
void (*funcPtr)();

int main(int argc, char **argv) {
funcPtr = test;
test();
}

void test() {
if(funcPtr == test) {
printf(“OK!\n”);
} else {
fprintf(stderr, “Bad!\n”);
exit(1);
}
}

$ llvm-gcc -emit-llvm -o FPtrEqTest.bc -c FPtrEqTest.c
$ llc -f FPtrEqTest.bc
$ gcc -o FPtrEqTest FPtrEqTest.s
$ ./FPtrEqTest
OK!

$ lli FPtrEqTest.bc
Bad!

The above test case is just a smaller version of the one in Python’s subtype_traverse which also tests a function pointer and calls itself. It seems the problem arises due comparison with the stub’s address when a comparison with the actual address of the compiled function is intended.

thanks,
Prakash

Sorry about the tardiness. I’ll take a look.

Thanks,

Evan

I’ve filed PR3043 for this.

Evan

Thanks, Evan. We’ve finally got a working version of python.bc which runs most of non-multithreaded python scripts (those that do not use the ‘threadmodule’) with lli. It needed a few more workarounds (apart from the one to make sure that functions whose addresses are taken are compiled before their addresses are taken):

(a) When any shared library (.so) is loaded using dlopen() by the program currently being JIT’ed by lli, if the initialization code in the library makes a call back to a function in the main process, the program crashes. The statically compiled code (llc + gcc) works however using the -disable-internalize and -rdynamic flags (to llc and gcc respectively). The -rdynamic flag makes sure that the function being called (in case of python, a single function Py_InitModule4() is always called by any module loaded as a .so in response to ‘import’ statements in python) is exported to the dynamic symbol table of the main process. I do not know a way to achieve this using lli. I changed python code to statically link all the modules that are required for some the scripts that we are running.

(b) Since the LLVM JIT does not support inline asm, I had to redefine some _byteswap() macros in C rather than inline asm.

With these changes, python.bc runs smoothly with lli for most of the python scripts that I have tested with :).

regards,
Prakash

Ok, the problem has to do with lazy compilation.

In main, when the address of “test” is taken, “test” hasn’t been compiled. So the store ended up storing the address of its stub instead. If you run the test with -disable-lazy-compilation, it will work.

I think the current solution would be something like this:

  1. When code emitter sees a function address, emit it as a relocation and remember the function.
  2. After the function has been emitted, compile and emit all functions whose addresses are taken.
  3. Relocate all references to function addresses.

Step 2 is the only real enhancement required. It’s not terribly difficult. Unfortunately I don’t have time to deal with this right now. Would someone care to take this up?

Thanks,

Evan