load bytecode from string for jiting problem

Hello,

I having a weird problem while writing a bytecode module to a string,
and after read/parse it for unsing on a jit.

I write a pass to export function to module, and put this module inside
a global variable.
I use WriteBitcodeToFile for this.
For debuging, after this write, I try to load the exported module with
parseBitcodeFile.
This two step works.

After, while the compiled program is running, I try to read and parse
this global variable for jiting the function.

1) I read the global variable with
  StringRef sr (gv, gv_length);

2) I manually test this bytecode by
(inspired by inline bool isRawBitcode(const unsigned char *BufPtr,
const unsigned char *BufEnd) at
http://llvm.org/docs/doxygen/html/ReaderWriter_8h_source.html#l00067)
  if (sr.str()[0] == 'B')
    std::cout << "B ok\n";
  if (sr.str()[1] == 'C')
    std::cout << "C ok\n";
  if (sr.str()[2] == (char) 0xc0)
    std::cout << "0xc0 ok\n";
  if (sr.str()[3] == (char) 0xde)
    std::cout << "0xde ok\n";

3) I try to parse the gv by
  MemoryBuffer* mbjit = MemoryBuffer::getMemBuffer (sr.str());
  LLVMContext& context = getGlobalContext();
  ErrorOr<Module*> ModuleOrErr = parseBitcodeFile (mbjit, context);
  if (error_code EC = ModuleOrErr.getError())
  {
    std::cout << ModuleOrErr.getError().message() << "\n";
    assert(false);
  }

This is the execution result:
B ok
C ok
0xc0 ok
0xde ok
Invalid bitcode signature

Ok is not working :confused:
But why ???

For debuging, between 2) and 3), I export the readed module and write to
a file on my hard drive,
and try llvm-dis, and the dissasembly of the module works.

Wath's wrong? Any idea for solve this problem?

Thanks you very much.

Regards,
Willy

Hello,

I having a weird problem while writing a bytecode module to a string,
and after read/parse it for unsing on a jit.

I write a pass to export function to module, and put this module inside
a global variable.
I use WriteBitcodeToFile for this.
For debuging, after this write, I try to load the exported module with
parseBitcodeFile.
This two step works.

After, while the compiled program is running, I try to read and parse
this global variable for jiting the function.

1) I read the global variable with
  StringRef sr (gv, gv_length);

2) I manually test this bytecode by
(inspired by inline bool isRawBitcode(const unsigned char *BufPtr,
const unsigned char *BufEnd) at
http://llvm.org/docs/doxygen/html/ReaderWriter_8h_source.html#l00067)
  if (sr.str()[0] == 'B')
    std::cout << "B ok\n";
  if (sr.str()[1] == 'C')
    std::cout << "C ok\n";
  if (sr.str()[2] == (char) 0xc0)
    std::cout << "0xc0 ok\n";
  if (sr.str()[3] == (char) 0xde)
    std::cout << "0xde ok\n";

3) I try to parse the gv by
  MemoryBuffer* mbjit = MemoryBuffer::getMemBuffer (sr.str());

Not sure if this is your issue, but should be fixed anyway:

The std::string created by "sr.str()" ends its lifetime in this
statement, and MemoryBuffer for efficiency reasons
avoids copying data it doesn't have to (like StringRef) so will be
referencing the freed memory.

To resolve this:
* Pass MemoryBuffer your StringRef directly
* Use getMemBufferCopy()
* Preserve the result of sr.str() into a stack variable and pass that
to getMemoryBuffer() instead.

As a final note, check if your bitcode buffer "string" is
null-terminated or not. If not, be sure to be careful and
do things like informing MemoryBuffer that this is the case.

Hope this helps,
~Will

I mad the change, and still have the problem.

I investigate more the source code of llvm.

First, I change isRawBitcode function to print the content of the parameter like this:
original: http://llvm.org/docs/doxygen/html/ReaderWriter_8h_source.html#l00081
   inline bool isRawBitcode(const unsigned char *BufPtr,
                            const unsigned char *BufEnd) {
     // These bytes sort of have a hidden message, but it's not in
     // little-endian this time, and it's a little redundant.
    errs()<< "isRawBitcode output:\n";
    for (int i = 0; i < 4; i++)
      errs() << BufPtr[i] << "\n";
    if (BufPtr != BufEnd )
    errs() << "BP != BE ok\n";
    if (BufPtr[0] == 'B')
    errs() << "B ok\n";
    if (BufPtr[1] == 'C')
    errs() << "C ok\n";
    if (BufPtr[2] == 0xc0)
    errs() << "0xc0 ok\n";
    if (BufPtr[3] == 0xde)
    errs() << "0xde ok\n";

     return BufPtr != BufEnd &&
            BufPtr[0] == 'B' &&
            BufPtr[1] == 'C' &&
            BufPtr[2] == 0xc0 &&
            BufPtr[3] == 0xde;
   }

Second, I change ParseBitcodeInto as this:
original: http://llvm.org/docs/doxygen/html/BitcodeReader_8cpp_source.html#l01971
...
  errs() << "parsebitcodeinto sniff the signature\n";
  uint32_t bvar = Stream.Read(8);
      errs() << "B :" << bvar << "\n";
  if (bvar != 'B') {
    errs() << "B :" << bvar << "\n";
    return Error(InvalidBitcodeSignature);
  }

  if (Stream.Read(8) != 'C') {
    errs() << "C\n";
    return Error(InvalidBitcodeSignature);
  }
  if ( Stream.Read(8) != 0xc0 ) {
    errs() << "0xc0\n";
    return Error(InvalidBitcodeSignature);
  }
  if ( Stream.Read(8) != 0xde ) {
    errs() << "0xde\n";
    return Error(InvalidBitcodeSignature);
  }
  // if (Stream.Read(8) != 'B' ||
  // Stream.Read(8) != 'C' ||
  // Stream.Read(4) != 0x0 ||
  // Stream.Read(4) != 0xC ||
  // Stream.Read(4) != 0xE ||
  // Stream.Read(4) != 0xD
  // ) {
...

The output of the code is :

isRawBitcode output:
B
C

BP != BE ok
B ok
C ok
0xc0 ok
0xde ok

parsebitcodeinto sniff the signature
B :37

It's possible that Stream object is not correctly initialized?

Hi Willy,

If the disassembly of the module works fine, then there is nothing wrong with the module.

Stream uses the memorybuffer that you pass in parseBitcodeFile. If what Will is saying is true, there is something wrong with your code in “3:”, i.e.:

MemoryBuffer* mbjit = MemoryBuffer::getMemBuffer (sr.str());
LLVMContext& context = getGlobalContext();
ErrorOr<Module*> ModuleOrErr = parseBitcodeFile (mbjit, context);
if (error_code EC = ModuleOrErr.getError())
{
std::cout << ModuleOrErr.getError().
message() << “\n”;
assert(false);
}

Can you post how you modified it in your second reply? For debugging purpose, you can simply use MemoryBuffer::getMemBufferCopy() and not worry about validity of stringref or null-termination. Also, you can run your program through valgrind and check for any invalid reads.

HTH
Vikas.

all of:

segmentation fault indicates memory corruption and it’s hard to tell without seeing the exact use of the APIs. If possible, please post a complete program and gdb stack trace from the core file. If there are multiple threads using the global variables, please let us know.

FWIW, I have some tests to write llvm::Module to bitcode files and read them back into llvm::Module and they work just fine with 3.4 (never tried with tip).

thx

vikas.

This segfault occuring only under valgrind,
in shell way, and in gdb way i have

Invalid bitcode signature
simple_scev_dynamic_array: /home/willy/apollo/llvm/include/llvm/Support/ErrorOr.h:258: storage_type *llvm::ErrorOr<llvm::Module *>::getStorage() [T = llvm::Module *]: Assertion `!HasError && “Cannot get value when an error exists!”’ failed.
Command terminated by signal 6

this is the code I use:

long jitter(void* info, skeleton_pair *skeletons, long skeleton_size, param_t params, long phi_state_size) {
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();

llvm::StringRef sr (skeletons[idx].jit_bytecode,
skeletons[idx].jit_bytecode_length);

if (sr.str()[0] == ‘B’)
std::cout << “B ok\n”;
if (sr.str()[1] == ‘C’)
std::cout << “C ok\n”;
if (sr.str()[2] == (char) 0xc0)
std::cout << “0xc0 ok\n”;
if (sr.str()[3] == (char) 0xde)
std::cout << “0xde ok\n”;

llvm::MemoryBuffer* mbjit = llvm::MemoryBuffer::getMemBufferCopy (sr);

llvm::ErrorOrllvm::Module* ModuleOrErr = llvm::parseBitcodeFile (mbjit, context);
if (llvm::error_code EC = ModuleOrErr.getError()) {
std::cout << ModuleOrErr.getError().message() << “\n”;
}

Module* Mjit = ModuleOrErr.get();

std::string eeError;
ExecutionEngine* nee = EngineBuilder(Mjit).setEngineKind(EngineKind::JIT).setUseMCJIT(true).setErrorStr(&eeError).create();
if (!nee) {
fprintf(stderr, “Could not create ExecutionEngine: %s\n”, eeError.c_str());
assert(false);
}

Function f = ret_fct(Mjit); // Function* ret_fct (Module*); return the function we want to jit.
uint64_t f_ptr = nee->getFunctionAddress(f->getName());

long (fjited)(param_t, phi_state_t, long, long, long, long)
= (long (
)(param_t, phi_state_t, long, long, long, long)) (intptr_t)f_ptr;

return fjited (params, phi_state, lower, upper, inst_outer, inst_inner);
}

Thanks,

The stack trace is:
(gdb) bt
#0 0x00000000004fa8c8 in llvm::BitstreamCursor::Read(unsigned int) ()
#1 0x00000000004fa1d2 in llvm::BitcodeReader::ParseBitcodeInto(llvm::Module*) ()
#2 0x0000000000503ae9 in llvm::getLazyBitcodeModule(llvm::MemoryBuffer*, llvm::LLVMContext&) ()
#3 0x0000000000503eb6 in llvm::parseBitcodeFile(llvm::MemoryBuffer*, llvm::LLVMContext&) ()
#4 0x00000000004ec195 in jitter (skeletons=<optimized out>, params=0x7fffffffdf40, phi_state=0x11adbc0, lower=0, upper=250, inst_outer=8, inst_inner=<optimized out>)
     at /home/willy/hello_stuff/with_apollo/simple_loop/runtime.cpp:263
#5 0x00000000004ec8fa in apollo_runtime_hook (info=<optimized out>, skeletons=0xc8b1f0, skeleton_size=<optimized out>, params=0x7fffffffdf40, phi_state_size=<optimized out>)
     at /home/willy/hello_stuff/with_apollo/simple_loop/runtime.cpp:438
#6 0x00000000004ee753 in ?? ()
#7 0x00000000004ecbf1 in main (argc=<optimized out>, argv=<optimized

) at

/home/willy/hello_stuff/with_apollo/simple_loop/simple_scev_dynamic_array.c:84

Hello Willy,

Here is the dump from one of my bitcode files:

0000000 42 43 c0 de 21 0c 00 00 25 05 00 00 0b 82 20 00

As expected, 0x42 (= B), 0x43 (= C), xc0 and 0xde are in correct order. In your case, the first byte is read as 37 (= 0x25). I wonder why? When you check the bytes yourself, you get expected results. When the same bytes are read from Stream object, you get a different result (maybe garbage). I would suggest that you put a watchpoint on mbjit->getBufferStart() and single step your program to make sure it is not freed, over written somewhere.

thx
Vikas.

Maybe I found the problem.
For debuging, I modify isEndPos of BitstreamReader as follows:
original: http://llvm.org/docs/doxygen/html/BitstreamReader_8h_source.html#l00234

  bool isEndPos(size_t pos) {
        if (BitStream == NULL) errs() << "BitStream is null\n";
        else errs() << "BitStream n'est pas null\n";
          errs() << "isEndPos prob " << pos << " & " << BitStream <<"\n";
                StreamableMemoryObject& smo = BitStream->getBitcodeBytes();
        errs() << "isEndPos prob smo\n";
        bool m = smo.isObjectEnd(static_cast<uint64_t>(pos));
          errs() << "isEndPos prob " << m << "\n";
          return m;
  }

and on my output i never see "isEndPos prob smo\n";
The segfault occuring when i call getBitcodeBytes().
Also, i don't see the message of "BitStream is null" or "BitStream n'est pas null" ...
Is possible that this variable is not correctly initialized?

Thanks

the segfault is not happening because BitStream is NULL, it is happening because BitcodeBytes member variable in BitstreamReader class is NULL. As you can see, getBitcodeBytes de-references BitcodeBytes:

 [StreamableMemoryObject](http://llvm.org/docs/doxygen/html/classllvm_1_1StreamableMemoryObject.html) &[getBitcodeBytes](http://llvm.org/docs/doxygen/html/classllvm_1_1BitstreamReader.html#a55ac0f8d247d2404c59c5e04d2d4e25f)() { return *BitcodeBytes; }

I finally found my problem.
For writing the string containing the module to the global variable, I need to have string size + 1 for '\0' at the end of string.
But when I want to read for parsing, I forget to remove this last character.

The problem is, the error returning by this test http://llvm.org/docs/doxygen/html/BitcodeReader_8cpp_source.html#l03216 is not handled.
I compile llvm with --enable-expensive-checks --enable-debug-runtime --enable-debug-symbols --enable-keep-symbols options.
I still have a segfault: (in gdb session)
...
bytecode: BC��!

bytecode_length: 4985

Program received signal SIGSEGV, Segmentation fault.
llvm::BitstreamCursor::Read (this=0x1a733a0, NumBits=8) at /home/willy/apollo_checks/llvm/include/llvm/ADT/OwningPtr.h:67
67 assert(Ptr && "Cannot dereference null pointer");
(gdb) bt
#0 llvm::BitstreamCursor::Read (this=0x1a733a0, NumBits=8) at /home/willy/apollo_checks/llvm/include/llvm/ADT/OwningPtr.h:67
#1 0x0000000000593703 in llvm::BitcodeReader::ParseBitcodeInto (this=0x1a73370, M=0x1a73260) at /home/willy/apollo_checks/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:1976
#2 0x000000000059e620 in llvm::getLazyBitcodeModule (Buffer=0x1a73230, Context=...) at /home/willy/apollo_checks/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:3315
#3 0x000000000059eb26 in llvm::parseBitcodeFile (Buffer=0x8, Context=...) at /home/willy/apollo_checks/llvm/lib/Bitcode/Reader/BitcodeReader.cpp:3347
#4 0x0000000000580e1e in apollo_runtime_hook (id=<optimized out>, info=<optimized out>, skeletons=0x144b9d0, skeleton_size=<optimized

, params=0x7fffffffdca8, phi_state_size=<optimized out>)

     at runtime.cpp:264
#5 0x0000000000581886 in bench_loop () at ./src//matrixmul/../include/bench.h:21
#6 kernel (out=<optimized out>, out=<optimized out>, a=..., b=...) at ./src//matrixmul/matrixmul.c:46
#7 main () at ./src//matrixmul/matrixmul.c:65

Did I need to add more configuration option to have error manager do his jobs?

Thanks,
Willy