Error 137: Out Of Memory Killer

I am getting Out of Memory error while compiling a project with a pass. I am running pass on a few functions. The process has been killed while compiling this function _ZTv0_n12_N2ft23action_listener_publishD1Ev. Following is the error,

_ZTv0_n12_N2ft23action_listener_publishD1Ev
dt: Tue Jan 31 12:57:58 2023
Unix time: 1675195078
Process start time:20.9ms
Process end time:20.9ms
Total process time: 0.0ms
Killed
Makefile:128: recipe for target 'TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o' failed
make: *** [TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o] Error 137

Then I tried compiling only this function _ZTv0_n12_N2ft23action_listener_publishD1Ev and it worked fine. But I have space and I am uncertain if the process has been killed by the OOM. The kernels panics the entire system when it ran out of memory. I am not sure if this is the problem with the pass. Any suggestions on this?

You might want to try to use memory profilers. For example, gperftools’ heap profiler.

Compile it without the pass. If it works then your pass is the problem or has exposed an existing problem. If it’s the former, great you have a small file to look at, if it’s the latter, then you have some digging into llvm to do.

Random ideas that you could try:

  • Cut the code you’re compiling right down to the smallest example that triggers the issue. You could use creduce for C or one of the IR reducers that are in tree (somewhere, never used them myself).
  • Cut the pass back to doing as little as possible, but still triggering the issue.
  • Rebuild the pass from scratch without looking at the version you have, then compare the two. Did the code you had actually represent your intent?
  • Follow the pass in a debugger and confirm the steps it follows are what you expect. You will likely find some effectively infinite loop allocating some temporary object.
  • Stick a bunch of print statements in it to see what gets executed when (the time honoured tradition).
  • On Linux at least, OOM kills are logged - see Finding which process was killed by Linux OOM killer - Stack Overflow and many others.
  • Find someone / a rubber duck (Rubber duck debugging - Wikipedia) to explain the logic of your pass to.
  • Go down the rabbit hole of what each call your pass makes actually does. By which I mean, does it allocate new things or does it tie existing things together? Maybe there are comments talking about which operations are expensive or not.

I sympathise with the problem you have but debugging from afar when you have an unknown pass compiling unknown code is never going to be easy.

1 Like

@mshockwave Thanks a lot for your help and for helping me to use this tool. I got this but I am not sure how to figure out the problem from the graph. I can see that libLLVM-14.so is using a lot of memory.


Can you please help me to find the bug?

Compile it without the pass. If it works, then your pass is the problem or has exposed an existing problem. If it’s the former, great; you have a small file to look at, if it’s the latter, then you have some digging into LLVM to do.

I have compiled the code without pass and it worked fine. I believe the problem is triggered because of the pass.

Cut the code you’re compiling right down to the smallest example that triggers the issue. You could use creduce for C or one of the IR reducers that are in tree (somewhere, never used them myself)
Rebuild the pass from scratch without looking at the version you have, then compare the two. Did the code you had actually represent your intent?

I cut the code and figured while retrieving the values here, I am getting the error, specifically at this line it is killed after compiling the following function,

_ZTv0_n12_N2ft23action_listener_publishD1Ev
entering !isDeclaration
------outside arguments-------
------inside arg_values-------
------outside arg_values-------
------outside use_begin()-------
entering !isDeclaration
Killed
Makefile:134: recipe for target 'TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o' failed
make: *** [TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o] Error 137

You can check the original pass here.

I am trying to use gdb to debug this code. I think it can’t find a declaration after compiling this class. So, I tried to put a breakpoint there but still the process was killed without breaking.

_ZTv0_n12_N2ft23action_listener_publishD1Ev
entering !isDeclaration
------outside arguments-------
------inside arg_values-------
------outside arg_values-------
------outside use_begin()-------
entering !isDeclaration
Killed
Makefile:134: recipe for target 'TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o' failed
make: *** [TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o] Error 137
(gdb) d

I am still trying to do that. I would be grateful if you can give me a bit more idea on how to do this.

You can see the prints above that point out the part where it was killed. Thanks

OOM is killing the process because the pass while compiling publish functions, it is using 50-80% of the memory,

I really want to know how can I find a good soul from the rubber duck.

I am not allocating anything in the pass, but I am not sure if it is allocating something while compiling.
Kindly help if you can when you have time.

First instinct is something is recursing, or iterators are being invalidated each time the loop runs. Looks like it’s not the latter, since you were able to get through it once.

Try to work out which function being compiled is causing the stall and compare it to the others to see if there’s some structural difference.

For example is it buried in a bunch of indirection and typedefs that take a lot of time to sift through. Making some of the calls you’re doing particularly expensive for that function.

Looks like you are debugging the make process not llvm itself.

What you need to do is find a single command to compile the source using your pass. I’m not an expert here but I think you’ll want to be running opt (for passes in clang) or llc (for passes in llvm).

I think for both you need IR input, so to get that you do clang <source file> -emit-llvm. Save that output to a file.

Once you have that command you can debug that. For example:

gdb -args llc /tmp/test.ll

Then gdb will be able to find your pass’ symbols.

Once that is working you may want to look into conditional breakpoints. As in “break here only if the function name turns out to be foo”. How that’s done in GDB I’m not sure but there are resources out there.

That or temporarily change your pass to only run on 1 specific function name (that’s probably easier to do).

Thanks for the help.

I think it is happening because action_listener_publish is a pointer and I am trying to cast it to an integer.


Same here,

(gdb) r
Starting program: /usr/bin/llc-14 TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.ll
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/usr/bin/llc-14: error: /usr/bin/llc-14: TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.ll:18571:16: error: invalid cast opcode for cast from '%"class.ft::TxtMqttFactoryClient"*' to 'i32'
  %5 = bitcast %"class.ft::TxtMqttFactoryClient"* %0 to i32
               ^
[Inferior 1 (process 11159) exited with code 01]
(gdb) 

So, I tried to bitcast using opcode,

for (auto &v : arg_values) {
        argsV.push_back(builder.CreateGlobalStringPtr(v->getName(), ""));
        //const DataLayout &DL = M.getDataLayout();
        unsigned SourceBitWidth = DL.getTypeSizeInBits(v->getType());
        //unsigned SourceBitWidth = cast<IntegerType>(v->getType())->getBitWidth();;
        //errs()<<"opcode: "<<CastInst::getCastOpcode(v, false, v->getType(), false)<<"\n";
       
        IntegerType *IntTy = builder.getIntNTy(SourceBitWidth);
        //Value *IntResult = builder.CreateBitCast(v, IntTy);
       
        Instruction::CastOps opcode = CastInst::getCastOpcode(v, false, v->getType(), false);      //This is the opcode
        Value *IntResult = builder.CreateCast(opcode, v, Type::getInt32Ty(context));                 // Passing it into the CreateCast
        Value *Int64Result = builder.CreateSExtOrTrunc(IntResult, Type::getInt32Ty(context));     // Then getting the 32 bit version of the value
        argsV.push_back(Int64Result);
}

But getting this error,


Here is the whole error,

_ZTv0_n20_N2ft23action_listener_publish10on_successERKN4mqtt5tokenE
_ZN2ft23action_listener_publishD2Ev
_ZTv0_n12_N2ft23action_listener_publishD1Ev
fatal error: error in backend: Cannot select: 0x6ef9c40: i32 = bitcast 0x48f2b00
  0x48f2b00: f64,ch = CopyFromReg 0x3570028, Register:f64 %23
    0x48f21a8: f64 = Register %23
In function: _ZN2ft20TxtMqttFactoryClient10publishLDREdsl
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0. Program arguments: clang++-14 --target=arm-linux-gnueabihf -flegacy-pass-manager -g -Xclang -load -Xclang ./pass/instrument.so -std=gnu++0x -std=c++0x -DDEBUG -DSPDLOG_ACTIVE_LEVEL=SPDLOG_LEVEL_TRACE -ITxtSmartFactoryLib/include -ITxtSmartFactoryLib/libs -I../Desktop/paho.mqtt.c/build/_install/include/ -I/home/ubuntu-18/Desktop/paho.mqtt.cpp/build/_install/include/ -Ideps/include -O0 -g3 -Wall -c -fmessage-length=0 -Wno-psabi -fPIC -MMD -MP -MFTxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.d -MTTxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o -o TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o TxtSmartFactoryLib/src/TxtMqttFactoryClient.cpp
1. <eof> parser at end of file
2. Code generation
3. Running pass 'Function Pass Manager' on module 'TxtSmartFactoryLib/src/TxtMqttFactoryClient.cpp'.
4. Running pass 'ARM Instruction Selection' on function '@_ZN2ft20TxtMqttFactoryClient10publishLDREdsl'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamEi+0x31)[0x7f7d209e36a1]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys17RunSignalHandlersEv+0xee)[0x7f7d209e13ee]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys15CleanupOnSignalEm+0x100)[0x7f7d209e2a60]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0xd79faa)[0x7f7d2090efaa]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0xd79f4b)[0x7f7d2090ef4b]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys7Process4ExitEib+0x27)[0x7f7d209dd947]
clang++-14[0x4136e2]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm18report_fatal_errorERKNS_5TwineEb+0x121)[0x7f7d2091da71]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0x166fb71)[0x7f7d21204b71]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj+0x38d8)[0x7f7d21204018]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0x2dd8197)[0x7f7d2296d197]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv+0x19f)[0x7f7d211fcc6f]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv+0x5c4)[0x7f7d211fc334]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE+0x1788)[0x7f7d211fb768]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE+0x8d9)[0x7f7d211f92d9]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0x2dd5451)[0x7f7d2296a451]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0x12e)[0x7f7d20d697de]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x3a0)[0x7f7d20b1dc20]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x33)[0x7f7d20b25213]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x946)[0x7f7d20b1e7c6]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsEN4llvm9StringRefEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISK_EE+0x3489)[0x7f7d27d14719]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(+0x1b89c01)[0x7f7d28038c01]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang8ParseASTERNS_4SemaEbb+0x244)[0x7f7d26eb4054]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang13CodeGenAction13ExecuteActionEv+0xb1)[0x7f7d28034f51]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang14FrontendAction7ExecuteEv+0x67)[0x7f7d289d6727]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x336)[0x7f7d2892dd86]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x29b)[0x7f7d28a4fe8b]
clang++-14(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0x98f)[0x41329f]
clang++-14[0x4114dc]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(+0x20fc392)[0x7f7d285ab392]
/usr/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm20CrashRecoveryContext9RunSafelyENS_12function_refIFvvEEE+0xdd)[0x7f7d2090ef2d]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZNK5clang6driver10CC1Command7ExecuteEN4llvm8ArrayRefINS2_8OptionalINS2_9StringRefEEEEEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPb+0x140)[0x7f7d285aae80]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZNK5clang6driver11Compilation14ExecuteCommandERKNS0_7CommandERPS3_+0x3f3)[0x7f7d28572693]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZNK5clang6driver11Compilation11ExecuteJobsERKNS0_7JobListERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0x8a)[0x7f7d2857291a]
/usr/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZN5clang6driver6Driver18ExecuteCompilationERNS0_11CompilationERN4llvm15SmallVectorImplISt4pairIiPKNS0_7CommandEEEE+0x1a7)[0x7f7d2858c407]
clang++-14(main+0x2816)[0x410f46]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f7d1ee02c87]
clang++-14(_start+0x2a)[0x40e3da]
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Ubuntu clang version 14.0.6
Target: arm-unknown-linux-gnueabihf
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/TxtMqttFactoryClient-c8d4ce.cpp
clang: note: diagnostic msg: /tmp/TxtMqttFactoryClient-c8d4ce.sh
clang: note: diagnostic msg:

********************
Makefile:111: recipe for target 'TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o' failed
make: *** [TxtSmartFactoryLib/Posix_Debug/src/TxtMqttFactoryClient.o] Error 70

This was caused because of casting a pointer to an integer.

So, I got the cast opcode and then cast it like following and it worked fine. The process was not killed.

for (auto &v : arg_values) {
        argsV.push_back(builder.CreateGlobalStringPtr(v->getName(), ""));
        //const DataLayout &DL = M.getDataLayout();
        unsigned SourceBitWidth = DL.getTypeSizeInBits(v->getType());
        //unsigned SourceBitWidth = cast<IntegerType>(v->getType())->getBitWidth();;
        //errs()<<"opcode: "<<CastInst::getCastOpcode(v, false, v->getType(), false)<<"\n";
       
        IntegerType *IntTy = builder.getIntNTy(SourceBitWidth);
        //Value *IntResult = builder.CreateBitCast(v, IntTy);
       
        Instruction::CastOps opcode = CastInst::getCastOpcode(v, false, v->getType(), false);      
        Value *IntResult = builder.CreateCast(opcode, v, Type::getInt32Ty(context));                     
        Value *Int64Result = builder.CreateSExtOrTrunc(IntResult, Type::getInt32Ty(context));   
        argsV.push_back(Int64Result);
}

However, getCastOpcode does not support array types such as [2 x i32], that’s why I got this error while executing the .ll file.


So, I kept the same for the other values but changed it for pointers like following,

or (auto &v : arg_values) {
        argsV.push_back(builder.CreateGlobalStringPtr(v->getName(), ""));
        const DataLayout &DL = M.getDataLayout();
        unsigned SourceBitWidth = DL.getTypeSizeInBits(v->getType());
        //unsigned SourceBitWidth = cast<IntegerType>(v->getType())->getBitWidth();;
        IntegerType *IntTy = builder.getIntNTy(SourceBitWidth);
        //Value *IntResult = builder.CreateBitCast(v, IntTy);
        //Instruction::CastOps opcode = CastInst::getCastOpcode(v, false, IntTy, false);
        //Value *IntResult = builder.CreateCast(opcode, v, IntTy);
       
        Value *IntResult;
        if(v->getType()->isPointerTy()){
                IntResult = builder.CreatePtrToInt(v, IntTy);
        }
        else{
                IntResult = builder.CreateBitCast(v, IntTy);
        }
        Value *Int32Result = builder.CreateSExtOrTrunc(IntResult, Type::getInt32Ty(context));
        argsV.push_back(Int32Result);
        //argsV.push_back(v);
}

But now the process is killed, any suggestions.

Hard to tell from here without seeing a minimal example.

Try to make a minimal example that just covers this particular situation. As in:

  • I have this minimal snippet of IR/C/whatever the input it is.
  • I want to do this to it.
  • This is how I am trying to do that.
  • I got that idea from <whatever docs or examples you read>.
  • I expect it to work like this.
  • But in fact this happens.

Try to summarise the goal in one sentence. I have a <x> and I want a <y>.

Which might all be in the previous posts here but it’s hard to pick that out.

Thanks for your help, I was able to solve it. The problem was pointer and arraytype to int cast. I found the error when I ran the .ll file with llc like following,

Then I cast the pointer using pointer to int. But for the array types, I tried dynamically casting the value to an array and then creating a copy of that array. It didn’t work, but I hope I can make it work. To understand how to use the llvm tools, I try to learn about that function or class from github and they try implementing that like I did here,

for (auto &v : arg_values) {
        //argsV.push_back(builder.CreateGlobalStringPtr(v->getName(), ""));
        const DataLayout &DL = M.getDataLayout();
        unsigned SourceBitWidth = DL.getTypeSizeInBits(v->getType());
        //unsigned SourceBitWidth = cast<IntegerType>(v->getType())->getBitWidth();;
        IntegerType *IntTy = builder.getIntNTy(SourceBitWidth);
        //Value *IntResult = builder.CreateBitCast(v, IntTy);
        
        Value *IntResult;
        
        if(v->getType()->isArrayTy()){
                //continue;
                auto *ArrayTy = dyn_cast<ArrayType>(v->getType());
                auto NumElements = ArrayTy->getNumElements();
                auto *NewArrayType = ArrayType::get(ArrayTy->getElementType(), NumElements);
                auto *NewIntArrayType = ArrayType::get(builder.getIntNTy(SourceBitWidth), NumElements);
                auto *NewArray = builder.CreateBitCast(v, NewArrayType);
                IntResult = builder.CreateBitCast(NewArray, NewIntArrayType);
        }
        if(v->getType()->isPointerTy()){        
                IntResult = builder.CreatePtrToInt(v, IntTy);
        }       
        else{
                IntResult = builder.CreateBitCast(v, IntTy);
        }
        Value *Int32Result = builder.CreateSExtOrTrunc(IntResult, Type::getInt32Ty(context));
        //llvm_unreachable("Invalid type for cast");
        argsV.push_back(Int32Result);
        ////Value *ty = Int32Result->getType(); //problem is here
        ////argsV.push_back(ty);
        //}
}

Thanks a lot for your help.

:+1: Good luck with the rest of your project.

1 Like

Thank you so much for your help.