Clang 'locks up' when compiling optimized

The file isn’t very large, at 181949 bytes, and is a machine generated bit of code.

The strange thing is that if I run the compile by hand by cutting/pasting the line into a shell window, it compiles in seconds. I run the same code generation/compile/execution with g++ (5.1) and it never locks up.

Another thing, the code is compiled from a daemon that places the compiler under execution limits. If it runs for more than 30 seconds or uses more than 500MB of RAM, it should have the appropriate limit applied to it. I fork the daemon and before exec() for the compiler do the following:

struct rlimit rlim;
rlim.rlim_cur = rlim.rlim_max = m_compile_max_seconds; // 30

if (0 != setrlimit(RLIMIT_CPU, &rlim))
{
xthrow(Sys_rlimit, errno, “system”, “can’t bound compilation time”);
}

rlim.rlim_cur = rlim.rlim_max = m_compile_max_memory * (1 << 20); // 500 MB

if (0 != setrlimit(RLIMIT_AS, &rlim))
{
xthrow(Sys_rlimit, errno, “system”, “can’t limit compiler memory usage”);
}

This is the compile that locked up. If anyone believes that looking at the source would make a difference, let me know and I’ll send it along.

501 59880 46191 4004 0 31 10 2495004 9612 - SN 0 ?? 0:00.02 /opt/local/libexec/llvm-4.0/bin/clang++ -pipe -c -o /Users/barto/UnixEnvironment/CSI/internal/repo4/internal.0/code/381/1.opt -O3 -Winvalid-pch -march=core2 -fstack-protector-strong -D_BSD_SOURCE -DFOR_SPARQL -D_REENTRANT -D_PTHREADS -DTHREAD -D_GLIBCXX_USE_DEPRECATED=0 -DTURBO_GENCODE=1 -DDO_CASSANDRA=0 -DMEM_LIMIT_LEAK_CHECKING -DFULL_RESERVATIONS -DGCC5 -D_DARWIN_C_SOURCE -DDARWIN -DMAC_OSX=1 -std=gnu++14 -m64 -fPIC -I/Users/barto/UnixEnvironment/CSI/repo4/lib -I/Users/barto/UnixEnvironment/CSI/repo4/lib/cgrsrc /Users/barto/UnixEnvironment/CSI/internal/repo4/internal.0/code/381/1.cpp
501 59881 59880 4004 0 31 10 2554904 33884 - SN 0 ?? 21:37.14 /opt/local/libexec/llvm-4.0/bin/clang -cc1 -triple x86_64-apple-macosx10.10.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -disable-free -disable-llvm-verifier -discard-value-names -main-file-name 1.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core2 -target-linker-version 274.2 -dwarf-column-info -debugger-tuning=lldb -coverage-notes-file /Users/barto/UnixEnvironment/CSI/internal/repo4/internal.0/code/381/1.gcno -resource-dir /opt/local/libexec/llvm-4.0/bin/…/lib/clang/4.0.0 -D _BSD_SOURCE -D FOR_SPARQL -D _REENTRANT -D _PTHREADS -D THREAD -D _GLIBCXX_USE_DEPRECATED=0 -D TURBO_GENCODE=1 -D DO_CASSANDRA=0 -D MEM_LIMIT_LEAK_CHECKING -D FULL_RESERVATIONS -D GCC5 -D _DARWIN_C_SOURCE -D DARWIN -D MAC_OSX=1 -I /Users/barto/UnixEnvironment/CSI/repo4/lib -I /Users/barto/UnixEnvironment/CSI/repo4/lib/cgrsrc -stdlib=libc++ -O3 -Winvalid-pch -std=gnu++14 -fdeprecated-macro -fdebug-compilation-dir /Users/barto/UnixEnvironment/CSI/repo4/bin -ferror-limit 19 -fmessage-length 0 -stack-protector 2 -fblocks -fobjc-runtime=macosx-10.10.0 -fencode-extended-block-signature -fcxx-exceptions -fexceptions -fmax-type-align=16 -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o /Users/barto/UnixEnvironment/CSI/internal/repo4/internal.0/code/381/1.opt -x c++ /Users/barto/UnixEnvironment/CSI/internal/repo4/internal.0/code/381/1.cpp

David

David Barto
barto@cambridgesemantics.com

Sometimes, my best code does nothing. Most of the rest of it has bugs.

Does it actually lock up, or just take a very long time. LLVM does have problems with very large functions, which leads to long times for “instruction selection” (worse in debug builds of the compiler too)

The same applies to g++ - I had something that was about 100k lines that took over 15 minutes to compile a while back - tweaking the options changed it to about 20s. Just because one compiler is “good” and the other “bad” doesn’t mean that the “bad” one is broken, it’s all depending on what the code looks like, one may well run through the compilation quickly, and the other take very long - “The devil is in the detail”. In my g++ case, it was "dead store elimination, that took a long time, and on a file that is several megabytes, the difference with DSE enabled was a few kilobytes - from what I can tell [without looking at the code], g++ does DSE in O(n^2) time, by something akin to for_each(instructions) { for_each(instructions) check_this_instruction(); }

Does it actually lock up, or just take a very long time. LLVM does have problems with very large functions, which leads to long times for “instruction selection” (worse in debug builds of the compiler too)

The same applies to g++ - I had something that was about 100k lines that took over 15 minutes to compile a while back - tweaking the options changed it to about 20s. Just because one compiler is “good” and the other “bad” doesn’t mean that the “bad” one is broken, it’s all depending on what the code looks like, one may well run through the compilation quickly, and the other take very long - “The devil is in the detail”. In my g++ case, it was "dead store elimination, that took a long time, and on a file that is several megabytes, the difference with DSE enabled was a few kilobytes - from what I can tell [without looking at the code], g++ does DSE in O(n^2) time, by something akin to for_each(instructions) { for_each(instructions) check_this_instruction(); }

Mats

This was left running overnight. It was completely locked and wasn’t making any progress.

Just scraping the compile line from the PS output and pasting it into a shell has the compiler running in about 5-8 seconds. So something about running this through my compile daemon did something weird.

It doesn’t happen on the same file every time. If I delete the code cache and re-run my system again, it will pick another file to lock up on, or possibly run to completion without locking up. It appears random.

David

David Barto
barto@cambridgesemantics.com

Sometimes, my best code does nothing. Most of the rest of it has bugs.

the code is compiled from a daemon

does is also lock up without the deamon - or only with?

This the the stack trace when the compiler locked up.
I attached with ‘lldb -p ”
I did the thread backtrace all then a process resume
I interrupted the program again and did a second thread backtrace all. Both were identical.

David

(lldb) thread backtrace all

  • thread #1: tid = 0x13b475b, 0x00007fff90ec65da libsystem_kernel.dylib`syscall_thread_switch + 10, queue = ‘com.apple.main-thread’, stop reason = signal SIGSTOP
  • frame #0: 0x00007fff90ec65da libsystem_kernel.dylibsyscall_thread_switch + 10 frame #1: 0x00007fff9497682d libsystem_platform.dylib_OSSpinLockLockSlow + 63
    frame #2: 0x00007fff8ca7271b libsystem_malloc.dylibszone_malloc_should_clear + 116 frame #3: 0x00007fff8ca72667 libsystem_malloc.dylibmalloc_zone_malloc + 71
    frame #4: 0x00007fff8ca71187 libsystem_malloc.dylibmalloc + 42 frame #5: 0x00007fff991fa43e libc++.1.dyliboperator new(unsigned long) + 30
    frame #6: 0x00007fff991fcf05 libc++.1.dylibstd::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(char const*, unsigned long) + 59 frame #7: 0x000000010e6fc7a9 libLLVM.dylibllvm::sys::findProgramByName(llvm::StringRef, llvm::ArrayRefllvm::StringRef) + 670
    frame #8: 0x000000010e6fd22c libLLVM.dylibprintSymbolizedStackTrace(llvm::StringRef, void**, int, llvm::raw_ostream&) + 186 frame #9: 0x000000010e6fda7b libLLVM.dylibllvm::sys::PrintStackTrace(llvm::raw_ostream&) + 93
    frame #10: 0x000000010e6fd116 libLLVM.dylibllvm::sys::RunSignalHandlers() + 83 frame #11: 0x000000010e6fde4d libLLVM.dylibSignalHandler(int) + 183
    frame #12: 0x00007fff94977f1a libsystem_platform.dylib_sigtramp + 26 frame #13: 0x00007fff8ca757da libsystem_malloc.dylibszone_free_definite_size + 4827
    frame #14: 0x000000010eb8a45b libLLVM.dylibstd::__1::__tree<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::__map_value_compare<llvm::Value*, std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::less<llvm::Value*>, true>, std::__1::allocator<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> > > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, void*>*) + 41 frame #15: 0x000000010eb8a44f libLLVM.dylibstd::__1::__tree<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::__map_value_compare<llvm::Value*, std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::lessllvm::Value*, true>, std::__1::allocator<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> > > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, void*>) + 29
    frame #16: 0x000000010eb8a44f libLLVM.dylibstd::__1::__tree<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::__map_value_compare<llvm::Value*, std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, std::__1::less<llvm::Value*>, true>, std::__1::allocator<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> > > >::destroy(std::__1::__tree_node<std::__1::__value_type<llvm::Value*, llvm::Optional<(anonymous namespace)::BitPart> >, void*>*) + 29 frame #17: 0x000000010eb894e0 libLLVM.dylibllvm::recognizeBSwapOrBitReverseIdiom(llvm::Instruction
    , bool, bool, llvm::SmallVectorImplllvm::Instruction*&) + 1224
    frame #18: 0x000000010ec3f969 libLLVM.dylibllvm::InstCombiner::MatchBSwap(llvm::BinaryOperator&) + 391 frame #19: 0x000000010ec3fe7c libLLVM.dylibllvm::InstCombiner::visitOr(llvm::BinaryOperator&) + 636
    frame #20: 0x000000010ec2e3a3 libLLVM.dylibllvm::InstCombiner::run() + 1261 frame #21: 0x000000010ec2f05c libLLVM.dylibcombineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, bool, llvm::LoopInfo*) + 2431
    frame #22: 0x000000010ec2f2d7 libLLVM.dylibllvm::InstructionCombiningPass::runOnFunction(llvm::Function&) + 297 frame #23: 0x000000010e78e1ba libLLVM.dylibllvm::FPPassManager::runOnFunction(llvm::Function&) + 290
    frame #24: 0x000000010ee59722 libLLVM.dylib(anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) + 810 frame #25: 0x000000010e78e6be libLLVM.dylibllvm::legacy::PassManagerImpl::run(llvm::Module&) + 606
    frame #26: 0x000000010d26c481 clangclang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 10253 frame #27: 0x000000010d38e53d clangclang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 1035
    frame #28: 0x000000010d693f59 clangclang::ParseAST(clang::Sema&, bool, bool) + 374 frame #29: 0x000000010d4fa5bd clangclang::FrontendAction::Execute() + 69
    frame #30: 0x000000010d4c89d0 clangclang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 722 frame #31: 0x000000010d526144 clangclang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1976
    frame #32: 0x000000010d205d96 clangcc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1371 frame #33: 0x000000010d20503e clangmain + 8255
    frame #34: 0x00007fff979fb5c9 libdyld.dylibstart + 1 frame #35: 0x00007fff979fb5c9 libdyld.dylibstart + 1
    (lldb)

This seems to point back to your RLIMIT_AS constraint, or overall system
memory availability. Why constrain the address space? A much more
realistic constraint would be RLIMIT_RSS.

-Brian

This is part of an in-memory system (no swap space configured) so RSS would match the AS size for this use case. From what I read about RSS and AS for MacOS and Linux.

Why did it lock up, why not throw the exception and exit?

David

Dunno, it seems like the OS is driving now and it's not immediately clear
to me why that system call wouldn't yield either success or failure. But
clang is asking for a resource (more memory), and I've seen those stall
before. My experience with linux (may or may not be applicable) leads me
to believe that the system is perhaps resource-constrained and your task is
pending while it tries to free up those resources.

RLIMIT_AS and RLIMIT_RSS are distinct on linux, I guess I am a little
surprised to see that they're not on MacOS.

In any case, the most likely culprit is your setrlimit. If I were you I
would take clang out of the loop entirely and write a test program that
does allocations just like the ones clang does (various sized mallocs, you
could try profiling to get a ballpark histogram). I would be surprised if
you don't see the same behavior.

-Brian

Looks like the bug is that the crash handler is attempting to allocate memory, and the reason it was crashing was that it ran out of memory.

Sounds like a real clang issue to deal with as the compiler should not assume infinite resources.

David

Hmm, yes, I didn’t notice that in the backtrace but you’re right.

I don’t think “assume infinite resources” is the bug though. malloc()'s not signal-safe on linux, so it probably isn’t on MacOS either. We shouldn’t be calling malloc() from the signal handler.

As a practical matter, maybe this is a feature that you could disable either at build-time or runtime?

I’m going to disable (for the MacOS Builds) the memory limit checks for the time being.

Should I register a bug about this or is there someone with ‘more authority’ who would be more appropriate?

David

AFAIK from this community you should feel free to open a bug. If I were
you that's what I'd do. Maintainers are free to debate your (and my) claim
that this is not how clang should behave. The challenge will be coming to
a resolution on how to address the problem, and perhaps finding leverage to
get someone to execute the fix if you don't know how yourself.

The rich set of behavior that snaps into action when the compiler faults is
beneficial to gathering bug reports from end users. Preserving that
functionality while making the signal handler robust/safe will require a
smarter design.

As an aside it would be cool to write an analyzer rule that checks for
exclusively signal-safe system calls that come from functions registered as
signal handlers.

-Brian