Clang 3.6 and trunk, high RSS usage compared to GCC (12.5GB vs. 0.5GB)

Hi,

I found that after moving to Clang pre-3.6 (git
65d8b4c4998b3a0c20934ea72ede72ef4838a004) and trunk (git
718825a8666acd9ceaab70fc7868332f20e2758f) our internal build machines started
going offline in Jenkins. Clang after 3.5 release is consuming extreme amounts
of memory in some cases.

I have uploaded [1] one of affected files.

$ g++ -std=c++11 -c -O1 -fPIC vpp_generated.ii -o vpp_generated.o

vmpeak: 582432 KB
rspeak: 504500 KB = ~ 0.5GB

$ clang++ -std=c++11 -c -O1 -fPIC vpp_generated.ii -o vpp_generated.o

vmpeak: 12992076 KB
rspeak: 12820184 KB = ~12.5GB

Disabling optimzer (-O0) resolves the issue, and RSS usage drops to ~300MB.

I decided to write here directly instead of creating yet another bug report,
which usually don't get any feedback/comments.

clang version 3.7.0 (git 718825a8666acd9ceaab70fc7868332f20e2758f)
Target: x86_64-unknown-linux-gnu
Thread model: posix

Compiled:

../configure --prefix=<..> --enable-optimized --with-binutils-include=<..>
--disable-terminfo --enable-bindings=none CC=gcc CXX=g++ 'CPP=gcc -E'
'CXXCPP=g++ -E'

Cheers,
david
- - -
[1] http://davidlt.web.cern.ch/davidlt/vault/vpp_generated.ii.xz

Are you missing --disable-assertions?

Is the problem also apparent on the 3.6 branch or with the 3.6 RC1 build?

Ben

I could reproduce this locally with clang 3.5.1 from the ArchLinux packages. I
ran it through my heaptrack [1] tool until ~630MB peak memory where consumed,
which should give enough of an insight already. The full report can be found
at [2]. Here is an excerpt:

7137759 calls with 475.81MB peak consumption from:
    0x7f83d81ff4ef
      in /usr/bin/../lib/libLLVM-3.5.so
    llvm::LazyValueInfo::getConstant(llvm::Value*, llvm::BasicBlock*)
      in /usr/bin/../lib/libLLVM-3.5.so
    0x7f83d80ba72f
      in /usr/bin/../lib/libLLVM-3.5.so
    llvm::FPPassManager::runOnFunction(llvm::Function&)
      in /usr/bin/../lib/libLLVM-3.5.so
    0x7f83d78df9c3
      in /usr/bin/../lib/libLLVM-3.5.so
    llvm::legacy::PassManagerImpl::run(llvm::Module&)
      in /usr/bin/../lib/libLLVM-3.5.so
    clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::CodeGenOptions
const&, clang::TargetOptions const&, clang::LangOptions const&,
llvm::StringRef, llvm::Module*, clang::BackendAction, llvm::raw_ostream*)
      in /usr/bin/clang
    0x8735a1
      in /usr/bin/clang
    clang::ParseAST(clang::Sema&, bool, bool)
      in /usr/bin/clang
    clang::CodeGenAction::ExecuteAction()
      in /usr/bin/clang
    clang::FrontendAction::Execute()
      in /usr/bin/clang
    clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
      in /usr/bin/clang
    clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
      in /usr/bin/clang
    cc1_main(char const**, char const**, char const*, void*)
      in /usr/bin/clang
    main
      in /usr/bin/clang

HTH

[1]: Heaptrack - A Heap Memory Profiler for Linux - Milian Wolff
[2]: https://userpage.physik.fu-berlin.de/~milianw/heaptrack.clang.report.txt.gz

Hi,

It seems that we always disabled optimizer on some files for Clang and
I dropped this particular patch from our code base.

I tried to bisect Clang and went as far as 3.3 LLVM/Clang release with
the same memory consumption.

I moved to Fedora 21 based VM. GCC 4.9.2 used <500MB, and Clang 3.3 till
trunk above 10GB until VM hanged.

We didn't use --disable-assertions. Clang still has a number of issue
with our code base (incl. crashing/asserting). pre-3.6 Clang improves
(no more aborting), but some issues remain. For those I filled bug reports.
Yet to test the current 3.6 RC1 with our code base to see if it regressed
or not.

david

I'd like to add that I was having similar issues with Clang 3.5, while
compiling test-code inside the QtCreator code base, with optimizer enabled.

I'm not sure if it's known or if I should file a bug report for that.

Relevant issue: Attempting to compile the tst_dumpers.cpp [1] target, with
optimizing flags enabled (-O2, iirc). => Clang runs OOM (with more than 5 GB
allocated)
Fix: Don't pass -On -- https://codereview.qt-project.org/#/c/103556/

Note: tst_Dumpers::dumper() is a 5 KLOC function, which *might* be an issue :slight_smile:

[1] https://qt.gitorious.org/qt-creator/qt-creator/source/487b05dba8b6e8f548ec3cd451965fdb6df71e4d:tests/auto/debugger/tst_dumpers.cpp

I have seen a number of cases in pkgsrc, where the Value Propagation
pass results in a very high CPU and memory use. Attached is a hack that
allows disabling the pass via "-mllvm -disable-value-propagation". It
seems you are hitting the same issue.

Joerg

prop.diff (2.19 KB)

We also looked into it.

This seems to be coming from LLVM, not Clang code. We did a heap profile
with IgProf [1] in a middle of compilation.

In our case seems that all allocations are from here:

lookup(Val)[BB] is constantly allocating std::map<AssertingVH<BasicBlock>,

(104 bytes), that slowly eats the memory.

Current impression is that cache is exploding.

[1] http://igprof.org/

Hi,

I reduced -O1 passed to the following:

opt -lazy-value-info -correlated-propagation -reassociate -loops -lcssa
-loop-rotate -licm -loop-unswitch -scalar-evolution -loop-deletion
-function_tti -loop-unroll -memdep -memcpyopt -sccp -lazy-value-info
-correlated-propagation -memdep -dse -adce a.bc > /dev/null

Seems that removing or re-aranging the passes removes the problem.

david