Memory and time consumption for larger files

Hello,
   recently I encountered one rather unpleasant issue that
I would like to share with you.
   Me and few colleagues are working on a project where we want to
create a development environment for application-specific processor
processors (web pages, currently not much up-to-date are here: http://merlin.fit.vutbr.cz/Lissom/).
One part of this project is a compiler generator.
   To generate instruction selection patterns from our
architecture description language ISAC, one function that describes semantics of each
instruction is generated. File that contains these functions is then
compiled and contents of functions are optimized, so I get something quite close
to instruction selection patterns.
   For some architectures like ARM, the count of generated functions
is huge (e.g. 50000) and the resulting C file is huge too.

The problem here is that the compilation to LLVM IR using frontend takes
enormous amount of time and memory.

When we're testing Clang's performance, we tend to test the different phases of translation to determine where we have a performance problem. For example:

  1) Using the -E option tests preprocessing performance
  2) Using the -fsyntax-only option tests parsing/semantic analysis performance (its time includes (1))
  3) Using the -emit-llvm option tests IR generation performance (its time includes (1) and (2))

llvm-g++ takes the same command-line options, so one can build up a chart fairly quickly to see which phase of Clang's translation is slower than llvm-g++'s corresponding phase. That would tell us where we need to look to start optimizing.

  - Doug

Hello, right after I sent previous message, I realized that I had wrong paths set,
sorry for that.
The previous results were for unoptimized clang,
here i fix it and results are for optimized clang.
Now the times match those of llvm-g++, however the huge memory consumption,
especially compared to standard gcc, did not change.
I am not sure if this is still a problem, however, on a 32-bit machine,
it will not be possible to compile the testing code.

As clang was used:
$clang --version
clang version 1.5 (trunk 99809)
Target: x86_64-unknown-linux-gnu

With *** are marked new measurement results.

1) g++ -DLISSOM_SEM -O0 -c -o tst.o clang-large-source.c
(time is only illustrative, because object code file is generated)
time: 12m17.064s
top memory approx: 2.6 GB

2) llvm-g++ -DLISSOM_SEM -O0 -c --emit-llvm -o tst.bc clang-large-source.c
time: 6m28.518s
top memory approx: 8 GB

*** 3a) clang -DLISSOM_SEM -DCLANG -c -O0 -o tst.bc clang-large-source.c
*** time: 1m55s (avg. from 3 runs) (this is almost fine, later I would report what will happen on a computer with more memory)
*** top memory approx 8.5 GB

for 2,5 MB file:
g++ (with obj. code generation): 1m 6s
llvm-g++: 7 s

*** time clang -DLISSOM_SEM -DCLANG -O0 -c -emit-llvm 2_5MB_clang_test.c
*** clang: 8.5 s (avg from 3 runs)

for 1 MB file:

g++ (with obj. code generation): 23 secs
llvm-g++: 2.5 s

*** time clang -DLISSOM_SEM -DCLANG -O0 -c -emit-llvm 1MB_clang_test.c
*** clang time: 2.7 secs (avg from 3 runs)

The memory consumption still looks like it could be an issue. However, we still need to know which part of the compiler is requiring so much memory: I'm guessing it's the LLVM back-end, since both llvm-g++ and clang++ are affected (and we've rarely seen Clang itself take more memory than GCC). Could you measure the memory usage difference between using -fsyntax-only and using -emit-llvm?

  - Doug

Hello,

   so I did some memory consumption measurement and the results are as follows:

source file size: 12MB
resulting .bc file with debug info has 250MB

clang: max. 4.8 GB
clang -fsyntax-only: max. ~120 MB
llvm-gcc: max. 4.1 GB
gcc 4.4.1: max 1.4 GB
opt -O3: max. 3.6 GB

source file size: 24MB

clang: max. 7.5 GB, stopped after 1 hour
gcc: after 11 seconds: gcc: Internal error: Segmentation fault (program cc1), max. 1.4 GB

Details and graphs showing dependence between time and
memory consumption can be found in http://lissom.aps-brno.cz/tmp/clang.xls

(the values in previous email were wrong, I was adding the virtual memory (VIRT) and
real memory (RES) and I did not realize that on by virtual memory is meant the whole
address space used by a process).

The testing file contains many simple functions (each has max. 30 lines).
It seems that the high memory consumption may be caused by
the memory LLVM IR representation, because the "opt" has similar requirements.
The LLVM IR objects have only virtual destructors, so the virtual function tables
should no be large(?), maybe that there are so many pointers between instructions,
values etc. (but of course, they are very useful). Is this not caused by debug information?
(I have not much inspected this, but using -g0 for clang seemed to have not much impact.)

Maybe, is it no necessary for the frontend to hold the whole program in memory?
However, it is for sure useful for the optimizer.

This memory consumption also limits the possibilities for paralellization,
if I would like to run two optimizers on one computer at the same time
on two different programs (lets say I would split my testing file into two),
because of the memory consumption, most of the time would be spent on
swapping.

There is also a question, whether this is really for clang and llvm an issue,
if it is not just my personal problem because I need to compile generated files,
anyway, gcc uses almost three times less memory.

Best regards
   Adam Husar

source file size: 24MB

clang: max. 7.5 GB, stopped after 1 hour
gcc: after 11 seconds: gcc: Internal error: Segmentation fault (program
cc1), max. 1.4 GB

Just a note:
I reported this bug to the gcc bugzilla and it is already fixed.
There was a problem with allocating large data on stack using alloca.

so I did some memory consumption measurement and the results are as
follows:

source file size: 12MB
resulting .bc file with debug info has 250MB

clang: max. 4.8 GB
clang -fsyntax-only: max. ~120 MB

Good to know.

llvm-gcc: max. 4.1 GB
gcc 4.4.1: max 1.4 GB
opt -O3: max. 3.6 GB

Can you check the max memory consumption of "opt result.bc
-disable-output" (running opt without any optimization passes)?

-Eli

This is interesting. I infer that these memory usage numbers are for "clang -O0 -g"? If so, you might want to try a new build, last night several memory and compile time improvements went in for debug info, and more are coming.

-Chris

I am curious what these two numbers tell us? Does clang's memory consumption is higher than gcc's fe or
clang's output forces more memory consumption on the part of llvm back-end? I think it is the latter rather
than the former, as we generally consume less memory in clang than gcc's fe.

- Fariborz

Hello,
   I made some tests for the latest revision 100183
and the memory consumption has gotten even higher compared to revision 99809.

For the same 12MB file:

clang (no args): 5GB
clang -O0 -g0: 5GB
clang -O0 -g: 6.3GB, stopped after 33 minutes
opt --disable-output: 3.7 GB
opt -O3 --disable-output: 3.9 GB

Details can be found here: http://lissom.aps-brno.cz/tmp/clang_rev100183.xls
and http://lissom.aps-brno.cz/tmp/clang_rev99809.xls

I will probably solve this problem by splitting this large file into
smaller ones and by buying more memory, but I think it would be at
least interesting to see, what is the main reason that the memory consumption
is so much higher than in gcc.

Best regards
   Adam

Please try again post r100261, it should substantially improve the 1.3G of bloat going from -g0 -> -g.

-Chris

Hello,
  I made some tests for the latest revision 100183
and the memory consumption has gotten even higher compared to revision
99809.

For the same 12MB file:

clang (no args): 5GB
clang -O0 -g0: 5GB
clang -O0 -g: 6.3GB, stopped after 33 minutes
opt --disable-output: 3.7 GB
opt -O3 --disable-output: 3.9 GB

Details can be found here:
http://lissom.aps-brno.cz/tmp/clang_rev100183.xls
and http://lissom.aps-brno.cz/tmp/clang_rev99809.xls

Please try again post r100261, it should substantially improve the 1.3G of bloat going from -g0 -> -g.

-Chris

Hello,

   so I bought some memory, so my computer has 8 GB now and results for a quite recent revision 100620
are as follows:

clang -g0 -O0: 5 GB
clang -g1 -O0: 7.6 GB (for previous results for revision 100183, the test was stopped before it ended)
clang -g3 -O0: 7.6 GB

opt --disable-output: 3.8 GB (for file generated with clang -g3)

gcc -g0 -O0: 1.4 GB (object file for x86 was also generated)
gcc -g1 -O0: 1.4 GB
gcc -g3 -O0: 2.0 GB

The memory consumption for large files is much higher than in gcc.
The only serious problem for standard usage I can imagine right now, is when "make" tool runs
multiple instances of clang in parallel. Also, for large generated files would be the compilation
on standard computer unbearably slow because of swapping, or on a 32-bit machine impossible
because of insufficient address space.

Again, details are here:
http://lissom.aps-brno.cz/tmp/clang_rev100620.xls,
http://lissom.aps-brno.cz/tmp/clang_rev100183.xls
and http://lissom.aps-brno.cz/tmp/clang_rev99809.xls.

Best regards,
   Adam