Analysis of polly-detect overhead in oggenc

Hi,
I think this should also go on the llvm-dev mailing list...

Hi Tobias,

Hi Star,

thanks for the update. I copied the polly mailing list, as I believe
this discussion is very valuable.

I tried to dig into oggenc.ll, one of the two cases we had for
scop-detect compile-time overhead, but I may need your help.

First, we cannot simply reduce code size by changing part of function
defines to declares. Experimental results shows that the relative
scop-detect time always goes down as we reduce the code size. That is
similar to what you have pointed out in your previous mail.

For example, if we divide the oggen.ll into oggen-top-half.ll (replace
all function defines into declares in the second half) and
oggen-bottom-half.ll (replace all function defines into declares in the
first half) , then their compile-time would be:

//Data explain: total time is 4.428s, the absolute time of Polly-detect
is 0.8780s and its relative compile time percentage is 20.1%.
oggenc.ll: Total(4.428s), Polly-Detect(0.8780s, 20.1%)
oggenc-top-half.ll: Total(2.680s), Polly-Detect(0.3828s, 14.2%)
oggenc-bottom-half.ll: Total(1.552s), Polly-Detect(0.2608s, 16.9%)

To investigate the relationship between the relative scop-detect time and
source code size, I also evaluated some big files based on oggenc.ll.
For example, we can generate a twice bigger file oggenc2.ll with the
following command:
cat oggenc.ll oggenc.ll > oggenc2.ll //write two copies
of oggen.ll into oggen2.ll
'<,'> s/define \(.*\)@/define \1@t2_ //replace all defines
in the second half to avoid redefinition
'<,'> s/declare \(.*\)@/declare \1@t2_ //replace all declares
in the second half to avoid redefinition
Similarly, we can generate oggenc4.ll (four copies of oggenc.ll) and
oggen8.ll (eight copies of oggenc.ll). Their compile-time would be:

oggenc: Total( 4.428s), Polly-Detect(0.8780s, 20.1%)
oggenc*2: Total(10.204s), Polly-Detect(2.8121s, 27.8%)
oggenc*4: Total(24.748s), Polly-Detect(9.9507s, 40.4%)
oggenc*8: Total(69.808s), Polly-Detect(39.4136, 56.6%)

Results show that the percentage of scop-detect compile time is
significantly increased as the code size is getting bigger. This can also
explain why our two cases oggenc.ll (5.9M) and tramp3d-v4.ll(19M) are both
big file.

Wow, this is an extremely interesting finding.

Second, I still have not found the reason why the relative scop-detect
compile time percentage is significantly increased as the compiled code size
is getting bigger.

My initial idea is that the global variable "FunctionSet
InvalidFunctions" (declared in include/polly/ScopDetection.h) may increase
compile-time because it always contains all functions. Since the variable
InvalidFunctions never releases its resource, the InvalidFunctions.count()
operation in ScopDetection.cpp would increases when more function pointers
are put into this Set container. Based on this idea, I attached a patch file
to fix this problem. Unfortunately, results show that the compile-time of
Polly-detect pass has no change with this patch file.

I doubt that the key should be some scop-detect operations, for which the
compile-time would increase as the code size or the number of functions
increase, but I have not found such operations. Do you have any suggestions?

Before we write a patch, we should do some profiling to understand where
the overhead comes from. I propose you generate oggenc*16 or even
oggen*32 to ensure we get to about 90% Polly-Detect overhead.

I would then run Polly under linux 'perf'. Using 'perf record polly-opt
...' and then 'perf report'. If we are lucky, this points us exactly to
the function we spend all the time in.

Cheers,
Tobias

Thanks for your very useful suggestion.
I have profiled the oggenc*16 and oggenc*32 and the results are listed as
follows:

oggenc*16: polly-detect compile-time percentage is 71.3%. The top five
functions reported by perf are:
     48.97% opt opt [.]
llvm::TypeFinder::run(llvm::Module const&, bool)
       7.43% opt opt [.]
llvm::TypeFinder::incorporateType(llvm::Type*)
       7.36% opt opt [.]
llvm::TypeFinder::incorporateValue(llvm::Value const*)
       4.04% opt libc-2.17.so [.] 0x0000000000138bea
       2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a

oggenc*32: polly-detect compile-time percentage is 82.9%. The top five
functions reported by perf are:
     57.44% opt opt [.]
llvm::TypeFinder::run(llvm::Module const&, bool)
     11.51% opt opt [.]
llvm::TypeFinder::incorporateType(llvm::Type*)
       7.54% opt opt [.]
llvm::TypeFinder::incorporateValue(llvm::Value const*)
       2.66% opt libc-2.17.so [.] 0x0000000000138c02
       2.26% opt opt [.]
llvm::SlotTracker::processModule()

It is surprise that all compile-time for TypeFinder is added into the
compile-time for Polly-detect, but I cannot find the any call instructions
to TypeFinder in Polly-detect.

Do you have further suggestion?

You can try to set a breakpoint with gdb on llvm::TypeFinder::run
and then see the backtrace to figure out where this is called from Polly.

Thanks,
Sebastian