Analysis of polly-detect overhead in oggenc

Hi Tobias,

Hi Star,

[...]

Before we write a patch, we should do some profiling to understand where
the overhead comes from. I propose you generate oggenc*16 or even
oggen*32 to ensure we get to about 90% Polly-Detect overhead.

I would then run Polly under linux 'perf'. Using 'perf record polly-opt
...' and then 'perf report'. If we are lucky, this points us exactly to
the function we spend all the time in.

Cheers,
Tobias

Thanks for your very useful suggestion.
I have profiled the oggenc*16 and oggenc*32 and the results are listed as follows:

oggenc*16: polly-detect compile-time percentage is 71.3%. The top five functions reported by perf are:
      48.97% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool)
        7.43% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*)
        7.36% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*)
        4.04% opt libc-2.17.so [.] 0x0000000000138bea
        2.06% opt [kernel.kallsyms] [k] 0xffffffff81043e6a

oggenc*32: polly-detect compile-time percentage is 82.9%. The top five functions reported by perf are:
      57.44% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool)
      11.51% opt opt [.] llvm::TypeFinder::incorporateType(llvm::Type*)
        7.54% opt opt [.] llvm::TypeFinder::incorporateValue(llvm::Value const*)
        2.66% opt libc-2.17.so [.] 0x0000000000138c02
        2.26% opt opt [.] llvm::SlotTracker::processModule()

It is surprise that all compile-time for TypeFinder is added into the compile-time for Polly-detect, but I cannot find the any call instructions to TypeFinder in Polly-detect.

Yes, this does not seem very conclusive. We probably need a call graph
to see where those are called.

Did you try running 'perf record' with the '-g' option? This should give
you callgraph information, that should be very helpful to track down the
callers in Polly. Also, if you prefer a graphical view of the
results, you may want to have a look at Gprof2Dot [1]. Finally, if this
all does not work, just running Polly in gdb and randomly breaking a
couple of times (manual sampling), may possibly hint you to the right place.

I also tried perf with -g, but it report nothing useful. the result of perf -g is:
- 48.70% opt opt [.] llvm::TypeFinder::run(llvm::Module const&, bool) `
   - llvm::TypeFinder::run(llvm::Module const&, bool)
      + 43.34% 0
      - 1.78% 0x480031
         + llvm::LoadInst::~LoadInst()
      - 1.41% 0x460031
         + llvm::LoadInst::~LoadInst()
      - 1.01% 0x18
           llvm::BranchInst::~BranchInst()
           0x8348007d97fa3d8d
      - 0.87% 0x233
         + llvm::GetElementPtrInst::~GetElementPtrInst()
      - 0.57% 0x39
         + llvm::SExtInst::~SExtInst()
      - 0.54% 0x460032
         + llvm::StoreInst::~StoreInst()

GDB is a useful tool! Thanks for Sebastian's advice!

By setting a break point on llvm::TypeFinder::run(llvm::Module const&, bool), I find most of calling cases are issued from the following two callsites:
0xb7c1c5d2 in polly::ScopDetection::isValidMemoryAccess(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const ()
0xb7c1d754 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const ()

The detailed backtrace of "isValidMemoryAccess" is:
#0 0x0907b780 in llvm::TypeFinder::run(llvm::Module const&, bool) ()
#1 0x08f76ebe in llvm::TypePrinting::incorporateTypes(llvm::Module const&) ()
#2 0x08f76fc9 in llvm::AssemblyWriter::init() ()
#3 0x08f77176 in llvm::AssemblyWriter::AssemblyWriter(llvm::formatted_raw_ostream&, llvm::SlotTracker&, llvm::Module const*, llvm::AssemblyAnnotationWriter*) ()
#4 0x08f79d1a in llvm::Value::print(llvm::raw_ostream&, llvm::AssemblyAnnotationWriter*) const ()
#5 0xb7c1d044 in polly::ScopDetection::isValidInstruction(llvm::Instruction&, polly::ScopDetection::DetectionContext&) const ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#6 0xb7c1ea75 in polly::ScopDetection::allBlocksValid(polly::ScopDetection::DetectionContext&) const ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#7 0xb7c1f4aa in polly::ScopDetection::isValidRegion(polly::ScopDetection::DetectionContext&) const ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#8 0xb7c1fd16 in polly::ScopDetection::findScops(llvm::Region&) ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#9 0xb7c1fd81 in polly::ScopDetection::findScops(llvm::Region&) ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#10 0xb7c206f7 in polly::ScopDetection::runOnFunction(llvm::Function&) ()
   from /home/star/llvm/llvm_build/tools/polly/Release+Asserts/lib/LLVMPolly.so
#11 0x09065fdd in llvm::FPPassManager::runOnFunction(llvm::Function&) ()
#12 0x09067e2b in llvm::FunctionPassManagerImpl::run(llvm::Function&) ()
#13 0x09067f6d in llvm::FunctionPassManager::run(llvm::Function&) ()
#14 0x081e6040 in main ()

>Also, can you upload the .ll file somewhere, such that I can access it?

(Please do not attach it to the email)

I have attached the source code of oggenc.c and oggen.ll in the bug r16624:
http://llvm.org/bugs/show_bug.cgi?id=16624

Best wishes,
Star Tan

Tobi,

it looks like this code is the problem:

    for (std::vector<Value *>::iterator PI = Pointers.begin(),
           PE = Pointers.end();
         ;) {
      Value *V = *PI;

      if (V->getName().size() == 0)
        OS << "\"" << *V << "\"";
      else
        OS << "\"" << V->getName() << "\"";

      ++PI;

      if (PI != PE)
        OS << ", ";
      else
        break;
    }

    INVALID_NOVERIFY(Alias, OS.str());

it prints values to OS even when debug is not turned on:
if you remember, I have already sent a patch to conditionally execute this code
in a DEBUG() stmt. You rejected that patch because it would not work for
other reporting tools -polly-show or some other flags.

I have found that the extremely expensive compile-time overhead comes from the string buffer operation for “INVALID” MACRO in the polly-detect pass.
Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.

However, this patch file does not solve this problem. It only shows the reason why polly-detect pass leads to significant compile-time overhead.

Best wishes,
Star Tan.

hack_for_polly_detect.patch (3.82 KB)

Hi Sebastian,

Yes, you have pointed an important reason. If we comment this source code you have listed, then the compile-time overhead for oggenc*8.ll can be reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%).

I just sent another mail to explain why polly-detect pass leads to significant compile-time overhead. Besides the reason you have pointed, another reason is resulted from those string buffer operations in “INVALID” MACRO. If we comment both the string buffer operations in “INVALID” MACRO and in the “isValidMemoryAccess” function, the compile-time overhead for oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%).

I think we should revise these string buffer operations in Polly-detect pass.

Best wishes,
Star Tan

Hi Sebastian,

Yes, you have pointed an important reason. If we comment this source code
you have listed, then the compile-time overhead for oggenc*8.ll can be
reduced from 40.5261 ( 51.2%) to 20.3100 ( 35.7%).

I just sent another mail to explain why polly-detect pass leads to
significant compile-time overhead. Besides the reason you have pointed,
another reason is resulted from those string buffer operations in "INVALID"
MACRO. If we comment both the string buffer operations in "INVALID" MACRO
and in the "isValidMemoryAccess" function, the compile-time overhead for
oggenc*8.ll would be reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%).

Awesome, thanks for the analysis.

Can you run again perf on the resulting program: I would still like to
understand
where we spend the 5.88s in the rest of scop detection.

I think we should revise these string buffer operations in Polly-detect
pass.

Right: we should incur no overhead in a non debug build.

Thanks,
Sebastian

The top ten functions reported by perf are:
+ 12.68% opt [kernel.kallsyms] [k] 0xc111472c
+ 9.40% opt libc-2.17.so [.] 0x0007875f
+ 2.98% opt opt [.] __x86.get_pc_thunk.bx
+ 2.42% opt [vdso] [.] 0x00000425
+ 1.46% opt libgmp.so.10.0.5 [.] __gmpn_copyi_x86
+ 1.11% opt libc-2.17.so [.] free
+ 1.07% opt opt [.] bool llvm::DenseMapBase<llvm::DenseMap<void const*, llvm::Pass*, llvm::DenseMapInfo<void const*>
+ 1.02% opt opt [.] llvm::ComputeMaskedBits(llvm::Value*, llvm::APInt&, llvm::APInt&, llvm::DataLayout const*, unsigna
+ 1.00% opt opt [.] llvm::Use::getImpliedUser() const
+ 0.90% opt libgmp.so.10.0.5 [.] __gmpz_set
+ 0.76% opt opt [.] llvm::SmallPtrSetImpl::insert_imp(void const*)
+ 0.74% opt opt [.] llvm::InstCombiner::DoOneIteration(llvm::Function&, unsigned int)
+ 0.73% opt opt [.] llvm::PassRegistry::getPassInfo(void const*) const
+ 0.72% opt libc-2.17.so [.] malloc
+ 0.72% opt opt [.] llvm::TimeRecord::getCurrentTime(bool)
+ 0.71% opt opt [.] llvm::ValueHandleBase::AddToUseList()
+ 0.57% opt opt [.] llvm::SlotTracker::processModule()
+ 0.52% opt opt [.] llvm::PMTopLevelManager::findAnalysisPass(void const*)
+ 0.51% opt opt [.] llvm::APInt::~APInt()
+ 0.51% opt libgmp.so.10.0.5 [.] __gmpz_mul

Unfortunately, I cannot set breakpoints for the top 2 functions.
Even with "perf -g", I still cannot track where time is spent on in Polly-detect pass. The "perf -g" results are like this:
- 12.68% opt [kernel.kallsyms] [k] 0xc111472c `
   - 0xc161984a a
   - 0xb7783424 a
      - 99.93% 0x9ab2000 a
         + 85.99% 0 a
         + 3.14% 0xb46d018 a
         + 1.42% 0xb7745000 a
   - 0xc1612415 a
      + 97.76% 0xc1062977 a
      + 1.75% 0xc1495888

Do you have some suggestions?

Best wishes,
Star Tan

Very nice analysis. I just tried it myself and can verify that for
oggenc 16x, your patch reduces the scop-detection time from 90 seconds
(80 %) to 0.5 seconds (2.5 %).

I think there are two problems:

   1) The cost of printing a single LLVM type/value increases with
      the size of the overall Module. This seems to be because
      TypeFinder::run() is called each time, without caching in place.
      The cost of TypeFinder::run() increases with the size of the
      module, as it basically just performs a scan on the entire Module.

   2) We are formatting the failure messages during normal compilation,
      even though they are only used in debugging tools like -view-scops

In terms of solutions:

It would be interesting to understand why is 1) so slow, especially as
it seems to be either a fundamental problem in LLVM IR printing or the
way we use the IR printing infrastructure. On the other side, for Polly
we need to solve 2) anyway. Even if formatting would be faster, we
should still not do it, if not needed. As we need to solve 2) anyway, 1)
will only hit us when we do debugging/formatting. I assume in case of
debugging the files we are looking into are normally smaller, such that
the formatting overhead will not be that big.

Hence, I would focus on 2). We could probably just put the code under a
NDEBUG ifndef, but I would actually like to keep them available even in
NDEBUG mode, as we may want to use the error messages to hint users to
why their code can not be optimized. For this and also to get rid of
another annoyance, the INVALID macro, I think we need to restructure the
reporting of the last error, such that formatting of the error messages
can be done on-demand. Another problem that could be solved at this
point is to remove the macro use, which hides the fact that the
functions return as soon as INVALID is called, which is plainly ugly.

I am not sure how to structure this, but I could imagine some small
class hierarchy that has a class for each error type. Each class just
stores pointers to the data structures it needs to format its error
message, but only formats the error on-demand. We could then return this
class in case of failure and return a NoError class or a NULL pointer in
case of success.

This change may also help us to later add support to keep track of all
errors we encounter (not just the first one). This is something Andreas
and Johannes found helpful earlier.

Cheers,
Tobias

g

I have found that the extremely expensive compile-time overhead comes from
the string buffer operation for "INVALID" MACRO in the polly-detect pass.
Attached is a hack patch file that simply remove the string buffer
operation. This patch file can significantly reduce compile-time overhead
when compiling big source code. For example, for oggen*8.ll, the compile
time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch
file.

Very nice analysis. I just tried it myself and can verify that for
oggenc 16x, your patch reduces the scop-detection time from 90 seconds
(80 %) to 0.5 seconds (2.5 %).

I think there are two problems:

  1) The cost of printing a single LLVM type/value increases with
     the size of the overall Module. This seems to be because
     TypeFinder::run() is called each time, without caching in place.
     The cost of TypeFinder::run() increases with the size of the
     module, as it basically just performs a scan on the entire Module.

  2) We are formatting the failure messages during normal compilation,
     even though they are only used in debugging tools like -view-scops

In terms of solutions:

It would be interesting to understand why is 1) so slow, especially as
it seems to be either a fundamental problem in LLVM IR printing or the
way we use the IR printing infrastructure.

I once analyzed it for GVN when I hit a similar issue (type finder
being rerun again and again while printing statements), and I came up
with the answer that the printing infrastructure does not expect to be
printing things regularly, and the typefinder only gets called so much
during debug printing, because of the call paths it goes through.

The basic cause is that Value::print creates AssemblyWriter objects
every time it is called. This in turn calls the type finder, for every
single call to Value::print. The type finder is being called to build
up the list of named types in the module, so that when it prints
things, it can use the right name for the type instead of an anonymous
name.

if you do the following:

diff --git a/lib/VMCore/AsmWriter.cpp b/lib/VMCore/AsmWriter.cpp
index 7ef1131..8a2206b 100644
--- a/lib/VMCore/AsmWriter.cpp
+++ b/lib/VMCore/AsmWriter.cpp
@@ -2118,7 +2118,7 @@ void Value::print(raw_ostream &ROS, AssemblyAnnotationWrit
   if (const Instruction *I = dyn_cast<Instruction>(this)) {
     const Function *F = I->getParent() ? I->getParent()->getParent() : 0;
     SlotTracker SlotTable(F);
- AssemblyWriter W(OS, SlotTable, getModuleFromVal(I), AAW);
+ AssemblyWriter W(OS, SlotTable, NULL, AAW);
     W.printInstruction(*I);
   } else if (const BasicBlock *BB = dyn_cast<BasicBlock>(this)) {
     SlotTracker SlotTable(BB->getParent());

(This is an old patch, AsmWriter.cpp is not in lib/VMCore anymore, but
you can see what is happening), you will not be shown named types in
the operand printing, but it will not be slow anymore.

This is of course, a complete hack just to work around the issue if
you need to get other work done.
The real fix is either to stop recreating these AssemblyWriter
objects, or improve caching in the bowels of it so that it doesn't
need to rerun typefinder again and again if nothing has changed.

Star Tan wrote:

I have found that the extremely expensive compile-time overhead comes from the string buffer operation for "INVALID" MACRO in the polly-detect pass.
Attached is a hack patch file that simply remove the string buffer operation. This patch file can significantly reduce compile-time overhead when compiling big source code. For example, for oggen*8.ll, the compile time is reduced from 40.5261 ( 51.2%) to 5.8813s (15.9%) with this patch file.

On top of your patch, I have removed from ScopDetection.cpp all printing of LLVM
values, like this:

- INVALID(AffFunc, "Non affine access function: " << *AccessFunction);
+ INVALID(AffFunc, "Non affine access function: ");

there are a good dozen or so of these pretty printing. With these changes the
compile time spent in ScopDetection drops dramatically to almost 0: here is the
longest running one in the compilation of an Android stack:

   2.1900 ( 13.7%) 0.0100 ( 7.7%) 2.2000 ( 13.6%) 2.2009 ( 13.4%) Polly - Detect static control parts (SCoPs)

Before these changes, the top most expensive ScopDetection time used to be a few
hundred of seconds.

Sebastian

Hi Sebastian,

I am slightly confused. The patch of Star Tan did the following:

  #define INVALID(NAME, MESSAGE) \
    do { \
- std::string Buf; \
- raw_string_ostream fmt(Buf); \
- fmt << MESSAGE; \
- fmt.flush(); \
- LastFailure = Buf; \
      DEBUG(dbgs() << MESSAGE); \
      DEBUG(dbgs() << "\n"); \
      assert(!Context.Verifying &&#NAME);

In my understanding, this patch alone removes all formatting overhead from the default execution. The only use of MESSAGE left is within the debug macro, which will only be evaluated if -debug is given.

I am surprised why you see further performance changes, by removing/changing the content of MESSAGE. As it is not evaluated, I do not see why this would change performance. Do you have any ideas what is going on.

I also tested for further performance differences on the oggenc benchmark and could not reproduce your results.

Cheers,
Tobias

Hi all,

I have attached a patch file to reduce the polly-detect overhead.

My idea is to avoid calling TypeFinder in Non-DEBUG mode, so TypeFinder is only called in DEBUG mode with the DEBUG macro.
This patch file did this work with following modifications:

First, it keeps most of string information by replacing “<<” with “+” operation. For example, code like this:
INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
would be converted into:
LastFailure = "Non branch instruction terminates BB:! " + BB.getName().str();

Second, it simplifies some complex operations like:

INVALID(AffFunc,
“Non affine branch in BB '” << BB.getName() << "’ with LHS: "
<< *LHS << " and RHS: " << *RHS);

into:
LastFailure = "Non affine branch in BB: " + BB.getName().str();

In such cases, some information for “LHS” and “RHS” are missed. However, it only has little affects on the variable “LastFailure”, while keeping all information for DEBUG information. Since the variable “LastFailure” is only used in ScopGraphPrinter, w! hich should only show critical information in graph, I hope such modif ication is acceptable.

Results show that it has almost the same performance improvements as my previous hack patch file, i.e., reducing the compile-time of Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc 16X.

Best wishes,
Star Tan

Postscript to Tobias:
I have also implemented your initial proposal, which uses some class hierarchy to show different Failure information. Unfortunately, I found it would significantly complicate the source code. It not only introduces a lot of dynamic object “new” and “copy” operations, but also makes source code hard to understand. I argue that very detailed information for the variable “LastFailure” is not essential because it should only show critical in! formation in the graph. If users want to know detailed failure information, they should refer to DEBUG information. This new patch file keeps most of critical information for “LastFailure” except some detailed information about Instruction and SCEV.
Do you have any suggestion?


[0001-ScopDetect-Optimize-compile-time-cost-for-string-ope.patch|attachment](upload://olPHo0NJ3hEY9aV8gEb0VQwB2Dk.patch) (16.4 KB)

Hi all,

I have attached a patch file to reduce the polly-detect overhead.

Great.

My idea is to avoid calling TypeFinder in Non-DEBUG mode, so
TypeFinder is only called in DEBUG mode with the DEBUG macro. This
patch file did this work with following modifications:

First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this:
     INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
would be converted into:
    LastFailure = "Non branch instruction terminates BB: " + BB.getName().str();

Second, it simplifies some complex operations like:
        INVALID(AffFunc,
               "Non affine branch in BB '" << BB.getName() << "' with LHS: "
                                           << *LHS << " and RHS: " << *RHS);
into:
       LastFailure = "Non affine branch in BB: " + BB.getName().str();

In such cases, some information for "LHS" and "RHS" are missed.

Yes. And unfortunately, we may also loose the 'name' of unnamed basic
blocks. Basic blocks without a name get formatted as %111 with '111'
being their sequence number. Your above change will not be able to
derive this number.

However, it only has little affects on the variable "LastFailure",
while keeping all information for DEBUG information.

Why is that? It seems the DEBUG output is the very same that gets
written to "LastFailure".

Since the
variable "LastFailure" is only used in ScopGraphPrinter, which should
only show critical information in graph, I hope such modification is
acceptable.

Why should we only show critical information? In the GraphPrinter we do
not worry about compile time so much, such that we can easily show
helpful information. We just need to make sure that we do not slow down
the compile-time in the generic case.

Results show that it has almost the same performance improvements as
my previous hack patch file, i.e., reducing the compile-time of
Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc
16X.

Great.

Postscript to Tobias:
  I have also implemented your initial proposal, which uses some class
  hierarchy to show different Failure information. Unfortunately, I
  found it would significantly complicate the source code. It not only
  introduces a lot of dynamic object "new" and "copy" operations, but
  also makes source code hard to understand. I argue that very
  detailed information for the variable "LastFailure" is not essential
  because it should only show critical information in the graph. If
  users want to know detailed failure information, they should refer
  to DEBUG information. This new patch file keeps most of critical
  information for "LastFailure" except some detailed information about
  Instruction and SCEV.

Interesting. This was also something I was afraid of. Passing new/delete
stuff through the scop detection is probably not what we want.

Do you have any suggestion?

I do.

Your patch goes in the right direction and it helped to get a better
idea of what should be done. I think the first point that we learned is
that passing class pointers around is probably too distrubing. I also
believe having the formatting of the error messages in the normal scop
detection is not great, as it both adds unrelated stuff to the code, but
also makes it harder to conditionally disable the error reporting. On
the other side, I believe it is good to make the 'return false'
explicit.

Hence, I propose to transform the code in something like the following:

Instead of

  if (checkSomething())
    INVALID(AffFunc, "Test" << SCEV <<);

we should get something like:

  if (checkSomething()) {
    reportInvalidAffFunc(SCEV);
    return false;
  }

The reportInvalidAffFunc is then either a NO-OP (during normal
execution) or it reports the error (in case we need it).

I am not yet fully sure how the reportInvalid* functions should look
like. However, the part of your patch that is easy to commit without
further discussion is to move the 'return false' out of the INVALID
macro. Meaning, translating the above code to:

  if (checkSomething()) {
    INVALID(AffFunc, "Test" << SCEV <<);
    return false;
  }

I propose to submit such a patch first and then focus on the remaining
problems.

Cheers,
Tobias

>On 07/21/2013 09:49 AM, Star Tan wrote:
>> Hi all,
>>
>>
>> I have attached a patch file to reduce the polly-detect overhead.
>
>Great.
>
>> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so
>> TypeFinder is only called in DEBUG mode with the DEBUG macro.  This
>> patch file did this work with following modifications:
>>
>>
>> First, it keeps most of string information by replacing  "<<" with "+" operation. For example, code like this:
>>      INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
>> would be converted into:
>>     LastFailure = "Non branch instruction terminates BB: " + BB.getName().str();
>>
>>
>> Second, it simplifies some complex operations like:
>>         INVALID(AffFunc,
>>                "Non affine branch in BB '" << BB.getName() << "' with LHS: "
>>                                            << *LHS << " and RHS: " << *RHS);
>> into:
>>        LastFailure = "Non affine branch in BB: " + BB.getName().str();
>
>
>> In such cases, some information for "LHS" and "RHS" are missed.
>
>Yes. And unfortunately, we may also loose the 'name' of unnamed basic
>blocks. Basic blocks without a name get formatted as %111 with '111'
>being their sequence number. Your above change will not be able to
>derive this number.
Yes, but it is much cheaper by using BB.getName().str(). 
Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder.
>
>> However, it only has little affects on the variable "LastFailure",
>> while keeping all information for DEBUG information.
>
>Why is that? It seems the DEBUG output is the very same that gets
>written to "LastFailure".
No, they are different. For example, the code like this:
        INVALID(AffFunc,
                "Non affine branch in BB '" << BB.getName() << "' with LHS: "
                                            << *LHS << " and RHS: " << *RHS);
would be converted to:
      LastFailure = "Non affine branch in BB: " + BB.getName().str();
       INVALID(AffFunc,
              LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS);
You see, information about LHS and RHS would be kept in INVALID DEBUG information.
To keep the information about unnamed basic blocks, I think we can rewrite it as:
      FailString = "Non affine branch in BB: ";
      INVALID(AffFunc,
              FailString << BB.getName() << "' with LHS: "
                         << *LHS << " and RHS: " << *RHS);
      LastFailure = FailString + BB.getName().str();
>
>> Since the
>> variable "LastFailure" is only used in ScopGraphPrinter, which should
>> only show critical information in graph, I hope such modification is
>> acceptable.
>
>Why should we only show critical information? In the GraphPrinter we do
>not worry about compile time so much, such that we can easily show
>helpful information. We just need to make sure that we do not slow down
>the compile-time in the generic case.
To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case?  If that is true, I think we should control those operations for "LastFailure" as follows:
if (In GraphPrinter mode) {
  LastFailure = ...
}
In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode.
>
>> Results show that it has almost the same performance improvements as
>> my previous hack patch file, i.e., reducing the compile-time of
>> Polly-detect pass from 90s (>80%) to 0.5s (2.5%) when compiling oggenc
>> 16X.
>
>Great.
>
>> Postscript to  Tobias:
>>   I have also implemented your initial proposal, which uses some class
>>   hierarchy to show different Failure information. Unfortunately, I
>>   found it would significantly complicate the source code. It not only
>>   introduces a lot of dynamic object "new" and "copy" operations, but
>>   also makes source code hard to understand. I argue that very
>>   detailed information for the variable "LastFailure" is not essential
>>   because it should only show critical information in the graph. If
>>   users want to know detailed failure information, they should refer
>>   to DEBUG information. This new patch file keeps most of critical
>>   information for "LastFailure" except some detailed information about
>>   Instruction and SCEV.
>
>Interesting. This was also something I was afraid of. Passing new/delete
>stuff through the scop detection is probably not what we want.
>
>> Do you have any suggestion?
>
>I do.
>
>Your patch goes in the right direction and it helped to get a better
>idea of what should be done. I think the first point that we learned is
>that passing class pointers around is probably too distrubing. I also
>believe having the formatting of the error messages in the normal scop
>detection is not great, as it both adds unrelated stuff to the code, but
>also makes it harder to conditionally disable the error reporting. On
>the other side, I believe it is good to make the 'return false'
>explicit.
>
>Hence, I propose to transform the code in something like the following:
>
>Instead of
>
>  if (checkSomething())
>    INVALID(AffFunc, "Test" << SCEV <<);
>
>we should get something like:
>
>  if (checkSomething()) {
>    reportInvalidAffFunc(SCEV);
>    return false;
>  }
>
>The reportInvalidAffFunc is then either a NO-OP (during normal
>execution) or it reports the error (in case we need it).
What do you mean with "Normal Execution"?  Does the GraphPrinter executes in "normal case"?
 If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of  "GraphPrinter" mode.
>
>I am not yet fully sure how the reportInvalid* functions should look
>like. However, the part of your patch that is easy to commit without
>further discussion is to move the 'return false' out of the INVALID
>macro. Meaning, translating the above code to:
>
>  if (checkSomething()) {
>    INVALID(AffFunc, "Test" << SCEV <<);
>    return false;
>  }
>
>I propose to submit such a patch first and then focus on the remaining
>problems.
Yes, I agree with you.  I have attached a patch file to move "return false" out of INVALID macro.
>
>Cheers,
>Tobias


Thanks,
Star Tan

0001-ScopDetect-move-return-false-out-of-INVALID-macro.patch (9 KB)

Hi all,

I have attached a patch file to reduce the polly-detect overhead.

Great.

My idea is to avoid calling TypeFinder in Non-DEBUG mode, so
TypeFinder is only called in DEBUG mode with the DEBUG macro. This
patch file did this work with following modifications:

First, it keeps most of string information by replacing "<<" with "+" operation. For example, code like this:
      INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
would be converted into:
     LastFailure = "Non branch instruction terminates BB: " + BB.getName().str();

Second, it simplifies some complex operations like:
         INVALID(AffFunc,
                "Non affine branch in BB '" << BB.getName() << "' with LHS: "
                                            << *LHS << " and RHS: " << *RHS);
into:
        LastFailure = "Non affine branch in BB: " + BB.getName().str();

In such cases, some information for "LHS" and "RHS" are missed.

Yes. And unfortunately, we may also loose the 'name' of unnamed basic
blocks. Basic blocks without a name get formatted as %111 with '111'
being their sequence number. Your above change will not be able to
derive this number.

Yes, but it is much cheaper by using BB.getName().str().
Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder.

Yes, that's the reason why it was used in the first place.

However, it only has little affects on the variable "LastFailure",
while keeping all information for DEBUG information.

Why is that? It seems the DEBUG output is the very same that gets
written to "LastFailure".

No, they are different. For example, the code like this:
         INVALID(AffFunc,
                 "Non affine branch in BB '" << BB.getName() << "' with LHS: "
                                             << *LHS << " and RHS: " << *RHS);
would be converted to:
       LastFailure = "Non affine branch in BB: " + BB.getName().str();
        INVALID(AffFunc,
               LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS);
You see, information about LHS and RHS would be kept in INVALID DEBUG information.
To keep the information about unnamed basic blocks, I think we can rewrite it as:
       FailString = "Non affine branch in BB: ";
       INVALID(AffFunc,
               FailString << BB.getName() << "' with LHS: "
                          << *LHS << " and RHS: " << *RHS);
       LastFailure = FailString + BB.getName().str();

Since the
variable "LastFailure" is only used in ScopGraphPrinter, which should
only show critical information in graph, I hope such modification is
acceptable.

Why should we only show critical information? In the GraphPrinter we do
not worry about compile time so much, such that we can easily show
helpful information. We just need to make sure that we do not slow down
the compile-time in the generic case.

To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case? If that is true, I think we should control those operations for "LastFailure" as follows:
if (In GraphPrinter mode) {
   LastFailure = ...
}
In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode.

Yes.

What do you mean with "Normal Execution"? Does the GraphPrinter executes in "normal case"?
If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of "GraphPrinter" mode.

It is not executed in normal mode. So yes, we should move all this under the condition of 'GraphPrintingMode'.

I am not yet fully sure how the reportInvalid* functions should look
like. However, the part of your patch that is easy to commit without
further discussion is to move the 'return false' out of the INVALID
macro. Meaning, translating the above code to:

  if (checkSomething()) {
    INVALID(AffFunc, "Test" << SCEV <<);
    return false;
  }

I propose to submit such a patch first and then focus on the remaining
problems.

Yes, I agree with you. I have attached a patch file to move "return false" out of INVALID macro.

Great. Committed in r186805.

I propose two more patches:

  1) Transform the INVALID macro into function calls, that format
            the text and that set LastFailure.

         2) Add checks at the beginning of those function calls and
            continue only if LogErrors is set

The second one is slightly more involved as we would like to turn this option on automatically either in -debug mode or if -polly-show or -polly-show-only is set.

What do you think? Does this sound right?

Cheers,
Tobias

>On 07/21/2013 07:33 PM, Star Tan wrote:
>>
>>> On 07/21/2013 09:49 AM, Star Tan wrote:
>>>> Hi all,
>>>>
>>>>
>>>> I have attached a patch file to reduce the polly-detect overhead.
>>>
>>> Great.
>>>
>>>> My idea is to avoid calling TypeFinder in Non-DEBUG mode, so
>>>> TypeFinder is only called in DEBUG mode with the DEBUG macro.  This
>>>> patch file did this work with following modifications:
>>>>
>>>>
>>>> First, it keeps most of string information by replacing  "<<" with "+" operation. For example, code like this:
>>>>       INVALID(CFG, "Non branch instruction terminates BB: " + BB.getName());
>>>> would be converted into:
>>>>      LastFailure = "Non branch instruction terminates BB: " + BB.getName().str();
>>>>
>>>>
>>>> Second, it simplifies some complex operations like:
>>>>          INVALID(AffFunc,
>>>>                 "Non affine branch in BB '" << BB.getName() << "' with LHS: "
>>>>                                             << *LHS << " and RHS: " << *RHS);
>>>> into:
>>>>         LastFailure = "Non affine branch in BB: " + BB.getName().str();
>>>
>>>
>>>> In such cases, some information for "LHS" and "RHS" are missed.
>>>
>>> Yes. And unfortunately, we may also loose the 'name' of unnamed basic
>>> blocks. Basic blocks without a name get formatted as %111 with '111'
>>> being their sequence number. Your above change will not be able to
>>> derive this number.
>> Yes, but it is much cheaper by using BB.getName().str().
>> Currently, I cannot find a better way to derive unnamed basic block without calling TypeFinder.
>
>Yes, that's the reason why it was used in the first place.
>
>>>> However, it only has little affects on the variable "LastFailure",
>>>> while keeping all information for DEBUG information.
>>>
>>> Why is that? It seems the DEBUG output is the very same that gets
>>> written to "LastFailure".
>> No, they are different. For example, the code like this:
>>          INVALID(AffFunc,
>>                  "Non affine branch in BB '" << BB.getName() << "' with LHS: "
>>                                              << *LHS << " and RHS: " << *RHS);
>> would be converted to:
>>        LastFailure = "Non affine branch in BB: " + BB.getName().str();
>>         INVALID(AffFunc,
>>                LastFailure << "' with LHS: " << *LHS << " and RHS: " << *RHS);
>> You see, information about LHS and RHS would be kept in INVALID DEBUG information.
>> To keep the information about unnamed basic blocks, I think we can rewrite it as:
>>        FailString = "Non affine branch in BB: ";
>>        INVALID(AffFunc,
>>                FailString << BB.getName() << "' with LHS: "
>>                           << *LHS << " and RHS: " << *RHS);
>>        LastFailure = FailString + BB.getName().str();
>>>
>>>> Since the
>>>> variable "LastFailure" is only used in ScopGraphPrinter, which should
>>>> only show critical information in graph, I hope such modification is
>>>> acceptable.
>>>
>>> Why should we only show critical information? In the GraphPrinter we do
>>> not worry about compile time so much, such that we can easily show
>>> helpful information. We just need to make sure that we do not slow down
>>> the compile-time in the generic case.
>> To my knowledge, all of those expensive operations are only used for the variable "LastFailure", which is only used in GraphPrinter. Do you mean the Graph Printer does not execute in the generic case?  If that is true, I think we should control those operations for "LastFailure" as follows:
>> if (In GraphPrinter mode) {
>>    LastFailure = ...
>> }
>> In this way, we can completely remove those operations for "LastFailure" in the generic case except in GraphPrinter mode.
>
>Yes.
>
>
>> What do you mean with "Normal Execution"?  Does the GraphPrinter executes in "normal case"?
>> If GraphPrinter is not executes in "normal case", we should move all operations for "LastFailure" under the condition of  "GraphPrinter" mode.
>
>It is not executed in normal mode. So yes, we should move all this under 
>the condition of 'GraphPrintingMode'.
>
>>> I am not yet fully sure how the reportInvalid* functions should look
>>> like. However, the part of your patch that is easy to commit without
>>> further discussion is to move the 'return false' out of the INVALID
>>> macro. Meaning, translating the above code to:
>>>
>>>   if (checkSomething()) {
>>>     INVALID(AffFunc, "Test" << SCEV <<);
>>>     return false;
>>>   }
>>>
>>> I propose to submit such a patch first and then focus on the remaining
>>> problems.
>> Yes, I agree with you.  I have attached a patch file to move "return false" out of INVALID macro.
>
>Great. Committed in r186805.
>
>I propose two more patches:
>
>	1) Transform the INVALID macro into function calls, that format
>            the text and that set LastFailure.
Translating the INVALID macro into function calls would complicate the operations for different counters.
For example, currently users can add a new counter by simply adding the following line:
BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form");
But if we translate INVALID macro into function calls, then users has to add a function for this counter:
void INVLIAD_SimpleLoop (...).
This is because we cannot use the following macro combination in function calls:
    if (!Context.Verifying)         !
              \
      ++Bad##NAME##ForScop; 
So, I do not think it is necessary to translate the INVALID macro into function calls.
Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ...  ? In that case, I could provide a small patch file for this purpose -:)

>         2) Add checks at the beginning of those function calls and
>            continue only if LogErrors is set
Those invalid log strings are used for two separate cases:
1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement:
if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE)) 
That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it.

I propose two more patches:

  1) Transform the INVALID macro into function calls, that format
            the text and that set LastFailure.

Translating the INVALID macro into function calls would complicate the operations for different counters.
For example, currently users can add a new counter by simply adding the following line:
BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form");
But if we translate INVALID macro into function calls, then users has to add a function for this counter:
void INVLIAD_SimpleLoop (...). \

        ^^ No uppercase in function names.

This is because we cannot use the following macro combination in function calls:
     if (!Context.Verifying) \
       ++Bad##NAME##ForScop;
So, I do not think it is necessary to translate the INVALID macro into function calls.
Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ... ? In that case, I could provide a small patch file for this purpose -:slight_smile:

I think it would still be nice to get rid of this macro. We could probably have a default function that takes an enum to report different
errors in the reportInvalid(enum errorKind) style. And then several others that would allow more complex formatting (e.g. reportInvalidAlias(AliasSet)). Especially the code after 'isMustAlias()' would be nice to move out of the main scop detection.

However, this issue is not directly related to the speedup work, so
you are welcome to skip it for now.

(Btw. thanks for not blindly following my suggestions!)

         2) Add checks at the beginning of those function calls and
            continue only if LogErrors is set

Those invalid log strings are used for two separate cases:
1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement:
if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE))
That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it.
2) The other case is for the variable "LastFailure", which is used only in GraphPrinter. Currently string operations for "LastFailure" always execute in normal cases. My idea is to put such string operations under the condition of "GraphPrinter" mode. For example, I would like to translate the "INVALID" macro into:
#define INVALID(NAME, MESSAGE) \
   do { \
     if (GraphViewMode) { \
       std::string Buf; \
       raw_string_ostream fmt(Buf); \
       fmt << MESSAGE; \
       fmt.flush(); \
       LastFailure = Buf; \
     } \
     DEBUG(dbgs() << MESSAGE); \
     DEBUG(dbgs() << "\n"); \
     assert(!Context.Verifying &&#NAME); \
     if (!Context.Verifying) \
       ++Bad##NAME##ForScop; \
   } while (0)

Looks good.

As you have suggested, we can construct the condition GraphViewMode with "-polly-show", "-polly-show-only", "polly-dot" and "polly-dot-only". However, I see all these options are defined as "static" variables in lib/RegisterPasses.cpp. Do you think I should translate these local variables into global variables or should I define another option like "-polly-dot-scop" in ScopDetection.cpp?

You can define a new option -polly-detect-collect-errors that enables the error tracking. Adding cl::location to this option allows you to
store the option value externally. You can use this to automatically
set this option, in case in lib/RegisterPasses.cpp in case -polly-show, -polly-show-only, ... have been set.

Cheers,
Tobias

Hi Tobias,

I have attached a patch file to optimize string operations in Polly-Detect pass.
In this patch file, I put most of long string operations in the condition variable of “PollyViewMode” or in the DEBUG mode.

Bests,
Star Tan

>On 07/22/2013 01:46 AM, Star Tan wrote:
>>> I propose two more patches:
>>>
>>> 	1) Transform the INVALID macro into function calls, that format
>>>             the text and that set LastFailure.
>> Translating the INVALID macro into function calls would complicate the operations for different counters.
>> For example, currently users can add a new counter by simply adding the following line:
>> BADSCOP_STAT(SimpleLoop, "Loop not in -loop-simplify form");
>> But if we translate INVALID macro into function calls, then users has to add a function for this counter:
>> void INVLIAD_SimpleLoop (...).                                                    \
>
>        ^^ No uppercase in function names.
>
>> This is because we cannot use the following macro combination in function calls:
>>      if (!Context.Verifying)                      \
>>        ++Bad##NAME##ForScop;
>> So, I do not think it is necessary to translate the INVALID macro into function calls.
>> Do you still think we should translate INVALID macro into a serial of functions like "invalid_CFG, invalid_IndVar, invalid_IndEdge, ...  ? In that case, I could provide a small patch file for this purpose -:)
>
>I think it would still be nice to get rid of this macro. We could 
>probably have a default function that takes an enum to report different
>errors in the reportInvalid(enum errorKind) style. And then several 
>others that would allow more complex formatting (e.g. 
>reportInvalidAlias(AliasSet)). Especially the code after 'isMustAlias()' 
>would be nice to move out of the main scop detection.
>
>However, this issue is not directly related to the speedup work, so
>you are welcome to skip it for now.
>
>(Btw. thanks for not blindly following my suggestions!)
>
>>>          2) Add checks at the beginning of those function calls and
>>>             continue only if LogErrors is set
>> Those invalid log strings are used for two separate cases:
>> 1) The first case is for detailed debugging, which is controlled by the macro DEBUG(dbgs() << MESSAGE). In such a case, string operations will automatically skipped in normal execution mode with the following if-statement:
>> if (::llvm::DebugFlag && ::llvm::isCurrentDebugType(TYPE))
>> That means string operations controlled by DEBUG will not execute in normal case, so we should not worry about it.
>> 2) The other case is for the variable "LastFailure", which is used only in GraphPrinter. Currently string operations for "LastFailure" always execute in normal cases.  My idea is to put such string operations under the condition of "GraphPrinter" mode. For example, I would like to translate the "INVALID" macro into:
>> #define INVALID(NAME, MESSAGE)                                                 \
>>    do {                                                                         \
>>      if (GraphViewMode) {                                                       \
>>        std::string Buf;                                                         \
>>        raw_string_ostream fmt(Buf);                                             \
>>        fmt << MESSAGE;                                                          \
>>        fmt.flush();                                                             \
>>        LastFailure = Buf;                                                       \
>>      }                                                                          \
>>      DEBUG(dbgs() << MESSAGE);                                                  \
>>      DEBUG(dbgs() << "\n");                                                     \
>>      assert(!Context.Verifying &&#NAME);                                        \
>>      if (!Context.Verifying)                                                    \
>>        ++Bad##NAME##ForScop;                                                    \
>>    } while (0)
>
>Looks good.
>
>> As you have suggested, we can construct the condition GraphViewMode with "-polly-show", "-polly-show-only", "polly-dot" and "polly-dot-only". However, I see all  these options  are defined as "static" variables in lib/RegisterPasses.cpp.  Do you think I should translate these local variables into global variables or should I define another option like "-polly-dot-scop" in ScopDetection.cpp?
>
>You can define a new option -polly-detect-collect-errors that enables 
>the error tracking. Adding cl::location to this option allows you to
>store the option value externally. You can use this to automatically
>set this option, in case in lib/RegisterPasses.cpp in case -polly-show, 
>-polly-show-only, ... have been set.
>
>Cheers,
>Tobias

0001-ScopDetect-Optimize-compile-time-cost-for-log-string.patch (9.26 KB)

Hi Tobias,

I have attached a patch file to optimize string operations in Polly-Detect pass.
In this patch file, I put most of long string operations in the condition variable of "PollyViewMode" or in the DEBUG mode.

OK.

From 448482106e8d815afa40e4ce8543ba3f6f0237f1 Mon Sep 17 00:00:00 2001
From: Star Tan <tanmx_star@yeah.net>
Date: Mon, 22 Jul 2013 23:48:45 -0700
Subject: [PATCH] ScopDetect: Optimize compile-time cost for log string
operations.

String operatioins resulted by raw_string_ostream in the INVALID macro can lead
to significant compile-time overhead when compiling large size source code.
This is because raw_string_ostream relies on TypeFinder class, whose
compile-time cost increases as the size of the module increases. This patch
targets to avoid calling TypeFinder in normal case, so TypeFinder class is only
called in DEBUG mode with DEBUG macro or in PollyView mode.

With this patch file, the relative compile-time cost of Polly-detect pass does
not increase even compiling very large size source code.
---
include/polly/Options.h | 3 ++
include/polly/ScopDetection.h | 6 +++
lib/Analysis/ScopDetection.cpp | 93 +++++++++++++++++++++++-------------------
lib/RegisterPasses.cpp | 22 ++++++----
4 files changed, 75 insertions(+), 49 deletions(-)

diff --git a/include/polly/Options.h b/include/polly/Options.h
index 62e0960..733edd0 100644
--- a/include/polly/Options.h
+++ b/include/polly/Options.h
@@ -17,4 +17,7 @@
#include "llvm/Support/CommandLine.h"

extern llvm::cl::OptionCategory PollyCategory;
+namespace polly {
+ extern bool PollyViewMode;
+}
#endif
diff --git a/include/polly/ScopDetection.h b/include/polly/ScopDetection.h
index 6ee48ee..5a5d7d1 100755
--- a/include/polly/ScopDetection.h
+++ b/include/polly/ScopDetection.h
@@ -145,6 +145,12 @@ class ScopDetection : public FunctionPass {
   /// @return True if the call instruction is valid, false otherwise.
   static bool isValidCallInst(CallInst &CI);

+ /// @brief Report invalid alias.
+ ///
+ /// @param AS The alias set.
+ /// @param Context The context of scop detection.
+ void reportInvalidAlias(AliasSet &AS, DetectionContext &Context) const;

Nice.

diff --git a/lib/Analysis/ScopDetection.cpp b/lib/Analysis/ScopDetection.cpp
index 9b2a9a8..4f33f6c 100644
--- a/lib/Analysis/ScopDetection.cpp
+++ b/lib/Analysis/ScopDetection.cpp
@@ -108,11 +108,13 @@ STATISTIC(ValidRegion, "Number of regions that a valid part of Scop");

#define INVALID(NAME, MESSAGE) \
   do { \
- std::string Buf; \
- raw_string_ostream fmt(Buf); \
- fmt << MESSAGE; \
- fmt.flush(); \
- LastFailure = Buf; \
+ if (PollyViewMode) { \

I believe this variable should describe what we do, rather than if a
a certain user of this feature is enabled.

Something like

if (TrackFailures) {

}

+void ScopDetection::reportInvalidAlias(AliasSet &AS,
+ DetectionContext &Context) const {

It is great that you extracted this function.

+ std::string Message;
+ raw_string_ostream OS(Message);
+
+ if (PollyViewMode || ::llvm::DebugFlag) {

This is a little unsatisfying. We now have two conditions that need to
be in sync. I think you can avoid this, if you rename the function to

std::string ScopDetection::formatInvalidAlias(AliasSet &AS)

and keep the INVALID_ macro at the place where the error happens.

diff --git a/lib/RegisterPasses.cpp b/lib/RegisterPasses.cpp
index 7fc0960..2e25e4d 100644
--- a/lib/RegisterPasses.cpp
+++ b/lib/RegisterPasses.cpp
@@ -125,28 +125,34 @@ static cl::opt<bool> DeadCodeElim("polly-run-dce",
                                   cl::Hidden, cl::init(false), cl::ZeroOrMore,
                                   cl::cat(PollyCategory));

-static cl::opt<bool>
+bool polly::PollyViewMode;
+
+static cl::opt<bool, true>
PollyViewer("polly-show",
             cl::desc("Highlight the code regions that will be optimized in a "
                      "(CFG BBs and LLVM-IR instructions)"),
- cl::init(false), cl::ZeroOrMore, cl::cat(PollyCategory));
+ cl::location(polly::PollyViewMode), cl::init(false), cl::ZeroOrMore,
+ cl::cat(PollyCategory));

-static cl::opt<bool>
+static cl::opt<bool, true>
PollyOnlyViewer("polly-show-only",
                 cl::desc("Highlight the code regions that will be optimized in "
                          "a (CFG only BBs)"),
- cl::init(false), cl::cat(PollyCategory));
+ cl::location(polly::PollyViewMode), cl::init(false),
+ cl::cat(PollyCategory));

-static cl::opt<bool>
+static cl::opt<bool, true>
PollyPrinter("polly-dot", cl::desc("Enable the Polly DOT printer in -O3"),
              cl::Hidden, cl::value_desc("Run the Polly DOT printer at -O3"),
- cl::init(false), cl::cat(PollyCategory));
+ cl::location(polly::PollyViewMode), cl::init(false),
+ cl::cat(PollyCategory));

-static cl::opt<bool> PollyOnlyPrinter(
+static cl::opt<bool, true> PollyOnlyPrinter(
     "polly-dot-only",
     cl::desc("Enable the Polly DOT printer in -O3 (no BB content)"), cl::Hidden,
     cl::value_desc("Run the Polly DOT printer at -O3 (no BB content"),
- cl::init(false), cl::cat(PollyCategory));
+ cl::location(polly::PollyViewMode), cl::init(false),
+ cl::cat(PollyCategory));

Sorry. Having all options storing their value in the very same location
does not look right.

When I was talking about cl::location() I rather ment that you introduce
a new option -polly-detect-track-failures that can also be set by an
cl::location.

Another alternative is that you add a parameter to Pass
*polly::createScopDetectionPass() { return new ScopDetection(); } which
can be used to enable the failure tracking.

Cheers,
Tobias

Is there any way to make the debug messages themselves more efficient?
Perhaps reimplementing it so that any table lookups are quick and no malloc/free is done in the process.

James

Hi Tobias,

I have attached a patch file to optimize string operations in
Polly-Detect pass.
In this patch file, I put most of long string operations in the condition
variable of "PollyViewMode" or in the DEBUG mode.

OK.

Hi James,

Is there any way to make the debug messages themselves more efficient?

Yes, there are two ways:

1) Use the getName() function instead of the ostream writer

This is significantly faster than what we have today, but does not format unnamed instructions correctly (It does not create the %123 instruction namings). Even though this is fast, it is probably still
slower than not doing any formatting at all.

2) Fix the AssemblyWriter as decribed by Daniel Berlin

There seems to be a larger issue in the printing infrastructure. It does
not seem to be written to print values many times.

Perhaps reimplementing it so that any table lookups are quick and no
malloc/free is done in the process.

Citing Daniel:

"The real fix is either to stop recreating these AssemblyWriter
objects, or improve caching in the bowels of it so that it doesn't
need to rerun typefinder again and again if nothing has changed."

So there is something that needs to be fixed when using the ostream formatter as we do. However, I do not think we should do this in this patch. Doing any kind of debug message formatting in the normal pass is unnecessary overhead that we should not have at all. The change Star proposes, moves this overhead out of the hot path, such that for debug messages we can now prioritize understandable formatting over performance.

Cheers,
Tobias