A case where llvm created different cfg for same code

Message: 4
Date: Tue, 12 Aug 2008 13:23:52 -0700
From: "Bill Wendling" <isanbard@gmail.com>
Subject: Re: [LLVMdev] A case where llvm created different cfg for
  same code
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Message-ID:
  <16e5fdf90808121323g1ae2a2e3lb6c5bd62521df621@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

> Hi,
>
> The following two segments of code are actually the same,
> but llvm created different cfg for them.
>
    

...
  

>
> The prime difference is that: cfg of form2 has additional basic block
> which has a back edge to a non-header-block
> I think the loop in that cfg is not canonical.
>
> I tried -loopsimplify and -indvars , but no improvement.
>
> Any comments for this? Thanks in advance.
>
    

The code is not the same. 7 is commented out in the first and 8 in the
second. Why would you expect the CFGs to match? Is there something
wrong (inefficient) in the resulting output of these two functions
when compiled?
  

7 for(i=0; i<j && i+j+1<N; i++) {

8 for(i=0; i<j && i<N-j-1; i++) {

Line 7 and Line 8 actually have the same expression,
Line 8 just move the "j+1" to the right hand of the inequation.

I just wonder why the two equal inequation result in different cfg, the
additional basicblock, backedge...
Surely the two cfg are all correct, but just the additional backedge
makes it hard for loop pipeline.

Can we improve it?

The prime difference is that: cfg of form2 has additional basic block
which has a back edge to a non-header-block
I think the loop in that cfg is not canonical.

I tried -loopsimplify and -indvars , but no improvement.

Any comments for this? Thanks in advance.

The code is not the same. 7 is commented out in the first and 8 in the
second. Why would you expect the CFGs to match? Is there something
wrong (inefficient) in the resulting output of these two functions
when compiled?

7 for(i=0; i<j && i+j+1<N; i++) {

8 for(i=0; i<j && i<N-j-1; i++) {

Line 7 and Line 8 actually have the same expression,

The expressions are logically similar, but they aren't the same to the compiler unless some pass that recognizes these types of polynomial equations and can prove that they are the same is run (I'm not sure if such a pass exists in LLVM). For instance, because you use "j - 1" in 8, you now have a situation where you have a negative number when j == 0, etc.

Line 8 just move the "j+1" to the right hand of the inequation.

I just wonder why the two equal inequation result in different cfg, the
additional basicblock, backedge...
Surely the two cfg are all correct, but just the additional backedge
makes it hard for loop pipeline.

Can we improve it?

For what it's worth, I got the exact same code for both forms when compiling at -O3.

-bw

Hi,

7 for(i=0; i<j && i+j+1<N; i++) {

8 for(i=0; i<j && i<N-j-1; i++) {

the arithmetic might overflow in one of these
but not in the other.

Best wishes,

Duncan.

These expressions overflow in different cases.
They are not equivalent.

Duncan Sands 写道:

Hi,

7 for(i=0; i<j && i+j+1<N; i++) {

8 for(i=0; i<j && i<N-j-1; i++) {
    
the arithmetic might overflow in one of these
but not in the other.
  

ok, finally I found the reason.

First, the llvm-gcc with -O0 will emit same cfg for the two forms of codes.
In the loop of Line 7/8, the "i<j" and "i+j+1<N or i < N-j-1" each occupies one basic block with a exit edge to the same successor basic block.
hence the loop has two exit edges.

Several passes will optimize the cfg and the ll code.

The final difference of the two cfg is due to pass licm and simplifycfg.
The licm will move the common expression out of the loop, like "j+1" and "N-j-1", thus the basic block for "i < N-j-1" will finally has only two instruction: the cmp and the branchInst. basic block for "i+j+1<N" has an addtional "adder" in it.

The pass simplifycfg has a process function :
static bool FoldBranchToCommonDest(BranchInst *BI)
Its comments :
"/// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a branch,
/// and if a predecessor branches to us and one of our successors, fold the
/// setcc into the predecessor and use logical operations to pick the right
/// destination."
So, as the basicblock for "i < N -j-1" has just has cmp and branchinst, it will be folded, and hence get a better cfg.
The basicblock for "i+j+1<N", as having additional "adder", leaves.

So my question is: can we loose the constraint "ONLY a setcc and a branch" ? why it's necessary?