How to gather data dependences

I’m currently trying to develop new LLVM Pass that will generate
simple data dependencies graph. For now I’m trying to get familiar
with DependenceAnalysis.
My general idea is to traverse each function (runOnFunction)
top to bottom Instruction by Instruction, using DA.depends( I, I2, …)
on every Instructions combination in function to check if they are
dependent on any others.

Problem is that almost all (if not all) Instructions seems to be dependent
on others even if they write/read to/from different memory cells.
Also I’m getting Dependence (confused) object, not the FullDependence,
should i try to dynamically cast it to FullDependence, or is there a way
to determine which Dependence instance i got other way?

Please help, I’m not an C++ nor LLVM programmer, but PHP-developer,
however i need it done asap. If you can help just a little, give me a hint,
i’ll be very, very thankful for ANY support.

Best Regards
Valmico

Hi,

The DependenceAnalysis pass isn’t reliable yet; it has several errors that need to be corrected. These manifest by the analysis claiming there’s no dependence when one in fact exists.

Your proposed scheme of testing every pair of instructions is asymptotically expensive, requiring O(n^2) tests, where each test is already fairly expensive. I describe (or start to describe) a better approach here.

In the meantime, you should be getting better results than you report. You’ll want to use several other passes in conjunction with DA. Try something like

opt -basicaa -mem2reg -simplifycfg -loop-simplify -loop-rotate -simplifycfg -instcombine -indvars`` -da

Also, a confused dependences mean that the analysis wasn’t able to prove anything; a FullDependence means the analysis was able to prove some facts, though it wasn’t actually able to disprove the dependence. It’s not reasonable to cast a Dependence to a FullDependence.

Preston

Hi,

I assume that DA is tool capable to give me an list of Dependencies in code
or at least a easy way to test instructions without much knowledge about
ZIV SIV, and other tests that has to be done find out dependencies?
Personally i'm an PHP programmer and i feel lack of proper (at least for
me) documentation with good use examples about LLVM, and it's modules like
analysis, transforms etc.

I agree that testing every pair is asymptotically expensive, requiring
O(n^2), but it doesnt matter for me for now.

This is code i came up for now:

virtual bool runOnFunction(Function &F) {

    DependenceAnalysis &DA = this->getAnalysis<DependenceAnalysis>();

    std::vector<Instruction*> instructions;

    for( inst_iterator I = inst_begin(F); I != inst_end(F); I++ ) {
        if( isa<StoreInst>( *I ) || isa<LoadInst>( *I ) ) {
            errs() << "Instruction " << *I << '\n';

            for(std::vector<Instruction*>::iterator it =
instructions.begin(); it != instructions.end(); it++) {
                Dependence *dep = DA.depends(&*I,*it,false);
                if( dep == NULL ) {
                    //errs() << " no dependence." << '\n';
                } else if( dep->isConfused() == true ) {
                    if( dep->isInput() ) {
                        errs() << '\t' << **it << " input dependence." <<
'\n';
                    } else if( dep->isOutput() ) {
                        errs() << '\t' << **it << " output dependence." <<
'\n';
                    } else if( dep->isFlow() ) {
                        errs() << '\t' << **it << " flow dependence." <<
'\n';
                    } else if( dep->isAnti() ) {
                        errs() << '\t' << **it << " anti dependence." <<
'\n';
                    }
                } else if( dep->isConfused() == false ) {
                    errs() << "# full dependence." << '\n';
                    //if( FullDependence *FDep = cast<FullDependence>( dep
) ) {
                    // errs() << " full dep casted" << '\n';
                    //}
                }
            }

            instructions.push_back(&*I);
        }
    }
}

Most likely i'm moving totally in wrong direction, but thats what happens
when people do sth completly blindly. Please dont' bother performance
concerns, it's my least problem.

I assume that DA is tool capable to give me an list of Dependencies in code
or at least a easy way to test instructions without much knowledge about ZIV SIV,
and other tests that has to be done find out dependencies?

DA exists to support building a dependence graph.
It checks to see if two instructions may refer to the same location
in memory, working hard to understand array references.
Someday the code to build a dependence graph will get written,
but it doesn’t exist yet.

The weakness with your scheme, besides performance, is that
it may report a dependence between two instructions that are
never both executed. For example

void zap(…) {

if (…)

A[0] = …

else

A[0] = …

}

your approach will claim there’s an output dependence between the two stores to A,
but in reality, code that executes one will never execute the other, so there
can be no dependence.

This is code i came up for now:

virtual bool runOnFunction(Function &F) {

DependenceAnalysis &DA = this->getAnalysis();

std::vector<Instruction*> instructions;

for( inst_iterator I = inst_begin(F); I != inst_end(F); I++ ) {
if( isa( *I ) || isa( *I ) ) {
errs() << "Instruction " << *I << ‘\n’;

for(std::vector<Instruction*>::iterator it = instructions.begin(); it != instructions.end(); it++) {
Dependence *dep = DA.depends(&*I,*it,false);
if( dep == NULL ) {
//errs() << " no dependence." << ‘\n’;
} else if( dep->isConfused() == true ) {
if( dep->isInput() ) {
errs() << ‘\t’ << **it << " input dependence." << ‘\n’;
} else if( dep->isOutput() ) {
errs() << ‘\t’ << **it << " output dependence." << ‘\n’;
} else if( dep->isFlow() ) {
errs() << ‘\t’ << **it << " flow dependence." << ‘\n’;
} else if( dep->isAnti() ) {
errs() << ‘\t’ << **it << " anti dependence." << ‘\n’;
}
} else if( dep->isConfused() == false ) {
errs() << “# full dependence.” << ‘\n’;
//if( FullDependence *FDep = cast( dep ) ) {
// errs() << " full dep casted" << ‘\n’;
//}
}
}

instructions.push_back(&*I);
}
}
}

I’m sure an LLVM stud to do a cleaner job, but your code looks plausible.
I’ll note that anything you do for a confused dependence makes sense for a full dependence.
So you can test for input, output, flow, etc.
If you look at the code in lib/Analysis/DependenceAnalysis.cpp
you can see how I exercise DA (in dumpExampleDependence)
and how I print the info associated with a dependence (in Dependence::dump)

Preston

Hello, thanks to your advices now my pass is on good way (i hope), but i’ve faced one problem that i cannot solve by myself:
Running all these passes (-basicaa -mem2reg -simplifycfg -loop-simplify -loop-rotate -simplifycfg -instcombine -indvars -da) helped a lot, but now i’m unable to find dependencies that are outside of the loop. f.eg. code like this returns no dependencies (and no store/load instruction at all).

int nonLoopDep( int A, int B, int C ) {
A = B + C;
B = A / C;
return B;
}

I’ve figured out that removing -mem2reg and -instcombine helps a little but it reports more dependencies than it should and also it messes up looking for dependencies inside loops completly.

The dependence analyzer helps with array references in loops.
The example you show,

int nonLoopDep( int A, int B, int C ) {
A = B + C;
B = A / C;
return B;
}

has no loops and no array refs, so dependence analysis won’t help.
But that’s ok, because these cases (scalar references) are particularly
simple and well understood by the rest of the compiler.

Unfortunately, there’s more for you to learn and I don’t really know
all the answers (in LLVM terms). LLVM converts everything to SSA form
(static single assignment), making it easy to chase from the use of a variable
to its definitions.

If we put your example in a file called z.c
and compile it like this,

clang -O -S -emit-llvm z.c

we’ll get a file z.s with the IL after optimization:

define i32 @nonLoopDep(i32 %A, i32 %B, i32 %C) #0 {

entry:

%add = add nsw i32 %C, %B

%div = sdiv i32 %add, %C

ret i32 %div

}

Note that each name (%A, %B, %C, %add, and %div) has exactly
one definition but may have multiple uses (e.g., %C).
If you zip through the code and record the location of each definition,
then whenever you see a use, you can immediately see where it was defined.

Note that the procedure entrance serves as a definition point for the parameters %A, %B, and %C.
Note also that your original variable B had 2 distinct definitions, and the compiler has
recognized this and given one of them a new name, %div.

It’s a little more complicated when multiple definitions can reach a single use.
For example, if we compile this code

int dumb( int A, int B, int C ) {

if (A == B)

A = C + 5;

else

A = C / 5;

return A + B;

}

we get

define i32 @dumb(i32 %A, i32 %B, i32 %C) #0 {

entry:

%cmp = icmp eq i32 %A, %B

br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry

%add = add nsw i32 %C, 5

br label %if.end

if.else: ; preds = %entry

%div = sdiv i32 %C, 5

br label %if.end

if.end: ; preds = %if.else, %if.then

%A.addr.0 = phi i32 [ %add, %if.then ], [ %div, %if.else ]

%add1 = add nsw i32 %A.addr.0, %B

ret i32 %add1

}

Note that 2 different definitions of A reach the use of A after the if statement.
In the LLVM IR, this situation is handled with a phi statement.
So what was originally A has become %A, %add, %div, and %A.addr.0
where the last is defined by a phi serving to join two of the names.

It’s not actually very complicated to use; if you play with a few more examples
(be sure to try some loops) and trace out all the use-def connections,
I think it’ll become clear.

Preston