Some small changes to the memory dependence analysis API

Hey folks , I was thinking of making some small, optional-for-callers
only changes to the memdep API, and before I did it, thought i'd post
here and make sure nobody thought it was egregiously stupid.

There are two related issues i'm trying to solve, both of which occur
in uses like GVN, where we are trying to prove equivalence between a
load/store/call with some other load/store/call:

1. Memdep sometimes walks back too far, making it slow.
2. Memdep sometimes walks back too far, making it add dependencies
that are confusing to GVN.

As a motivating example, take this wonderful piece of code from the real world:

_ZN4llvm4castINS_8CallInstEPKNS_5ValueEEENS_10cast_rettyIT_T0_E8ret_typeERKS7_.exit.i.i.i.i.i.i.i.i.i.i.i:
; preds = %entry
  %1 = getelementptr inbounds %"class.llvm::Instruction"* %I, i64 0,
i32 0, i32 0
  %arrayidx.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = getelementptr inbounds
%"class.llvm::Value"* %1, i64 -1, i32 4
  %2 = load %"class.llvm::Type"**
%arrayidx.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i, align 8
  %SubclassID.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = getelementptr
inbounds %"class.llvm::Type"* %2, i64 0, i32 1
  %3 = bitcast i32* %SubclassID.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i to i8*
  %4 = load i8* %3, align 1, !tbaa !0
  %cmp.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i = icmp ne i8 %4, 2
  %tobool.i.i.i.i.i.i.i.i.i.i.i.i = icmp eq %"class.llvm::Type"* %2, null
  %or.cond.i.i.i.i.i.i.i.i.i.i.i.i = or i1
%cmp.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i.i,
%tobool.i.i.i.i.i.i.i.i.i.i.i.i
  br i1 %or.cond.i.i.i.i.i.i.i.i.i.i.i.i, label %return, label
%_ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i

_ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i: ;
preds = %_ZN4llvm4castINS_8CallInstEPKNS_5ValueEEENS_10cast_rettyIT_T0_E8ret_typeERKS7_.exit.i.i.i.i.i.i.i.i.i.i.i
  %5 = bitcast %"class.llvm::Type"* %2 to %"class.llvm::Function"*
  %call1.i.i.i.i.i.i.i.i.i.i.i.i = tail call i32
@_ZNK4llvm8Function14getIntrinsicIDEv(%"class.llvm::Function"* %5)
nounwind readonly
  %cmp.i.i.i.i.i.i.i = icmp ne i32 %call1.i.i.i.i.i.i.i.i.i.i.i.i, 142
  %tobool = icmp eq %"class.llvm::Instruction"* %I, null
  %or.cond = or i1 %cmp.i.i.i.i.i.i.i, %tobool
  br i1 %or.cond, label
%_ZN4llvm3isaINS_12DbgValueInstEPNS_11InstructionEEEbRKT0_.exit.i,
label %return

_ZN4llvm3isaINS_12DbgValueInstEPNS_11InstructionEEEbRKT0_.exit.i: ;
preds = %_ZN4llvm3isaINS_14DbgDeclareInstEPNS_11InstructionEEEbRKT0_.exit.i
  %6 = bitcast %"class.llvm::Type"* %2 to %"class.llvm::Function"*
  %call1.i.i.i.i.i.i.i.i.i.i.i.i13 = tail call i32
@_ZNK4llvm8Function14getIntrinsicIDEv(%"class.llvm::Function"* %6)
nounwind readonly
  %cmp.i.i.i.i.i.i.i14 = icmp eq i32 %call1.i.i.i.i.i.i.i.i.i.i.i.i13, 143
  %phitmp18 = icmp ne %"class.llvm::Instruction"* %I, null
  %phitmp18. = and i1 %cmp.i.i.i.i.i.i.i14, %phitmp18
  ret i1 %phitmp18.

We are trying to prove the equivalence of the two tail calls.
If you call getNonLocalCallDependency on the second call, you end up
with two dependencies[1].

One of these dependencies we care about, the first tail call, and the
other we don't, the load way back at %4. I could teach GVN that this
particular clobber dependency is irrelevant for this call (and i will
:P). But the reality is we don't even need to be looking back that
far for dependencies.

I'd like to extend the interface for getNonLocallCallDependency
optionally take a set of stopping point instructions (or BB's,
depending on how expensive testing on every instruction ends up
being), and to not look for dependencies back past these points (at
least for whatever path it's currently following. You can't
short-circuit the whole thing because there may be a path backwards to
the same instruction with a different set of dependencies). That way,
it stops at the things we are trying to prove equivalence with. You
ca see how if you start to insert a lot more blocks in between the
"thing we are checking against", and "the next dependency it thinks is
real", you can get into hunting for a lot of dependencies we don't
care about. Besides being faster, short circuiting like this
eliminates more things.

A similar issue occurs with getNonLocalPointerDependency (though there
the issue is slightly different, we have a set of loads we want to
know if this load is equivalent to, and if we walk past those loads
without it seeing a def/clobber, we know the answer is "no", and we
don't need to know the closest deps)

[1] As a side note, whether you get two dependencies or one depends on
whether you remove the bitcast at %6, and replace it with the bitcast
at %5, because getCallSiteDependencyFrom tries to recognize when it
finds two of the same readonly call. The above is actually the output
of a larger testcase that was already run through GVN, but if you run
GVN on it *again*, it will delete the call the second time because it
will now replace the bitcast before calling getNonLocalCallDependency.
Suffice to say, the reason it doesn't do so the first time on the
larger testcase is because of the issue i'm talking about above.
Trying to shove a large amount of value based equivalence logic into
memdep seemed like "not the best approach".