How to get information about data dependencies?

Hello,

I had a question similar to this one in the link below.

http://lists.llvm.org/pipermail/llvm-dev/2015-January/080589.html

In addition, if we can’t get RAW dependencies information using the provided llvm tools, how would I write a pass for it? Are there any helpful resources?

Thanks,
Jiho

LLVM has multiple dependence analyses, each with its up- and downsides:

* llvm::DependenceAnalysis
* llvm::LoopAccessAnalysis
* llvm::MemorySSA
* polly::DependenceInfo

Just running the analysis passes does not change anything, the
analysis has to be used by a transformation pass. Since there is no
common interface for dependence analysis, each transformation pass
expects a specific analysis and runs it itself if necessary. For
instance LoopVectorize uses LoopAccessAnalysis.

None of these are well documented(*), I suggest to look at the
transformation pass sources that use them.

Michael

(*) For MemorySSA, there is https://www.llvm.org/docs/MemorySSA.html

Michael Kruse via llvm-dev <llvm-dev@lists.llvm.org> writes:

LLVM has multiple dependence analyses, each with its up- and downsides:

* llvm::DependenceAnalysis
* llvm::LoopAccessAnalysis

Can someone explain the differences between these? As far as I can tell
they essentially do the same thing (though perhaps one is more
precise?). LAA seems to be used by vectorization (what else?) while DA
seems to be used by loop transformations (what else?).

I am not a loop opt guy so while I'm familiar with the basic ideas, the
details are somewhat lost on me. Is there a reason to have two passes
or should they be combined and maintained as one pass? If I have need
of dependence analysis it's not clear which I should use and the
comments are not much help.

                   -David

AFAIK they are independent developments. LoopAccessAnalysis was
extracted out of the LoopVectorizer in 2015, and first developed in
2013 (https://github.com/llvm/llvm-project/commit/d517976758c8674bdcd4c74457f7a83f20e432c5)

DependenceAnalysis was a from-scratch implementation from 2012
(https://github.com/llvm/llvm-project/commit/59b61b9e2c549956b1094417a72c3943c20c9234)

I do not know why LoopVectorize would not make use of
DependenceAnalysis in 2013.

Michael

Michael Kruse via llvm-dev <llvm-dev@lists.llvm.org> writes:

AFAIK they are independent developments. LoopAccessAnalysis was
extracted out of the LoopVectorizer in 2015, and first developed in
2013 (https://github.com/llvm/llvm-project/commit/d517976758c8674bdcd4c74457f7a83f20e432c5)

DependenceAnalysis was a from-scratch implementation from 2012
(https://github.com/llvm/llvm-project/commit/59b61b9e2c549956b1094417a72c3943c20c9234)

I do not know why LoopVectorize would not make use of
DependenceAnalysis in 2013.

Is anyone looking at unifying these? It's super confusing as things
stand.

                      -David

Hi,

It happens that I’m working on dependence analysis this summer for outer loop vectorization. I have studied a bit both LAA and DA.

Their most important difference is that DA is used for compile-time / static checks while LAA is mainly used for generating run-time checks.

Now, more to their core, DA is based on strong theoretical background and has a phenomenally clean implementation
(basically, if you have understood the paper, which itself is written as a textbook chapter, since it kind of is, you have understood
almost all DA).
LAA on the other hand is a very hacky implementation and takes quite some time just to understand what it tries to do.
It’s hard to read. Note though that I know too little to criticize. This is just how it seems to me.

It doesn’t seem that strange to me that LAA had to be developed since if you see its test-suite, i.e. the problems it had to solve,
for a lot of them, DA just gives up while LAA generates run-time checks. Now, as to why it had to be implemented
in the way it was done, I would very much like to learn.

Now, as for unifying them, if we mean something other than just putting them in the same file, I don’t think it can happen.
IMHO they’re way more apart than it initially seems.

Kind regards,
Stefanos

Στις Τρί, 7 Ιουλ 2020 στις 6:16 μ.μ., ο/η David Greene via llvm-dev <llvm-dev@lists.llvm.org> έγραψε:

Stefanos Baziotis via llvm-dev <llvm-dev@lists.llvm.org> writes:

Their most important difference is that DA is used for compile-time /
static checks while LAA is mainly used for generating run-time checks.

Ah, that's important information I didn't have. Thank you!

Now, as for unifying them, if we mean something other than just putting
them in the same file, I don't think it can happen.
IMHO they're way more apart than it initially seems.

But yet they are intimately related in that the kind of information you
want to know statically and dynamically is the same. I wonder what it
would take to extend DA to generate runtime checks if it can't prove
independence.

The thing I fear is one or the other being enhanced to resolve more
things statically without the other getting the same improvements. Then
some passes benefit and others don't and it won't be clear why. The
same could happen with enhancement to dynamic checking (if it were added
to DA).

                       -David

Ah, that’s important information I didn’t have. Thank you!

No problem, glad to help!

To the rest of your thoughts, I certainly agree. One interesting question is why LAA
didn’t use DA at all. Other than that, note that LAA is quite specialized, namely for
loop vectorization. Actually, it’s even more specific. For innermost loop vectorization.

That affects the design. It might had been easier to create this specialized tool than
extending a general one (if that was a good path to follow is another topic).

But yet they are intimately related in that the kind of information you
want to know statically and dynamically is the same. I wonder what it
would take to extend DA to generate runtime checks if it can’t prove
independence.

Indeed, but again, IMHO unifying them is neither easy nor does it make sense.
They do fundamentally the same thing but their directions are very different.

So, I see two options:

a) As you said

I wonder what it would take to extend DA to generate runtime checks if it can’t prove independence.

Personally, I see potential but neither do I know what it would take. Since this is something that I’m
currently thinking of, I would be more than interested to discuss it extensively.

In any case, I would strongly prefer that we don’t follow the LAA path, since I don’t think it has potential
anyway. I think that we should try to find a way to extend it that is also based on strong theoretical foundation
and maintains the high quality of code.

b) Extend LAA to do static checks

The question here is though: Why do that? As I said, it doesn’t seem to have potential and I believe that people
working on vectorizers (either LLVM’s current one or external like e.g. RV and VPlan) don’t do either.

Tell me what you think and I’m looking forward to more people jumping in.

  • Stefanos

Στις Τρί, 7 Ιουλ 2020 στις 11:11 μ.μ., ο/η David Greene <david.greene@hpe.com> έγραψε:

Stefanos Baziotis via llvm-dev <llvm-dev@lists.llvm.org> writes:

In any case, I would strongly prefer that we don't follow the LAA
path, since I don't think it has potential anyway. I think that we
should try to find a way to extend it that is also based on strong
theoretical foundation and maintains the high quality of code.

Just to be clear, you don't think it has potential because it's been
disgned into an inner-loop corner and would take extensive rewriting to
handle OLV?

Tell me what you think and I'm looking forward to more people jumping in.

If my supposition above is correct, it certainly seems like extending DA
is the way to go but I'd like to hear from the current vectorizer
maintainers because I don't have enough knowledge to make an informed
judgment.

There's the VPlan infrastructure which I have not heard much about for
several months. What is going on with that? Yes, that's a vector
codegen issue but it may be useful to have a more complete picture of
how all this works or will work.

                   -David

Their most important difference is that DA is used for compile-time / static checks while LAA is mainly used for generating run-time checks.

Note that when the development of LAA started
(https://github.com/llvm/llvm-project/commit/d517976758c8674bdcd4c74457f7a83f20e432c5)
it also did static checks only, even though DA already existed in the
code base (https://github.com/llvm/llvm-project/blob/d517976758c8674bdcd4c74457f7a83f20e432c5/llvm/lib/Analysis/DependenceAnalysis.cpp).

Now, more to their core, DA is based on strong theoretical background and has a phenomenally clean implementation
(basically, if you have understood the paper, which itself is written as a textbook chapter, since it kind of is, you have understood
almost all DA).
LAA on the other hand is a very hacky implementation and takes quite some time just to understand what it tries to do.
It's hard to read. Note though that I know too little to criticize. This is just how it seems to me.

It doesn't seem that strange to me that LAA had to be developed since if you see its test-suite, i.e. the problems it had to solve,
for a lot of them, DA just gives up while LAA generates run-time checks. Now, as to why it had to be implemented
in the way it was done, I would very much like to learn.

Now, as for unifying them, if we mean something other than just putting them in the same file, I don't think it can happen.
IMHO they're way more apart than it initially seems.

Thanks for sharing your analysis.

Michael

Hi,

> Ah, that's important information I didn't have. Thank you!

No problem, glad to help!

To the rest of your thoughts, I certainly agree. One interesting question is why LAA
didn't use DA at all. Other than that, note that LAA is quite specialized, namely for
loop vectorization. Actually, it's even more specific. For innermost loop vectorization.
That affects the design. It might had been easier to create this specialized tool than
extending a general one (if that was a good path to follow is another topic).

> But yet they are intimately related in that the kind of information you
> want to know statically and dynamically is the same. I wonder what it
> would take to extend DA to generate runtime checks if it can't prove
> independence.

Indeed, but again, IMHO unifying them is neither easy nor does it make sense.
They do fundamentally the same thing but their directions are very different.

So, I see two options:

a) As you said

> I wonder what it would take to extend DA to generate runtime checks if it can't prove independence.

Personally, I see potential but neither do I know what it would take. Since this is something that I'm
currently thinking of, I would be more than interested to discuss it extensively.

In any case, I would strongly prefer that we don't follow the LAA path, since I don't think it has potential
anyway. I think that we should try to find a way to extend it that is also based on strong theoretical foundation
and maintains the high quality of code.

I am not sure if that is an entirely fair characterization of LAA. LAA is being used by the vectorizer (and other passes) in production for a few years now. None of the in-tree users of DA seem to be enabled by default and therefore LAA probably has an order of magnitude more testing, bug fixes & tuning.

DA’s implementation might be cleaner, but as mentioned earlier, DA handles only a small subset of things LAA handles and hence I am not sure comparing the code-complexity is too helpful.
IMO a lot of LAA complexity comes from things DA does not handle, in particular runtime check generation. LAA also analyses & processes a whole loop whereas DA only checks dependences between 2 memory accesses, as well as decides whether it is profitable to generate runtime checks.

There is definitely potential for improving the structure & organization of LAA, as well as improving the documentation. Happy to collaborate on that.

I am not convinced it makes sense to add runtime check generating to DA directly, because I don’t think the static dependence checks really need to be strongly coupled with runtime-check generation.

b) Extend LAA to do static checks

The question here is though: Why do that? As I said, it doesn't seem to have potential and I believe that people
working on vectorizers (either LLVM's current one or external like e.g. RV and VPlan) don't do either.

To clarify, LAA does static checks and only generate runtime checks if it cannot prove that the dependence is safe for vectorization statically. Granted, the static checks mostly boil down to distance computations on SCEV expressions, but for the current use cases it seems to work well enough.

It might be feasible to use DA for the static checks in LAA. That might help for a few multi-dimensional cases, but in practice generating proper runtime-checks for multi-dimensional cases is probably more important, due to aliasing issues.

Cheers,
Florian

Just to be clear, you don’t think it has potential because it’s been
disgned into an inner-loop corner and would take extensive rewriting to
handle OLV?

First of all, if one wants to do dependence analysis for OLV (which is what I’m currently working on), they may get away
with extending LAA, or writing their own OLV checker.

That said, no it doesn’t seem that the current LAA can help. But that we can’t support OLV currently
is not the core problem for me. We may do it eventually, but if the implementation is still hacky, then
it would still be bad I think. The core problem is that there’s no clear theoretical foundation to support it.

The code seems to me like a set of “Oh, we needed to handle this case so we added an if there”.
Which to me is not the correct way to go and it’s pretty much the opposite of DA.

(Again, I don’t mean to criticize LAA implementors - they probably had their reasons and I guess
they know way more than me)

it certainly seems like extending DA is the way to go

Maybe but maybe not. Maybe a good theory for run-time checks ends up
being different than DA’s ideas (this is partly what I have to do this summer).
In any case, extending

but I’d like to hear from the current vectorizer
maintainers because I don’t have enough knowledge to make an informed
judgment.

Me too!

There’s the VPlan infrastructure which I have not heard much about for
several months. What is going on with that? Yes, that’s a vector
codegen issue but it may be useful to have a more complete picture of
how all this works or will work.

Sorry, I don’t know much about VPlan. I’m involved in the RV (https://github.com/cdl-saarland/rv)

Note that when the development of LAA started it also did static checks only, even though DA already existed in the code base

Interesting, thanks.

Thanks for sharing your analysis.

No problem :slight_smile:

I am not sure if that is an entirely fair characterization of LAA. LAA is being used by the vectorizer (and other passes) in production for a few years now. None of the in-tree users of DA seem to be enabled by default and therefore LAA probably has an order of magnitude more testing, bug fixes & tuning.

No argument there. The fact that it is used, doesn’t necessarily mean it’s clean nor that it has some strong theory supporting it.

DA’s implementation might be cleaner, but as mentioned earlier, DA handles only a small subset of things LAA handles and hence I am not sure comparing the code-complexity is too helpful.

DA does not handle a small subset of LAA’s checks, unless I miss something. It handles way more when it comes to static checking.
I think that comparing code complexity is important. DA is about double the size of LAA yet it’s way more understandable. And the reason for that
I don’t think it is that it does something more trivial. Rather, it’s based on a clear paper and has clearly implemented it.

IMO a lot of LAA complexity comes from things DA does not handle, in particular runtime check generation.

I agree.

LAA also analyses & processes a whole loop whereas DA only checks dependences between 2 memory accesses, as well as decides whether it is profitable to generate runtime checks.

It processes innermost loops only and the fact that it can handle a whole loop rather than independent accesses I’m not sure it is a good path. For LAA’s usage it’s necessary but it creates a form of coupling (and complexity).

There is definitely potential for improving the structure & organization of LAA, as well as improving the documentation. Happy to collaborate on that.

Are we really sure of that? Personally, I was thinking of submitting a patch but I’m not sure it is worth the effort. However, I’m glad to hear that you’re happy to collaborate. :slight_smile:
We can talk about that more if you want.

I am not convinced it makes sense to add runtime check generating to DA directly, because I don’t think the static dependence checks really need to be strongly coupled with runtime-check generati

I agree to the latter, maybe to the former. In any case, I’d like to see the current DA staying as it is. And move the discussion to “what is the future of run-time checks”. Either that
is extending LAA, DA or something else completely.

To clarify, LAA does static checks and only generate runtime checks if it cannot prove that the dependence is safe for vectorization statically. Granted, the static checks mostly boil down to distance computations on SCEV expressions, but for the current use cases it seems to work well enough.

Yes, sorry for not stressing that LAA does static checks too as I said though, in my understanding they’re very weak, though still enough for its usage. And I agree that LAA’s capabilities is probably enough for innermost loop vectorization.
The important thing I believe is the future.

It might be feasible to use DA for the static checks in LAA. That might help for a few multi-dimensional cases, but in practice generating proper runtime-checks for multi-dimensional cases is probably more important, due to aliasing issues.

Well… I tried that and it doesn’t seem to be very useful unfortunately. The C/C++ way that arrays are defined is probably why DA is not that useful. Namely that a row can alias with another row in 2D arrays. The theory behind DA
is quite powerful if we knew that they don’t alias. Right now, it just gives up.

I don’t think that LAA can handle multi-dimensional cases either though, nor do I have a good idea about how to do it myself (in or out of LAA / DA).

Best,
Stefanos

Στις Τετ, 8 Ιουλ 2020 στις 12:48 π.μ., ο/η Florian Hahn <florian_hahn@apple.com> έγραψε:

Stefanos Baziotis via llvm-dev <llvm-dev@lists.llvm.org> writes:

Well... I tried that and it doesn't seem to be very useful
unfortunately. The C/C++ way that arrays are defined is probably why
DA is not that useful. Namely that a row can alias with another row in
2D arrays. The theory behind DA is quite powerful if we knew that they
don't alias. Right now, it just gives up.

Note that the situation is very different in flang, where good
dependence analysis becomes much more critical.

                      -David

From what I know about Fortran (which is next to nothing), it provides aliasing guarantees, e.g. that different rows
don’t alias so yes this is important there.

A note regarding C/C++ that I was assuming, but only confirmed recently.

Any serious multi-dimensional performance code (e.g. benchmarks) is not written with indirect pointers. E.g. if you want to do operations

on a 3-D array, the function won’t get a ***A. It will get *A and then index like A[i*n + j*m + k] (this is the same if you
take a multi-dim static array, in which indexing is lowered into the same thing - single pointer and GEPs).

Probably the reason is what we’re discussing. Because you get one pointer, that you also mark restrict, and you’re done.

I’m (partly) in the middle of investigating DA’s role in LLVM, hopefully something good will come out.

  • Stefanos

Στις Τετ, 15 Ιουλ 2020 στις 10:17 μ.μ., ο/η David Greene <david.greene@hpe.com> έγραψε:

There was an RFC on the mailing list to support a GetElementPtr with
dynamic array sizes:
http://lists.llvm.org/pipermail/llvm-dev/2019-July/134063.html

We also had a discussion whether `A[i][j]` can be lowered to a
GetElementPtr with `inrange` keyword for the subscripts, including for
statically sized arrays.

Michael

Thanks for the info. Let me clarify something that might have caused confusion: When I said 2D arrays above, I meant
double pointers, which is another beast on its own. Basically my example states that e.g. a[0][1] and a[0][2] could easily
point to the same exact memory.

Now, returning the discussion to static arrays, which is more realistic anyway in high-performance code, I think I would
be very happy to see GEP with dynamic sizes. In my current work, I depend on SCEV Delinearization to recover the
indices (I guess many people have had to do that, for no actual reason).

We also had a discussion whether A[i][j] can be lowered to a
GetElementPtr with inrange keyword for the subscripts, including for
statically sized arrays.

To lower it to one GEP, wouldn’t that require GEPs with dynamic sizes?

Also, to be sure we’re on the same page, this is beneficial because, as in the RFC example, if we have
A[x1][y1] and A[x2][y2] and x1 != x2 and y1 != y2, then the two don’t alias. Actually, if inrange works
as expected, even if one of the conditions holds, the two don’t alias.

  • Stefanos

Στις Τρί, 21 Ιουλ 2020 στις 12:18 π.μ., ο/η Michael Kruse <llvmdev@meinersbur.de> έγραψε: