Pointer "data direction"

Hi,

suppose the following C function declaration:

    void f(int *in, int *out);

Now further suppose, that _in_ is an array only read from and _out_ is
an array that is only written to.

Based on this, I was wondering whether there is some already existing
LLVM pass (or maybe a part of a pass) that detects those "data
directions" for pointers. I'm not quite sure whether e.g. Alias Analysis
can provide me this information (I suppose it *cannot*).

Best regards,
Sebastian

Hi Sebastian,

This kind of analysis is a pretty complex problem in general case. Consider, for instance, function “f” has nested calls of other functions with “side effects”, meaning they could potentially change the contents of “in” or “out” indirectly. For this reason, even current state-of-art commercial APIs that imply strong data analysis (like OpenACC or HMPP) require functions to be free of side effects, because nobody could solve this problem well at compile-time.

Depending on the purpose of your question, this may or may not help: in comparison to general analysis, LLVM community makes way better progress in analysing data access patterns for vectorization and parallelization. In this case, particular code regions or loops are considered for matching suitable access patterns. For details on vectorization - you can look into the work by Nadav Rotem, Hal Finkel et al, the work by Preston Briggs [1], for details on polyhedral analysis - the Polly project [2]. These two could be extended further with runtime-assisted data analysis, where knowing actual values of pointers and index ranges you can also make conclusions about read/write modes with respect to particular code regions, like in [3].

[1] https://sites.google.com/site/parallelizationforllvm/
[2] http://polly.llvm.org/
[3] http://kernelgen.org

Hope it helps,

  • D.

2013/1/9 Sebastian Dreßler <dressler@zib.de>

Hi Dmitry,

Hi Sebastian,

This kind of analysis is a pretty complex problem in general case.
Consider, for instance, function "f" has nested calls of other functions
with "side effects", meaning they could potentially change the contents of
"in" or "out" indirectly. For this reason, even current state-of-art
commercial APIs that imply strong data analysis (like OpenACC or HMPP)
require functions to be free of side effects, because nobody could solve
this problem well at compile-time.

The functions I'm going to analyze are not having side effects (sorry
for not mentioning). Basically, they are enclosed kernels.

Depending on the purpose of your question, this may or may not help: in
comparison to general analysis, LLVM community makes way better progress in
analysing data access patterns for vectorization and parallelization. In
this case, particular code regions or loops are considered for matching
suitable access patterns. For details on vectorization - you can look into
the work by Nadav Rotem, Hal Finkel et al, the work by Preston Briggs [1],
for details on polyhedral analysis - the Polly project [2]. These two could
be extended further with runtime-assisted data analysis, where knowing
actual values of pointers and index ranges you can also make conclusions
about read/write modes with respect to particular code regions, like in [3].

[1] https://sites.google.com/site/parallelizationforllvm/
[2] http://polly.llvm.org/
[3] http://kernelgen.org

Thank you. At a glance [3] is really helpful, so I'll have a deeper look
onto it. I also know [2] but didn't used it yet, but now I'll "have to" :wink:

Maybe a few sentences regarding the purpose: the goal is to analyze the
sizes of the provided arguments to the function. What I've done so far
is, that I'm able to analyze data structures (C, C++) and mallocs at
run-time by injected LLVM IR code [4]. Currently I'm further extending
it to analyze C++ classes (most things work, include inheritance,
templates, also std::vector and std::map). However, my problem is the
described one, i.e. how to assign the directions in order to correctly
compute data volumes for in and out. One of the first ideas was to use a
kind of configuration file, since the kernel developer knows about the
data directions. But I think this can be done more elegant. Anyway,
based on your answer I'll maybe go back to this idea.

Thanks,
Sebastian

[4]
http://opus4.kobv.de/opus4-zib/files/1556/kdv_dressler_steinke_zibreport.pdf

Are you analysing sizes in order to perform host<->accelerator memory synchronization?

2013/1/9 Sebastian Dreßler <dressler@zib.de>

Are you analysing sizes in order to perform host<->accelerator memory
synchronization?

No. This is done for performance prediction of kernels with rather
complex data structures.

Of course, this could be extended towards it. Shouldn't be that hard.