Hi Alexandra,
I don't know much, maybe this topic should be bridged with polly-dev
(adding it to CC) to bring it more attention.
Thanks Dimitry for moving this over. I would also have replied on the LLVM list, but was away this weekend. Now I am back.
Indeed, polly uses ScopPass, that creates serious limitations in
compatibility with other passes. To my understanding, scops are used
because ISL loop analysis tool uses scops.
Scops are used, as this is the problem domain we are working on. Scops are not just loops, but can also be nested conditions without any loops.
Also describing them as loop passes is not what we want, as the idea of our polyhedral description is actually to abstract away the specific structure of the loops.
I would be very interested to hear what use cases you believe are limited because of the use of ScopPass? If you are e.g. only interested in Dependency Information or Alias Information, it might be possible to
create a LoopPass that proxies the relevant information, such that it is available to other passes.
> In fact, just for handling
OpenMP directives scops are not required, unless one need to make sure
OpenMP directive is set for loop with parallel iterations.
Right. Scops are not needed to handle OpenMP directives and Polly is actually not ment to handle OpenMP directives. It is just one of several ways to introduce OpenMP parallelism. Polly does this if the loop it generate is parallel, clang would do it if the user added OpenMP directives and Alex may introduce parallelism in similar cases.
For me handling 'OpenMP directives' could mean two things:
1) clang understand OpenMP pragmas and lowers them to a set of OpenMP intrinsics/function calls and structures.
2) An LLVM optimization pass understands a set of high level OpenMP intrinsics that it can optimize and transform to specific libgomp or mpc.sf.net library calls.
Both are not yet available in LLVM, but would be highly useful. Especially 2) would be nice for Polly and probably also for Alexandra.
Btw, it would be very interesting to know more about your
project/purpose for this!
Oh yes. I am also highly interested.
Hi,
I want to execute the iterations of a loop in parallel, by inserting calls
either to pthreads or to the gomp library at the LLVM IR level. As a first
step, I inserted an omp pragma in a C file and compiled it with llvm-gcc to
check the generated LLVM code.
This is a good step. When you do this, make sure you use
'schedule (runtime)'. Otherwise gcc will use not only function calls to set up libgomp, but also inline instructions. This makes the code a lot harder to understand.
>> If I understand correctly, to parallelize the
loop in LLVM IR, I have to separate the loop in a new function, put all
required parameters in a structure, make the call to the gomp library, and
restore all values from the structure in the original variables.
Also, I have to compute the number of iterations allocated to each thread
and insert in the loop body a new condition, such that each thread executes
only its slice.
Is that correct?
Partially. And it depends also on what kind of OpenMP parallel loop you want to generate. I suggest you want to generate a schedule(runtime) OpenMP parallel loop, as this is the easiest one to generate. Here you need basically do this:
Host function:
(The function that contains the loop you want to parallelize)
Here you replace the loop with calls to:
GOMP_parallel_loop_runtime_start(subfunction, subfunction_data,
number_of_threads, lower_bound,
upper_bound, stride)
subfunction()
GOMP_parallel_end()
subfunction is the address of a new function, called subfunction. subfunction_data, is the address of the structure that contains the data needed in the subfunction. The remaining arguments should be obvious.
subfunction is now basically:
int lower_bound, upper_bound;
while(GOMP_loop_runtime_next(*lower_bound, upper_bound)) {
for (int i = lower_bound; i < upper_bound; i += stride) {
// Put here your loop body
}
}
As far as I know, both llvm-gcc and Polly already offer support for OpenMP,
by inserting calls to the gomp library.
Polly support automatic parallelization of the loops it generates. gcc (and therefore llvm-gcc) supports both user added OpenMP pragmas and automatically added OpenMP calls (provided by the -autopar pass).
> Can this code be reused?
Depends on what you plan to do. The code in Polly is currently specific to Polly , only creates the calls to OpenMP that we need and basically builds an OpenMP loop from scratch.
What you most probably want is a pass, that takes an existing LLVM-IR loop and translates it into an OpenMP parallel loop.
From Polly you could get the functions that create the definitions of the relevant LibGOMP functions, that set up the functions and that create the new loop structure. To get a pass that translates an LLVM-IR to LLVM-IR loop, you still need to implement the actual transformation.
Is there a pass that I can call to do all these code transformations?
No. LLVM has no included passes for OpenMP transformations. Polly just does code generation for its own, specific use case. gcc has some passes that lower high-level OpenMP calls to more individual OpenMP calls and calculations.
I had a look at the CodeGeneration from Polly. Is it possible to use it
without creating the Scops, by transforming it into a LoopPass?
No. It is not a pass that translates from LLVM-IR to LLVM-IR, but it creates new LLVM-IR from a polyhedral description. So without Scops and therefore without a polyhedral description it cannot be used.
Could you indicate how is this handled in llvm-gcc?
The only pass that generates OpenMP calls in llvm-gcc is the gcc -autopar pass. It basically detects if a loop is parallel and introduces calls to libgomp (I am not sure if it creates higher level intrinsics that are lowered by a subsequent gcc pass to the actual libgomp calls or equivalent instructions).
Alex, let me know what you are planning to do. Maybe we can work together on some OpenMP infrastructure in LLVM. I would love to have both a generic OpenMP builder that can be used by both your transformations and Polly, as well as a Pass that lowers high level OpenMP intrinsics to higher performance OpenMP code and low level function calls.
Cheers
Tobi