Performance slowdown

Sent from my Verizon Wireless 4G LTE DROID
On Aug 19, 2015 1:36 PM, César via Openmp-dev <openmp-dev@lists.llvm.org> wrote:
>
> On Wed, Aug 19, 2015 at 3:07 PM, Jack Howarth <howarth.mailing.lists@gmail.com> wrote:
>>
>> On Tue, Aug 18, 2015 at 2:14 PM, César via Openmp-dev
>> <openmp-dev@lists.llvm.org> wrote:
>> > Hello,
>> >
>> > I don’t know if this is the correct list to talk about this - I did not find
>> > a better place…
>> >
>> > I am doing performance experiments with a few OpenMP implementations (IOMP,
>> > GOMP and our private impl.) and I am seeing a severe slowdown when I use
>> > IOMP (GOMP and others are performing well).
>> >
>> > The benchmarks I am using are these ones:
>> > http://kastors.gforge.inria.fr/#!index.md
>>
>> That web page claims the benchmarks use parts of the OpenMP 4.0 specification.
>>
>> "The KaStORS benchmark suite has been designed to evaluate the implementation of
>> the OpenMP dependent task paradigm, introduced as part of the OpenMP 4.0
>> specification."
>>
>> Currently openmp is only complete for the OpenMP 3.2 specification
>>
>
> I am able to compile a few benchmarks that use task dependence annotations (from OMP 4.0) but for those that specify the range of the memory dependence I get syntax error. So, should I assume that this part is not implemented, right? Is there a list for the OMP 4.0 items that are currently supported?
>
> BTW, the Clang version from Github was able to parse these annotations, was it dropped from the current newer version?
>

It is not there yet. You’ll need to use the code from the github clang_trunk (and llvm_trunk, etc.) repositories to get both recent Clang/LLVM and all of the OpenMP features.

-Hal

>
>>
>> >
>> > Really, the slowdown is huge. For one of the programs (plasma/dpotrf_taskdep
>> > -n 8192 -b 64 -i 1 -c) the serial version executes in ~28s and the parallel
>> > one executes in ~110s. I did some profiling and found that most of the time
>> > is being spent on synchronization barriers and dependence tracking (see
>> > attached image). Before digging deeper I would like to hear back from you if
>> > I am doing something wrong here:
>> >
>> > - I tested with the last version of the repository:
>> > http://llvm.org/svn/llvm-project/openmp/trunk
>> > - I am using Ubuntu 14.10.
>> > - I have tested on more than one machine, the results above are from a Intel
>> > i7-3770
>> > - The runtime itself is compiled using: make compiler=gcc os_omp=linux
>> > arch=32e
>> > - The version of GCC that I am using is: 4.9.1
>> > - The version of Clang that I am using to compile the benchmarks: 3.5.0
>> >
>> >
>> > César.
>> >
>> > _______________________________________________
>> > Openmp-dev mailing list
>> > Openmp-dev@lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>> >
>
>

Sent from my Verizon Wireless 4G LTE DROID

Yes, but still this version is not optimal. Clang trunk will produce a little bit faster code.

Actually the performance problem is separate from

supported version or linker errors or compiler used.

So we apparently need to investigate the problem

in the OpenMP runtime. We will work on this.

Thanks,

Andrey

Hello,

Actually the performance problem is separate from

supported version or linker errors or compiler used.

That was my first thought.

So we apparently need to investigate the problem

in the OpenMP runtime. We will work on this.

I will also spend some time investigating the problem in the runtime.

Thank you,