Debug assert trigger in OpenMP + MPI

Hello everyone,

Writing an OpenMP + MPI code I've triggered a debug assert in __kmp_task_start:

KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);

I attach a simpler code that does not do anything special with additional info.

#include <mpi.h>

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char **argv)
{
int TIMESTEPS = 10;
int BLOCKS = 100;

     MPI\_Init\(&amp;argc, &amp;argv\);

     int rank, nranks;
     MPI\_Comm\_rank\(MPI\_COMM\_WORLD, &amp;rank\);
     MPI\_Comm\_size\(MPI\_COMM\_WORLD, &amp;nranks\);

     int DATA;

     \#pragma omp parallel
     \#pragma omp single
     \{
             for \(int t = 0; t &lt; TIMESTEPS; \+\+t\) \{
                     for \(int r = 0; r &lt; nranks; \+\+r\) \{
                             for \(int b = 0; b &lt; BLOCKS; \+\+b\) \{
                                     \#pragma omp task depend\(in: DATA\)
                                     \{ \}
                             \}
                     \}

                     \#pragma omp task depend\(inout: DATA\)
                     \{\}
             \}
             \#pragma omp taskwait
     \}

     MPI\_Finalize\(\);

}

llvm_project debug build, commitaafdeeade8d
MPICH Version: 3.3a2
MPICH Release date: Sun Nov 13 09:12:11 MST 2016

$ MPICH_CC=clang mpicc -fopenmp t1.c -o t1
$ for i in {1..100}; do mpiexec.hydra -n 4 ./t1; done

http://bsc.es/disclaimer

Hello again,

I've managed to remove MPI from the equation. It seems a race condition in the runtime.

int main(int argc, char **argv)
{
int TIMESTEPS = 10;
int BLOCKS = 100;

     int nranks = 4;

     int DATA;

     \#pragma omp parallel
     \#pragma omp single
     \{
             for \(int t = 0; t &lt; TIMESTEPS; \+\+t\) \{
                     for \(int r = 0; r &lt; nranks; \+\+r\) \{
                             for \(int b = 0; b &lt; BLOCKS; \+\+b\) \{
                                     \#pragma omp task depend\(in: DATA\)
                                     \{ \}
                             \}
                     \}

                     \#pragma omp task depend\(inout: DATA\)
                     \{\}
             \}
             \#pragma omp taskwait
     \}

}

To run it execute:

clang -fopenmp t1.c -o t1

for i in {1..5000}; do echo $i; OMP_NUM_THREADS=3 ./t1; done

Regards,

Raúl

Hello again,

I've managed to remove MPI from the equation. It seems a race condition in the runtime.

int main(int argc, char **argv)
{
int TIMESTEPS = 10;
int BLOCKS = 100;

    int nranks = 4;

    int DATA;

    \#pragma omp parallel
    \#pragma omp single
    \{
            for \(int t = 0; t &lt; TIMESTEPS; \+\+t\) \{
                    for \(int r = 0; r &lt; nranks; \+\+r\) \{
                            for \(int b = 0; b &lt; BLOCKS; \+\+b\) \{
                                    \#pragma omp task depend\(in: DATA\)
                                    \{ \}
                            \}
                    \}

                    \#pragma omp task depend\(inout: DATA\)
                    \{\}
            \}
            \#pragma omp taskwait
    \}

}

To run it execute:

clang -fopenmp t1.c -o t1

for i in {1..5000}; do echo $i; OMP_NUM_THREADS=3 ./t1; done

Thanks for the reproducer! We might need to file a bug report for this one

but maybe someone will pick it up from here, let's wait a little while :slight_smile:

I looked a bit into this issue, because I ran into the same issue with a blocked cholesky factorization code this week. Thanks for providing this reproducer!

I think, the bookkeeping of task queues is broken, so that under certain conditions, the tail/head marker is not updated correctly.
In addition to the assertion, I see regular stalling in the runtime.

I'm not convinced, that TCW_4 &Co have any effect in current builds. Therefore, I think, that the compiler might move the accesses to the head/tail counters out of the locked region?!?

For testing purposes, I added KMP_MB() after locking and before unlocking the lock function (kmp_tasking.diff). Or should this actually become a part of the locking function (kmp_lock.diff)?!?

I did no performance tests of those change, but the latter solution fixed my stalls as well as the spurious assertion violations.

Best
Joachim

kmp_tasking.diff (2.75 KB)

kmp_lock.diff (862 Bytes)

I think, I found the issue and posted a fix at:

https://reviews.llvm.org/D80480

- Joachim

Thanks! Can you also commit the reproducer as a test?