OpenMP code and thread priority on OSX

Ho,

We are testing OpenMP aware audio C++ code with clang 3.7 on OSX. Basically this code is later on started in a realtime audio thread, using CoreAudio API on OSX. It seems that master and worker threads do not "inherit" thread priority from the realtime audio thread as we could expect, so the application do not behave reliably (audi drop out…)

Is there any specific set configuration of the libomp library we need to do ? Like environment variable or similar to set up?

Thanks.

Stéphane Letz

The OpenMP runtime does not manipulate thread priorities
A first skim through Mach Scheduling and Thread Interfaces doesn't make it clear to me whether threads inherit their priority from their creator.

Since OpenMP threads are created using pthread_create, that page suggests you may need to use pthread_setschedparam inside each thread to bump its priority. (Or you may be able to hack the runtime to pass the priority as an attribute at thread creation time).

BEWARE, though, you will almost certainly also want to change the default sleep behaviour of your OpenMP threads. By default they will spin for 200ms before sleeping in the kernel, which is unlikely to be helpful to the overall user experience if they’re at high-priority! (See KMP_BLOCKTIME).

Overall OpenMP is not designed for concurrency, but for parallelism, and you may find this a hard thing to make work.

-- Jim

James Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

The OpenMP runtime does not manipulate thread priorities
A first skim through Mach Scheduling and Thread Interfaces doesn't make it clear to me whether threads inherit their priority from their creator.

Since OpenMP threads are created using pthread_create, that page suggests you may need to use pthread_setschedparam inside each thread to bump its priority.

The thing is that on OSX audio threads are actually "THREAD_TIME_CONSTRAINT_POLICY" threads, and this priority cannot be setup using Posix API, but needs "thread_policy_set" kind of API (as you can see in the URL you gave).

(Or you may be able to hack the runtime to pass the priority as an attribute at thread creation time).

Can you possibly point me where exactly in the runtime code?

And would this kind of "inheritance policy" be better dynamically setup? I mean is this a requirement that could be needed for more use-cases to be part on the part of the official code?

BEWARE, though, you will almost certainly also want to change the default sleep behaviour of your OpenMP threads. By default they will spin for 200ms before sleeping in the kernel, which is unlikely to be helpful to the overall user experience if they’re at high-priority! (See KMP_BLOCKTIME).

Ok thanks.

Overall OpenMP is not designed for concurrency, but for parallelism, and you may find this a hard thing to make work.

-- Jim

What do you mean by "not designed for concurrency, but for parallelism" in this specific use context?

Thanks.

Stéphane Letz

Can you possibly point me where exactly in the runtime code?

% cd runtime/src
% grep pthread_create *.*
kmp_i18n.h: KMP_SYSFAIL( "pthread_create", status );
kmp_i18n.h: KMP_CHECK_SYSFAIL( "pthread_create", status );
z_Linux_util.c: status = pthread_create( & handle, & thread_attr, __kmp_launch_worker, (void *) th );
z_Linux_util.c: KMP_SYSFAIL( "pthread_create", status );
z_Linux_util.c: status = pthread_create( &handle, & thread_attr, __kmp_launch_monitor, (void *) th );
z_Linux_util.c: KMP_SYSFAIL( "pthread_create", status );

(Yup, z_Linux_util.c is counterintuitive...)

What do you mean by "not designed for concurrency, but for parallelism" in this specific use context?

Concurrency is handling multiple asynchronous events, and can be a useful way to structure code even when one has only a single hardware thread. For instance to execute callbacks when the user presses on buttons in a GUI, or a network packet arrives.

Parallelism is using many hardware threads to reduce the time to solution of a single problem. It is futile if you only have one hardware thread.

OpenMP is designed for parallelism. It works best when it controls all the hardware and there is nothing else going on. In your case there clearly are other things going on (otherwise changing the priority wouldn't be necessary). In such an environment OpenMP may not work well.
Amongst the reasons are
1) Many openMP codes use static work distribution. That assumes both that the work is evenly distributed between threads, and that the threads execute at the same speed. If a thread is stolen away by the OS that second assumption is false. At which point all other threads will have to wait at the next join or barrier for the laggard to arrive.
2) Even if you use dynamic scheduling, the definition of OpenMP barriers (and join) is that all the threads must arrive at the barrier, not that all the work has to be complete. So if the laggard is still not executing OpenMP code even though all the work is complete all the other threads still have to wait.

-- Jim

James Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers and Runtimes)
Tel: +44 117 9071438

% cd runtime/src
% grep pthread_create *.*
kmp_i18n.h: KMP_SYSFAIL( "pthread_create", status );
kmp_i18n.h: KMP_CHECK_SYSFAIL( "pthread_create", status );
z_Linux_util.c: status = pthread_create( & handle, & thread_attr, __kmp_launch_worker, (void *) th );
z_Linux_util.c: KMP_SYSFAIL( "pthread_create", status );
z_Linux_util.c: status = pthread_create( &handle, & thread_attr, __kmp_launch_monitor, (void *) th );
z_Linux_util.c: KMP_SYSFAIL( "pthread_create", status );

(Yup, z_Linux_util.c is counterintuitive…)

Thanks I'll have a look and try to patch this code.

What do you mean by "not designed for concurrency, but for parallelism" in this specific use context?

Concurrency is handling multiple asynchronous events, and can be a useful way to structure code even when one has only a single hardware thread. For instance to execute callbacks when the user presses on buttons in a GUI, or a network packet arrives.

Parallelism is using many hardware threads to reduce the time to solution of a single problem. It is futile if you only have one hardware thread.

OpenMP is designed for parallelism. It works best when it controls all the hardware and there is nothing else going on. In your case there clearly are other things going on (otherwise changing the priority wouldn't be necessary). In such an environment OpenMP may not work well.
Amongst the reasons are
1) Many openMP codes use static work distribution. That assumes both that the work is evenly distributed between threads, and that the threads execute at the same speed. If a thread is stolen away by the OS that second assumption is false. At which point all other threads will have to wait at the next join or barrier for the laggard to arrive.
2) Even if you use dynamic scheduling, the definition of OpenMP barriers (and join) is that all the threads must arrive at the barrier, not that all the work has to be complete. So if the laggard is still not executing OpenMP code even though all the work is complete all the other threads still have to wait.

-- Jim

Well our OpenMP code is basically a DAG of linked audio tasks (audio data flowing between the tasks) that we express as a sequence of parallel sections (#pragma omp sections with #pragma omp section inside). In a given parallel section tasks are usually quite similar, and should be of the same speed. Then parallel sections are synchronized with barriers. Since we are in an RT context, we can assume that no other non RT thread will steal the CPU during the pure audio computation, of if there is one, then it would be another audio RT thread (or any other RT thread…), and this is acceptable (since all audio RT threads simply try to meet their timing deadline).

I guess our code still uses static scheduling, I will test dynamic scheduling also.

Stéphane