omp_set_num_threads() with target region


It looks like that “omp_set_num_threads()” does not affect the number of threads in a parallel region that belongs to a combined target construct.

Consider this code:

#include <omp.h>

#include <stdio.h>

int main()


int lp = 0;


#pragma omp target teams distribute parallel for lastprivate(lp)

for(int i = 1; i < 10; i++) {

printf(“tid: %d\n”, omp_get_thread_num());

if(i == 1) {

lp = 0;




printf(“lp: %d\n”, lp);

return 0;


This code when compiled with clang -fopenmp test.c returns a value of lp equal to 0, and the parallel region is executed by 9 threads in this case.

Shouldn’t the omp_set_num_threads(1) set only one thread for the parallel region? Therefore, the expected result should be “9”?

Could this be a runtime bug? Otherwise, please help me understand this.



Hi Simone,

Couple of notes:

  1. omp_set_num_threads() is not supposed to work for limiting number of threads inside target or teams constructs. It only works for parallel regions in the same (implicit) task, and the target construct is supposed to initialize its own separate execution environment in implementation-specific manner. Currently OpenMP specification does not have any api to control number of teams and number of threads in each team; the only standard control is thread_limit clause on teams construct. As an extension, the runtime library honors OMP_NUM_THREADS environment variable to limit the number of threads in teams construct executed on host device. And in about a month it is expected new release of the OpenMP specification (Technical Report 8) which will introduce new environment variables OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT and corresponding APIs omp_set_num_teams and omp_set_teams_thread_limit for controlling number of teams and threads for each team. But these are not yet implemented in the runtime library, and I’d guess they still won’t work for “target teams” construct, only for “teams” without “target”.

  2. Even if you are able to limit the number of threads to 1 (and obviously limit the number of teams to 1 as well), you won’t get expected result, because the lastprivate clause is not implemented (or the implementation has bugs) for “target teams distribute parallel for” construct on host device in clang. Maybe compiler team knows more details here, as the runtime library seems to provide needed support. We can discuss this issue separately if needed.