Local variables and yielding of untied tasks

I am experimenting with the behavior of taskyield under different OpenMP implementations and have written a test program that creates a set of untied tasks that each contain two taskyield regions. The program makes sure that only one thread is executing the relevant branch in the tasks by trapping all but thread 0 in a loop. Before the first yield, a shared variable is incremented and read into a local variable. After the first yield, the local variable is printed and a second taskyield is executed. Eventually the threads are released as soon as all tasks have reached the second yield. The complete code is attached.

I am observing some strange behavior with Clang 6.0 (built from release tarball, libomp built from master): the local variable contains the value zero where it should contain a value similar to flag_one_cntr:

[0] task_id 0 : flag_one_cntr 1 : flag_two_cntr 0
[0] task_id 0 : flag_one_cntr 3 : flag_two_cntr 1
[0] task_id 0 : flag_one_cntr 4 : flag_two_cntr 1
[0] task_id 0 : flag_one_cntr 5 : flag_two_cntr 3

(notice that there appears to be no output for flag_one_cntr=2 and 7)

Interestingly, the behavior seems correct if either of the taskyield regions is commented out. Similarly, the behavior seems correct if the tasks are tied (in which case there are no tasks to yield to). Using tied tasks and disabling the task stealing constraint (KMP_TASK_STEALING_CONSTRAINT=0) seems to provide the expected behavior:

[0] task_id 257 : flag_one_cntr 257 : flag_two_cntr 0
[0] task_id 256 : flag_one_cntr 257 : flag_two_cntr 1
[0] task_id 255 : flag_one_cntr 257 : flag_two_cntr 2
[0] task_id 254 : flag_one_cntr 257 : flag_two_cntr 3
[0] task_id 253 : flag_one_cntr 257 : flag_two_cntr 4
[0] task_id 252 : flag_one_cntr 257 : flag_two_cntr 5

(the tasks last put on the stack arrive at the printf first)

Any idea what could be going wrong here and how to debug this further? I tried different other implementations (incl. Cray, GCC, OmpSs) which seem to work correctly (albeit with different behavior of taskyield). I originally found this issue with the Intel 18.0.1 compiler.

Any help would be much appreciated!

Best regards,
Joseph

openmp_yield_clang.c (1.15 KB)

Running your code on our machine gives random numbers for task_id (but every time the same within a run). This looks like accessing uninitialized memory.

So either the variable is not properly stored at the yield, or the variable is not initialized to the stored value on continuation.

- Joachim

Hi Joseph,

This looks like a bug in the clang' implementation of untied tasks - it does not capture the private variables locations, and thus the value of the "task_id" gets lost at each taskyield (where the thread exits a task in order to re-enter its continuation later on).

Regarding missing printed values of shared flags, I don't see any problems here. Your code allows to increment flag_one several times before each printing, so the number of prints should be stable, but the values can vary depending on how many additions will be executed before each print statement (implementation is allowed to postpone any task on taskyield and execute another task which increments flag and also can be postponed, etc.).

And what problems you see with the Intel compiler? AFAIK it always captures the location of local variables in tasks.

Regards,
Andrey

Andrey,

Thanks for your quick reply! You're right in that the printed value of flag_one_cntr should not be expected to continuously increment, I guess I was reading more into the output than was there.

I got confused with the Intel compiler, too. I was investigating another effect I was observing and tried to reproduce it using the Clang compiler. Eventually that turned out to be a non-issue though.

Should I file an issue for the missing capture of private variables?

Cheers,
Joseph

Should I file an issue for the missing capture of private variables?

Yes, filing a Bugzilla would be good in order to track the issue.

For untied tasks lexically visible private variables should be captured similar to what done for shared variables in current implementation. Though indirect accesses can be a bit slower, the correctness should be more important I think.

Regards,
Andrey

Done: https://bugs.llvm.org/show_bug.cgi?id=37671

Just out of curiosity: is there a document describing the implementation of OpenMP tasks in Clang? I have so far been assuming that local variables are situated on the stack of the task (i.e., the stack of the thread for Clang) and upon a taskyield the next task is simply put into the next stack frame on top of the yielding task. I don't quite understand why the local variable has to be explicitly captured. I guess my understanding of the implementation is grossly oversimplified and I would be grateful for some insights (I have looked at the libomp code but have not attempted to dug into the compiler side of life).

Thanks,
Joseph

Just out of curiosity: is there a document describing the implementation
of OpenMP tasks in Clang?

I am not aware of any documentation on the implementation internals.

I have so far been assuming that local variables are situated on the stack
of the task (i.e., the stack of the thread for Clang) and upon a taskyield
the next task is simply put into the next stack frame on top of the yielding
task. I don't quite understand why the local variable has to be explicitly captured.

For tied tasks keeping private variables on stack is preferred, but for untied tasks this is problematic.
I haven't got your idea on "the next stack frame on top of the yielding task". The current implementation just schedules task continuation as a separate task and exits yielding task, so the stack is lost after that making previously initialized private variables inaccessible in the task continuation. If private variables are placed in heap memory and accessed indirectly, then different threads can work with the same locations when execute different parts of the same untied task.

Regards,
Andrey