How the runtime guard from data racing when update those global variables if there are multiple root threads (user threads) each starts a parallel region?
For example, for __kmp_all_nth ++ in __kmp_allocate_thread which is called with a forkjoin region as indicated from the function comments. Is this forkjoin region per root thread? if so it seems data racing of updating the global variables. If there is global protection of updating the global variables, it seems a bottleneck for performance. I guess there could be other functions that are in the similar situations.