Default value of OMP_PROC_BIND=false

Hi all,

I would like to initiate a discussion regarding the default thread affinity / binding behavior in the LLVM OpenMP runtime , specifically around the default value of OMP_PROC_BIND=false.

While evaluating OpenMP performance with Flang on modern multi-core linux system supporting affinity binding, we have observed that enabling thread affinity binding (e.g., setting’OMP_PROC_BIND=close’ and ‘OMP_PLACES=cores’) often results in significant speedup – e.g., the SPEComp2017 / ROMS2017_OMP benchmark shows a ~2X speedup.

The Cray compiler uses a default binding policy that distributes threads across cores, automatically capturing the performance advantages mentioned above (see “OMP_PROC_BIND” in https://cpe.ext.hpe.com/docs/latest/cce/man7/intro_openmp.7.html.

Has there been prior discussion on whether a simple topology-aware placement default would be preferable (especially for HPC workloads), compared with the current default of OMP_PROC_BIND=false? It can help users to avoid performance degradation in common cases (such as when the number of requested threads is less than or equal to the number of places) and skip binding for most problematic cases (where binding is more likely to degrade performance).