Please check out our recent ISCA publication that introduces a new performance profiling technique for analyzing parallel programs along with an open source tool to collect the profiles. The tool is written into LLVM’s LTO. Could someone please add the paper to http://llvm.org/pubs/ ?
Here is the citation:
Harmony: Collection and Analysis of Parallel Block Vectors, M. Kambadur, K. Tang, M. A. Kim, In International Symposium on Computer Architecture (ISCA), 2012.
Our project page with the downloadable tool:
The paper abstract:
Efﬁcient execution of well-parallelized applications is central to performance in the multicore era. Program analysis tools support the hardware and software sides of this effort by exposing relevant features of multithreaded applications. This paper describes parallel block vectors, which uncover previously unseen characteristics of parallel programs. Parallel block vectors provide block execution proﬁles per concurrency phase (e.g., the block execution proﬁle of all serial regions of a program). This information provides a direct and ﬁne-grained mapping between an application’s runtime parallel phases and the static code that makes up those phases. This paper also demonstrates how to collect parallel block vectors with minimal application perturbation using Harmony. Harmony is an instrumentation pass for the LLVM compiler that introduces just 16-21% overhead on average across eight Parsec benchmarks. We apply parallel block vectors to uncover several novel insights about parallel applications with direct consequences for architectural design. First, that the serial and parallel phases of execution used in Amdahl’s Law are often composed of many of the same basic blocks. Second, that program features, such as instruction mix, vary based on the degree of parallelism, with serial phases in particular displaying different instruction mixes from the program as a whole. Third, that dynamic execution frequencies do not necessarily correlate with a block’s parallelism.