Code coverage is the percentage of code that is executed by automated tests. Code coverage tells us which lines of a project have been executed and which have not. The LLVM Compiler Infrastructure has upwards of 100,000 tests (unit, regression, and test suite) that can be executed with a single make check-all command, and a very large codebase consisting of millions of lines and multiple subprojects such as Clang, MLIR, OpenMP, Polly etc.
It is therefore natural that some parts of the compiler are better checked by testcases than others. While code coverage cannot tell us if a given test suite is adequate or not; what it can tell us is if the test suite covers all areas of the codebase. Code coverage metrics also give information about line and branch execution counts which in turn can be used for profile guided optimizations. Code coverage can also tell us about the amount of dead code (unreachable code) in LLVM and such code can be removed. Finally, a codebase wide audit may throw up surprising results and lead to some research papers discussing these results.
In this proposal I describe how we can go about implementing code coverage for the LLVM codebase, how to interface it with tools like LCOV, and how to integrate GitHub actions such that the coverage percentage for each subproject is displayed in the GitHub repository.
Proposal has been submitted along with resume. You can find it here: Ashutosh_Pandey_GSoC_LLVM.pdf - Google Drive