Q&A: How Google Implements Code Coverage at Massive Scale

Post Syndicated from Rina Diane Caballar original https://spectrum.ieee.org/tech-talk/computing/software/qa-how-google-implements-code-coverage-at-massive-scale

In software development, a common metric called code coverage measures the percentage of a system’s code that is covered by tests performed prior to deployment. Code coverage is typically measured automatically by a separate software program, or it can be invoked manually from the command line for certain code coverage tools. The results show exactly which lines of code were executed when running a test suite, and could reveal which lines may need further testing. 

Ideally, software development teams aim for 100 percent code coverage. But in reality, this rarely happens because of the different paths a certain code block could take, or the various edge cases that should (or shouldn’t) be considered based on system requirements.

Measuring code coverage has become common practice for software development and testing teams, but the question of whether this practice actually improves code quality is still up for debate.

Some argue that developers might focus on quantity rather than quality, creating tests just to satisfy the code coverage percentage instead of tests that are robust enough to identify high-risk or critical areas. Others raise concerns about its cost-effectiveness—it takes valuable developer time to review the results and doesn’t necessarily improve test quality. 

For a large organization such as Google—with a code base of one billion lines of code receiving tens of thousands of commits per day and supporting seven programming languages—measuring code coverage can be especially difficult.

A recent study led by Google AI researchers Marko Ivanković and Goran Petrović provides a behind-the-scenes look at the tech giant’s code coverage infrastructure, which consists of four core layers. The bottom layer is a combination of existing code coverage libraries for each programming language, while the middle layers automate and integrate code coverage into the company’s development and build workflows. The top layer deals with visualizing code coverage information using code editors and other custom tools.