Code Coverage Density and Test Overlap
The idea behind this principle is that test cases not only break because of bugs in the code, but more frequently because of changes in the specification. Code behavior that was previously assumed to be correct may become incorrect or insufficient due to new requirements. If code is modified to reflect the new requirement, the test cases that assert the old behavior may break and need to be updated. If you are following a TDD approach, the updating of the test cases should actually happen before the tested code is changed. In a test suite with a high level of test overlap, one small change in the specification (and implementation) may require a large number of test cases to be updated. In such a situation, the test cases were obviously overlapping in some code detail that was subject to change. Such occurrences are highly undesirable because they significantly increase the test suite’s maintenance cost.
We previously identified four different code paths for the tested method shown in Listing 2. If you take a closer look at the code logic, you will notice that only the two least significant bits of the input are evaluated. The remaining bits of the input have no influence on the code path that is taken. So, effectively, testing the method with an input value of 4 will use the same code path that is taken for an input value of 0. In the same fashion, the input values 1, 5, 9, 13, ... will all cause the same code path to be taken. For the purposes of path coverage, the method from Listing 2 has four equivalence classes, which are summarized in Table 2.
Table 2: Equivalence classes for path coverage of Listing 2
To achieve complete path coverage with minimal test overlap, it is sufficient to pick one input value from each equivalence class. There is nothing to be gained in terms of coverage if multiple test cases use input values from the same equivalence class. Equivalence classes vary according to the coverage criteria. For example, in terms of statement coverage, the input values 0 and 2 are both in the equivalence class for covering the return null statement of the sample method, but they would be in different equivalence classes when looking at path coverage.
Identifying equivalence classes for test inputs is a useful tool for minimizing test overlap, but again, trouble is looming ahead when we move towards full regression coverage. If a test suite achieves full regression coverage for a particular method, this implies that the test suite forms a complete specification of that method. Any change in the method's behavior—no matter how minor—would result in a test failure. For the sample method from Listing 2, we already determined that test cases with all 256 possible input values would be necessary. How many equivalence classes for test inputs would there be in terms of full regression coverage? Unfortunately, the answer to this question is “256.”
For full regression coverage, no input value is equivalent to any other input value. Full regression coverage means that the behavior of a method is completely “locked in.” For example, even though the input values 0 and 4 are in the same equivalence class for the purpose of path coverage, they are in equivalence classes of their own for full regression coverage. If they were in the same equivalence class, this would imply that just picking one of the values (for instance, 4) would still satisfy the criterion of full regression coverage. However, in that case, it would be possible to implement the tested method in such a way that it works properly for the input of 4 but not for the input of 0 (for example, by adding a check that deliberately returns a wrong result if the input was 0). Therefore, 0 and 4 cannot be in the same equivalence class for full regression coverage.
Code Coverage Density
Again, when aiming for full regression coverage, the trick of picking only one input from each equivalence class can no longer be used for minimizing test overlap. Full regression coverage will always cause additional overlap. What other principles can be used to mitigate the negative effects of test overlap?
A frequent problem is common code that is directly or indirectly executed by a large number of test cases. Specification changes affecting that common code are likely to cause a large number of failures. The goal is to avoid such concentrations and rewrite the tests in a way that limits the amount of commonly-executed code. Coverage density is a helpful metric that can be used to create test suites that execute the tested code in a more evenly-distributed fashion. Coverage density extends the dichotomy of “covered” versus “not covered” to a numeric metric that also counts how often a branch or path is executed. For example, instead of simply getting a yes or no answer as to whether a particular line was covered, coverage density would also tell you that the line was executed exactly 500 times. Coverage density can be applied to any coverage criterion, but most commonly it is offered in conjunction with statement or branch coverage.
Again, visualization of path coverage densities is just as problematic as visualizing simple “yes/no” path coverage. A common way of visualizing coverage density is to add colored markers with different brightness in the source code editor. For example, a light shade of green might indicate that a piece of code is covered by a few test cases, but an extremely dark shade of green would be a warning that there is a large concentration of test cases that all execute the same particular piece of code. Such warning indicators should ideally prompt a refactoring that moves the common code out of the code path.
Image credit: Jen and a Camera