Working with legacy code and 3 steps to update it
When you're dealing with legacy code, you need a sustainable way to manage change. Working with legacy code can be a barrier to Agile and DevOps, but you can conquer the challenge by leveraging appropriate technologies.
What is legacy code?
Many people use the term “legacy code” to simply mean old code. But “old” and “legacy” mean different things to different people. Here, I'm using the definition of legacy code as any existing code that the team has limited knowledge about.
Knowledge about the code could be incomplete for several reasons, such as:
- The team acquired a project from another part of the organization.
- The original author left the team and took knowledge about the code with him or her.
- The functionality delivered by the code is no longer a business priority and has remained unchanged, resulting in forgotten details about the code.
In any case, let's be clear: legacy code is the rule, not the exception. Much of the software infrastructure in the world today runs on legacy code. The question is, then, how do we mitigate the risks associated with legacy code when we need to make a change? In this blog post, I'll give you some solutions for working effectively with legacy code.
Legacy code is a barrier to Agile and DevOps
The problem with legacy code isn’t its age — it’s that you don’t understand how changing it can affect existing functionality. The knowledge gaps associated with legacy code can become a barrier if you are transitioning to a new development methodology, such as agile or DevOps.
Agile and DevOps have become the dominant methodologies for creating software because they help teams quickly iterate and release applications as soon as minimal marketable features are ready. Short and frequent development cycles are the hallmark of iterative development methodologies, but these approaches don’t leave room for mitigating potentially problematic outcomes when you're dealing with legacy code. Trying to rapidly iterate on code you don’t understand is likely to introduce new issues.
The reality is that these techniques are much easier to apply when starting new projects. For projects that have been around for a while, teams usually work with systems that involve legacy code. Developers may not know how the existing code base works but must still fix defects or extend functionality without introducing new problems. And even superficial or seemingly small changes can have a significant impact on the application.
Why technical debt matters (or doesn’t)
The software development game is about constantly balancing software quality, time-to-market, and cost of development. In most cases, we make trade-offs to achieve business goals based on what’s happening in the market. Over time, we rack up technical debt.
What is technical debt?
Technical debt is the cost of mitigating the risk associated with implementing an imperfect solution in order to achieve your time-to-market or cost-of-development goal. (For example, forgoing upgrades to a library because doing so would have delayed the release represents technical debt in the form of time it will take to update the library later.)
In many cases, inherited legacy codebases are heavy with technical debt in the form of poor testability, low coverage, overly complex code, etc. Technical debt can weigh down the application of newer software development practices because teams constantly face the question of whether to address the debt.
Should you be concerned about technical debt?
To put it in perspective, every application has technical debt, and many organizations can invest significant resources paying it down without realizing any substantive benefits. At the end of the day, the decision to invest resources into paying off technical debt depends on which parts of your application you plan on changing. But you won’t know unless you start taking some additional steps (I'll get into that momentarily).
Boosting coverage on legacy code
When organizations inherit a legacy codebase to deal with, they often adopt a coverage policy that helps them create a baseline for new development. The legacy code is already in the field and supposedly working, so the focus is on ensuring the quality of new code. In order to comply with the coverage policy, many organizations drive up the coverage of the legacy code by any means necessary. Low coverage drags down the overall metrics, which makes it difficult to accurately measure coverage for your new development. If you know that you're working with legacy code that's well-covered, the overall project metrics can indicate if new development is moving in the right direction.
The underlying rationale for this strategy is sound, but the problem is that organizations blindly generate tests in order to comply with their coverage policy. As a result, the project is loaded with unmaintainable tests that provide a false sense of software quality. If you do not plan on touching the code or are not concerned with test maintainability or quality, then you can use one of the several test generation tools on the market that can help you achieve this goal.
Create meaningful, maintainable Java tests
To be clear, I'm not advocating for blindly generating tests. Instead, use a tool that helps you rapidly create meaningful tests to cover your Java legacy code. Parasoft Jtest provides a point-and-click interface that gives developers an automatic test creation process based on the existing code. The resulting regression suite is meaningful, maintainable, and extensible.
The 3 steps for updating your legacy code
Rather than trying to work on the macro level, create a baseline and narrow down the scope of your quality activities to the areas of code affected by your planned changes. After taking measurements to assess the scope and state of the code, you should create tests that capture current behavior so that the team can understand how the changes may affect existing functionality.
You can then leverage a range of technologies that help you collect analytics as you refactor legacy code and ensure that your investment on code changes improve safety, security, and reliability of legacy systems.
1. Define your scope
Understanding how changes affect system behavior requires at least one data point. Begin by choosing a baseline build and start tracking metrics moving forward. Set your scope and look at three characteristics of the legacy code:
- How many static analysis violations do you have and how severe are they? You need to understand how many potential defects are built into the code.
- What is your current test coverage? Low coverage represents potential risk associated with change.
- How much cleanup will be necessary? Additional metrics, such as complexity, comments, etc., can provide perspective about the state of the software quality.
Parasoft provides a powerful analytics platform for capturing, correlating, and reporting code analysis violations, test results, coverage analysis, and other software quality data. The platform goes beyond static reporting — it also applies additional analysis to help you identify parts of the application affected by change.
Leveraging the concept of resource groups, you can identify a specific set of files or directories and scope coverage, static analysis violations, and metrics data to those specific resources. This information helps you create a baseline for areas of the codebase before making changes within those parts of the code.
2. Capture behavior
Armed with an initial data point, the next step is to start capturing the current behavior of the system by creating tests. Building up a high-quality regression suite not only captures existing behavior, it also drives coverage, which serves as a safety net for making sure that changes don’t break functionality.
Parasoft Jtest is ideally suited to this task because it enables you to create a baseline of JUnit tests in bulk, including assertions, based on the existing code. Jtest also includes the ability to create tests that directly access private methods for when the legacy code was not originally written with testability in mind.
It’s better to expand coverage with meaningful tests. During coverage gap analysis, Jtest identifies existing tests that can be cloned and mutated to reach untested parts of the code. A lot of work went into creating those existing tests, and the clone and mutate functionality in Jtest increases the return on your test creation investment.
You should strive for the highest level of coverage possible, but in most cases achieving 100% coverage on the entire codebase is not practical. We’ll discuss an additional technique you can apply as a safety net to ensure coverage on modified code a little later.
When you have good coverage from a functional perspective, you can start making changes and modifying the tests as you go.
3. Improve the isolated legacy code
With the behavior of the system captured, you can start fixing violations, addressing PRs, or applying the changes you want to focus on with minimal risk of breaking existing functionality. Parasoft can help you manage the existing technical debt and put data, such as static analysis violations, into appropriate workflows where they can be easily reprioritized, suppressed, or resolved to improve the overall quality of the application. Changes from build-to-build should also be monitored as part of the ongoing process, to ensure that the software quality doesn’t take a turn for the worse.
The best time to address the technical debt in the legacy code is as you make changes. The reported data should be included in the overall statistical information about the project. The technical debt may not have an immediate impact on the application, but you should apply best practices for containing and managing it systematically. Refactoring legacy code any time you need to make changes helps you incrementally reduce the debt.
Ensuring coverage on modified code
This process helps ensure that the scope of changes don’t negatively impact existing functionality, but you also need to ensure that the team follows good practices moving forward. Continuing to maintain a high level of coverage and to write or update tests as the code evolves requires buy-in on a cultural level. This is why we made technology that can automatically notify you when modified code (i.e., new or changed code) fails to comply with the coverage policy.
By analyzing the changes between specified baseline builds, you can focus on, and monitor, change across the whole codebase to ensure nothing slips through the cracks. Achieving 100% coverage across the entire codebase is not practical, but by monitoring the coverage of the modified code, the team can focus on the parts of the code that are actively being worked on and have confidence that all changes are tested.
To conclude, the world’s software runs on code that has been passed on from team to team. Dealing with legacy code is an every day reality. The gaps in knowledge about the code represent potential risks as developers make changes to maintain or extend the functionality, and the processes and technologies used here should help you get the confidence to take on just about any codebase thrust upon your teams.
Parasoft's VP of Development, Igor is responsible for technical strategy, architecture, and development of Parasoft products. Igor brings over 20 years of experience in leading engineering teams, with a specialization in establishing and promoting the best agile practices in software development environments.