All too often, I see organizations releasing software in a manner that is about as safe as playing a game of Russian Roulette – gambling with their customer’s safety, private data, and security, not to mention reliability. They’re also gambling with their company’s reputation and bottom line. The cost of a software failure can be felt in different ways – in the stock price of a public company, for instance, or in a small company, it can mean going out-of-business. IEEE posted an excellent list of public failures a couple of years ago and you can be sure software is still failing.
The reason I like this somewhat scary analogy is that all-too-often I hear people say things like “That software has been out for a long time and hasn’t had problems” or “We’ve always done it this way and it works” – but of course this is a bad way to plan. A company focused on software engineering is looking for ways to build and release better software that fails less. This means proactively planning for success by doing the right thing, even if doing the wrong thing has worked out so far.
Researchers at Harvard have found that something like one-half of IT software projects fail. There are lots of numbers from others and that estimate isn’t the highest, so let’s take it for a minute. This is like playing Russian roulette with 3 bullets in the chamber – a 50-50 chance of failure. I don’t like those odds and certainly wouldn’t gamble the future of my company on it.
Let’s look at some of the nasty gambles people take every day when they release their software. Bullets in their roulette gun, if you will:
Old Known Bugs
We all know that we release software with bugs because flawless software would take forever to make. But that’s no excuse for never fixing the bugs we know about. Much has been said about technical debt in very abstract terms, but this is a real practical measure of debt in your software. If there’s a bug there and you’re not fixing it, you’d better have a pretty good reason why you think it doesn’t matter. Plan some time each release to not just add new features but to generally make things better. Take time to polish your software.
New Bugs in Old Code
Old code is tricky. I’ve seen companies that have a policy of “clean it up if you’re fixing it anyway” and others where the rule is “only touch what you must, and only when there is a field-reported bug.” Both are interesting policies, but what’s most important is to understand the risk involved when you find a new bug in old code. I was working with a hardware vendor and they were struggling with how to handle the output from a new tool on some legacy code. In their case, it was an ambiguous scope issue which still leaves me wondering how their compiler could allow such madness. They were bumping into a conflict – on the one hand they had this new tool, and on the other they were not supposed to touch old code unless there was a bug report from the field.
Understanding what you plan to do with your legacy code is important, as well as fully understanding its risk to your organization. If the code is critical, the age might not matter as much as you think. If the code is being deprecated, perhaps you’re wasting time testing things you don’t intend to fix.
Security as Part of Testing Instead of Development
It’s depressingly common for organizations to overlook security. In some cases, they think they can test security into their application (they can’t), while in other cases they think security issues won’t apply to their code (they will). In order to get out of this mess of constant security failures, organizations must harden code with solid AppSec best practices, as codified in a static analysis tool that does more than just flow analysis. If you don’t know where to start, it honestly wouldn’t hurt to simply take the MISRA rules and start following them for any code you write starting today.
The Ever-Failing, Ever-Passing Test Suite
An extremely common and dangerous practice I see is having a large test suite and relying on a simple metric of the number of tests that passed. For example, you commonly have an 80% pass rate, so you assume this will be fine. The problem is that there is no way to know if the 80% that passed today is the same as the 80% that passed yesterday. There could easily be a new real failure hiding in that 80% (there is) because something else got fixed, leaving the number balanced. Keep your test suite clean or it’s not telling you much. I’d seriously question the value of a test failure you feel comfortable ignoring. Why not just skip that test – it’s a more honest and useful approach.
Releasing on the Calendar
Probably still the most common crucial release criteria is the calendar. People picked a date and now they’re going to release because that date arrived. Granted, there are external issues that influence your release schedule, but just because a date arrived doesn’t mean it’s ok to push crummy software onto your unsuspecting soon-to-be-former customers. Release when it’s ready/safe/stable/good. If the calendar is a fixed constraint, make sure your process will get you there on time.
How many times can you release like this before you pay the price? In our analogy of Russian Roulette, at most six, maybe as little as one. Let’s do our best to make sure we’re going to deliver the best software with the best chance of success.