False Positives in Static Code Analysis

Feb 21, 2013

Years ago, the biggest challenge in static code analysis was trying to find more and more interesting things to check. In Parasoft's original CodeWizard product back in the early 90s, we had 30-some rules based on items from Scott Meyers' book, Effective C++. It was what I like to think of as "Scared Straight" for programmers. Since then, static analysis researchers have constantly worked to push the envelope of what can be detected, including adding newer techniques like data flow analysis to expand what static analysis can do.

It seems to me that the biggest challenge today is no longer adding new weaknesses to detect (although I hope at some point we can again return to that). Now, one of the most common hurdles people run into with static code analysis is trying to make sense of the results they get. Although people do say "I wish static analysis would catch ____" (name your favorite unfindable bug), it’s far more common to hear "Wow, I have way too many results - static analysis is noisy" and "static analysis false positives are overwhelming." Some companies have gone so far as to put triage into their workflow, having recognized that the results they produce aren’t exactly what developers need.

static analysis results 300x95
Universe of static analysis results

What is a False Positive in Static Analysis?

In the context of static analysis, "false positive" seems to have two meanings. In the simplest sense, it means that the tool was incorrect in reporting that a static analysis rule was violated. Sometimes developers fall into the trap of labeling any error message they don’t like as a "false positive," but this isn’t really correct. In many cases, they simply don’t agree with the rule, they don’t understand how it applies in this situation, or they don’t think it’s important in general or in this particular case.

Pattern-Based Analysis

Pattern-based static analysis doesn’t have false positives. If the tool reports that a static analysis rule was violated when it actually was not, this indicates a bug in the rule (because the rule should not be ambiguous). If the rule doesn’t have a clear pattern to look for, it’s a bad rule.

I'm not saying that every reported rule violation indicates the presence of a defect. A violation simply means that the pattern was found, indicating a weakness in the code, a susceptibility to having a defect. 

When I look at violation, I ask myself whether or not this rule applies to my code. If it applies, I fix the code. If it doesn’t, I suppress the violation. It’s best to suppress static analysis violations in the code directly so that it’s visible to team members and you won’t end up having to review it a second time. Otherwise, you will constantly be reviewing the same violation over and over again; it's like trying to spell check but never adding your "special" words to its dictionary. The beauty of in-code suppression is that it’s independent of the static analysis engine. Anyone can look at the code and see that the code has been reviewed and that this pattern is deemed acceptable in this code.

Flow-Based Analysis

With flow-based analysis, false positives are relevant—and need to be addressed. Flow analysis cannot avoid false positives for the same reason unit testing cannot generate perfect unit test cases. The analysis has to make determinations about expected behavior of the code. Sometimes there are too many options to know what is realistic; sometimes you simply don’t have enough information about what is happening in other parts of the system.

The important thing here is that the true false positive is something that is just completely wrong. For example, assume that the static analysis tool you’re using says you’re reading a null pointer. If you look at the code and see that it’s actually impossible, then you have a false positive.

On the other hand, if you simply aren’t worried about nulls in this piece of code because they’re handled elsewhere, then the message (while not important to you) is not a false positive. The messages range from "true and important" through "true and unimportant" and "true and improbable" to "untrue". There is a lot of variation here, and each should be handled differently.

There is a common trap here as well. As in the null example above, you may believe that a null value cannot make it to this point, but the tool found a way to make it happen.  If it’s important to your application, be certain to check and possibly to protect against this.

It’s critical to understand there there is both power and weakness in flow analysis. The power of flow analysis is that it goes through the code and tries to find hot spots and find problems around the hot spots. The weakness is that it is going some number of steps around the code it’s testing, like a star pattern.

The problem is that if you start thinking you’ve cleaned all the code because your flow analysis is clean, you are fooling yourself. Really, you’ve found some errors and you should be grateful for that.

Runtime Error Detection

One great, but commonly overlooked, complement to flow analysis is runtime error detection. Runtime error detection helps you find much more complicated problems than flow analysis can detect, and you have the confidence that the condition actually occurred. Runtime error detection doesn’t have false positives in the way that static analysis does.

Your runtime rule set should closely match your static analysis rule set. The rules can find the same kinds of problems, but the runtime analysis has a massive number of paths available to it. This is because at runtime, stubs, setup, initialization, etc are not a problem. The only limit is it only checks the paths your test suite happens to execute.

Is it Worth the Time?

My approach to false positives is this: If it takes 3 days to fix a bug, it’s better to spend 20 minutes to look at a false positive...as long as I can tag it and never have to look at the same issue again. It’s a matter of viewing it in the right context. For example, say you have a problem with threads. Problems with threads are dramatically difficult to discover. If you want to find an issue related to threads, it might take you weeks to track it down. I’d prefer to write the code in such a way that problems cannot occur in the first place. In other words, I try to shift my process from detection to prevention.

Static analysis, when deployed properly, doesn’t have to be a noisy unpleasant experience. 

Static Analysis Development Testing Runtime Monitoring