Distributed Denial of Service (DDoS) attacks are very inefficient but very effective. Auditors are careful to be sure their findings are accurate so that they are not accused of being unfair to their subjects and so they maintain their reputation for impartiality. Spammers expect a small number of hits for the millions of messages they send out.
“Based on their interpretation and evaluation of that activity, the team determined that it did not warrant immediate follow up,” a spokesperson for a company that suffered a major breach explained after it was learned that the hack that caused the breach was detected by its security operations team earlier than when the breach was finally confirmed.
What all these things have in common is that they illustrate the major weakness in the very idea of the “false positive.” I’ve written about this before in a blog post entitled “I’m certain too much certainty is certain failure.” I’ve pointed out that the story of the boy who cried wolf has something to teach us about this as well. So this may appear redundant. It may appear inefficient. But if I am arguing for the virtues of redundancy and inefficiency, then why not repeat myself? For those who insist that each effort be unique, I assure you that I am taking a different angle on this than the last two discussions.
As monitoring and detection tools get better and more comprehensive, the number of times that observations will be labeled “false positives” will increase. So it is important to refine what we mean by it. The attraction of identifying “false positives” can be broken down into some simple attitudes and motivations:
- We don’t have enough resources to run down “noise”
- The vendor has the lowest false positive rate in the industry
- The likelihood of a false positive being a true positive is so low that the risk, calculated as impact and likelihood combined, is too low to bother with (did I mention we have limited resources?)
- How could that possibly be an attack
- Don’t bother me with that nonsense, my team just identified a host reaching out to a C2 server in Malwaristan and cleaned it before we had any data loss. The instrument is state of the art and sharp as ever.
We can dismiss number 5 above as hubris, the kind of arrogance that we should all know to avoid. The others are a bit harder to dismantle.
Limited resources are the theme that ties all of the rest together. The first four above can be summed up therefore as follows:
“The Security Department, like all areas of the Enterprise, does not have unlimited resources. We therefore need to prioritize and use our resources most efficiently. So we have a state of the art tool that has the lowest false positive rate in the industry and we go after the highest risks; that is the ones with the greatest likelihood of being a true attack.”
How could a rational manager of the security efforts of their organization say anything else? If we consider the same statement from the attacker’s point of view, it looks different:
“The Security Department, like all areas of the Enterprise, will not spend unlimited resources resisting my attacks. And they will make mistakes. In the event they do not deploy their state of the art tool correctly, I will go at it head on. In the event they forget about old attacks when they update their tool, I will keep trying the old attacks. In the event that they have not thought of variation x of my attack, I will send variations x and x+1. And of course I will try exploit every known vulnerability that I can in the event that I find one they have not protected themselves against. I don’t care about the likelihood of success, because no matter how many times I fail, once I’ve succeeded my success rate is 100%.”
The mismatch between these two attitudes shows how efficiency by itself is a vulnerability. The way to correct for that is to allow for a certain amount of inefficiency in your processes and procedures.
This is where the idea of a “false positive” needs to be re-examined. Rather than label something that appears harmless as a “false positive,” only label events that are explained as beneficial or at least well understood as “false positives.”
A simplified example: suppose you detect a system administrator who is supposed to be on vacation logging on to a database server in the middle of the night (this assumes you have that level of detection that you know when your system administrators are on vacation and watching for their logons).
No queries are run against the database and the event has not recurred in the next three days. The temptation is to dismiss it as a “false positive.” Some orphan process that generated the log entry and then died. Maybe the SA is just checking something out while they’re on vacation. Not worth looking for or bugging them on vacation. It would take a lot of resources to run down, require getting attention from the server engineers who already feel like security is keeping them from doing their “real work” and really, what’s the likelihood that that was an attack.
It probably isn’t an attack. But it’s something you can’t explain. And that’s the first principle in re-examining how you look at “false positives:” at any given moment, a network only consists of what’s attached to it and communicating with it, a closed system at least for that moment. So everything that happens on that network should be explainable.
But it is unrealistic to assume that it will be easy or possible to explain everything that goes on in the network. In spite of this, the desire to “explain away” anomalies is probably what caused the security team described in the first paragraph of this article to decide that what they saw did not “warrant immediate follow up” and thereby miss a major attack.
The second principle of “false positives:” do not apply a generalized description of impact to dictate how you measure likelihood. In other words, do not consider likelihood until you have gotten some detail around what likelihood you are referring to. If every analysis is aimed at answering the question “is this a major attack and/or breach” then the answer will be “no” a lot and you will miss things. You need to define impact at a smaller scale and measure a more granular likelihood.
Too many analyses dismiss impact as “well, the worst that can happen is a complete system failure or losing all our data so there’s no point thinking about that; what is the likelihood of that happening?” This is very efficient and risk based, of course, but it misses the point that your adversary is not efficient.
So, spend some time on “what if” analyses. What would the impact be if this behavior were part of an attack, a link in a kill chain, a small part of a breach? When you define impact like that, then the likelihood of those impacts goes way up and a lot more “false positives” become worth attention.
The final principle of “false positives:” efficiency cannot be the only goal of your detection process. Defined as maximum output for amount of input, efficiency is a great way to measure an engine or a manufacturing process. It is appropriate for auditors to think in terms of being efficient in this way and never listing something that turns out to be an inaccurate finding.
But the idea of “defect” found in process improvement methodologies like Six Sigma have no place in measuring the overall effectiveness of a security program or a generalized control structure.
With the exception of risk assessment and control audits (which are crucial activities), identifying findings, like the kind in the output of an audit, are not the right way to look for things in on-going security operations.
Making the most out of limited resources is always a challenge. Once you’ve subtracted what is explicitly required to be done by laws and regulations, then there may not be many hours left in the day to do much else. But before you focus on efficiency, think about effectiveness. Before you take pride in how few “false positives” you are detecting or how quickly you are dismissing them, consider that exploits are often inefficient and opportunistic.