Imagine for a moment a villain (or villains) trapped powerless for years. After being reanimated by an unsuspecting populace, the newly-released evil goes forth to wreak havoc until it’s only once again conquered through tremendous effort and sacrifice.
Sound familiar? It should. What’s described above is a routinely-occurring fictional trope (often referred to as “Sealed Evil in a Can”) that dates all the back to ancient Greece when Pandora opened her eponymous box to release the previously-dormant evils of the world.
This trope has been used time and time again since then — in books (for example Voldemort in Harry Potter), to film (General Zod in Superman II), to television (Khan Singh in Star Trek). It also happens to be a pretty good analogy for a security issue that happens in many virtualized data centers – with potentially staggering consequences that can occur when it does.
Dormant Images, Dormant Problems
To see the problem in action, consider a virtualized environment in the active throes of a security event. For example, imagine a situation where malware is actively propagating unchecked throughout the network or where attackers have deployed rootkits to a subset of hosts (with Command & Control (C&C) channels leading outside the perimeter). What happens if a running virtual host in that context is serialized and stored only to be launched later: for example if a periodically-used slice is no longer actively needed or because a snapshot is captured at that point in time?
If you guessed “sealed evil in a can” you’re right on the money. In this situation, compromised hosts are pulled aside from the population of running hosts where they become a metaphorical “land mine” for the future. It’s only a matter of time before a snapshot taken in this way can resurface to once again cause problems.
Under the best conditions, a host like this could come back online only to start spewing long-removed malware over the network where, since you’ve remediated the environment from this malware already once, you have mechanisms in place to protect against it (or at very least know how to combat it). Worst case though, a host like this could come back with live remote control malware installed, reestablishing an adversary’s command and control pathway and allow that foothold to be used once again as a beachhead for launching attacks against other internal hosts.
As you can see, the outcome for already-compromised images is not pretty. But it bears pointing out that even non-compromised images can represent a security challenge when they’re stale or out of date. After all, think about how many times timeliness is a factor in routine security hygiene activities: keeping OS and application software at a reasonable patch level, keeping antivirus (AV) signatures current, making sure security-relevant system parameters are synchronized, rolling off end of life versions of OS and application software, etc. For example, many shops are right now struggling with the April 8 Windows XP end of life – but how about 6 months from now when a previously-dormant XP image suddenly “pops to life” in the middle of an IaaS environment or VDC?
The technical security challenges associated with this are probably obvious: even if patching and signature updates begin immediately, there’s still a window of time where the system is at a less than optimal configuration. But don’t discount other headaches that can arise as well: for example, should an image of this type be the subject of (or a sample in) someone’s compliance audit (with failing results because of out of date versions or missing patches). The more time has elapsed between when the image was captured and when it comes back online, the more pronounced these and other problems become.
The point is, if you’re using virtualization (and who isn’t), unless you’ve specifically evaluated scenarios of this type and prepared a mechanism to deal with them, chances are good you’re sitting on a few of these problematic images yourself — either of the (worst case) already compromised variety or the (better but still requiring diligence) “stale and aging” variety.
Getting a Handle
All this begs the question of what you can do about it. There are a few different approaches to consider, and in many cases, some combination of them is most effective.
First and foremost, controlling sprawl is a useful measure to begin with. The fewer overall images there are in the first place (and the better the organization you have of them), the harder it is for individual images to “fall through the cracks.”
Meaning, by reducing the number of unknown, underused, and uncontrolled images you have, you help shine light on those that you do know about. So start by evaluating the state of your VM inventories. How current are they? Do they provide important information like patch level and application software version information? Are they correlated with the last-accessed times coming from the hypervisor? Of course, it goes without saying that if you don’t have an inventory at all, now might be a good time to consider fixing that.
Second, consider technical measures you already have access to that might be of help. Hypervisors often have an “expiration” feature that can be used to specify situations when an image can no longer be accessed. You could use this feature to require that images be periodically updated to allow it to continue functioning, or you could write scripts to automatically “refresh” the expiration every time it’s spun up. For example, not spinning it up for x months/years would cause it to expire unless someone takes explicit action.
Lastly, as the use of virtualization continues to expand, be cognizant that more and better tools will continue to emerge to help practitioners deal with this situation. Over the last few years, technical products have emerged that are designed to work directly with the hypervisor to keep images up to snuff. For example by applying patches, updating malware scanning signatures, or even just to warn about a potential issue in an offline image. There are also technologies that can be used to “quarantine” newly-launched VM’s until they demonstrate/remediate their security health when coming online. Be alert in evaluating these technologies though that VM images can migrate both between hypervisors and environments, so a product in one environment (e.g. your VDC) doesn’t specifically guarantee a solution in others (e.g. the IaaS your developer happens to be using under the radar).
Ed Moyle is Director of Emerging Business and Technology for ISACA. Prior to joining ISACA, Ed was a founding partner of the analyst firm Security Curve. In his more than 15 years in information security, Ed has held numerous practitioner and analyst positions including senior manager with CTG’s global security practice, vice president and information security officer for Merrill Lynch Investment Managers, and senior security analyst with Trintech. Ed is co-author of Cryptographic Libraries for Developers and a frequent contributor to the Information Security industry as author, public speaker, and analyst.