Friday, November 01, 2013

Why it's not about containers "winning" or "losing"

About a month ago, I cranked out a long blog post about containers and how they came about. In a nutshell, containers virtualize an OS; the applications running in each container believe that they have full, unshared access to their very own copy of that OS. This is analogous to what virtual machines do when they virtualize at a lower level, the hardware. In the case of containers, it’s the OS that does the virtualization and maintains the illusion.

I dusted off some old research notes I had written and was writing about containers again because they have become something of a hot topic, especially in the context of platform-as-a-service (PaaS). This hotness in turn has led to, alas, predictable stories and claims that containers will kill the currently dominant approach to workload separation, namely, hardware-based virtualization like Linux' KVM or VMware's ESX.

That won't happen. Though containers are certainly an important technology. Let me take you through, in the form of a Q&A, why I believe both those statements to be true. (The original post and the research notes it links to give a lot more background, which I won't repeat here.)

What advantages do containers have over virtual machines in particular?

Probably the biggest advantage for most purposes is density. With hardware-based virtualization (henceforth just "virtualization"), each guest instance runs a full operating system copy. With OS virtualization (aka "containers"), there's just one operating system instance for the whole physical server; only a modest subset of the host OS is duplicated for the individual containers. 

Furthermore, containers have a single kernel with direct visibility into all the workloads running on the entire server. This eliminates some overhead and indirection associated with the fact that the contents of the guests in a virtualized server are a level of abstraction away from the controlling hypervisor (which is itself an operating system). Suffice it to say that, all things being equal, containers can provide significantly higher guest densities on a given piece of hardware. 

Starting up new instances and resizing instances is much faster as well. Again, with containers, you're not starting up or reconfiguring an entire operating system copy; you're effectively just fiddling with resource groups within an OS.

So, if containers are so gosh darn great, why aren't they everywhere?

Because they didn't do as good a job of addressing enterprise problems c. 2001 as the virtualization alternative did. (They did and continue to do a very good job of solving hosting provider problems which is one of the reasons that containers are so widely used in that environment.

It is indeed true that everyone, including enterprises, was desperate to "do more with less" as the cliche goes in the post-dot-com hangover. So any product that would allow multiple workloads to run on a single physical server was very welcome. (Windows workloads such as Exchange were notorious for demanding that they have an entire server to themselves even when they used just a small fraction of the capacity. 15 percent utilization was considered good pre-virtualization.)

But, with enterprises in particular, these weren't cookie-cutter workloads. Multiple OSs, lots of different versions, lots of different libraries and other customizations. Not every workload was a unique little flower. But lots of them were. And indeed, some early virtualization deployments were as much about supporting multiple operating system versions on a single machine as they were about consolidation as such.

As a result, virtualization was a better match for enterprise workloads than containers. And the density thing wasn't that big a deal. Getting from 15 percent to, say, 50 percent utilization looked like a big win to most enterprises and getting to the next level of optimization wasn't nearly the priority it was for service providers. (And enhancements to both software and the introduction of hardware assists by the chip makers incrementally improved performance in any case.)

Why didn't enterprises just adopt both containers and virtualization?

This gets into speculation but organizations have a limited bandwidth to adopt new technologies. And virtualization was a big consumer of that bandwidth throughout much of the 2000s (or naughts or whatever we're calling that decade). Indeed, it remains a big consumer of enterprise IT bandwidth today. And, for the reasons stated above, containers just didn't offer that much of an incremental win for enterprises.

Virtualization also added a lot of services of interest to enterprises that made use of the hypervisor (live migration, disaster recovery, storage snapshotting, and so forth) that are arguably a less natural fit for container-based approaches.

So what changed? Why are we talking about containers again?

The cloud changed. By which I mean two things in particular.

The first is that, as I wrote in my previous piece, cloud-style workloads tend towards scale-out, stateless, and loosely coupled. They also tend to run on more homogeneous environments (alongside existing applications under hybrid cloud management) and use languages (Java, Python, Ruby, etc.) that are largely abstracted from the underlying operating system. You typically don't have or want to have a highly disparate set of underlying OS images because that makes management harder. You also tend to have a large number of smaller and shorter-lived application instances. These are all a good match for containers.

The second is that PaaS amplifies all this in that it explicitly  abstracts away the underlying infrastructure and enables the rapid creation and deployment of applications with auto-scaling. This is a great match for containers, both because of the high densities and the rapid resource reallocation they enable (and, indeed, require). 

In short, it's not so much the case that the containers concept has changed but, rather, that the nature of workloads is changing to be better aligned with what containers do well.( There's also been significant open source innovation in the containers space as there has also been with KVM and oVirt in virtualization.)

So why doesn't virtualization go away?

The flip answer is "because nothing goes away." 

The less flip answer is "horses for courses." In principle, containers could be adapted to do (almost) anything subject to running on a common kernel. But I suspect that, if you have a requirement for Infrastructure-as-a-Service with a fair bit of heterogeneity in the guests, virtualization will continue to be just a more natural fit. The exact manner in which IaaS and PaaS evolve and converge over time will certainly change the details of how we do multi-tenancy.  History suggests strongly though that we'll continue to manage workloads in multiple ways. 

Everything old is new again

Oftentimes it seems that there are few genuinely new concepts in the IT biz. Rather, the crank turns again and this time the opportunities and the pitfalls are a bit better understood. Or there's a better match between the technology and current market needs. Or the necessary user and development ecosystem come together better. I'd argue all of these are true with containers.

No comments: