Connections: April 2015

Wednesday, April 22, 2015

Podcast: Configuration management with Red Hat's James Shubin

James Shubin is a configuration management architect at Red Hat. In this podcast, James talks about the state of configuration management and where it's going. He discusses some of the different approaches among major projects in the (broadly speaking) configuration management space including Puppet, Chef, and Ansible and offers his perspective on how they should be viewed relative to each other.

Links:

Listen to MP3 (0:15:35)
Listen to OGG (0:15:35)

[Transcript]

Gordon Haff: Hi, everyone. This is Gordon Haff with Red Hat. Welcome to another edition of the "Cloudy Chat Podcast." Today, I'm joined by James Shubin, who writes "The Technical Blog of James, ttboj.wordpress.com." He goes by @purpleidea on Twitter and other places. He's a config management architect at Red Hat.

My guess, we're going to talk about configuration management today. Welcome, James.

James Shubin: Hi, Gordon. Thanks for having me.

Gordon: Great to have you here. Let me start off with ‑‑ to set the stage here, from your perspective, what is this config management thing?

James: Config management. Let me run you through the quick, five‑second basics that I get everyone on the same page so at least we can use the same words. The three separations that I like to make is there is something that is provisioning, there's something that is config management, and then there is something that is orchestration.

Sometimes, we blur the lines. Just to set those straight from day one, provisioning is everything that happens that gets your machine up and running. Basically, a kickstart or a vagrant up or something like that. After that, everything that happens after that is config management. That configures the machine, it might install packages. Lots of things can go on.

After that, sometimes people like to manage things. We typically call that, "Orchestration." When there's some external force that goes and pokes things to do something, that's orchestration. The reason I like to make this clear is because there's lots of great tools out there, and some of them blur the lines of config management and orchestration.

A tool like Puppet is a pure config management tool. Same thing with Chef. There are things like Ansible which are crossing the line between config management and orchestration. It gets everyone a little confused, but those are the terms that we use.

Gordon: I'm going to dig into a little more detail about the differences between some of the tools out there, and how they overlap and don't overlap. Before we get to that level of detail, from your perspective, what's changing about config management today? What's the interesting stuff that's happening?

James: A lot of interesting stuff is happening. A lot of config management didn't used to happen in the past. People had a smaller number of machines, there weren't as many services. Now we have microservices, we have more and more machines. As things go on, automation isn't a question of if you do it well, you'll have to do it.

If you're not doing config management, if you're not doing automation, you simply won't be able to run your infrastructure. It's becoming more and more essential. Once upon a time, we would have little bash scripts that glued everything together. Some people still do. CFEngine was an early player that was somewhat popular, although I wasn't a huge CFEngine user, personally.

Newer technologies like Puppet and Chef are quite popular these days. In the future, I think that's going to change, too. The scales are getting bigger. There's a promise that ARM servers could happen and increase host counts by 10 or 100 or maybe more. A lot's changing there. The fact that containers are getting quite popular is definitely going to change the scene a little bit, too.

I don't have the magic glass ball that can tell me exactly what's going to happen. I'm definitely following it closely.

Gordon: Let's talk about containers specifically. That brings a significant change in the way we operate systems. While you can look at containers as virtualization lite, as I've talked about in some of my previous podcasts, that's not the best way to use containers.

James: Fair enough. The biggest message around containers that I would like to put out there...I think many of my colleagues and peers in the community agree, especially in the config management community.

While containers are a great technology, and there's definitely a lot of cool stuff happening there, this unfortunately, does not get rid of the need to still do config management. The role might change a little bit, how it glues together might be a little bit different, but it doesn't go away.

If anything, config management needs to be adapted and more flexible, and have new paradigms so that these things work that much better. The current generation of container stuff, containers have existed for ages, but quite recently they're getting quite hot and popular.

A lot of the current generation of config management tools, were written and designed before containers were very mainstream. That might have to change. They might have to make some big changes, and some of them are trying. We'll see how well they succeed. Maybe someone will come up with something entirely new that solves this problem much better.

Gordon: Another change that's not directly related to containers, although they're part of the same general sea change, is this shift towards VMs or containers or services, that aren't long‑lived, that have a very short lifetime. The pets/cattle thing. One of these things goes bad, you just shoot it and start up a new one. How does that change config management?

James: You've still got to configure how those things are built, and what goes in them and what settings get set for them. There are tools like Kubernetes that are particularly good at managing these cattle. That'll glue them together with containers.

I have a product, more of a project called "Oh‑My‑Vagrant," which I use to actually test and glue a lot of these technologies together. It's an environment on top of Vagrant, which is a great development tool. It will also glue in Puppet and even Ansible and Docker and Kubernetes and things like that.

If you want to spin up a test environment to test this out for yourself and to make it easy to develop your app, you can do that. I've got some new screencasts coming that I've been publishing on my blog. There's some stuff there now, and some more stuff is coming quite soon.

Gordon: Let's dive a little deeper into some of the individual products and projects that are out there. What are they good for? Who are they interesting to? How are they evolving? Let's start with Puppet.

James: Puppet is one of the most popular tools. It has a tricky learning curve for some people. I think it's a useful tool. I've done quite a lot of very complex and advanced Puppet things. I'm grateful that it existed, because it's where I did a lot of my learning about config management from.

The language, the Puppet DSL, the domain‑specific languages, is mostly declarative, which is uncomfortable for some people who are used to more imperative programming tools. Somehow, Puppet folks have made some really, really brilliant features. One of my favorite is exported resources.

It's a feature in Puppet that lets you declare things that seem to be part of one host but get exported, and those definitions get exported and used on a different host. When you're doing multi‑machine things, which I think are interesting. Puppet has some nice patterns that make it more natural.

It's not perfect. There're definitely some problems and missing pieces there, but it definitely has been very inspiring.

Gordon: Now, the developer, as opposed to the operator or the ops community, Chef is probably the most popular tool today. Right?

James: I'd like to see more of less developer versus ops and really more ops devs or dev ops, as we sometimes hear on the Internet. Getting those two to converge might be an impossible fight. A lot of places that I've seen people converge the two, and have much more truer DevOps environment, they see a lot of gains.

You can move a lot quicker. You can ship code that you own and are responsible from. You can make changes very quickly. There's a lot of organizations that are literally making changes every minute and every 10 minutes and moving that quickly.

When you can use DevOps techniques to move that quickly, you can innovate a lot quicker than some of the slower‑moving competition out there.

Gordon: What is it, specifically, about Chef that makes it interesting for developers and the dev ops folks?

James: I'm not a heavy Chef user. I've used Chef a bit, but not as extensively as Puppet. That's just a personal choice. I know that the Chef community is awesome. Nathan Harvey, one of their community guys, is a really nice guy. Every time you meet him at conferences, it's great. I wouldn't be surprised if one of the main answers is their community.

They have a lot of other great community people. Another thing about Chef is that it's not declarative. Not fully declarative, anyway. There's a lot more imperative Ruby being used. It might be a lot more comfortable for people who aren't familiar with the declarative paradigm. Personally, I think we should get used to the declarative paradigm.

For a lot of people that don't want to dive down that rabbit hole, Chef can be a great step into config management.

Gordon: You've talked about this declarative versus imperative. Maybe for our listeners, spend a couple of minutes talking about what the differences are?

James: This has changed a little bit over the lifetime of Puppet, because there are pure declarative languages and stuff. Puppet's maybe not purest. Imperative is what we're typically used to in a programming language. You have for loops and the typical structures you see in Python and Ruby.

The types, which are the things that we declare or state in config management exist. In Puppet, the constructs that set up those types can also be declarative. In pure Puppet, there didn't used to be for loops and things like that. There were classes and things that are called defines. You would set those up. You would have a relatively logical but well‑organized code base.

All of the more complex redefining of variables and little things that can cause programming errors in imperative space don't really exist in this language. It actually makes your code safer. I realize I'm probably explaining this extremely poorly. That's probably because I've got my head stuck in the sand, so deep into code that I forget how to define things.

Hopefully that gives you a good example, and can convince people to look into the specifics and see. If you're writing code that runs on thousands of machines, you could make a mistake. You could make a mistake.

Something like an off‑by‑one error, that could perhaps be prevented quite easily with a declarative language, could blow away a whole bunch of machines or a whole bunch of data. There is a use for safer declarative languages.

Gordon: One of the ways I think about, and it's probably a bit simplistic, is imperative you're telling something...what to do, and declarative is you're telling it where you want it to end up.

James: The concept of states and defining what state you want to converge towards exists in all the languages. How you define those state elements can be a bit more dynamic and be programmed a bit more classically in Chef than in Puppet.

Gordon: One other project we're starting to hear quite a bit more about these days is Ansible.

James: Ansible is an interesting project, because people lump it into the what's config management space. I don't see it in that space. The easy way you could decide if it's in the same space is is there some cluster, architecture, or infrastructure where you would have more than one of these tools?

If you would use Ansible with Puppet or Chef, would it still make sense? It might not be necessary. What Ansible really does is it's idempotent. What that means is that if you write some code or have some sort of line that says I'd like to do something and you run that over and over again, if it's idempotent it should converge towards that state.

Some operations, like appending to a file, they would not converge. They would diverge, and you would never end up at one, single state. Puppet, and Chef, and Ansible are all idempotent, which means you can run the code as many times, and you should converge towards that one state.

The difference is that Ansible, like I said, isn't purely config management. It's more of an orchestrator. You actually run it on your laptop or on, arbitrarily, one server. It goes out over SSH and applies things, in an idempotent way, to a number of servers. In contrast, Puppet and Chef run on that machine and configure directly.

Gordon: Any other projects that have particularly interested you in this general space?

James: Which ones have particularly interested me? There's a lot of little, random things that I look at and poke at. I don't think a lot of them are worth mentioning. I think we'll see some exciting changes over the next three months. I would wait and see what comes up. Nothing I wanted to call out today specifically.

Gordon: Anything we haven't covered that you think our listeners might be interested in?

James: There's a lot of stuff going on. I've been doing a lot of config management and containers work lately. The big thing that I'm hacking on at the moment, this week, is Oh‑My‑Vagrant. I definitely encourage you to have a look at that. It's a simple way to make a Vagrant environment without writing a thousand lines of Ruby.

There's some great stuff happening in the systemd‑nspawn world, which I think is quite interesting and I think is a serious technology win. That's quite interesting, if you're interested more in the container space. For config management, I don't know what's going to happen in the future. I have some ideas.

Hopefully, they'll be interesting things that we can talk about in a few months, or in a year down the road.

Links for 04-22-2015

The best TV show you’re not watching: “The Americans” will never be must-see TV, no matter how hard it tries - Salon.com
The MIT Sloan CIO Symposium - I'll be hosting a lunchtime BOF on cloud security/governance at the MIT Sloan CIO Symposium. Always a good event.
Docker in Production: Reality, Not Hype
Netflix, Full House, and the Temptations of Nostalgia | TIME - "During the long, rich life that Full House lived on ABC, it was not a good show. But it was a well-loved show, and that was enough to bring it back, because that’s what we do now. "
Google Lifts the Veil on Borg, Revealing Apache Aurora’s Heritage - The New Stack
The Best Gear for Travel | The Wirecutter
What to Do on the Left Bank, Paris - NYTimes.com
2015 Future of Open Source Survey Results - RT @valb00: Hot off the press: 2015 Future Of Open Source Study results! #futureOSS @north_bridge @black_duck_sw
Large-scale cluster management at Google with Borg - RT @jbeda: Borg paper is finally out. Lots of reasoning for why we made various decisions in #kubernetes. Very exciting.
Inside The Ceph Exascale Storage At Yahoo
Corporate Open Source Participation Reaches All-Time High, But More Formal Management Needed | Linux.com - RT @linuxfoundation: Corporate OS participation at an all-time high according to the @Black_Duck_sw Future of OS survey via @LinuxPundit
DevOps Interview: Ryan Frantz - VictorOps - RT @jasonhand: "I think DevOps is gaining traction in older or more traditional organizations" - @Ryan_Frantz : #DevOps
Hewlett Packard Enterprise just released its boring new logo - RT @BButlerNWW: Is it me, or is this new Hewlett Packard Enterprise logo really boring?
Moore’s Law Turns 50, May Not See 60 - Prismatic - RT @Adamalthus: [What are the economic implications of it ending?] Moore’s Law Turns 50, May Not See 60
Folklore.org: The Original Macintosh
Should We Demolish Or Cherish Brutalist Architecture? - The Daily Beast - "Boston’s City Hall, a large Brutalist structure whose demolition has been suggested, is often called the ugliest building in America."
Cambridge ban on single-use plastic bags will affect campus retailers - The Tech - Cambridge is going all California on us
How an Unknown Photographer Named Carleton E. Watkins Helped Save Yosemite
Supercomputing Strategy Shifts in a World Without BlueGene - Don't follow servers as closely any longer but hadn't realized IBM had moved on from BlueGene

Tuesday, April 14, 2015

Links for 04-14-2015

King's "Everything You Need to Know...."
Media History Monographs|Online Journal|Journalism and Mass Communication HistoryThe Evolution of the Summary News Lead - Didn't realize that even the wire services didn't adopt the inverted pyramid model until the late 1800s.
Internet of Things - HorizonWatch 2015 Trend Report
Internet of Things: Facts and Forecasts - Digitalization & Software - Pictures of the Future - Innovation - Home
Matt Asay on Twitter: "Every time I think about the Open Core model for #opensource I'm reminded of this classic Dilbert http://t.co/nd04N9zRYJ" - RT @mjasay: Every time I think about the Open Core model for #opensource I'm reminded of this classic Dilbert
Skeuomorphism Will Never Go Away, And That's a Good Thing - "If skeuomorphism was just an ugly-ass, past-its-prime visual style, we wouldn't need to write about it. But there's a more structural function for skeuomorphism, as Brownlee also points out, and it deals with teaching users how to use new technologies."
Slides – Create and share presentations online - Trying out for a new presentation I'm putting together for #DevOpsSummit in June.
Digging into microservices - SD Times
OpenID Connect support by liggitt · Pull Request #1631 · openshift/origin · GitHub - RT @TheSteve0: ANNDDD...OAuth2 authentication lands in V3 of OpenShift. Read about it here git pull here
A Technical Overview of Red Hat Cloud Infrastructure (RHCI) | all things open - RT @kevinchin5: A Technical Overview of Red Hat Cloud Infrastructure (RHCI) -
Translating the Internet of Things | Red Hat - RT @jkirklan: Check out my news post on #IoT on the @redhatnews blog. It's on Translating the Internet of Things
13 Things that Saved Apollo 13
Vala Afshar on Twitter: "In two months we will be closer to year 2030 than we were to 2000. By then, everything will seem antiquated. http://t.co/aYecrcypNb" - RT @ValaAfshar: In two months we will be closer to year 2030 than we were to 2000. By then, everything will seem antiquated.
Cloudy Chat - I've added my Cloudy Chat podcast to SoundCloud. (It's also on iTunes or direct from my blog.)
Who Can Save the Grand Canyon? | Arts & Culture | Smithsonian
Instagram - RT @EmilyStancil: Say no to stock photos, and all pics of women laughing with salad! #rhnapc #redhat @ Omni Orlando
Can I Stream.It?: Search Netflix, Hulu, Google Play, iTunes, and more, for movies to stream instantly, rent, and buy. - RT @asymmetricinfo: It occurs to me that you may not know about Can I Stream It? It’s a game changer. You’re welcome.
Red Hat channel chief: Time to build an open source practice - Hearing a lot about the importance of channel building up deep open source practices

Improve your data and cloud project odds

View image | gettyimages.com

A few factoids about the success or lack thereof of cloud-related projects have been making thee rounds of late. This is an edited version of my response to a recent query from a journalist about this topic.

A lot of large IT projects do fail. For example, a 2012 McKinsey study found that "on average, large IT projects run 45 percent over budget and 7 percent over time, while delivering 56 percent less value than predicted." And 17 percent went so bad that the very existence of the company was threatened. What's a typical number? That's hard to say because it depends on so many factors--perhaps most notably size and length of the project. Big ERP projects are the poster children for IT project failure with failure rates of at least 25 percent commonly cited.

Here's my take on two specific numbers that I’ve seen cited recently. I’m inclined to interpret them rather differently from each other although they do have aspects in common.

The first number comes from Capgemini Consulting: "Only 27% of the executives we surveyed described big data initiatives as ‘successful.’" Although the details are much different--not least by the dominance of open source technologies within current big data storage (including from Red Hat) and analysis solutions--it feels as if we're in somewhat the same place as we were amidst all the data warehousing hype in the mid- to late-90s. There's this feeling that with all the data out there, we must be able to do *something* with it even if we don't know the right questions to ask or the right models to apply.

So I chalk up a lot of the failures happening in the big data project space to projects not having a clear goal and a clear path to that goal. If you look at Gartner studies, for example, the #1 and #2 big data challenges are "Determining how to get value from big data" and "defining our strategy." Many organizations are undertaking big data projects mostly because it's something they think they ought to be doing even if they don't know how or why. Of course they're not going to succeed!

What should they do? Well, don't do that. Look at the success stories that do exist and ask if you could do something similar. But be careful. Many of those stories are something between Photoshopped reality and myth.

Technologies are shifting too--for example, the emergence of in-memory processing with Apache Spark. Software-defined storage (both file and object) is also maturing rapidly and coming into its own. So it is challenging to pick the right technologies and apply them to a specific problem. We've seen far too much "The answer is Hadoop. What's the problem?" But the bigger problem is not having sensible and actionable objectives.

On the cloud side, Tom Bittman's informal poll and some of his associated Gartner research about private clouds highlight quite a few reasons (perhaps most of all organizational ones) for problems that cloud projects encounter. But it's worthwhile noting that that his quote was "95% of the 140 respondents (who had private clouds in place) said something was wrong with their private cloud.” It's the rare IT project (or indeed any type of project of any consequence) that doesn't have at least some problems. So I'm less inclined to focus on the 95% and rather on the steps that can be taken to reduce the number of problems: Set objectives, get the right organization in place, engage with the right partners, take an iterative approach and don't try to do everything at once.

At Red Hat, we've been doing a lot of work with clients in both the Infrastructure-as-a-Service (Red Hat Enterprise Linux OpenStack Platform) and Platform-as-a-Service (OpenShift by Red Hat) spaces. Part of our involvement is delivering supportable enterprise product subscriptions of course. But it's also working with customers to implement a proof of concept, often with a small consulting team. It doesn't need to be a mega-project (and it's usually better if it isn't) but some initial guidance by consultants who have done this before can go a long way toward getting projects headed in the right direction and having fewer problems as a result. We’ve also partnered with other vendors including Dell (Dell Red Hat Cloud Solution, Powered by Red Hat Enterprise Linux OpenStack Platform) and Cisco to create integrated infrastructure solutions.

Pretty much no project of any size is going to go without a hitch. But understanding your objectives going in and working with the right partners can make a big difference.

Monday, April 06, 2015

How PaaS makes "developer-defined infrastructure" fences possible

Over at VentureBeat, Jerry Chen of Greylock Partners writes:

We are entering the age of developer-defined infrastructure (DDI). Historically, developers had limited say in many application technologies. During the 1990s, we effectively lived in a bilateral world of Microsoft .NET vs Java, and we pretty much defaulted to using Oracle as a database. In the past several years, we have seen a renaissance in developer technologies and application infrastructure from a proliferation of languages and frameworks (Go, Scala, Python, Swift) as well as data infrastructure (Hadoop, Mongo, Kafka, etc.). With the power of open source, developers can now choose the language, runtime, and database that make sense. However, developers are not only making application infrastructure decisions. They are also making underlying cloud infrastructure decisions. They are determining not only where will their applications run (private or public clouds) but how storage, networking, compute, and security should be managed. This is the age of DDI, and the IT landscape will never look the same again.

In part, this reflects developers as The New Kingmakers as my former colleague RedMonk’s Stephen O’Grady has eloquently written about. Like any meme, the ascendency of developers as IT decision makers can be overstated. Developer

73314991 7d5566a2c7 b

s flock to Apple’s app store for the same reason that Willie Sutton robbed banks. It’s where the money is. Not because it’s a wonderful developer-focused experience. Nor are we living in a NoOps world, a term that caused a bit of a furore a couple years back.

That said, many of the most interesting happenings in enterprise software today have a distinct developer angle whether or not they’re exclusively built around developer concerns. Containers and their associated packaging, orchestration systems, and containerized operating systems (like Red Hat Enterprise Linux Atomic Host/Project Atomic) certainly. An expanding landscape of programming languages. (To quote Stephen O’Grady again: "an environment thoroughly driven by developers; rather tha

n seeing a heavy concentration around one or two languages as has been an aspiration in the past, we’re seeing a heavy distribution amongst a larger number of top tier languages followed by a long tail of more specialized usage.”) And even much of the action in data is at least as much about the applications and the analytics as about the infrastructure.

However, to Jerry Chen’s basic point, it’s also about separating the concerns of admins and developers so that each can work more effectively. As I’ve written about previously, this is one of the reasons why a Platform-as-a-Service (PaaS) such as OpenShift by Red Hat, is such a useful abstraction. It's a nicely-placed layer from an organizational perspective because it sits right at the historical division between operations roles (including those who procure platforms) and application development roles—thereby allowing both to operate relatively autonomously. And in so doing, it helps to enable DevOps by providing the means for operation

s to setup the platform and environment for developers while the PaaS provides self-service for the developers and takes care of many ongoing ops tasks such as scaling applications.[1]

[1] A PaaS like OpenShift also enables DevOps in other ways such as providing tools for continuous integration and a rich set of languages and frameworks but I wanted to focus here on the abstraction.

Links for 04-06-2015

The Economist’s Tom Standage on digital strategy and the limits of a model based on advertising » Nieman Journalism Lab
The economics of the podcast boom - Columbia Journalism Review
Maps Mania: The Transit Map Quiz
From cancelled to champions: The strange history of MIT Football | MIT News
The geek shall inherit the earth: The age of developer-defined infrastructure | VentureBeat | Dev | by Jerry Chen, Greylock Partners - "Welcome to the age of DDI, where developers are making decisions on how, what, and where their applications should run. DDI is the natural evolution of software defined infrastructure. The power of turning hardware into software is partly the separation of logical from physical into software, but mainly the fact that once you have hardware represented in software, you can treat hardware like any other piece of code. "

Podcast: Microservices, Rocket, and Docker with Red Hat's Mark Lamourine

This podcast takes a look under the covers at today's different containerization approaches and the implications for developers, operators, and architects. We also talk microservices, the different ways they can be aggregated, and the challenges of figuring out the best service boundaries.

You can also point your browser at Containers on redhat.com to learn more about:

Container portability with deployment across physical hardware, hypervisors, private clouds, and public clouds
An integrated application delivery platform that spans from app container to deployment target—all built on open standards
Trusted access to digitally signed container images that are safe to use and have been verified to work on certified container hosts

Listen to MP3 (0:16:32)
Listen to OGG (0:16:32)

[Transcript]

Gordon Haff: Welcome to another edition of "Cloudy Chat," Podcast. I'm Gordon Haff in the Cloud Product Strategy Group at Red Hat. I'm here with my colleague, Mark Lamourine. Today, we're going to talking about microservices, Rocket, and Docker, and how these various things relate to each other.

Before we get into technologies, one bit of housekeeping. We're going to be talking about the technical aspects of these things. Unless I say something explicit about commercialized product, none of this should be taken as a product roadmap, or product plans. With that said, Mark, let's take it away.

To start off with, maybe provide a little context. Where does Rocket fit in the container landscape?

Mark Lamourine: Prior to Rocket, if you were trying to build containers, you really had two avenues. I'm leaving older technologies, like Solaris and stuff aside. We're talking just about Linux containers. It was LXC, which was a build‑your‑own, handcrafted, rather complex mechanism which has existed for years, but really didn't gain any traction outside of a couple of important areas, which we’re not going to get into today.

Then, about two and half years ago, Docker emerged and made it much easier. By making a lot of assumptions for you about what a container should be, they made it a lot simpler to create containers to move images around to distribute them. They did a lot of tooling around it.

Since the advent of Docker, containers have really caught on, at least in mindshare as a new way to distribute software and distribute applications. Rocket came into that as a slight alternative. The community that was working with Docker and with LXC, they noticed a couple of things that were a little restrictive about what Docker does.

There are some assumptions in Docker that make it slightly more difficult than it might be to create containers that communicate with each other, or to create containers that have multiple processes or a number of different little things, that people have noticed only after they started trying to build complex applications. Rocket was designed specifically to fall into that middle space.

Gordon: It's probably worth mentioning something here, which you touched on, which is that Docker is often equated with containers, the base level technology. Whereas, Docker really encompasses a lot more than that.

Mark: You're right. You made two points there. One is that people think of containers, and they think Docker like that's the only way. There are a lot of ways to do containers, or there are now three ways to do containers.

But also Docker encompasses a lot more than just the containers. They've built an ecosystem. They've built a life cycle, or they're starting to build a life cycle. Rocket, again, falls somewhere in the middle of control spectrum. Where LXC, you have very little in the way of infrastructure, very little in the way of control and boundaries, and it's very free, where Docker provides a lot of infrastructure, and a lot of help in producing, and managing your containers and in running them, but it also imposes a certain level of restriction.

Gordon: Both of these technologies are independent of a containerized operating system, such as RHEL Atomic Host.

Mark: In fact, they're independent of the operating system in general. You can run both Docker and Rocket on ordinary hosts. You can embed both of them onto a container host, a stripped‑down host, like either Core OS, Project Atomic, or RHEL Atomic.

They're really orthogonal pieces. In fact, they can run side‑by‑side. You can run Rocket right next to Docker, and you can run Rocket containers right next to Docker containers.

Gordon: I think there's been a tendency in the marketplace, deliberately by some folks to mash things together that don't need to stay together.

Mark: In some senses, that makes sense because, in the end, the goal is to produce an environment where people can create containerized applications, and stop thinking about the underlying operating system. In that sense, the idea that somehow the two are equated, it makes a certain amount of sense.

But when you're actually developing for one or another, it doesn't really matter. In fact, you don't generally develop containers on a container host. You develop them on a real host, then you migrate them over, test them, and hopefully deploy them out onto a container host.

Gordon: Mark, what is Rocket at a high level from the technical perspective?

Mark: Rocket's a container system. What that means is that Rocket is a means of starting processes that have a different view of the host operating system, than an ordinary process does. It's different from other isolation mechanisms, like virtual machines, in that the containerized app is running on the host.

From the host, it looks like an ordinary app, but the containerized app has a different view. Rocket is one mechanism of creating one of these, and what Rocket does is it allows you to create a single image with a single process, a set of process binaries, and run it on the host in its little special environment.

Gordon: Now, if you think about how Rocket is different from other approaches to implementing containers on a containerized operating system or in some other way, what are some of the most salient characteristics?

Mark: The big difference between others is that when you're crafting a new image or a new container, if you use LXC, you have to handcraft all of the boundaries, all of the contents. If you're using Docker, you use the Docker file and a base image, which helps you define what goes inside.

With Rocket, you have to handcraft the image in the same way that you do with LXC or a similar way, but Rocket also offers you the advantage of the image spec, which LXC doesn't. It's a simpler image format. It's a simpler set of contents, but you do still have to handcraft what goes inside.

What that means is that you get a very lightweight container image. The container image doesn't have all of the package adding mechanisms. It has just what you need to do your application.

Gordon: What are the pros and cons of those approaches?

Mark: For Docker, the big advantage is that the application developers can work very quickly. They don't have to understand a lot about the specifics of how their application works. They can treat it like an ordinary RPM. The disadvantage is that, they tend to get a lot of bulk into their images that doesn't really need to be there at runtime.

The biggest example is whether it's an RPM or Debian‑based system, all that packaging infrastructure is included in the image and is used in composing the image at build time, but it's essentially unused afterwards.

People have to do some fairly interesting tricks to get around that. In Rocket, you just put the files you want in and go on.

Gordon: We've really been talking about the impact on the developer. What are the differences at runtime?

Mark: Again, we get back to comparisons with Docker. With Docker, you get one image and it has one process. The boundaries of the container are defined by the contents of the image. With Rocket, the image is a start and the runtime is specified. But when you create an actual container with Rocket, you can supply multiple images and it can run multiple processes.

It establishes one set of boundaries around all of those processes. With that means is that the processes can communicate with each other, but not with the outside world, except through the holes you push. When you're working with a Docker container, and you want multiple processes to communicate with each other, you can't easily have the two processes communicate without explicitly punching lots of little holes.

What it means is, again, that it's easier to compose a Rocket container from multiple images that cooperate, and still have the containment boundary.

Gordon: Are there any implications for application architect or microservice type application architectures?

Mark: There are a lot of implications for that. There are a number of places where what Rocket does is it recognizes that the processes running in a service have more or less affinity to each other. Some of them only connect via a network. Other ones need to share files.

They may need to share pipes. They need to be on the same host. Rocket gives you the flexibility to control the distance between the processes and to control more finely which ones are treated as close boundaries and which ones are treated as wide open ones.

Gordon: Presumably, there are also implications at the orchestration layer, like Kubernetes, for example.

Mark: One thing about Docker is it's taken what we've learned about orchestration so far and is starting to apply it to the containers themselves, the container design. I hadn't looked at it until very recently, but it turns out that the Rocket developers have actually adopted some Kubernetes technology.

They had started out calling a container this thing which has multiple processes in it. The documentation recently has started to refer to that as a pod, which is a term pulled from Kubernetes, which refers to multiple processes which run on the same host, and which share resources more tightly than merely ones that are communicating over a network.

Gordon: This all very much corresponds to some of the terminology, different terminology of the ideas in microservices where you expose every microservice or do you have nested microservices because some things don't really need to be explicitly exposed to the outside world?

Mark: I think there's still a lot of exploration that needs to go across the container community about how good application architectures will look. There are purists. Actually, it's interesting. There aren't that many purists in this area. Everyone's pretty much exploring. There are a few. But I think we'll find out from practical experience, over time, what works for what situations.

Gordon: One of the things people adopting microservices have found to be very difficult is figuring out what these boundaries are to start with. In fact, I've even seen for some situations the recommendation that maybe you start out doing a more monolithic application, and then break that out into microservices, once you understand how the parts relate.

Mark: This is one of the areas where you might call me a purist, because I take exactly the opposite tack. I think part of the reason we don't know, we find difficulty in finding those boundaries when we're trying to decompose an application, is because we've deliberately ignored them. They were unimportant in a host‑based environment.

If you put the files down in their own root and two different parts of an application share them, on a host it doesn't matter. There is no boundary to cross. As we start putting applications into containers, if you treat them as, what someone I found out last week calls, "Virt‑Light," if you treat them as if it's a virtual machine that you're stuffing things into, you actually continue to obscure the communications that you've been ignoring before.

I think it's still early, and I think people are going to probably take the monolithic path at first as a perceived easier first step. But I think it's really important to start uncovering the ways in which these covert communications can hide inside the applications and try to find ways both to identify them a priority, to figure out whether they need to be there in the first place, and then figure out how to flag them, split them out, establish proper communications boundaries.

In some senses, this mirrors the object‑oriented wave that took place in the '80s and '90s where one of the points of object‑oriented programming was to enforce the boundaries, the modularity, and the coherence. Some of the people took it. Some people decided that wasn't so important after all.

Gordon: New global variables are really useful.

Mark: At times, yes.

Gordon: [laughs] As you were talking, I was thinking exactly the same thing, because I mean none of the things are perfect today. But I think it's absolutely true that when you first saw object‑oriented programming out there, there were a lot of people that couldn't be bothered with having to use methods to get data out of classes and that thing. You took a lot of shortcuts.

Mark: It was perceived as a lot of these new waves are, it was perceived at the time as overhead, as extra work, as unnecessary work. I know what's there. I'll go get it. The changes have been informed by the recognition that software has a life cycle and, that by doing this little bit of overhead, it actually frees the developer of whatever the tool is to change the implementation without causing problems for the user on the other end; for the consumer, in the case of libraries or objects.

Because containers are this object where we want to treat them as a unit, and then we want to be able to push things in and get things out, these boundaries are going to become very important.

If we expect in the end to build this hardware store mentality, where someone can go in and get, "I need a database container, but I need one that's tuned this way," rather than having one container for the database, which you have this monstrous set of tuning parameters.

What I'd like to see is that we discover, "Gee, there are some people who can just use the default tuning parameters, who never touch them, and that's fine." Then you might find others where they have specific patterns, because I suspect that not every tuning parameter has equal value.

You're going to find specific situations where you want to tune specific things. The response to that rather than having one highly configurable thing is to have a small set of slightly tuned ones, and you can go and say, "Oh, I want the generic database thing. Or, I want the storage tuned database thing. Or, I want the throughputs tuned database thing," and then only passing parameters that are necessary to get your job done.

It puts some of the work back on the developer to try, and think through the common use cases, and figure out what the right tuning choices are, but it's going to make it much easier for the consumer in the long run, if we can shift that work.

Gordon: Well, great. It's, as always, been a great discussion today, Mark. Thank you.

Mark: Thank you.