Connections: May 2016

Wednesday, May 25, 2016

Issue #4 of my newsletter is live

This issue has links to an article I recently had published on public cloud security as well as to discussions around using Ansible with docker-compose and why it's important to orchestrate containers using tools such as Kubernetes.

Links for 05-25-2016

Six Ways Ansible Makes Docker-Compose Better
Lost containers tell no tales. Time to worry • The Register
The End of Food - The New Yorker
Along the Cambrian Way
How organisations get public cloud security wrong | ITProPortal.com - RT @YourCloudHub: Some very interesting points by @ghaff on how organisations get public cloud security wrong
Dear Silicon Valley: Stop saying stupid stuff | InfoWorld
Forty Percent of the Buildings in Manhattan Could Not Be Built Today - The New York Times
Untitled (http://fortune.com/2016/05/19/susan-sarandon-hedy-lamarr/) - RT @gigabarb: Hedy Lamarr May At Last Get Her Long-Overdue Memorial via @FortuneMagazine
MIT Sloan CIO Symposium: Customer experience is critical in the digital economy | The Enterprisers Project - Running a bit on recapping #MITCIO yesterday but #redhat Carla Rudder covered opening panel f Enterprisers Project
23 Brilliant Life Lessons From Anthony Bourdain - Airows
OSCON 2016: Extensible Kubernetes and OpenShift for Graceful Evolution - The New Stack - RT @alexwilliams: OSCON 2016: Extensible Kubernetes and OpenShift for Graceful Evolution

Thursday, May 19, 2016

Data, security, and IoT at MIT Sloan CIO Symposium 2016

As always, the MIT Sloan CIO Symposium covered a lot of ground. Going back through my notes, I think it’s worth highlighting a couple sessions in particular—in addition to the IoT birds of a feather that I led at lunchtime. They all end up relating to each other through data, data security, and trust.

Big Data 2.0: Next-Gen Privacy, Security, and Analytics moderated by Sandy Pentland of the MIT Media Lab

There were two major themes in this panel.

The first was that it’s not about the size of the data but the insights you get from it. This is perhaps an obvious point but it’s fair to say that there’s probably been too much focus on how data gets stored and processed. These are important technical questions to be sure. But they’re technical details and not the end in itself.

I might be more forgiving had I not lived through the prior data warehousing enthusiasm of the mid- to late-1990s. As I wrote five years ago: "There are many reasons that traditional data warehousing and business intelligence has been, in the main, a disappointment. However, I'd argue that one big reason is that most companies never figured out what sort of answers would lead to actionable, valuable business results. After all, while there is a kernel of truth to the oft-repeated data warehousing fable about diapers and beer sales, that data never led to any shelves being rearranged."

However, the other theme is newer—or at least amplified. And that’s ensuring the security of data and the privacy of those whose data is being stored. One idea that Sandy Pentland discussed is the idea of sharing answers (especially aggregated answers) rather than raw data. See enigma.mit.edu as an example of a system that's designed to make it possible for parties to use and maintain data without having full access to that data. Pentland also noted that because systems such as this make it possible to securely ask questions across jurisdictional boundaries, they could help address some of the often conflicting laws about the treatment of personally identifiable information.

Getting Value from IoT

At my luncheon BoF table, we had folks with a diverse set of IoT experiences including Ester Pescio and Andrea Ridi of Rulex Analytics, Nirmal Parikh of Digital Wavefront , and Ron Pepin, a consultant and former Otis Elevator CIO. The conversation kept coming back to value from data. What data can you gather? What can you learn from it? And, critically, can you do anything with that data to create business value?

Per my earlier comment about data warehouses, gathering the data is relatively straightforward. It may not be easy, especially when you’re dealing with sensors that aren’t on your own property and therefore need dedicated networks of some sort. But the problems are mostly understood. It’s “just" a case of engineering cost-effective solutions.

But what data and what questions? Ron Pepin shared his experiences from Otis. Maintenance is a big deal for elevators. It’s also the main revenue stream; the elevators themselves are often a loss leader. Yet proactive elevator maintenance mostly consists of preventative maintenance on a fixed schedule.

Anders Brownworth, Principle Engineer Circle, on Blockchain panel

It seems like a problem tailor-made for IoT. Surely, one can measure some things and predict impending failures. But it’s not obvious what combination of events (if any) are reliable signals for needed maintenance. There’s a potential for more intelligent and efficient maintenance but this isn’t a case where you can cost effectively just instrument everything—someone else owns the building—and the right measurements aren’t obvious. Is it number of hours, number of elevator door reversals, temperature, load, particular patterns of use, something else, or none of the above?

The Blockchain

Given the level of hype around blockchain, perhaps the most interesting thing about this panel by Christian Catalini of MIT Sloan was the the lack of such hype.

Interest, yes. Catalini described how blockchain is an interesting intersection of computer science, economics & market design and law. He also argued that it can not only make things today more efficient (which could potentially redefine the boundary of firms by reducing transaction costs) but also create new types of platforms.

That said, there was considerable skepticism about how broadly applicable the technology is. Anders Brownworth of Circle (which has a peer-to-peer payment application making use of blockchain) said that the benefits of blockchain are broadly in the area of time-based transactions, with interoperability, and with many able to audit those transactions. However, with respect to private blockchains outside of finance, “we trust all the people around the table anyway” and, therefore, the audibility that’s inherent to blockchain doesn’t buy you much.

In the same vein, Simon Peffers of Intel agreed that it’s "hard to let thousands of users have the same view of data with a traditional database. But some blockchain use cases would fit with traditional database.” He added that "There is a space for smaller consortiums of organizations that know who the parties are with other requirements that can be implemented in a private blockchain. Maybe you know who everyone is but don't fully trust them."

To sum up the panel: You’re usually going to be giving up some features relative to a more traditional database if you use blockchain. If you’re not making use of blockchain features such as providing visibility to potentially untrusted users, it may not be a good fit.

Photos (from top to bottom):

Sandy Pentland, MIT Media Lab

Anders Brownworth, Principal Engineer, Circle

Tuesday, May 10, 2016

Links for 05-10-2016

Why 2016 made a mockery of Nate Silver
The creators of Siri just showed off their next AI assistant, Viv, and it's incredible | The Verge
Queso Fundido | The Pioneer Woman
Untitled (https://www.getrevue.co/profile/ghaff) - I’m giving a newsletter format a try. Go here if you’d like to subscribe:
Maps Mania: The Interactive Watershed Map
Building Security Through Culture (Craft Conf - Budapest 2016) // Speaker Deck
Data, like code, is better open | Opensource.com - My piece on open data (especially governmental) and how to use it over at @opensourceway
Amtrak NEC Vision
Does distributed development affect software quality? An empirical case study of Windows Vista - Microsoft Research
Agile is Dead, Long Live Continuous Delivery - Gradle
Untitled (http://www.bloomberg.com/view/articles/2016-05-03/tips-for-uber-drivers-not-from-me?utm_content=buffer11b26&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) - RT @vpostrel: Save the @Uber experience: Don't give your driver any extra cash. My latest @BV
Twitter - RT @Craw: Alexa is going to be such an important consumer platform #payattention
mah ki dal or kaali dal recipe, how to make maa ki dal or manh di dal
Death by GPS | Ars Technica
New York’s Elevators Define The City | FiveThirtyEight - RT @FiveThirtyEight: New York's elevators define the city:
Home | Red Hat Summit 2016 - RT @EmilyStancil: Who else is going to @RedHatSummit? Less than 8 weeks away!
Twitter - There are still apparently bonafide booth babes at Interop.

My newsletter experiment

There’s a certain range of materials–curated links to comment upon, updates, and short fragments–that to me have never felt particularly comfortable as blog posts or on twitter. Tumblr never quite did it for me and I’ve little interest in shoving content into yet another walled garden anyway. I’ve been thinking about trying a newsletter for a while and, when Stephen O'Grady joined the newsletter brigade, I figured it was time to give it a run. We’ll see how it goes.

Here’s a link to the first issue: https://www.getrevue.co/profile/ghaff/archive/19505

It includes some DevOps related links and short commentary, links to a couple of new papers I’ve written on security and deploying to public clouds, and upcoming events including Red Hat Summit in San Francisco at the end of June. (Regcode INcrowd16 saves $500 on a full conference pass!)

You can also subscribe directly to this newsletter here.

The need for precise and accurate data

8266473782 fef433d94b k

Death by GPS (Ars Technica):

What happened to the Chretiens is so common in some places that it has a name. The park rangers at Death Valley National Park in California call it “death by GPS.” It describes what happens when your GPS fails you, not by being wrong, exactly, but often by being too right. It does such a good job of computing the most direct route from Point A to Point B that it takes you down roads which barely exist, or were used at one time and abandoned, or are not suitable for your car, or which require all kinds of local knowledge that would make you aware that making that turn is bad news.

It's a longish piece that's worth a read. However, it seems that a lot of these GPS horror stories--many from the US West--are as much about visitor expectations of what constitutes a "road" as anything else. It's both about the quality of the underlying data and its interpretation, things that apply to many automated systems.

According to Hacker News commentator Doctor_Fegg:

This is clearly traceable to TIGER, the US Census data that most map providers use as the bedrock of their map data in the rural US, yet was never meant for automotive navigation.

TIGER classes pretty much any rural "road" uniformly - class A41, if you're interested. That might be a paved two-lane road, it might be a forest track. Just as often, it's a drainage ditch or a non-existent path or other such nonsense. It's wholly unreliable.

But lest you think data problems are in any way unique to electronic GPS systems, read this lengthy investigation into a 1990s Death Valley tragedy.

For what it’s worth, I did some cursory examination into what Google Maps would do if I tried to entice it into taking me on a “shortcut” through the Panamint Mountains in western Death Valley. My conclusion was that it seemed robust about not taking the bait; it kept me on relatively major roads. However, if I gave it a final destination that required taking sketchy roads to get there (e.g. driving to Skidoo), it would go ahead and map the route.)

After writing this, it occurs to me that for situations such as this, we need data that is both accurate (represents the current physical reality) and precise (describes that physical reality with sufficient precision to be able to make appropriate decisions).

Monday, May 09, 2016

Interop 2016: The New Distributed Application Infrastructure

The New Open Distributed Application Architecture from Gordon Haff

The platform for developing and running modern workloads has changed. This new platform brings together the open source innovation being driven in containers and container packaging, in distributed resource management and orchestration, and in DevOps toolchains and processes to deploy infrastructure and management optimized for the new class of distributed application that is becoming the norm.

In this session, Red Hat's Gordon Haff discuses the key trends coming together to change IT infrastructure and the applications that will run on it. These include:

Container-based platforms designed for modern application development and deployment
The ability to design microservices-based applications using modular and reusable parts
The orchestration of distributed components
Data integration with mobile and Internet-of-Things services
Iterative development, testing, and deployment using Platform-as-a-Service and integrated continuous delivery systems