Tuesday, November 15, 2016

Data and DevOps with Splunk's Andi Mann

AndiMann HiRes

Andi Mann is Chief Technology Advocate at Splunk. In this podcast, he discusses some of the ways in which data plays an important role in DevOps. I’ve known Andi for ages since we were both IT industry analysts and we had a chance to sit down at CloudExpo/DevOps Summit/IoT Summit in Santa Clara where Andi was chairing a DevOps track and I was one of the speakers. (We also did a data and DevOps panel together on the main stage but that video doesn’t seem to be up yet. I’ll post once it is.)

Among the topics we tackle are choosing appropriate metrics that align with the business rather than just technical measures, creating feedback loops, using data to promote accountability, and DevSecOps.

Show notes:

Listen to MP3 (22:13)

Listen to OGG (22:13)

[Transcript]

Gordon Haff:  I'm sitting down here with an old analyst mate of mine, Andi Mann. Also, formally of CA, also the author of some books, and now he is the chief technology advocate with Splunk. What we're going to talk about today is data in DevOps. Welcome, Andi.

Andi Mann:  A lot of the customers I talk to, who are doing DevOps in various versions using Splunk...It boils down to three key areas is that they really want to know about, the metrics that matter for them.

The first is really about how fast are they? What's their cycle time? How quick does it take for an idea to get in front of a customer? How long does it take someone in business to come up with something and then basically make money from it, or, in government, they service their citizens with it? That cycle time is really important, the velocity of delivery.

The second key area that people look at is around the quality of what they're delivering. Are they doing good? Are they delivering good applications? Are they creating downtime? Are they having availability issues? Is one release better than another?

The third area is really around what sort of impact do they have? Measuring real business goals, MBOs, things like revenue and customer sign‑up rates, and cart fulfillment, and cart abandonment. These sorts of things. Those are the metrics that my customers, the people I talk to are interested in for DevOps, closing those feedback loops in those three areas.

Gordon:  One of the things I find interesting, what you just said, Andi, is that you read these DevOps surveys, DevOps reports, and often the metrics, or at least what they're calling metrics, are framed in much more technical terms. How many releases do we have per year, or per week, per hour?

What's the failure rate? How quickly can we do builds? How quickly can we integrate? Which, I think to your point, are probably worth measuring, but they're really...The ultimate goal of DevOps is not to release software faster.

Andi:  Exactly. It's interesting because you do look at these metrics in isolation, and they matter. All this matters. 10 deploys a day, we all know that from 2009 in Velocity. That matters, but 10 deploys a day is no good if they're all bad deploys. You need to measure quality in that.

But even if it's a good quality deploy and you do it quickly, if it's not moving the needle on what your business wants you to be doing, then again, it doesn't matter. I think it's actually really important to connect these together so you really are getting metrics, correlating metrics, that matter across the whole range to really understand whether you're doing good or not.

Gordon:  One of my favorite Dilbert cartoons, I don't remember the exact wording but to the effect of...Pointy Hair goes, "We're now going to measure you on the number of lines of you code you write," and Wally says, "I'm going to off to write myself a new car today."

Andi:  [laughs] Yeah, exactly. That's one of the things that I actually do measure. We measure it internally. A bunch of our customers do actually measure code volume. There's a couple of interesting reasons for that. Especially in a DevOps and Agile mode, actually delivering too much code can be a signifier that you're doing things badly.

You're writing too much code, you're doing too much in one release rather than doing small, iterative releases. It can also signify that one person has too much of a workload. When you think about DevOps and the concepts around empathy and wanting to make sure that life doesn't suck for everyone, when one person is doing all the work, that sucks for them.

There are actually good things that come out of measuring code volume [laughs] but saying that more code equals better code, equals a bonus? That's a really bad thing. [laughs]

Gordon:   I think a lot of people tend to lump data metrics into this one big bucket. As we've had discussions before, there are these business metrics which have to be somehow connected to things.

It's not clear that overall company revenue is necessarily a good DevOps metric. Some of the other things you mentioned certainly are. In many cases, it does make sense to collect a lot of underlying data for data analytics and things like that. Then, you also have alerts.

Andi:  Yeah, the business stuff is really interesting. I know one of our customers releases the software-as-a-service. They're a SaaS company, cloud native and all that. Their developers actually do care about who uses specific features.

They'll implement a feature. They do canary releases. They'll implement feature on 10 out of a 1,000 servers, or whatever. Certain volume or percentage of their customers will get access to it. Then they'll measure using Splunk the way that those features are being used or not. They also measure the satisfaction of those customers.

They've got these nice smileys, and tick marks, and stuff that say, "Yes, I enjoyed using this feature." They can correlate that together, and it actually means the next day after doing a commit, after doing a release, they actually know whether the business use cases being satisfied, which is very cool.

I know a television company in the UK that we work with. They actually send reports on a weekly basis, I think it is, to their marketing department, based on whether users are using the website, what they're doing on the website, whether they're clicking through on competitions.

That's actually really important, but obviously mostly what people are doing in using data and the feedback...Closing the feedback loops is what I'm talking about here at DevOps Summit.

They're closing the feedback loop around those technical measurements.

Am I creating more bugs? Am I creating availability issues? Am I creating problems with uptime? Am I closing out the feature set that is in the story or in the epic that I was promising to do? Partially, it's also around this accountability to each other. Am I doing what I'd promised I'd do?

Gordon:  Talk a little more about accountability.

Andi:  Yeah, that's one of my soapboxes at the moment. I see a lot of the empowerment that DevOps gives developers to make decisions. I think that's great, especially in companies where you've got systems thinking and they understand their role in the organization and what it means to deliver good outputs for their customers.

You give them a lot of responsibility. Their manager is the leader. You give your developers, and your operations team, and those DevOps professionals a lot of responsibility and lot of empowerment to do the right thing.

Also, I think that there's a need for them to be accountable for doing the right thing as well. Especially as DevOps grows in larger organizations and there are more and more people involved. Also, with the concept that DevOps is about helping and making sure that each other is having a good experience at their life and their work.

As a developer, you're not making sure that operators are getting called out late at night, and all this sort of stuff. If DevOps is about helping to work with each other, to collaborate, to communicate better, to make sure each other's lives get better as Dev and Ops professionals, then I think you need to be accountable in two ways.

You need to be accountable to your business, which often means being accountable to your manager for doing the work that you're meant to do, and doing the work you promised you would, within the bounds of the responsibility you've been given.

It's also being accountable to each other, from doing good work, and doing the right work in ways that helps your whole team move forward, and makes everyone else's life positive. I think we talk a lot about empowerment and enablement. We don't really talk much about the flip side of that, which I think is that accountability.

Gordon:  I think the culture talk around DevOps, and we did have lots of discussions around culture and some of the ways that it can be overextended and over‑applied. Yeah, it can turn into this "don't fear failure," empathy, transparency, etc. Unicorns farting rainbows. This very touchy feely, everyone's happy and sings "Kumbaya," but you are, at the end of the day, being paid to produce business outcomes.

There does need to be some accountability there. If you crash the SQL server three weekends in a row, and call in Ops, somebody's going to have to talk with you, as they should.

Andi:  Exactly. Especially when you talk about the DevOps toolchain and the life cycle of software. It's a very complex and opaque theme to try to see what's going on at every stage, especially if you're a manager who's not necessarily fully fluent in specific tools. They can't dig into the specific tools to have a look at that.

I think reporting up to your management and reporting to each other and saying, "I introduced these bugs and I'm sorry for it. I won't do it again." By the same token, "I introduced these newest features, and they were really successful. We should all celebrate that as a team."

I think that accountability is actually really important. You'll see this in manufacturing as well where we get a lot of our examples from. You'll see that if one person makes the same mistake several times, then they'll get into a training program, or they'll get different mentoring.

Maybe they'll move into a different part of the line where they're better suited, and their skills are better suited. You don't know how to make your team better if you're not being accountable to each other, and to your management.

That's, I think, something we've got to step up to as DevOps professionals for want of a better term, is how do we be accountable to each other, and to the company that pays us as you said to do the job?

Gordon:  You just talked about manufacturing. You just mentioned quality, and I think that's a pretty good segue because we often think about DevOps primarily, well, through the lens of developer for one thing, but that's another topic for another day.

We also tend to view DevOps, first and foremost, through the lens of this velocity, business agility, and so forth, but there is a very important quality component there as well. What are some of the ways that data can help to surface that quality component?

Andi:  Absolutely. Some of the things we're looking at ‑‑ and our customers are doing a lot of this at the moment ‑‑ is looking at areas like code coverage and tests, number of defects, defect rates per release. Looking at the aggregating and correlating the quality metrics out of multiple test and scanning tools.

Doing static analysis and looking at the defect rates, doing dynamic analysis, and then also looking at the defect rates, as well as application performance and health scores. Looking at the performance in terms of resource utilization, response time, availability, execution failures, and so forth.

Comparing current release in production with next release just about to come forward, and being able to run that over time, so you can see whether you're making quality improvements over time.

If you're able to actually give your application a health score, and then you can measure that not just in production, but also in staging, or pre‑prod, whatever you want to call it, then you can start to make sure that you're getting better with every release. Your quality is going up with every release.

You can do with actual data, real measurements, so coming out of these testing tools, as well as coming out of actually running that in a stood up environment. There's lots of feedback loops you can close there.

Once you start to find problems as you find them especially in production, but also in pre‑prod and staging, feeding those things back into the test cycle so that you never find the same mistake twice, because the first time you find it, the next time you'd test for it.

Gordon:  This idea of doing things incrementally in stages, before they hit production, is really important from a security perspective as well. I was just having a conversation with one of my colleagues, or actually several of my colleagues, about this kind of tension between the traditional security guy who is sort of, "Stop. Stop. Don't push it out there," and this idea of whether you like the term or not, DevSecOps, where security gets baked in, and added incrementally.

What we were saying, and what was really coming out as we were having this discussion was that while the reason there's this tension or maybe disconnect is from the security guys' point of view, to a degree, the serious security flaws are pushed out into production.

Well, that is something that simply needs to be stopped to a degree that you can tolerate failures and errors in security that don't hit the actual production environment, because you found them through automated testing, or whatever. Then that makes more sense as this incremental, and sometimes breaking things sort of process.

Andi:  Yeah. Absolutely. This is actually something I've done a little bit of work, and most of the work is being done by someone that you probably know well, Ed Haletky of the TVP, @Texiwill on Twitter.

He's done a bunch of work, and I've put my two cents worth, and it's probably worth maybe one. Looking at security, and security testing, pin testing, code quality testing, so finding things like potential SQL injection, these sorts of things.

Also using some of those tools like Fortifywhich will do quality of code scanning for security purposes. You can start to shift left in that respect, but also continuing to get inputs from security testing even post release. There's no reason why security testing can't keep going even after you've released.

You can get to a certain coverage rate. This is where data helps. You get to a 90 percent, or a 92 percent, or a 95 percent coverage rate, or confidence level if you will. You go, "OK, I'm ready to release. I know that the remaining five percent is potentially low impact, or low risk. I'll put it out there anyway, but continuing to test."

There's some really interesting work out there that Ed's published about cloud, cloud promotion, and cloud delivery that actually really focuses on using these metrics from security testing, both pre and post release, which I think is actually really important.

Gordon:  We're going to be hearing a lot more about this whole security angle everywhere. This is partly an IoT show. We've heard a lot about security. I'm not sure we've heard a lot of solutions, but we've heard a lot about security.

Obviously, it is a big part of the DevOps discussion. It's a big, scary world out there, and it's pretty universally recognized that having an auditor sign off once a year, and then you don't think about security for that application for another six months or whatever, really doesn't work today.

Andi:  Yeah. It's not my joke. I saw someone post it the other day. "What did you get owned by? Your toaster or your fridge?" It's so true, especially in IoT, but in a DevOps perspective, or DevOps context, being able to do that continuous security testing, I think, is really important, and bring security a shift left.

We talk about a shift left in all sorts of other areas, and we're doing it with QA which I think is awesome. We need to start doing it more with security, I believe. At Splunk, we do have a whole security practice around incident event monitoring, or unused behavior analytics. Being able to start to apply some of that in the test, and pre‑prod and staging environment I think is really important.

Being able to do some automated audit reporting around what is happening, penetrations, security violations, or passwords, or PII exposure, potential hard coded passwords, stuff like that, there's a bunch of stuff that developers could be, and should be responsible for that actually make security pro's life easier. Not harder. I think there's a lot of work yet to be done on that.

Gordon:  Absolutely. I'd go back to DevSecOps. I think there's this school of thought that, well, if you read the Phoenix Project properly, you wouldn't be having to have this discussion. Know security was baked in. Meanwhile in the real world, security has tended to be this separate profession.

We were both at DevOpsDays London. I still remember security professional I guess in his 40s, standing up in an open space, and go, "I'm one of those security guys who's been getting in your way. You know, this is the first time I've ever been to an IT conference that wasn't purely a security conference."

I love that story. Certainly not to pick on that guy. That's quite brave of him, getting up like that. I think that's such a perfect illustration of how security has operated in his own world as this gatekeeper to releasing applications.

Andi:  Yeah. People joke about IT being a department of no. Security has that moniker for fear or not. Obviously, security teams are just looking out to protect the business. That's their job. Having them in the tent, I think, is a better option, and we started to bring other teams into the tent of DevOps.

I actually gave a presentation. You can find it online at the Splunk user conference, that was titled something along the lines of "Biz PMO Dev Sec QA Biz Ops," or something crazy like that, about broadening the tent of DevOps.

Security's got to come into this tent. Being a security pro into your team, into your scrum, that's got to be a good start, doesn't it?

Gordon:  Right, even if they're not in the meeting in the stand‑up every week, or every day, at least having them be as part of the team. Just like there used to be a business analyst who's a part of the team. Our product and technologies operations, their DevOps story.

I call it the "Banana Pickle Story," because they would get asked for a banana, and as Katrinka describes it, six months later, they deliver this pickle. Really, their DevOps story...Again, the business level, because it's what matters to me.

They used a lot of technology like OpenShift, Platform Service, and Ansible for automation , things like that. But again, they were really focused on the business story of how do we get the stakeholders iterating with us. "Oops, that banana's looking a little green. Let's dial that back to yellow and get on with the other things."

Andi:  Yeah, and this is the agile model for development, is getting someone from the business...You're creating an MVP and getting someone from the business to evaluate it, and continue to iterate with their advice.

You know that you're creating the right thing as you're creating it, rather than finding out in six‑months time, that you've created a pickle, instead of banana. [laughs] I love that analogy.

We should be doing that more and more with security. If security is saying no to you all the time, then maybe you're not inviting them to the party as much as you should, so that they can say yes iteratively, rather than one big no at the end.

Gordon:  Right. Just to cap off this podcast. In order to prove to security, much less external auditors and to prove to these other stakeholders, you need data.

Andi:  Absolutely. Exactly right. This is fundamental to what I believe. We cannot continue making decisions based on "I feel that this is the right thing to do. I think we're going to have good results here." We're living in a society that's driven by data and facts. Especially as developers or IT professionals, we need to have these feedback loops based on real data.

Not just people coming back and saying "I don't feel like you did the right thing. I don't think that this was good. I think our release worked and helped our customers." We need to come back and stop having these back‑and‑forths over opinions.

There's some very crude statements about, "Everyone's got opinions," right? I like to say, "In God we trust. All others bring data." That's how we get these real feedback loop in a system's mode, getting feedback from productions systems, from customer interaction, from the security violations and the passes that we do make.

From the coverage, to know if we are doing the right thing in terms of speed, in terms of quality, in terms of impacting our business, that's where data has a huge role to play. It’s those feedback loops that DevOps depends on.

Wednesday, November 09, 2016

Podcast: Open source ecosystems with Red Hat's Diane Mueller

Dianemueller 1378481891 37
Traditionally, in open source, there was a lot of emphasis on singular projects. Today, it's much more about how multiple communities interact and build on each other. In this podcast recorded at the OpsenShift Commons Gathering and Kubecon in Seattle, Red Hat's Diane Mueller discusses what she's learned as Director for Community Development at OpenShift and what's coming next.
Show notes:

Listen to MP3 (17:39)
Listen to OGG (17:39)

Tuesday, November 08, 2016

Presentation: Optimizing the Ops in DevOps

As DevOps practices have been put into wide use, it's become evident that developers and operations aren't merging to become one discipline. Nor is operations simply going away. Rather, DevOps is leading software development and operations - together with other practices such as security - to collaborate and coexist with less overhead and conflict than in the past.

In my session at @DevOpsSummit at 19th Cloud Expo, I discussed what modern operational practices look like in a world in which applications are more loosely coupled, are developed using DevOps approaches, and are deployed on software-defined, and often containerized, infrastructures - and where operations itself is increasingly another "as a service" capability from the perspective of developers.

How does the operations tool chest change? How does the required skill set differ? How are the interactions between operations and other IT and business organizations different from in the past? How can operations provide the confidence to the entire organization that this new pipeline is still delivering non-functional requirements such as regulatory compliance and a secure and certified operating environment? How does operations safely consume vendor and upstream dependencies while meeting developer desires for the latest and greatest?

Operations is more important than ever for a business to derive value from its IT organization. But the roles and the goals of operations are significantly different than they were historically.

Tuesday, November 01, 2016

The state of Platform-as-a-Service 2016

if you're ignoring PaaS because early offerings didn’t meet your needs or because you’re more focused on operations than developers, you should look again. It enables ops to enable developers efficiently and to manage an underlying container infrastructure.

Circulating in drafts beginning in 2009, some variant of the NIST Cloud Computing definition used to be de rigueur in just about every cloud computing presentation. Among other terms, this document defined Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software

DevOps 1

-as-a-Service (SaaS) and, even as technology has morphed and advanced, this is the taxonomy that we still largely accept and adhere to today.

That said, PaaS was never as crisply defined as IaaS and SaaS because “platform” was never as crisply defined as infrastructure or (end-user) software. For example, some platforms were specific to a SaaS, such as Salesforce.

Others, specifically the online platforms that were most associated with the PaaS term early on, were typically tied to particular languages and frameworks. These PaaSs were very “opinionated.” For example, the original Google App Engine supported an environment that was just (and just almost) Python and Heroku was all about Ruby. Heroku's twelve-factor app manifesto was an additional type of opinion; write your apps this way or they won’t really be suitable for the platform. These platforms may not have been just for hobbyists, but they were certainly much more suited to developer prototyping and experimentation than production deployments. 

At the same time, platform was also used more broadly to cover the integration of a range of middleware, languages, frameworks, other tools, and architecture decisions (such as persistent storage) that a developer might use to create both web-centric and more traditional enterprise applications. Furthermore, such PaaSs as OpenShift remained not only “polyglot” but also allowed for an increasing range of deployment types both on-premise and in multi-tenant and dedicated online environments. (As well as on developer laptops using the upstream open source OpenShift Origin project.)

However, the various approaches to PaaS did have a common thread. They were bundles of technology that were largely framed as appealing to developers.

The developer angle was never the whole story though. Back in 2013, my Red Hat colleague Gunnar Hellekson talked with me about some of the operational benefits of a PaaS in government.

One of the greatest benefits of a PaaS is its ability to create a bright line between what's "operations" and what's "development". In other words, what's "yours" and what's "theirs".

Things get complicated and expensive when that line blurs: developers demand tweaks to kernel settings, particular hardware, etc. which fly in the face of any standardization or automation effort. Operations, on the other hand, creates inflexible rules for development platforms that prevent developers from doing their jobs. PaaS decouples these two, and permits each group to do what they're good at.

If you've outsourced your operations or development, this problem gets worse because any idiosyncrasies on the ops or the development side create friction when sourcing work to alternate vendors.

By using a PaaS, you make it perfectly clear who's responsible for what: above the PaaS line, developers can do whatever they like in the context of the PaaS platform, and it will automatically comply with operations standards. Below the line, operations can implement whatever they like, choose whatever vendors they like, as long as they're delivering a functional PaaS environment.

We spend a lot of time talking about why PaaS is great for developers. I think it's even better for procurements, architecture, and budget.

Today, with the rise of DevOps on one hand and containers on the other, it’s increasingly clear that a PaaS can be the sum of parts that are of direct interest mostly to developers and parts that are of direct interest mostly to operations. 

DevOps both leads to change and reflects change in a couple of areas. 

First is the number of tools that organizations are bringing into their DevOps (or DevSecOps if you prefer) software delivery workflow. Most obvious is the continuous integration/continuous delivery pipeline, most notably with Jenkins. But there are also any number of testing, source code control, collaboration, and monitoring tools that need to be integrated into the workflow. At the same time, developers still want their self-service provisioning with an overall user experience that’s tailored to how they work. A PaaS is an obvious integration and aggregation point for this tooling.

DevOps is also changing the way that developers and operations work with each other. Early DevOps discussions often focused on breaking down the wall between Dev and Ops. But this isn’t quite right. DevOps does indeed embody cultural elements such as collaboration and cooperation across teams—including Dev and Ops. But there’s also a recognition that the best form of communication is sometimes eliminating the need to communicate at all. To the degree that Ops can build a self-service platform for developers and get out of the way, that can be more effective than improving how dev and ops can work together. I don’t want to communicate more effectively with a bank teller; I want to use an ATM (or skip cash entirely).

Containers have also influenced how some organizations are thinking about PaaS. Many PaaS solutions (including OpenShift) have been based on containers from the beginning. But each platform did their own implementation of containers; in OpenShift it was Gears, in Heroku it was Dynos, in CloudFoundry it was Warden (now Garden) containers.  

As the industry moved to a container standard (Docker-format with standardization through the Open Container Initiative (OCI)), OpenShift moved with it. Red Hat has helped drive that movement along with many others though not all PaaS platforms have participated in the shift to standards. 

With container formats, runtimes, and orchestration increasingly standardized through the OCI and Cloud Native Computing Foundation (where kubernetes is hosted), there’s increasing interest from many ops teams in deploying a tested and integrated bundle of these technologies outside of any specific development environment initiatives within their companies.

That’s because the huge amount of technological innovation happening around containers and DevOps can be something of a double-edged sword. On the one hand it creates enormous possibilities for new types of applications running on a very dynamic and flexible platform. At the same time, channeling and packaging the rapid change happening across a plethora of open source projects isn’t easy—and can end up being a distraction from the ultimate business goals.

As a result, at Red Hat, we talk to customers who view OpenShift primarily through the lens of a container management platform rather than the more traditional developer-centric PaaS view. There’s still a developer angle of course—a platform isn’t much use unless you’re going to run applications on it. But sometimes there are already developer tooling and workflows in place and the pressing need is to deploy a container platform using Docker-format containers and kubernetes orchestration without having to assemble these from upstream community bits and support them in-house.

An integrated platform leads to real savings. For example, based on a set of interviews, IDC found that:

IT organizations that want to decouple application dependencies from the underlying infrastructure areadopting container technology as a way to migrate and deploy applications across multiple cloud environments and datacenter footprints. OpenShift provides a consistent application development and deployment platform, regardless of the underlying infrastructure, andprovides operations teams with a scalable, secure, and enterprise-grade application platformand unified container and cloud management capabilities.

Among its quantitative findings was 35 percent less IT staff time required per application deployed. [1]

In short, PaaS remains a central part of the cloud computing discussion even if the name is sometimes discarded for something more specific or descriptive such as container platform. What’s perhaps changed the most is the recognition that PaaS isn’t just a tool for developers. It’s also a way for ops to enable developers most efficiently and to manage the underlying container infrastructure.

[1] I’ve got some other good data points and outside perspectives that I’ll share in a future post.