OneOps, TwoOps … Exploring cloud ops

OneOps, TwoOps, RedOps, BlueOps,
DevOps, NoOps, OldOps, NewOps.
This one relies on cooperation.
This one banks on automation.
Say! What a lot of ops there are.

– With sincere apologies to Dr. Suess

Those of you who follow me on Twitter (@jamesurquhart) may have come across frequent recent discussions about new operations methods enabled by cloud computing. The most common terms in these discussions include DevOps and it’s controversial sibling, NoOps. While I think the practices behind these terms are critical to understand as the nature of IT operations shifts to meet new demands, the terms themselves are less than helpful.

So, what I thought I’d do today is walk you through key new operations concepts being adopted by the most cloud-savvy organizations I know, but without allowing terminology to distract the discussion. If I am successful, you’ll be able to look past the label and see the incredible value these new models bring to businesses and institutions of all sizes.

If I am unsuccessful …well, maybe we’ll keep having these conversations for another year—kind of like Groundhog Day.

How is cloud changing IT operations?

Understanding how cloud computing drives fundamental changes to the way IT works, rather than just becoming another way of expressing what has come before, doesn’t rest with its causes, but with its effects:

In the past, I’ve gone into some depth about how cloud computing takes IT from server-centric to application-centric operations.
I’ve also pointed out that the very nature of “who owns what” in the cloud reorganized operations activities along application, services and infrastructure lines.
Furthermore, I’ve also discussed how the highly interdependent, multi-owner nature of the cloud brings forward the science of complex adaptive systems.

In conjunction with these three, one other concept is critical to understand. What matters most to any business is the application of IT to business problems, and the ongoing support of those applications as long as they remain applicable to the business. The rest of IT exists in support of that.

What people not working closely with cloud computing fail to realize is that the application-centric nature of cloud operations shifts the very nature of operations away from infrastructure (as it has been since the mainframe) to, well, applications. (While I usually hate electric utility analogies for cloud computing, this is indeed similar to the shift of power generation from private generation to public utilities.)

If you are focusing on running applications in an environment you may or may not control, you focus on how to keep code running, data available, configuration viable and policy enforced. And, since the only thing you control is the code, data, configuration and policy, you have to start focusing on how to build performance and survivability into the application itself.

This was the first lesson learned by the Web 2.0 companies that embraced Amazon’s EC2 and similar services early on. To make an application run well at high scale in someone else’s data center, you have to make the application responsible for its own operational integrity. So, the practice of integrating development tools and people with operations tools and people was born (and became the first form of DevOps—embedding operations skills into development teams).

When skills just aren’t scalable enough

That sounds like a heck of a solution, right? Build applications that utilize the services they run on, and add some custom automation developed by people who understand server, network and storage performance, and how to keep IT running.

Except … there’s one little problem.

In any organization with more than a few applications to deploy and operate, the problem of scaling operations resources (people and tools) to meet that demand becomes a question of not only cost, but coordination across teams. At a small scale, that’s not a big deal. However as application teams grow in number, the problem of coordinating operations activities becomes increasingly difficult.

In an ironic twist, some early adopters of this model report utilization and contention issues when operations staff are embedded in development teams. The operations staff are faced with a dilemma: either selfishly protect the needs of their own projects, or work with other operations staff on other projects to find common ground—potentially impacting their own projects’ approaches or schedules.

The solution for some of the most bleeding-edge of these companies is interesting. Rather than force bureaucracy into the mix, they took a different tack: turn operations into a service—with an API. A platform service, or PaaS, to be exact.

In the PaaS model, developers utilize a service that embeds most of the operations automation for a class of applications right into the platform. You work with code and data, and configuration and most policy is handled for you (though you might provide metadata to influence both). The development team steps back from defining the specific operations logic for their applications, and instead trusts it to the platform service.

Because the developer does little day-to-day operations in the traditional sense, this approach is sometimes called NoOps. I personally despise that term.

It should also be noted that these platforms are essentially coding frameworks provided as a service, which can limit the class of applications to which they apply. So, it is unlikely that a single platform solution will meet the needs of an entire business.

Nonetheless, I think this is the (long-term) future of IT operations: relying on platform services to manage most of the day-to-day performance and survivability challenges a custom application faces. For those companies that are big enough, there may be a team that uses more of a merged DevOps team approach to deliver a platform service of their own. But the vast majority of companies will slowly move away from running infrastructure toward building and constantly tweaking applications.

The road will be far from easy

I say that knowing full well that many of you are reading this thinking “there is no way my organization is moving to a model like that anytime soon.” And I completely agree. Legacy applications weren’t built for this model, and most organizations aren’t set up to handle these tasks, either. The “traditional” siloed operations model will survive for a while at most companies.

But for how long is, in my opinion, uncertain. Take a look at Netflix, a poster child for pushing cloud operations boundaries. They believe very much in the platform services model.

The truth is, if you haven’t already started automating operations for your applications built for the cloud, you are not taking full advantage of the model. Start, at least, with that. However, consider that, as platform services (both public and private) mature, it may make more sense to build your next generation of applications on one.

Just don’t fool yourself. Regardless of which model you adopt, your company will always be doing some sort of operations. Don’t let the terminology fool you when it comes to that.

Image courtesy of Colin Smith.

James Urquhart is vice president of products at enStratus and a regular GigaOM contributor.

Related research and analysis from GigaOM Pro:
Subscriber content. Sign up for a free trial.