Do you really need Kubernetes?

Kubernetes offers a uniform interface for running workloads as containers across everything from cloud service providers and on-premises environments, to F-16 fighter jets. The technology is open-source and is accompanied by an open, large and mature ecosystem for monitoring, Continuous Integration (CI), Continuous Deployment (CD), and pretty much anything else you can think of. This allows you to avoid the vendor lock-in of proprietary cloud services, and (in theory) easily switch cloud providers if one disappears or you are dissatisfied with the direction a provider is heading.

This sounds great! Is Kubernetes the silver bullet we’ve been searching for all these years?

– You (perhaps)

Hold your horses, cloud-native cowboy. Kubernetes has its strengths and applications, and while committing to it has been the right move for some, I believe there are several long-term risks and second-order effects that organizations should be aware of and explicitly consider before going all-in on Kubernetes.

… at the cost of reduced utilization of the cloud.

One of the main arguments for adopting Kubernetes is its portability, especially when compared to proprietary and less portable cloud services like Amazon Web Services (AWS)’s Elastic Container Service (ECS). Depending on proprietary technology comes with risks, but the benefits should be carefully weighed against the disadvantages. Using managed cloud services offloads significant responsibility, complexity, and risk onto a trusted provider, and allows you as a customer to focus on what differentiates your business1.

Standardizing on Kubernetes often leads to using the least common denominator across cloud providers, while you miss out on the opportunities vendor-specific services like AWS Lambda can offer, and the tighter integration these services have with the rest of a provider’s ecosystem.

Despite its portability, Kubernetes also introduces a form of lock-in – not to a specific vendor, but to a paradigm that may have implications on your architecture and organizational structure. It can lead to tunnel vision where all solutions are made to fit into Kubernetes instead of using the right tool for the job.

If everything in an organization revolves around Kubernetes and the development teams have little direct exposure to the cloud, they are deprived of discovering potentially better and more efficient ways of solving their problems.

If the portability of Kubernetes is an important property due to a potential future migration to another cloud provider, you should also consider the probability of a migration and whether the cost of adopting Kubernetes outweighs the expected migration costs. If they do, using Kubernetes as a hedge does not make much sense from an economic point of view at least. Enterprise architect Gregor Hohpe2 has coined a law that may be relevant here:

[O]ne of the key culprits [behind complexity] is an organization’s inability to make meaningful decisions: everything has to be multi-platform, multi-cloud, portable, integrated with legacy systems, and customized for all possible options just in case. This major pitfall leads us to our final insight:

Gregor’s Law: Excessive complexity is nature’s punishment for organizations that are unable to make decisions.

… at the cost of increased responsibility, complexity, and risk.

Adopting Kubernetes across an organization is costly in time during both initial setup and ongoing operations. This time use is not only a matter of the direct cost of salaries but also the opportunity cost of diverting resources that could potentially be used elsewhere in the organization.

While getting started with Kubernetes can be fairly straightforward, doing it right and managing it (and the required knowledge!) effectively over time is challenging. Organizations invested in Kubernetes often require dedicated experts or platform teams to manage central Kubernetes clusters and associated tools from the ecosystem to relieve development teams from an excessive operational burden.

You need to continuously ensure that the platform is up-to-date, secure and robust, and that any changes do not negatively impact development teams. The blast radius is large, and you often get an internal team in the critical line of development teams. Over time this can lead to a fuzzy responsibility model where you gradually revert back to the very silos (“Dev” and “Ops”) you have been trying to move away from.

Complexity lurks beneath the surface …

Even with managed services like Amazon Elastic Kubernetes Service (EKS) and Google Kubernetes Engine (GKE), the flexibility of Kubernetes and its ecosystem comes at the expense of taking on substantial operational responsibility.

You should consider whether you actually have business needs that justify the level of flexibility Kubernetes offers. Could you instead use a managed cloud service like Amazon ECS, which is less flexible but simpler to use, covers most use-cases and shifts more responsibility and complexity onto the cloud provider? A so-called “80% solution”.

Should your organization bet on Kubernetes? Like many things in our industry, the answer is “it depends”. Where are you today and where do you want to be? Which constraints and regulations are you under?

Kubernetes and its ecosystem have a lot to offer, and for those who have done a proper evaluation, it has likely been a good decision. If you have a real, long-term business need for operating most of your organization’s workloads across cloud providers and/or on-premises, the uniform interface offered by Kubernetes has a lot of appealing properties. Unfortunately I believe this group is a minority of users, and I suspect that many have jumped on the bandwagon too hastily without considering total cost of ownership and opportunity costs. The future may bring greater standardization across cloud providers while allowing you to use the cloud for what it is worth and without taking on an unnecessarily large operational burden, but that is not the state of Kubernetes today.

To do more than just ruffle feathers I want to briefly mention an alternative and lightweight approach that I have seen work well for various customers during the last couple of years:

Autonomous development teams with a cloud-first mindset, full responsibility for the software development lifecycle (SDLC) of their solutions and the support they need to take on this responsibility.

  • They use the right tools for their specific problems, and leverage managed cloud services like AWS Lambda and AWS Fargate to move faster, more securely and with less complexity in their baggage.
  • They have the required knowledge and access to troubleshoot and handle incidents in their production environments without having to involve other teams.
  • They can be supported by a developer platform that helps them work efficiently with the cloud while they can innovate on top, and diverge and extend if necessary. Such a platform can in its simplest form consist of cloud experts who share their knowledge, experience and practices with development teams (a “Thinnest Viable Platform” in Team Topologies lingo), and over time include self-service tools, services and building blocks that cover the most important needs well enough.

To end on a summarizing note: before making strategic decisions in an IT organization, be willing to give up some control, and focus less on technology and where you are today, and more on the direction you want to move, total and opportunity costs, and how your choices may impact autonomy and responsibility in, and structure of, your organization.

Read More

Blythe Coby