The Answer Isn’t Shift Left or Shift Right — It’s Shift Up

Microservices architectures are rapidly becoming the new norm architects rely on when it comes to cloud computing. A recent statista report identified 85% of large organizations currently use microservices.

‍

At a basic level, a microservice arranges an application into a set of loosely coupled code bases that are each independently easy to locate, address, discover, and reuse across multiple services.

‍

These characteristics all contribute to the significant benefits of microservices:

Avoid long-term commitment to a design or technology
Accelerate innovation, velocity, and scalability
Development-friendly and easy to test, maintain, and deploy

‍

But while microservices resolve many of the problems that monolithic architectures created, they also come with their own set of unique challenges:

Increased design complexity. SRE teams typically try three approaches to mitigate this: Adopt a lightweight protocol for their microservices to communicate (such as gRPC); Design microservices to tolerate transient errors; and Focus on team upskilling.
Increased operational complexity. The new definition of success involves investing in good observability platforms, implementing SRE and DevOps best practices, and taking distributed tracing seriously.
Increased resource consumption. This typically is disregarded as a non-issue given the overall significant gains in innovation, velocity, and agility.

‍

To be fair, design complexity is to be expected and reasonable given the nature of microservices. Operational complexity on the other hand is merely a series of band-aids that overall ignore the root problem: a lack of thoughtful design processes. And resource consumption is too easily dismissed — organizations should not blindly accept performance gains since it's clear they come with hefty cloud costs.

‍

Managing a monolith was a lot, but managing thousands of microservices is no small feat either, especially with rapid, large-scale adoption. As a result, applications have been left in turmoil as SRE teams scramble to patch up operations with alerts, exponentially grow their teams, or write in-house solutions (which often end up needing constant supervision and care).

‍

If your organization is experiencing frequent and/or longer production issues as well as increased cloud spend, then your microservices architecture is no longer beating the monolithic past.

The Debate Between Shift Left vs. Shift Right

So how do you recoup the significant benefits of microservices I mentioned earlier? DevOps and SRE teams today typically look to their CI/CD pipeline and use one of the following approaches:

Shift left: Move tasks to an earlier stage than before
Shift right: Move tasks to a later stage than before

62c925666d276915f7bdd645_jleiBp0VoiXJg5YbLRSRkCZqENhYMzhp3NAWKhC1cafuZD5IBQU78C54UJ3GxSayaXnc9UOgLPAtCLZdqJYCMFbZVHXQSbvxeppT60hbIu3ZXFD6-a5TMorc8feyOZsolzJFWba82vEyBCnWGCc.webp

‍

How do you determine which approach is right for your organization? Shifting either left or right seems reasonable, but it’s yet another band-aid that doesn’t address the underlying issue: toil.

‍

Toil is a term coined by Google in their book on Site Reliability Engineering; it describes the type of work that inhibits engineering velocity.

‍

“Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows.”

Given the dynamic nature of microservices, organizations need to set higher goals to eliminate toil and deliver better performance and availability. Shifting left or right will only perpetuate the amount of toil your team faces on a daily basis.

Why Organizations Should Shift Up

‍

There’s another direction you can take instead of shifting within your organization’s CI/CD pipeline: Shift Up. In other words, take advantage of autonomous systems.

‍

“Microservices are designed to be managed with autonomous systems and not automated systems.”

Jigar Desai, Senior Engineering Leader - AI & BigData, Facebook

‍

The tools and platforms used today to combat operational complexity simply have limits. Observability/AIOps platforms are designed to easily and effectively manage services — not microservices. Observability simply cannot manage the complex dependencies and massive data generated by microservices, which means SRE teams are stuck with toil.

‍

But the advent of microservices also brings new innovation to solve new problems. An autonomous system is the next — and critical — step in the microservices journey. To be clear, autonomous systems do not replace observability platforms; they work together. Autonomous systems are simply a pragmatic solution to alert fatigue, and capable of managing the complex dependencies and immense amount of data that are inherently part of microservices. And most importantly, autonomous systems handle the toil, which means engineers can fully take advantage of one of microservices core perks: innovation velocity.

‍

Microservices is the new norm when it comes to architectures, and soon autonomous systems will be the new norm for managing them.

‍

If you’re still debating between Shift Left or Shift Right, you’re complacent with managing numerous observability dashboards, innumerable alerts, and complex rules and thresholds that require constant adjustments. Band-aid solutions aren’t enough. Alert fatigue will only be replaced by team burnout.

‍

Autonomous systems tackle the root problem and invite full-steam-ahead innovation. Teams are no longer stuck in the monotony of production issues, strenuous RCAs and post-mortems, and the world of thresholds. Not only does it unlock the full potential of microservices, it creates freedom for your team to focus on the fun stuff.

‍