Attend a Live Product Tour to see Sedai in action.

Register now

Kubernetes Cluster Scaling Challenges

Published on
Last updated on

April 1, 2024

Max 3 min
Kubernetes Cluster Scaling Challenges


In this article we will explore the reasons behind our adoption of Kubernetes, delve into the concept of autoscaling within Kubernetes, and specifically focus on cluster autoscaling. Throughout this article, we will examine the various tools that enable cluster autoscaling configuration. Additionally, we will provide a high-level overview of the event-driven autoscaling capabilities offered by Kubernetes.  You can watch the original video here.

Embracing the Era of Containerized Application Deployment

In the ever-evolving landscape of application deployment, we have come a long way. Once upon a time, our applications found their home on physical servers. Then, as technology advanced, we embraced the realm of virtual machines. But now, in the present day, we have entered a new era where containers have taken the center stage for application deployment. Why did we make this shift? Well, it all boils down to addressing the challenges we encountered in the traditional and virtualized modes of deployment. Containers emerged as a powerful solution, offering us a way to overcome those hurdles and unlock new possibilities for efficient and flexible application deployment.

In the present day, containers have become the standard and widely accepted approach for deploying applications at any scale . However, when it comes to managing containers within a single physical machine or virtual machine, you have the freedom to handle it on your own terms.

But imagine a scenario where you have hundreds or even thousands of workloads, and you're deploying containers across numerous physical or virtual machines. In such cases, it becomes impractical to manage everything manually. This is where the need for a specialized tool arises. Enter Kubernetes.

Exploring the Potential: Cluster Scaling in Kubernetes

In the vast landscape of container orchestration, Kubernetes stands out as a leading and extensively embraced tool. Its popularity stems from a myriad of benefits it brings to the table. While I won't delve into the specifics of these advantages in this discussion, it's worth noting their significance. However, for the purpose of our session, we will narrow our focus to the crucial topic of cluster scaling within Kubernetes.

Moreover, it's worth mentioning that the recent advancements in technology have made it possible to set up Kubernetes in various environments. Whether it's on-premises, in the cloud, or even on edge devices, Kubernetes can be deployed virtually anywhere.

So, let's set our sights on the captivating world of cluster scaling within Kubernetes and unravel the possibilities it holds. Join me as we explore this essential aspect of container orchestration and discover the potential it offers for optimizing your application deployments.

Focusing on Public Cloud Deployment: Exploring Kubernetes on AWS

Kubernetes can be deployed across a diverse range of environments, including physical servers, virtual servers, private cloud, public cloud, hybrid cloud, and even edge devices. However, for the purpose of this topic, our focus will be specifically on the public cloud, and more specifically, on AWS. Allow me to explain why.

It is worth noting that a significant majority of container workloads, approximately 80% of them, are deployed on the cloud, and out of those, an impressive 82% of Kubernetes deployments specifically run on AWS. These statistics highlight the widespread adoption of AWS for containerized workloads and Kubernetes deployments. As a result, we have chosen AWS as our primary focus for this topic.

Within the AWS ecosystem, there are multiple options available for deploying Kubernetes. When it comes to self-managed Kubernetes environments, the onus is on you to configure both the control plane and the worker node components. However, with Amazon EKS, AWS assumes the responsibility of managing these aspects for you.

Part 1.  Auto Scaling  Kubernetes

At its essence, autoscaling entails the dynamic modification of resource capacity to align with the specific requirements at hand. Within the context of Kubernetes, this capability is harnessed to optimize resource utilization and cost-effectiveness by scaling the cluster up or down in response to demand fluctuations. By leveraging autoscaling in Kubernetes, you can ensure efficient allocation of resources, enabling your cluster to adapt seamlessly to varying workloads while optimizing costs along the way. Autoscaling, in simple terms, refers to the dynamic adjustment of resource capacity to align with the specific requirements at hand.

At the pod level, Kubernetes provides two key options: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA). HPA enables automatic scaling of the number of pods based on resource utilization metrics, ensuring that your application can dynamically adapt to changing demands. On the other hand, VPA focuses on adjusting the resource allocation within individual pods, optimizing their resource requests and limits to enhance efficiency.

When it comes to managing resources at the node level, Kubernetes offers a powerful tool called the Cluster Autoscaler. This tool automatically adjusts the number of nodes within your cluster based on resource utilization, effectively scaling your cluster up or down to meet the workload requirements.

Exploring HPA, VPA, and Best Practices in Kubernetes Autoscaling

Within the context of resource allocation in Kubernetes, we find two additional options worth exploring: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), each with its own specific capabilities.

HPA comes into play when you need to dynamically scale the number of pods based on resource utilization metrics. This means that as demand increases, HPA will automatically add more pods to your workload to ensure smooth operation. Conversely, during periods of reduced demand, HPA can scale down the number of pods accordingly, optimizing resource usage. On the other hand, VPA focuses on enhancing resource allocation within individual pods. By adjusting the resource requests and limits of a pod, VPA can dynamically allocate additional memory and CPU resources, scaling it up to meet higher demands or scaling it down to conserve resources during periods of lower utilization.

To effectively leverage HPA and VPA, it is important to adhere to best practices. These best practices encompass various considerations, such as defining appropriate resource requests and limits, setting target utilization thresholds, and regularly monitoring and tuning the autoscaling behavior.

By following these best practices, you can harness the full potential of HPA and VPA, ensuring efficient autoscaling and resource utilization within your Kubernetes environment.

Part 2: Cluster Auto Scaling

Optimizing Resource Configuration: Best Practices for HPA, VPA, and Cluster Scaling in Kubernetes

Ensuring the correct resource configuration for your workloads is essential. It is recommended to configure Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) together, unless you have specific requirements for custom metrics or external metrics.

In Part 1 we highlighted the potential risks and challenges of configuring HPA and VPA together using the default Kubernetes metrics. It is important to be aware of these risk conditions, as improper scaling behavior may occur if not handled properly. Additionally, it is crucial to consider the scalability of your cluster when utilizing HPA and VPA. Failure to do so can lead to a situation where HPA and VPA trigger scale-up actions, but the pods end up in a pending state due to insufficient capacity in the worker node groups.

We will delve into these scenarios and discuss how to effectively address these challenges. Join us as we explore practical strategies to optimize resource configuration, mitigate risk conditions, and ensure smooth scaling within your Kubernetes environment.

Scaling your applications is contingent upon the availability of ample capacity within your cluster. Without adequate resources, scaling becomes infeasible.

Under the hood, Kubernetes on AWS ensures that containers are scheduled and run on worker nodes with a functional container runtime engine, optimizing the deployment and execution of containers within the cluster. This seamless orchestration enables efficient utilization of resources and streamlined container management.

Scaling Considerations: Fargate and EC2-Based Kubernetes Clusters

When scaling up your Kubernetes cluster, it is important to have sufficient capacity available, except when utilizing Fargate. With Fargate, AWS handles the management of worker nodes by creating dedicated microVMs, known as Fargate microVMs, for each pod. This eliminates the need for manual management and enables automatic scaling and provisioning by AWS. However, in the case of regular EC2-based Kubernetes clusters, it is essential to ensure your cluster scales appropriately to meet the internal requirements.

There are two options for scaling your Kubernetes cluster. The first option involves modifying the autoscaling group settings within your EC2 console to add additional nodes. However, this method is not mandatory and can be avoided if preferred.

Part 3. Cluster AutoScaler

In a dynamic infrastructure, it is crucial to have automated scaling. We require tools that can automatically adjust the cluster's capacity, both increasing and decreasing it. Today, we will talk about two such tools: cluster autoscaler and Karpente.

Cluster autoscaler is a highly popular auto scaling tool utilized across various cloud providers. It is compatible with all major cloud platforms, including EKS, and can be deployed anywhere. It functions exceptionally well, even when dealing with a large node group consisting of a thousand nodes. Let's explore what occurs when you deploy a cluster autoscaler.

AutoScaling with Cluster Autoscaler: How it Works

The cluster autoscaler is responsible for initiating scale-up and scale-down actions based on certain conditions. If there are pods in a pending state due to insufficient resources in the cluster, the cluster autoscaler will trigger a scaling action to accommodate the pods. Additionally, it continuously monitors the utilization of nodes in the cluster. If it identifies a node that has been underutilized for a prolonged period, it will trigger a scale-down action.

Now, let's delve into how the cluster autoscaler operates. It is deployed as a deployment within your Kubernetes cluster, similar to other workloads. It constantly monitors events occurring within the cluster. When it detects a pod in a pending state caused by resource shortages, it initiates an autoscaling event. The autoscaling group then adds additional instances to the cluster to meet the resource demands. Once the newly added node becomes ready, the pending pod is scheduled onto it. Conversely, the cluster autoscaler also keeps a close eye on the nodes. If it identifies a node that remains idle or underutilized for an extended duration, it triggers a scale-down action. As a result, the autoscaling group removes nodes from the cluster.

This is how the cluster autoscaler ensures scaling, both in terms of scaling up and scaling down, when it is employed.

Challenges Faced with Cluster Autoscaler Implementation

Selecting Appropriate Instance Types

One challenge faced by SREs when using cluster autoscaler is choosing the right instance types. AWS provides a range of instances optimized for various factors such as compute, memory, and IOPS. This becomes challenging when dealing with a mixed workload. For instance, if a cluster is built using t3.medium instances and a new machine learning workload requiring higher resources is introduced, the deployment of that workload can result in a pending state. To overcome this, an additional worker node group with a higher-capacity instance type may be needed. Cluster autoscaler should be configured to manage this specific node group, and proper labeling should be applied to deploy workloads to the designated node type.

Provisioning Delay

Another major challenge with cluster autoscaler is the delay in provisioning. Scaling is triggered by the autoscaling group, and it typically takes a few minutes for new nodes to become available, join the Kubernetes cluster, and reach the ready state. In high-traffic scenarios, this delay can be significant and impact application performance.

Availability Zone Awareness

Cluster autoscaler relies on the autoscaling group for instance provisioning, but it lacks awareness of specific availability zone requirements for all workloads in a deployment. This can lead to situations where a pending workload triggers autoscaling, and an instance is provisioned in an availability zone different from where all the workloads need to be scheduled. This mismatch can create challenges and disrupt workload distribution.

Lack of Application Intelligence

Cluster autoscaler lacks intelligence about the specific applications running in the cluster. It doesn't have information about the characteristics or requirements of deployed applications. This limitation can make it challenging to optimize scaling decisions based on application-specific factors.

These are the primary challenges that may be encountered when implementing cluster autoscaler. Understanding and addressing these challenges will help ensure a smoother and more efficient scaling experience.

Part 4: Karpenter

Let's now shift our focus to Karpenter, a relatively new autoscaler introduced by AWS in December. Unlike cluster autoscaler, Karpenter offers dynamic instance selection, allowing it to choose the appropriate instance types for your workloads. Moreover, Karpenter can directly add and remove cluster nodes, eliminating the need to manage specific node groups.

With cluster autoscaler, you typically have to configure either a self-managed worker node group or a managed worker node group. However, Karpenter operates differently. It doesn't require the use of any worker node group or autoscaling group. Instead, you can directly manage the instances. This approach provides flexibility and simplifies the management process. Karpenter boasts high performance and fast provisioning. When utilizing Karpenter, it directly interfaces with the EC2 API to provision EC2 instances. The provisioning process is rapid, taking less than 60 to 90 seconds to create a new instance and seamlessly add it to the cluster.

Karpenter offers an efficient and streamlined approach to managing instances, providing dynamic selection and rapid provisioning for enhanced scalability and performance.

How does Karpenter works

After deploying the operator in your cluster, Karpenter is deployed as well, residing within your cluster. When there are pending pods, the Kubernetes scheduler attempts to schedule them within the existing worker node group. However, if the pods remain unscheduled due to insufficient capacity, Karpenter comes into action by provisioning the appropriate instance for the workload.

For instance, let's consider a scenario where multiple deployments are deployed to the cluster, and some pods remain unscheduled. In such cases, Karpenter evaluates the total CPU and memory requirements of the workload. It then matches these requirements with the available EC2 instances, considering both availability and cost. Karpenter provisions the instance that satisfies the workload's needs, ensuring efficient utilization of resources.

However, it's important to note that Karpenter lacks the intelligence to distinguish between memory-intensive and CPU-intensive workloads. It provisions any available instance without specific workload-awareness. Nonetheless, Karpenter excels in its speed and efficiency, outperforming cluster autoscaler. It automatically detects the required capacity and swiftly provisions the right instances at the appropriate time, ensuring efficient scaling.

Advantages of Karpenter

Karpenter offers several advantages compared to cluster autoscaler, particularly in handling mixed and heterogeneous workloads. Regardless of workload diversity, Karpenter can accurately detect the CPU and memory requirements for specific workloads and provision the corresponding EC2 instances. This capability ensures optimal resource allocation for larger workloads within the cluster.

In the case of mixed containers, Karpenter shines by accommodating different architectures. For instance, if your current containers are based on an x86 platform architecture, but you introduce a new machine learning workload that utilizes an ARM-based container, Karpenter can provision an ARM-based instance to support it. Similarly, if you deploy a GPU-based workload, Karpenter can provision an instance equipped with GPU capabilities. This flexibility allows Karpenter to seamlessly adapt to the diverse needs of your workloads.

Furthermore, Karpenter stands out in terms of speed and efficiency. By directly connecting to the EC2 API and provisioning instances without relying on autoscaling groups, it bypasses unnecessary overhead, resulting in fast instance provisioning. This direct interaction with the EC2 API enables Karpenter to efficiently manage resources, providing rapid scaling capabilities. Additionally, Karpenter supports advanced scheduling features, such as pod affinity and volume topology awareness. These features enhance workload placement and optimize resource utilization within the cluster, contributing to improved performance and efficiency.

These advantages make Karpenter a compelling choice over cluster autoscaler when dealing with mixed workloads, diverse architectures, fast provisioning, simplified management, and advanced scheduling capabilities.

Challenges of Karpenter

While Karpenter offers several advantages, there are also challenges that need to be considered when implementing it. These challenges include:

Lack of Application Awareness

Similar to cluster autoscaler, Karpenter lacks awareness of the specific applications being deployed. It does not possess information about the characteristics or requirements of the applications. This limitation makes it challenging to optimize instance provisioning decisions based on application-specific factors.

Workload Provisioning for Specific Requirements

Karpenter provisions instances based on availability and cost, without considering whether the workload is memory-intensive or CPU-intensive. To address this, SREs need to create dedicated provisioners for each workload type and tag them in the deployment configuration. This additional workload of creating multiple provisioners and managing the associated tagging can pose a challenge for SREs.

These challenges highlight the importance of considering workload characteristics, creating dedicated provisioners, and properly configuring tags to ensure optimal instance provisioning with Karpenter. While it offers flexibility and rapid provisioning, addressing these challenges effectively is crucial to fully leverage its capabilities.

Part 5: Event Driven Autoscaling in Kubernetes

Event-driven autoscaling brings the event-based capabilities of serverless Lambda architecture to Kubernetes, enabling efficient pod scaling. Keda, a widely used tool, simplifies application autoscaling by seamlessly integrating with Kubernetes clusters. Developed jointly by Red Hat and Microsoft, Keda deploys as a metric server within the cluster and integrates with various addons and integrations. It receives input from these sources to trigger scaling operations in response to events. Keda can also provide input triggers to the Horizontal Pod Autoscaler (HPA), facilitating scaling from one to N instances based on workload demands. 

Notably, Keda's direct scaling feature allows it to scale pods from zero to N instances, optimizing resource utilization by dynamically provisioning and deprovisioning pods based on workload requirements. By incorporating Keda into a Kubernetes environment, organizations can leverage its event-driven auto scaling capabilities, enabling applications to scale seamlessly in response to events and workload needs. Keda simplifies the auto scaling process, empowering users to handle complex scaling scenarios and optimize resource allocation effectively.

Challenges Faced by SREs and Alternatives for Autoscaling

Now, let's address the challenges that SREs commonly face. The first challenge revolves around right-sizing container resources, where SREs need to accurately determine the resource requests and memory requirements for containers. Continuous evaluation of workloads is necessary to ensure appropriate figures are set in deployment configurations. The second challenge pertains to right-sizing instances. SREs must decide which instance type to use, such as memory-optimized or CPU-optimized EC2 instances. This decision involves assessing actual requirements, grouping workloads into buckets, provisioning the appropriate node group, and tagging deployments accordingly. Manual management of node groups becomes burdensome if provisioning is frequent, and there is currently no automated solution available.

Another significant challenge SREs encounter is maintaining cost control and minimizing wastage. Proper scaling configuration is crucial to achieve optimal cost efficiency within the Kubernetes cluster. These points warrant discussion, and I'm here to answer any questions. In the next section, Heidi will explain how an autonomous system can be implemented in Kubernetes to address these challenges.

Now, let's address the question of whether another cluster autoscaler is necessary with AWS already providing Fargate for EKS and ECS. Fargate operates on a serverless architecture, eliminating the need for managing nodes as AWS handles them. Each pod deployed using Fargate is assigned a dedicated Firecracker VM within the AWS service account, connected to the Kubernetes cluster through an elastic network interface. AWS manages the scaling process in a serverless approach, alleviating the management burden. However, Fargate does have limitations, such as not supporting certain workloads and service mesh configurations like Istio. Additionally, most open-source and management tools do not function with Fargate workloads.

Moving on to Keda, it indeed excels in fine-grained autoscaling and simplifying configuration for HPA (Horizontal Pod Autoscaler) and VPA (Vertical Pod Autoscaler). Keda can be used as an autoscaler at the pod level, enabling scaling from zero to one instance and configuring HPA to scale from one to N instances. It acts as a metric server within Kubernetes clusters, accepting triggers from external metrics sources such as SQL, Redis, and Kafka to initiate scaling actions for workloads. However, configuring the trigger values for each workload can pose a challenge when using Keda.

It's important to note that Keda is primarily designed for running functions within a service or event-driven autoscaling. Alternative tools like OpenFaaS and Knative also serve similar purposes. On the other hand, Karpenter is a distinct tool specifically designed for cluster autoscaling. While there are other cluster auto scaling tools available, the most widely used ones are cluster autoscaler and Karpenter.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.