Attend a Live Product Tour to see Sedai in action.

Register now

The End of Cold Starts: Autonomous Concurrency for AWS Lambda

Published on
Last updated on

March 4, 2024

Max 3 min
The End of Cold Starts: Autonomous Concurrency for AWS Lambda


In this article, our primary focus is to provide you with a comprehensive overview of how Sedai addresses the challenge of cold starts through the implementation of autonomous concurrency.  Watch the video version here.

Publishing Lambda

First, let me provide you with a brief introduction to how Lambdas function. When you create a Lambda, there are several methods you can use: either through, SLS APIs or interfaces, or by directly writing your code on the AWS console.

Regardless of the method you choose, once you publish your code, AWS takes your code, along with the runtime information and any necessary dependencies, and packages them together like a container. This package is then stored in an S3 storage location. This is essentially what occurs when you publish a Lambda.

Part 1. Lambda Invocation

Exploring Cold Starts and Function Execution

Let's imagine there is an invocation for your Lambda called Test Skill Lambda. When this call is made, it is initially intercepted by the Lambda service, which is the engine within AWS responsible for handling all Lambda service requests. The Lambda service ensures that everything is in order and properly set up before executing the Lambda function. The first task of the Lambda service is to retrieve the function image from storage, which could be located in S3 or any other storage location. It acquires the stored container that contains the necessary components for the Lambda.

Next, the Lambda service examines the Lambda intro spec set to determine the required language runtime. For example, your Lambda code may be written in Python, Java, or .NET. Accordingly, the service sets up the appropriate runtime environment and loads the Lambda code into this environment. Once all these preparations are completed, the function is ready to be executed.

It is crucial to note that all the steps mentioned above, leading up to the function invocation, contribute to what is commonly known as a cold start. This includes fetching the necessary library code from storage, setting up the runtime environment, preparing and initializing it, and performing other essential tasks prior to executing the actual function. These steps occur during the first invocation of a particular instance of the Lambda.

Exploring Subsequent Invocations and the Absence of Cold Starts

Now, let's consider the second invocation that takes place shortly after the completion of the first invocation. In this scenario, the second invocation does not encounter a cold start because the Lambda container or runtime is already warmed up. It remains prepared and ready to execute the Lambda function. Consequently, the Lambda service simply invokes the Lambda function without any additional setup or initialization. It efficiently carries out its tasks and promptly returns the desired results. As a result, subsequent invocations do not suffer from cold starts and can proceed smoothly.

However, this favorable warm state does not apply to all invocations following the initial warm-up. AWS still needs to allocate resources for your Lambda, even during periods when it is not actively being utilized. For example, if there is a significant gap between invocations, let's say around seven minutes, AWS considers it a reasonable time to wait for another invocation. Consequently, it shuts down the runtime associated with your Lambda and removes it from memory.

When a subsequent request arrives after this runtime shutdown, it will once again experience a cold start. This means that the entire lifecycle of the runtime and execution of a Lambda involves these warm-up periods followed by potential cold starts.

It's important to understand this dynamic to effectively manage the behavior and performance of your Lambdas.

Let's take a closer look at the impact of cold starts, quantified by the numbers. Based on our analysis, we have found that in a typical implementation with reasonable invocation seasonality and varying number of invocations, cold starts account for approximately 1.2% of all invocations. However, despite this relatively small percentage, these cold start calls can consume up to 13% of the total execution time. Consider this: you are spending 13% of your valuable time merely preparing the Lambda to execute, time that could have been dedicated to serving customer requests or engaging in more productive activities.

In terms of cold start durations, the typical range falls between three to 14 seconds. For example, a small Node.js Lambda may experience a cold start of around three seconds, while .NET-based Lambdas can reach up to 14 seconds. This indicates the waiting time required for the container to warm up and become ready for execution. Imagine a scenario where a customer is waiting for 14 seconds to complete their checkout process—it underscores the significance of minimizing cold start delays.

These numbers demonstrate the substantial impact of cold starts on your current infrastructure, emphasizing the importance of addressing this issue to enhance overall performance and customer experience.

Quantifying Cold Start Impact: Insights into Delays and Efficiency Loss

Now, let's delve deeper into the impact of cold starts, backed by quantitative analysis. Our findings reveal that in a typical implementation with reasonable invocation seasonality and varying numbers of invocations, cold starts contribute to approximately 1.2% of all invocations. Although this percentage may seem small, these cold start instances can consume up to 13% of the total execution time.

Consider the implications: you are allocating 13% of your valuable time solely to prepare the Lambda for execution, time that could have been better utilized in serving customer requests or engaging in more productive endeavors. When it comes to the duration of cold starts, we observe a typical range of three to 14 seconds. For instance, a small Node.js Lambda may experience a cold start of around three seconds, while .NET-based Lambdas can reach up to 14 seconds. This waiting period represents the time required for the container to warm up and become ready for executing the Lambda function. Imagine a situation where a customer has to wait for 14 seconds to complete a checkout process. This underscores the significance of minimizing cold start delays.

These statistics clearly demonstrate the substantial impact of cold starts on your existing infrastructure, emphasizing the utmost importance of addressing this issue to enhance overall performance and ensure an optimal customer experience

Mitigating Cold Starts with Cost-Effective Measures

The AWS solution to address cold starts is known as provisioned concurrency. This approach significantly alleviates the problem. With provisioned concurrency set to 2, we allocate additional funds to AWS, ensuring that two runtimes remain available and don't shut down. By doing so, when a customer arrives, they won't encounter a cold start. This is achieved by completing the cold start process before any requests are received.

The provisioned concurrent instances are constantly prepared and ready for action. When the first customer's request arrives, it is immediately served without any cold start delay. The same goes for the second customer, as they are directed to the same container with the warmed-up runtime. Even when the third customer arrives, they can be accommodated in the same container, avoiding a cold start. However, when the fourth customer arrives, concurrent to the first one, the second container is utilized. At this point, the fifth customer encounters a cold start because there are no available containers. AWS needs to resort to spinning up a new runtime environment for the invocation, resulting in the customer experiencing a cold start before the function is executed.

Provisioned concurrency does help eliminate cold starts, but it does not provide a 100% guarantee. In this scenario, increasing the provisioned concurrency to three could have eliminated cold starts for the third customer as well. However, it's important to note that provisioned concurrency involves a trade-off. It comes at an additional cost, as you need to pay extra to keep those runtimes operational and ready.

Part 2.  Autonomous Concurrency

What is Autonomous Concurrency

Autonomous concurrency is an innovative feature we have introduced in Sedai to efficiently manage concurrency and minimize cold starts. This feature seamlessly integrates with your existing Lambdas in production, requiring no code or configuration changes. From Sedai's perspective, enabling autonomous concurrency is as simple as toggling a button in the user interface. You have the flexibility to enable it for individual Lambdas or apply it account-wide, based on your preference as a customer. By activating autonomous concurrency, you can eliminate over 90% of actual cold starts.

Now, let's explore how this feature works. It's important to note that it only requires three days of data to become effective. You won't have to wait for weeks or months for the system to grasp the seasonal patterns and other essential factors. Within just three days of enabling the toggle, autonomous concurrency becomes highly effective in reducing cold starts and improving performance.

This streamlined approach ensures that you can quickly benefit from the advantages of autonomous concurrency without unnecessary delays or extensive data gathering periods.

Sedai's Autonomous Concurrency: Intelligent Runtime Management for Cold Start Elimination

So, let's delve into how Sedai's autonomous concurrency works. Behind the scenes, similar to provisioned concurrency, Sedai utilizes activation calls to maintain warm runtimes. By ensuring that the runtime is ready before the first invocation arrives, Sedai effectively eliminates cold starts. Much like provisioned concurrency, the first, second, and third requests are directed to the first runtime without any cold start delays. Similarly, the fourth request is served by the second runtime, also avoiding a cold start. However, in this specific case, the fifth request encounters a cold start because it is directed to the third runtime.

Now, you might be curious about how we determine the number of runtimes needed. This is where the magic of autonomous concurrency comes into play. Sedai intelligently keeps a sufficient number of containers or runtimes warmed up and ready, surpassing the anticipated traffic. By staying ahead of the traffic and maintaining an appropriate number of warmed runtimes, Sedai effectively reduces cold starts.

Autonomous Concurrency Offers Flexibility and Safety without Code Changes

Now, let's compare the approach of autonomous concurrency with other solutions aimed at eliminating cold starts. One popular method is using warmup plugins. However, implementing warmup plugins requires a decision to be made well in advance during the deployment process. Developers must be aware of the need to enable the Lambda for the warmup plugin and make code changes to ensure compatibility. These changes need to be carefully executed to guarantee the safety of the execution. For instance, if a warmup call is made that potentially involves deleting entries from a database, it can be disastrous if not handled safely.

In contrast, autonomous concurrency in Sedai offers a more flexible and streamlined approach. The decision to enable autonomous concurrency can be made during production, without the need for code changes or upfront planning. It is inherently safe due to the way it is built, eliminating the risks associated with manual code modifications. With warmup plugins, determining the expected concurrency level is crucial, and it needs to be set up before the code goes live. However, predicting the invocation pattern accurately before deployment can be challenging.

Autonomous concurrency addresses these challenges by automatically adjusting to traffic and seasonality. It observes the live invocations of the Lambda function and dynamically determines the number of additional containers needed and the frequency of warmups. On the other hand, configuring the frequency of warmups with warmup plugins is a manual process. Regarding the seven-minute period mentioned earlier, it should be noted that there is no officially documented evidence specifying this duration. It is typically observed to be around seven minutes, but the exact timeframe is determined by AWS.

In summary, autonomous concurrency in Sedai offers a more convenient and adaptive solution compared to warm up plugins, ensuring efficient management of cold starts without requiring code changes or manual configuration.

The difference from Warmup Plugins

Warm Plugins - A8s Concurrency

Achieving Dynamic Adaptability through Autonomous Concurrency

When it comes to adaptability to runtime shutdown events, warmup plugins fall short, as they do not adjust to changes that occur within a few minutes. In contrast, autonomous concurrency in Sedai is highly dynamic and responds to every runtime shutdown event received from the running container. It is finely tuned to actual values rather than relying on a hypothetical seven-minute duration. Let's now compare this with provisioned concurrency. One of the challenges with provisioned concurrency is the need for a stricter CI/CD process and development workflow to ensure that Lambdas are appropriately versioned to take advantage of it. Provisioned concurrency cannot be applied to the latest version of Lambdas without versioning. In contrast, autonomous concurrency can be utilized with both versioned and unversioned Lambdas, allowing any Lambda you write to benefit from it.

Provisioned concurrency requires upfront analysis to determine the concurrency settings, whereas autonomous concurrency dynamically adjusts based on real-time traffic and seasonality patterns. While you can achieve autoscaling with provisioned concurrency using AWS's autoscaling feature, it involves understanding concurrency limits and setting thresholds and limits manually. In contrast, autonomous concurrency handles these aspects automatically, requiring only a simple toggle in the UI.

With provisioned concurrency, there is also the risk of cost overruns if concurrency is not properly planned and the Lambda's utilization pattern is not fully understood. This has been a significant concern for many customers considering provisioned concurrency. However, autonomous concurrency eliminates the chance of cost overruns as it is continuously monitored and designed to prevent them. Behind the scenes, autonomous concurrency leverages provisioned concurrency when appropriate, providing the best of both worlds.

In summary, autonomous concurrency in Sedai offers greater adaptability to runtime shutdown events compared to warmup plugins and avoids the complexities associated with provisioned concurrency. It dynamically adjusts to real-time conditions, works with all Lambdas, and mitigates the risk of cost overruns, ensuring a hassle-free and efficient solution for managing cold starts. 

Below is a table illustrating the distinction between Provisioned and Autonomous a8s Concurrency.

Cost-Effective Cold Start Reduction: Autonomous Concurrency vs. Provisioned Concurrency

Let's consider a scenario with a hundred million invocations and typical seasonality variations. Assuming an average invocation duration of 500 milliseconds and a configured memory of one gigabyte, the monthly cost for such a setup would be approximately $853, taking into account the occurrence of a significant number of cold starts given the high invocation volume.

Now, if you were to set provisioned concurrency to two, you would certainly reduce the number of cold starts, but there would still be a considerable amount remaining. With a hundred million invocations, you would witness multiple container spin-ups beyond the two set by provisioned concurrency, and all of them would experience cold starts. The cost for this approach would be in the range of a couple of thousand dollars. On the other hand, with autonomous concurrency in Sedai, although there might be a slight increase in invocations, the likelihood of encountering cold starts is significantly reduced. This improvement in cold start reduction comes at a comparable price point to not having provisioned concurrency at all. Therefore, from a cost perspective, autonomous concurrency offers distinct advantages by minimizing cold starts while keeping costs in line with the provisioned concurrency approach.

This graphical representation illustrates the cost comparison between different approaches. As shown, the cost decreases significantly with autonomous concurrency while providing equal or even better benefits compared to provisioned concurrency. This demonstrates the advantage of autonomous concurrency in terms of both performance and cost-effectiveness, making it a highly favorable choice.

Sedai's Adaptive Approach to Autonomous Concurrency

To enable autonomous concurrency, Sedai incorporates a special extension layer into each Lambda function. This extension takes advantage of the powerful features provided by Lambda extensions that we discussed earlier. It plays a crucial role in achieving the desired functionality. By closely observing the behavior of your Lambdas and actively listening to their lifecycle events, the extension intercepts activation calls and enables Sedai to predict the expected seasonality of invocations within our backend system. This prediction helps us determine the optimal number of containers or runtimes needed to handle the anticipated concurrency levels effectively.

To ensure accuracy and responsiveness, Sedai continuously validates and fine-tunes these parameters. We employ reinforcement learning techniques, allowing us to adapt to various scenarios, such as deploying new Lambdas or encountering changes in invocation patterns and durations. This adaptive approach ensures that your Lambdas and their runtime environments are always optimized for performance and efficiency.

Seamless Implementation: Enabling Autonomous Concurrency with a Click of a Button

Let me paint a picture of what using autonomous concurrency would look like for you as a customer. When you decide to leverage this feature, we take a few steps to ensure smooth implementation.

Firstly, we deploy an activation Lambda, also known as the AV Lambda, in your environment. This Lambda plays a crucial role in the process. Additionally, we store a significant amount of the concurrency information in DynamoDB, which is located closer to you. By doing so, we minimize latency, ensuring that the information is readily available when needed. This is particularly important for scenarios like anticipating cold starts or container shutdowns. So, this is the initial setup for onboarding you to autonomous concurrency. To make it work seamlessly, we decorate your customer Lambda with Sedai's Lambda extension. That's all it takes! And here's the best part: all of these steps happen behind the scenes with just a simple click of a button on Sedai's user interface. It's like magic, enabling autonomous concurrency 



Let me provide a brief overview of how the extension works and summarize the benefits of autonomous concurrency within Sedai.

The Sedai extension takes care of handling activation calls, collecting usage statistics, and monitoring shutdown events. It plays a crucial role in the process. When the activation Lambda is invoked, it retrieves data from Sedai's core, which includes information about seasonality and predicted concurrency numbers. This Lambda is responsible for issuing activation calls, managing concurrency settings, and listening for shutdown events, among other tasks.

Autonomous concurrency in Sedai is a revolutionary feature that streamlines the management of concurrency settings. By deploying an activation Lambda and leveraging DynamoDB for storing concurrency information, Sedai minimizes latency and improves performance. The Sedai extension handles activation calls, collects usage stats, and monitors shutdown events, making the process seamless for customers. This no-touch, no-code solution eliminates most cold starts and mitigates the risk of cost overruns. With autonomous concurrency, Sedai provides an end-to-end solution to tackle cold starts effectively. This feature is now available to all Sedai customers, enabling them to optimize their applications effortlessly.

To experience the benefits of autonomous concurrency, visit and give it a try. 


Q: Will the new Sedai extension for autonomous concurrency interfere with existing extensions?

A: The team has taken precautions to prevent interference with other extensions. They have implemented safety checks to ensure that all Lambda calls are intercepted within their extension, allowing everything to pass through normally.

Q: Does AWS support multiple extensions?

A: AWS supports multiple extensions. The platform is designed to accommodate and allow multiple extensions to coexist without interfering with each other.

Q: Can different extensions, such as tracing or logging, work independently of each other?

A: Since the focus is specifically on concurrency, other extensions like tracing or logging can work independently without any issues. Each extension can function smoothly without interfering with one another.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.