Attend a Live Product Tour to see Sedai in action.

Register now

Cutting Serverless Latency by 50 with Autonomous Optimization

Published on
Last updated on

April 1, 2024

Max 3 min
Cutting Serverless Latency by 50 with Autonomous Optimization


In this article, we will explore the world of optimization, with a special focus on serverless functions, particularly those leveraging Lambda. We will dive into how Sedai, a cutting-edge solution, helps in optimizing serverless functions. Sedai offers invaluable insights and tools that enable businesses to identify and address niche areas where optimization can make a significant impact.

We will discuss the remarkable latency reduction achieved by fabric, thanks to the implementation of Sedai's solutions. By leveraging Sedai's expertise, fabric managed to slash latency by up to an astonishing 50%. We will explore the techniques and strategies employed by fabric, unveiling the secrets behind their success.  You can watch the original video here.

Part 1: Optimization Challenge

Optimizing Lambda Functions: Fine-Tuning Memory, CPU, Duration, and Cost

In optimization, specifically focusing on Lambda functions, the main objective and the overarching goal of Sedai is to ensure the optimal performance, availability, and efficiency of every resource we handle. Managing the life cycle of these resources becomes crucial to maintaining their uptime and maximizing their performance potential.

Now, let's shift our focus to Lambda and explore the various aspects that can be fine-tuned and optimized to achieve desired control. Within Lambda, we will uncover the essential knobs that grant you the power to fine-tune and optimize your functions. These knobs act as controls, enabling you to manipulate different parameters and configurations to enhance the performance and efficiency of your Lambdas.

Memory: Impact on CPU

The  memory settings is one of the configuration settings out of the box in AWS. You can increase the amount of memory and that leads to a bunch of things. For completeness, the other thing you could tweak is Provisioned Concurrency. You can autonomously increase concurrency by managing that. There is the concept of Reserve Concurrency in Lambda to manage throttles and thresholds. You can manage Timeouts for Lambda as a way to kind-of make sure the operation runs the way you expect it to. Outside of what AWS offers, you also have the concept of Warm Ups, which some of you might be doing through warmer plugins or other third party solutions.

Refer to the graph above, for any AWS Lambda you can set the memory from 128 MB. That's a minimum you can set for any Lambda, all the way up to 10 gigs. You can linearly increase it by increments of one MB at a time. What happens when you increase memory? You get more RAM. You get all the advantages of additional memory, but what it has done very smartly behind the scenes is, it also abstracts the concept of CPU and the processing horsepower behind the memory setting. Every time you increase memory, you also get an additional fractional CPU that's added to your Lambda. If you increase memory from 128 MB to 256, a corresponding increase in the amount of processing horsepower that a particular Lambda gets and they're tightly coupled.

What you see in the relationship between memory and CPU is linear. You increase this one, the other one increases. It's a straight line that’s why it’s very easy to understand. There are very preset points in this graph, where the number of cores that's available to your Lambda jumps from two to three and three to four and so on. This is well documented. There's a known behavior and something that AWS lets you take advantage of. As a developer who's writing Lambdas, this is something that you have to pay a little bit more attention to. Make sure the Lambda that you run is able to take advantage of all the cores. Select your choice of language that you use for the Lambda, choice of libraries and make sure that they can take advantage of all the CPU cores and that all these memories set points as you increase the configuration of memory. Memory and CPU are directly related- increase one, you increase the others.

Memory: Impact on execution duration

Let's look at the execution duration. What happens when you increase memory? How does your Lambda perform? 

When it comes to computational units like functions or web services, they consist of two distinct components: the fixed part and the CPU-dependent part. The fixed part primarily encompasses the time required for network round trips, such as interacting with third-party APIs, accessing storage, or making database calls. Regardless of the amount of computational power, be it CPU or RAM, allocated to the system, this fixed part remains constant. It sets the baseline cost for running the service, Lambda, or function.

On the other hand, the CPU-dependent part varies with the addition of more CPU power and overall horsepower. This component comes into play when algorithms repeatedly process data recursively, as seen in machine learning applications. It is also relevant for tasks like data crunching or in-memory data aggregation, where increased memory and CPU capacity directly correlate with improved performance. Moreover, CPU and horsepower play a crucial role in resource-intensive operations such as image or video processing.

Plotting the relationship between memory and execution duration reveals a non-linear codependency. It's important to note that adding more CPU power does not necessarily lead to a proportional increase in performance. There comes a point of diminishing returns, which is captured by the curve representing this relationship. As memory is increased, it indirectly affects CPU utilization, contributing to the observed distribution. The key takeaway is the existence of a non-linear connection between memory and duration, with memory also indirectly impacting CPU performance.

Memory: Impact on cost

Now let's examine another aspect. Let's consider the aspect of cost. How does memory affect the cost? When we look at the documented pricing structure for Lambdas, we find that it consists of two components. First, there is the duration, which represents the amount of time the Lambda function runs or utilizes AWS resources. The second component is the memory settings that you configure. Both of these factors impact the overall cost—both the duration and the memory. This introduces a higher level of complexity. The relationship between memory and cost is no longer linear, nor is it smooth. Instead, it resembles a complex U-shaped relationship.

What we observe from the accompanying graph is that as memory increases, the cost initially decreases before eventually rising. This indicates a complex relationship between memory and cost. Why does this happen? As previously mentioned, part of the cost is dependent on memory, while the other part is influenced by the duration. As we have established, the relationship between memory and duration is not linear but rather complex. This compound effect contributes to the non-linear relationship between memory and cost.

Optimization Problem: Optimize for performance and/or cost

Now, let's delve into defining our optimization problem. The core objective here is to exert control. What are the available controls at your disposal? You aim to optimize either for performance, cost, or possibly both. Different customers may prioritize either performance or cost, and we will explore these distinctions shortly. This essentially captures the essence of the problem. If you have experience working with algorithms or control systems, you are aware that this is a complex optimization problem encompassing multiple objectives. The goal is to define an objective function that incorporates both performance and cost, and then manipulate the independent variables to achieve the optimal balance of performance, cost, or any other specified goals.

Cost vs. Performance: Optimization Targets

Let's now proceed to visualize what this entails.

Now, within this specific graph, I have removed memory as a control variable and introduced both optimization targets: duration on one axis and cost on the other axis. By examining this graph, we can observe the non-linear relationship between duration and cost. Through our extensive observation of Lambdas in various customer environments, we have come to realize that determining the appropriate settings to achieve a desired level of performance or cost can be challenging by default.

The shaded area depicted in the graph represents where the majority of customers and Lambdas typically fall. On the horizontal X axis, which represents duration, the tendency is towards the right rather than the left, indicating suboptimal performance. Similarly, on the Y axis, the graph does not reach the bottom of the curve, implying that the cost is not at its optimal level. As a default configuration, customers often prioritize either achieving the best duration or attaining the lowest cost.

Thus, this highlights the prevailing state of default settings in most environments. What is your desired outcome? Specifically, what would constitute an ideal default setting that achieves the best balance between both objectives? This optimal scenario would be represented by the bottom left corner of the graph.

Cost vs. Performance: Best of both worlds

Therefore, the objective is to achieve the lowest possible duration and cost simultaneously. This choice is straightforward. Why should one pay more to attain a certain level of performance if it is achievable at a lower cost? Alternatively, one can argue why should a higher cost be paid if the same performance can be obtained at a different cost? Thus, the default optimization target is to pursue this optimal scenario.

On the left side of the image, you can observe one of the control panels within Sedai. When customers are boarded onto Sedai and we manage their Lambdas, our default approach is to strive for the best of both worlds. We aim to maintain the same cost while identifying the optimal duration or performance achievable within that cost range. This shaded area displayed in the image represents the outcome of this default optimization strategy.

Cost vs. Performance: Real-time use cases

Now, let's shift our focus to real-time use cases. Many of our customers, as we will further explore in the second part of this discussion when we talk about fabric, prioritize real-time performance and user experience. Examples include scenarios like managing a shopping cart, processing payments during e-commerce transactions, or handling login interactions. In these real-time use cases, it is crucial to ensure seamless performance without any disruptions. Customers aim for the best possible performance and user experience during these critical interactions. They are willing to invest additional funds to achieve optimal performance for specific functions within their Lambdas.

Within Sedai, we provide a setting on the left side where customers can choose to increase performance and decrease duration, even if it incurs slightly higher costs. For instance, customers may indicate that they are willing to spend 25% more on a particular Lambda as long as the performance is improved. The shaded part of the chart represents this scenario. As you can observe, the duration trend moves towards the left, indicating faster execution, while customers are willing to accept some additional cost. However, they don't want to reach the upper region of the chart, as the difference in performance between that point and the lower region is not significant, despite the steep increase in cost. The goal is to stay within the bottom part of the graph.

It is important to note that the example presented here pertains to a sample Lambda. However, the complex relationship between cost and duration applies to any Lambda you create. The specific shape and nature of the graph depend on the characteristics of your Lambda, its implementation, and the underlying function. Sedai analyzes and understands the performance, memory, and cost characteristics of each Lambda through introspection. We then plot the graph for every Lambda you work on. Based on your requirements and goals, we identify the sweet spot within this graph and optimize the Lambda to achieve that optimal configuration.

Cost vs. Performance: Batch and Backend

Now, let's shift our focus to the other side of the spectrum. In certain backend processes, such as number crunching or image scaling, there is no immediate user interaction or real-time requirement. It is acceptable for these tasks to take a bit longer to complete, as they happen behind the scenes or during low-demand periods. Examples include batch processing tasks like end-of-day data processing or monthly reporting, where the output is generated and made available to users at a later time.

In this context, the emphasis is not on performance or duration sensitivity, but rather on cost sensitivity, as these backend processes may involve significant computational resources. The objective here is to achieve the best possible cost efficiency, even if it means a slightly longer execution time.

Therefore, the optimization approach we adopt in this scenario is to minimize costs while still setting a limit on the duration. The goal is to decrease the cost without allowing the duration to increase by more than 25%. This ensures that the batch processes are completed within reasonable time constraints, without compromising cost efficiency. This setting provides the optimization target for this specific case.

Now, let's proceed further. This comprehensive illustration below presents all the potential optimization targets available for a Lambda.

Cost vs. Performance

As mentioned before, the graph's shape is distinct for each Lambda, and the real challenge lies in determining the ideal position. While you, as a customer, provide us with your objective, the task at hand is to pinpoint that sweet spot on the graph. This is where Sedai comes into play, offering a solution to this intriguing problem.

Intricacies and Challenges in Optimizing the Relationship between Cost and Duration

Now, let's dive into the intricacies that make this problem even more intriguing. Behind the scenes, various complexities come into play. As mentioned earlier, each Lambda is not only unique in its implementation but also in its traffic patterns. Additionally, the duration of each Lambda exhibits a mean duration and a spread, which differs for every Lambda.

Traffic patterns also vary significantly, as do the dependencies a Lambda may have. Some Lambdas may make single backend calls, while others may make none or multiple calls. These factors pose fascinating modeling challenges in understanding the relationship between cost and duration.

Moreover, external changes further complicate the optimization process. Customers can make configuration changes through the AWS console, which lie outside our control. Similarly, the introduction of new releases alters the Lambda's profile. Execution times may change, and previously identified characteristics from the graph need to be reassessed and understood. Understanding this relationship is an ongoing endeavor. As environments change and new releases occur, the modeling exercise must be continuously repeated. Safety is paramount, and optimization should never cause any issues. It must be reliable and repeatable. If anything undoes the optimization, it may be necessary to redo the process to reach the desired sweet spot.

Finally, there are other complexities introduced by external factors. For instance, the concurrency of a Lambda can be modified, provisioned concurrency can be adjusted, and warmup calls can be utilized. Later, we will explore more innovative methods, such as autonomous concurrency, for achieving warmups. These additional factors contribute to the complexity of autonomous optimization.

The Optimization Process Behind Sedai: Modeling, Reinforcement Learning, and Continuous Monitoring

So, how do we accomplish this? What does it entail behind the scenes? We have implemented a life cycle-based system that employs a modeling approach. This approach defines the primary objectives of Sedai and the autonomous system, which are to ensure the availability and optimization of functions. Both goals are crucial to Sedai's operation. It ensures that the function is always functioning without any downtime. Even if there are external factors causing a downtime, Sedai takes the responsibility to fix it promptly.

The optimization goals for each resource can be set either by the customers themselves or by us based on default configurations. We meticulously establish the unique relationship between memory, duration, and cost for every Lambda and accurately model the Lambda accordingly. With an understanding of the graph's shape that I mentioned earlier, we devise a strategy to drive the function towards its goal. Once we know the goal and the Lambda's characteristics, we employ a recurrent reinforcement learning technique to steer the Lambda in the desired direction.

Continual monitoring of this process is crucial. We don't just make a change and sit back. We ensure that the optimization reaches the goal and remains stable. We also remain vigilant for any external variables that may impact the optimization, as mentioned previously. If any external changes are detected, we revisit and adjust the strategy accordingly.

This is essentially how the optimization process works behind the scenes in Sedai.

Part 2: How fabric reduced Latency by 50%

Overcoming Challenges in E-commerce Infrastructure with Serverless Architecture and Sedai's Optimization

fabric is an e-commerce platform that caters to various e-commerce vendors, providing them with the infrastructure and tools to set up their online stores. As an e-commerce platform, fabric understands the criticality of minimizing latency, as it directly impacts user experience and customer satisfaction. In the online shopping world, users are quick to abandon a website or checkout process if it is too slow or cumbersome. The e-commerce industry is also subject to seasonal fluctuations and event-driven promotions, which further necessitates a responsive and high-performing system. Fabric has built its platform primarily using a serverless architecture, leveraging Lambdas to easily scale and manage their infrastructure. However, to optimize performance, cost, and other internal objectives, certain workloads have been transitioned to EKS and ECS. Among the various services offered by fabric, the payment service is particularly latency-sensitive, as ensuring seamless and secure payment transactions is of utmost importance to both fabric and its customers. Moreover, fabric operates on a multi-tenant environment, with each customer onboarded through separate AWS accounts, introducing complexity in managing multiple customers simultaneously.

Fabric encountered several internal challenges when managing serverless functions. The dynamic nature of the environment posed difficulties as they continually added new features and aimed to optimize performance to meet service level objectives. It was crucial for them to continuously adjust the function's specifications, such as memory allocation and provision concurrency, to ensure optimal sizing. The readiness of functions to respond to customer requests without delays caused by cold starts was also a priority.

Streamlining Issue Identification and Error Tracing in a Distributed Environment

In a distributed environment, swiftly identifying the root cause of any issues and effectively tracing errors became essential. They needed to investigate and address the underlying reasons behind problems while keeping costs as low as possible. These challenges are common concerns for customers utilizing Lambda functions. They seek efficient optimization, low costs, and seamless integration of new releases. Additionally, fabric's SRE team faced the task of juggling multiple priorities while scaling their operations effectively to accommodate a growing customer base.

These were some of the internal challenges that fabric had to overcome.

1. Dynamic environment: Difficult to optimize serverless function & overall performance

2. Critical needs to continuously:

- Right size function

- Manage provisioned concurrency

- Choose how/when to warm up serverless functions and how to keep warm

- Trace errors in heavily distributed environment

- Keep costs low

3. Each stage potentially needed to be repeated with each new release

4. The SRE team's bandwidth stretched covering multiple different priorities, such as setting up and supporting new customer environments and optimizing existing customers' accounts.

The core part encompasses essential features such as card management, ordering, shipping, and identity management, which directly impact the customer's buying experience. On the other hand, tasks like configuring offers, managing inventory, implementing a loyalty system, and handling image management are vital behind-the-scenes operations that internal operators and the business itself need to address.

This is essentially how their system is organized, and they also rely on various external services. They have dependencies on external services for payments, shipping, search, notifications, and more. There is a chain of dependencies that includes several external tools, libraries, and frameworks that they rely on.

To establish this infrastructure and the underlying systems, they have employed over 10,000 services, with a significant portion of them being built using serverless architecture on Lambdas. These services are implemented in a variety of languages such as Node.js, Python, and Java. Additionally, there is a blend of Kubernetes and serverless containers used to fully define the backend environment.

Achievements for Fabric: Significant Duration and Latency Improvements

Here's what we were able to achieve for fabric. The example Lambda provided has an obfuscated name, but we managed to improve its duration by 88%. Over time, this led to a reduction in latency of up to 48%.

As mentioned earlier, fabric was willing to invest a bit more to achieve this latency reduction, which amounted to just $12. In the grand scheme of things, they were able to nearly halve the duration and latency by spending a few additional dollars. This highlights the power of optimization when finding the right balance in the optimization problem. It's worth noting that this approach applies to all combinations of optimization targets. For instance, if the focus were on cost optimization, you could expect to see a different behavior. In that case, the latency may remain similar, with the duration possibly increasing by 5% or 10%. However, the cost could be reduced by as much as 40% to 50%. It demonstrates the contrasting outcome when targeting a different optimization objective.

Efficient Scaling with Fabric and Sedai

Scaling up is a relatively straightforward process of adding more resources and memory, carrying minimal risk. However, scaling down poses a more complex challenge for fabric. Previously, they had to navigate multiple internal approvals, conduct cost and risk analyses, and consider the impact on performance and availability when scaling down.Sedai eliminates these concerns by implementing an efficient safety check system that performs dry runs for every scaling operation, whether it involves scaling up or scaling down. Sedai analyzes the checks and determines the optimal approach for executing the operation. If the safety check fails, the operation is not carried out, ensuring proactive risk management.

Fabric benefits from Sedai's machine learning-driven and continuously reinforced system. Even with new releases and changes, fabric no longer needs to worry about when and how to scale up or scale down. For example, if fabric scales up a Lambda and later implements more optimized code, Sedai automatically handles the scaling down process based on performance, ensuring seamless operation.

The provided image illustrates the trade-off between latency and cost optimization. Finding the optimal point on the curve, whether for cost or latency, is the objective. While fabric prioritizes lower costs, customer preferences lean towards speed and performance. Achieving a balance between cost and speed becomes crucial, and fabric can easily configure this balance by determining the desired level of improvement and the corresponding cost they are willing to invest, ultimately achieving the desired outcome.


Fabric experiences traffic patterns influenced by seasonality among its customer base. Sedai's monitoring and optimization capabilities allow fabric to adapt effectively to these fluctuations. Sedai enables scaling down resource requirements during periods of reduced traffic, improving cost efficiency. Conversely, during high-traffic periods like peak shopping hours or special events, Sedai enables scaling up to meet demands and maintain optimal performance. Continuous monitoring by Sedai ensures fabric's system remains consistently optimized. Moreover, the use of serverless technology eliminates the need for resource pooling, facilitating efficient allocation and reallocation of resources without manual intervention.


Q: Will autonomous actions cause downtime or increase cold starts?

A: Absolutely. When we designed the platform, our top priority was ensuring the safety of every action taken. We implemented thorough safety checks for every operation, leveraging historical data to identify actions that led to positive outcomes. Through reinforcement learning, we continuously improve our performance in these tasks. Moreover, working with serverless architectures and Lambdas provides an advantage. For instance, when increasing memory, existing running Lambdas are not affected. They continue to execute until completion, while the new memory setting takes effect seamlessly.

This abstraction layer is handled by the AWS platform itself, allowing us to build on top of it and maintain stability. With our comprehensive safety checks and established processes, Sedai eliminates any potential downtime or increased cold starts resulting from optimization actions.

Q: Does duration always go down when you increase memory in Lambda?

A:  Certainly. Let's consider a Lambda that handles CPU-intensive tasks. In such cases, increasing the memory allocated to the Lambda usually leads to a decrease in its duration. However, there are instances where we observe no impact on duration despite increasing the memory. In those scenarios, if the safety conditions are met, we can explore reducing the memory allocation.

It can seem counterintuitive. If a Lambda function is not memory or CPU intensive and primarily performs tasks like forwarding requests to downstream APIs or endpoints, it may not require a significant amount of memory or CPU. In such cases, it becomes possible to optimize costs while maintaining the same level of performance. We have encountered situations where duration remains unaffected by changes in memory or Lambda settings.

Q: Does this work approach work for all Lambdas? Or are there any restrictions?

A: Initially, our solution only supported making memory changes to the latest Lambdas, specifically the provisioned Lambdas. In the Lambda ecosystem, once a Lambda is published and deployed, the memory settings cannot be modified directly. It requires republishing and versioning. So initially, we were unable to make these changes for existing provisioned Lambdas, and our optimization efforts focused on newly provisioned Lambdas.

However, we later introduced a new feature to address this limitation. With this feature, we create a new version of the Lambda based on its existing memory profile and settings. This process is carried out efficiently and incorporates rigorous safety checks. By publishing this new version, we ensure that all Lambdas, regardless of their provisioning status, can benefit from our solution. Therefore, there are no limitations anymore, and any Lambdas you write can leverage and take full advantage of our optimization solution.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.