Attend a Live Product Tour to see Sedai in action.

Register now
More
Close

CONTENTS

Achieving Autonomous Management with Datadog

Published on
Last updated on

June 17, 2024

Max 3 min
 Achieving Autonomous Management with Datadog

Introduction

In this blog post, we will explore how Datadog can be utilized to achieve autonomous cloud environments in just 15 minutes. We'll discuss the partnership between Sedai and Datadog, the integration they have developed, and how customers can leverage DataDog to automate manual processes. The goal is to guide customers towards a fully autonomous system. By following our guidance, you can transform your investment in Datadog into a self-driving engine. Stay tuned as we reveal the steps to unlock the full potential of Datadog and empower your cloud environments.

In addition to discussing the partnership and integration, we will also highlight the range of options available through Datadog, including partner integrations and professional services. This ensures that customers have the flexibility to configure and purchase the solutions that best meet their specific needs. Get ready to discover how Datadog can revolutionize your cloud environment and propel you towards automation and autonomy. Watch the full video here.

Observability with Datadog

Now, a lot of customers have begun to write scripts to automate actions from alerts. So let's take the previous example of an app utilizing 70% of the resources available. We can then write a script to work with the VPA or HPA in Kubernetes or the memory or concurrency in Lambda to conduct a scale up. Of course, we also need to define the threshold for scaling, so we don't waste resources. Based on our analysis, we've concluded that 20% scale up is sufficient. So we actually set it up at 25% resources scale up to ensure nothing begins to slow down when the traffic comes in. Now we have an automated workflow that scales based on an alert and two thresholds that we have established. Now, keep in mind, as your environment grows with business growth, the number of alerts you have increases to hundreds, if not thousands, each having an individual threshold, the number of scripts you write increases and the different thresholds for scale up increases well.

So what if my environment changes over time? What if my traffic patterns have seasonality components to them? Am I actually wasting any of my resources? Where can I cut costs without impacting performance? Will I be faced with availability issues? If I don't scale greater than the 25% threshold, due to unforeseen bursts and traffic? These are all issues that will automatically be resolved by utilizing Sedai.

Sedai Architecture

With Sedai, integrating it with your existing DataDog instance is a breeze and can be done in just 10 to 15 minutes. Once onboarded, you can say goodbye to configuring alerts and thresholds manually. Sedai seamlessly integrates with Datadog, leveraging its golden metrics and learning the behavior, seasonality, and infrastructure configurations specific to your applications. Once Sedai has ingested the necessary metrics and trained its models, it will autonomously take action in your environments on your behalf. The best part? Sedai can feed these actions back into Datadog as events, allowing you to visualize the real-time changes made by Sedai directly on your Datadog dashboards.

With Sedai, you can experience a hands-off approach to managing your environments, benefiting from its autonomous actions and gaining insights through Datadog's visualization capabilities.

Turn Production Autonomous using Sedai

Now, let's dive into a few examples to illustrate the power of Sedai. When it comes to performance, Sedai can swiftly detect sudden traffic bursts and autonomously scale your systems to accommodate the increased demand. This ensures that your application remains performant, with no latency experienced by your customers. On the other hand, when it comes to cost optimization, how do you know when it's appropriate to scale resources back down without compromising performance? Here's where Sedai's models come into play. They have the capability to determine when and by how much resources should be scaled back, striking a balance between optimal performance and cost savings.

Comparatively, in the manual or even automated remediation workflows in DataDog, you still need to configure alerts and thresholds, and periodically review and adjust them through QBRs to align with the evolving environment. However, with Sedai, there's no need for such manual intervention. Sedai incorporates continuous learning, autonomously managing seasonality and behavioral changes in real time. It understands the precise resource requirements, when to scale them up, and when to scale them back down, all without any human involvement.

Moreover, Sedai goes beyond performance optimization. It can also analyze releases into production on your behalf, providing alerts if an application is likely to encounter errors, timeouts, or performance degradation in the future. This empowers your developers to proactively assess application performance by leveraging our APIs before pushing changes into production. Alternatively, if an application is already deployed, Sedai will send timely alerts, notifying you about potential issues such as timeouts expected within an hour unless certain resource changes are made. Additionally, Sedai can assist you in autonomously establishing service level objectives (SLOs) for all your applications. By examining their historical behavior, our models can allocate appropriate SLOs for each application. Furthermore, Sedai strives to keep you within your error budgets by taking necessary actions in production whenever feasible.

With Sedai, you can unleash the full potential of automation, ensuring optimal performance, cost efficiency, and proactive monitoring for your applications in a seamless and autonomous manner.

Leveraging Datadog and Sedai

When it comes to integrating Sedai with Datadog, there are specific areas where Datadog is used for setup. You would configure the metric pull from Datadog and the event push back into Datadog. To achieve this, you can simply add Datadog to your AWS account using your API key. By doing so, Sedai will pull the relevant metrics associated with that specific cloud account. Through our models and tagging, we can determine which metrics are applicable to the cloud account. Additionally, you can integrate a notification provider by adding the corresponding API, enabling the sending of notifications back into Datadog.

Now, let's focus on remediations and optimizations performed on serverless or Lambda functions. You can utilize Datadog to send the event information back into the platform. To set this up in Datadog, you can install the Sedai tile from the integrations section. By installing this tile, you gain access to an out-of-the-box dashboard that houses all the events and relevant information. Using the calendar view, you can navigate through the past months and observe various events, such as scale-up memories and Lambda jobs, along with their respective timestamps. By leveraging Datadog as the visualization tool, you can easily observe the memory increase over time and the corresponding decrease in Lambda durations. This visual representation allows you to effectively track and analyze these changes. Furthermore, we are continuously working on enhancing the features to provide even deeper visibility through Datadog.

In summary, Datadog acts as the "glass' ' through which you can visualize and monitor Sedai's events and actions. By integrating Sedai and Datadog, you can take advantage of Datadog's powerful capabilities for tracking and visualizing the impact of Sedai's optimizations and remediations.

Summary

Embracing autonomous capabilities brings significant benefits to SRE, Ops, and Dev teams, particularly in terms of performance optimization, availability, and cost savings. Datadog serves as one of the industry's leading APM and observability platforms, enabling you to visualize various aspects beyond infrastructure and APM, such as network, logs, and security. However, the true synergy between Sedai and Datadog lies in ingesting these metrics and transforming your system into an autonomous environment. Sedai takes over many day-to-day operational tasks that impact your teams, allowing developers and ops personnel to focus on more critical issues. Sedai effectively manages availability, performance, and cost, while you leverage Datadog's comprehensive visualization capabilities.

Setting up Sedai is incredibly simple. No manual configurations, thresholds, or alerts are required. All you need to do is connect your Datadog APIs and cloud provider, and Sedai's models will automatically go to work, determining how and when to take actions on your behalf.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.