Attend a Live Product Tour to see Sedai in action.

Register now


Using AI /ML for Fraud Detection & Scaling with Autonomous Operations

Published on
Last updated on

June 17, 2024

Max 3 min
Using AI /ML for Fraud Detection & Scaling with Autonomous Operations


In this article, we'll summarize the talk given at autocon by Manu Thapar, CTO of Mastercard (you can watch the video here) on how Mastercard applied AI and machine learning for fraud detection.

Credit cards are a familiar concept for almost all of us. As an example, let's consider the Apple MasterCard credit card. Whenever a transaction takes place using a credit card, it is important to note that the authorization for that transaction originates from the issuing bank. As part of the authorization process, they employed a sophisticated AI machine learning system for fraud detection. This system plays a crucial role in assessing the probability of fraud for each transaction. Based on historical data and machine learning algorithms, the system calculates this probability, and if it deems the transaction to be suspicious, it is denied. Over the years, Mastercard has dedicated significant efforts to building a robust system capable of handling a staggering volume of transactions. Currently, it processes over 150 billion transactions annually, serving numerous clients globally, including leading US banks and other sectors such as credit risk management. The system not only prevents a substantial amount of fraud but also brings immense value to end customers.

Empowering Merchants and Preventing Fraud

Let's explore their comprehensive system that encompasses various components. However, the primary value they offer, as depicted on the subsequent slide, is to merchants. Their  foremost objective is to prevent merchant attrition by effectively combating fraud. On the acquirer side, they receive signals from gateways and assess them for factors such as merchant risk, credit risk, and credit card fraud probability. Each transaction undergoes rapid evaluation within a matter of milliseconds during the authorization process to determine its likelihood of being fraudulent.

As a result, a significant percentage of transactions are consistently denied, yielding substantial savings for merchants. This preventive measure is crucial because in the unfortunate event of fraud, it is the end merchant who bears the financial burden. Moreover, it significantly improves the overall customer experience by minimizing instances of fraudulent activity.

This showcases the remarkable capability of leveraging these transactions to automatically ascertain the likelihood of fraud and swiftly score them in real-time. This enables us to determine which transactions should be denied. With this automated process, we can confidently take proactive measures by automatically declining transactions, without relying on manual rule creation or human intervention for the denial process.

Disaster Recovery Across AWS

Mastercard has deployed this system at scale across multiple regions worldwide, including Europe, Sydney, and North America. Currently, it efficiently handles production traffic with an end-to-end latency of less than 150 milliseconds. This latency includes scoring the transaction, receiving it from the acquiring side, and transmitting the results over the internet for the acquiring bank's decision-making process. The system follows an active-active deployment across regions, ensuring both regions remain active and capable of covering any potential disasters. Within each region, multiple data centers are utilized to ensure seamless recovery in case of any issues within a data center.

The architecture of the software and system is designed to be resilient, and they have experienced instances where a data center in Frankfurt went down without causing any latency problems or disruptions. This resilience is a result of a well-architected design that automatically recovers from errors and ensures smooth operations.

This block diagram illustrates the various components that have been deployed on AWS. While it may appear complex at first glance,it describes the main components in a simplified manner. The system consists of numerous microservices that can be horizontally scaled. These microservices are containerized and managed through Kubernetes, with some of them running in a serverless fashion. Critical components are automatically managed through the combined capabilities of containers and Kubernetes.

A Highly Performant System

Mastercard has developed a highly performant system that comprises multiple microservices and leverages a combination of AWS services. This combination allows them to generate results within a remarkably short time frame. Additionally, the system is designed to scale horizontally, enabling them to handle billions of transactions with latencies in the range of tens of milliseconds. Through extensive testing, they have demonstrated that achieving such performance is feasible on a cloud infrastructure. The key lies in designing the system correctly and utilizing the appropriate architecture and tools to ensure that performance aligns with the service level agreements they have with their end customers.

High Availability and Resiliency

To address these concerns, Mastercard has implemented a well-architected design that incorporates robust capabilities. These capabilities have enabled them to achieve exceptional uptime and maintain seamless operations in production. The system incorporates fundamental features that allow us to scale efficiently while ensuring high availability. One of the key aspects in the system is its comprehensive instrumentation. They have implemented extensive monitoring and alerting mechanisms, which provide them with valuable insights and notifications. Many of these alerts are intelligently handled without requiring human intervention, further enhancing the reliability and stability of the system.

Data Lifecycle automation and customer analytics

Another common concern when transitioning to the cloud is security and compliance. This is an area where Mastercard has placed great emphasis and achieved remarkable success. The  system is not only designed to meet their own stringent security standards, but it has also undergone rigorous external audits. As a result, they have obtained important certifications such as PCI certification and certifications from various external agencies. These certifications validate the system's adherence to industry best practices and ensure that they maintain a high level of security and compliance.

In addition to its core functionality, the system also offers robust analytics capabilities and reporting. Through a highly customizable interface, users can access detailed information about the system's operations and understand the reasons behind each transaction denial. They provide comprehensive reason codes to ensure transparency and clarity for their customers and, ultimately, the cardholders. It is crucial to avoid denying legitimate transactions, and they strive to provide precise details regarding the denial process.

To facilitate this, they have implemented a customer case management system that aids in evaluating and resolving any issues or concerns that may arise. This system enhances the overall user experience and helps maintain a smooth and effective transaction denial process.

Ultimately, the culmination of these diverse components has delivered immense value to the customers. They have witnessed substantial improvements in terms of fraud rates, with significant reductions in basis points. As a result, they are currently in a phase of rapid customer expansion, while also venturing into new domains such as credit risk and healthcare. This expansion and diversification allow us to extend the benefits of the system to a wider range of industries, further enhancing their overall impact.

Now, let's turn our attention to the final one, which focuses on operating systems at scale. This is the crucial aspect of designing any system that operates on a large scale. It necessitates the adoption of thoughtful guiding principles to ensure operational excellence. Key considerations include not only the implementation of top-notch monitoring capabilities but also leveraging fundamental technologies such as container systems, cluster management systems, and serverless computing. These technologies enable them to automatically bring up systems without requiring human intervention, especially when issues arise within a system of such magnitude.

In addition to operational excellence, they prioritize other essential factors such as security, reliability, performance, efficiency, and cost optimization. As they progress, they continuously introduce additional capabilities, including self-healing systems, to further enhance the overall operational efficiency. As mentioned by Suresh, their ultimate goal is to achieve autonomous operations, minimizing the need for human intervention and ensuring seamless functionality throughout the system.


Let’s  reflect on the significance of autonomy in their system. The concept of autonomy is crucial as it eliminates the need for human intervention and enables the system to handle errors and issues automatically. Operating a system of such immense scale across multiple regions, including Europe, North America, Asia, and Australia, makes it practically infeasible to rely on significant human involvement. Achieving autonomy is not only essential for meeting service level agreements but also for cost optimization. As they scale their operations, the importance of autonomy becomes increasingly evident. Any advancements that reduce human interaction yield tremendous benefits for them, given the magnitude of their system. Thus, autonomy holds a pivotal role in the operations.

Looking ahead, their benchmark is to strive for a system that is ideally 100% autonomous in the coming years. With technological advancements and the growing trust in areas like machine learning, they are rapidly approaching this goal. In the foreseeable future, they aim to achieve full autonomy for our systems, with human intervention reserved only for exceptional cases. Over time, our aim is to minimize these exceptions as much as possible.

Was this content helpful?

Thank you for submitting your feedback.
Oops! Something went wrong while submitting the form.