explorations, AI, artificial intelligence, emerging tech

Francis Scialabba

Amex’s Fraud Detection AI Was Ready to Go Live. Then Covid Hit.

For Amex's fraud risk team, the year leading up to the new model's rollout was full of upended expectations—and wishing they had a stress ball handy

March 10, 2021

• 10 min read

This is the second piece in our Demystifying Algorithms series. Click here to read the first installment.

As one of the humans behind American Express’s fraud risk and detection model, Krishnendu Das was tasked with solving an algorithmic mystery: Why were Amex cards getting declined at a popular US music festival?

It was October 2019. Sitting in the company’s skyscraper office in Gurgaon, India, Krish and his team, were stuck.

So they turned to their go-to brainstorming method: tossing a yellow stress ball around in a circle. Whoever caught it weighed-in with their thoughts on the puzzle: Was it because a brick-and-mortar store had a pop-up location at the festival? A problem with the point-of-sale device? After about an hour, the team had their “aha moment”—they knew where the algorithm’s wires had crossed, and how to untangle them.

Wading through these sorts of issues—correcting and molding the model—is the norm for Krish’s team, and for any group entrusted with keeping a large machine learning (ML) model running. Though they can automate functions and learn from their mistakes, large models still need some degree of human hand-holding to correctly perform their tasks at scale. A fraud alert notification is ultimately the product of billions of data points, millions of lines of code, and thousands of hours of human monitoring.

But when Covid hit, the Amex team’s focus shifted from addressing individual problems as they came up—like the festival—to rethinking the entire foundation of a model they’d spent months fine-tuning. Like everyone and everything else one year ago, Amex’s fraud model had to adapt to the new normal.

Forest from the trees

Have you ever read a Choose Your Own Adventure novel? You make a choice that then leads you to another set of choices, and so on, until you reach a path-dependent conclusion. Lots of credit card fraud models work like that.

The models are typically a series of decision trees that answer questions like “Has this person visited this store within the last 6 months?” or “Is this a comparatively high spending amount?”

As a model advances from one decision tree to another, it becomes more confident in its prediction of how the story will end. Except it isn’t aiming for a fairytale ending or an exciting cliffhanger—it’s focused on the most accurate outcome.

This entire process ends with one output: a probability of fraud. Above a certain threshold, your card will be declined, and the bank will send a text asking, “Is this really you?” Below that threshold, you’re good to go.

In 2010, Amex’s research lab began exploring ML as a method to detect and prevent card fraud. Every other financial institution we reached out to declined to share when they began researching ML, but at the time, ML for financial services was “incredibly hyped up,” says Daniel Van Dyke, VP of Content at Insider Intelligence.

“When we first started, it took us about a few years of...repeated test-and-learn, test-and-fail,” says Rajat Jain, Amex’s global head of identity and authentication strategy and VP.

The turning point for the sector at large was when hardware capabilities caught up and ML “matured to the point it could effectively augment existing rules-based systems” and their risk thresholds, says Van Dyke, adding that since then, the tech has advanced so far that “it can prevent fraud in near-real time in addition to just detecting it.”

In 2014, Amex finally landed on a winning technique for itself, gradient boosting machines—i.e., the forest of decision trees we just explained—and implemented its first-ever ML model for fraud detection. It led to a 30% improvement over Amex’s legacy systems, which were simpler, logistic regression models.

Seven years later, Amex’s newest model, called Gen X, executes a sequence of more than 1,000 decision trees. It was developed on billions of observations and sweeps more than $1.2 trillion in annual spending. The fraud decision science team has tripled from 10 decision scientists in 2014 to about 30, including Krish and his team, who monitor the model 24/7 and update it at least once a year.

For all that growth, the Gen X build and rollout—a process that should’ve taken about three months—stretched on for nearly a year due to Covid-19. In January 2021, it finally made it to all 112 million Amex cards in circulation. But for Krish and the hundreds of others on Amex’s fraud team working to push the new model out, it was one long squeeze of the yellow stress ball.

Defense mode

In February 2020, the modeling team felt ready to plant the Gen X model’s decision trees. They’d been planning for the model’s rollout since January and were on track to wrap things up by March.

But “March is when all of that kind of stopped,” says Krish, as lockdowns began in India and the US.

As usual, the team had been introducing scenarios into the Gen X model to see how it performed. But since each generation of a gradient-boosting ML model is typically developed on data from earlier that same year, many of the model’s assumptions no longer made sense.

Spending patterns changed overnight, which triggered new fraud alerts. Supermarkets were inundated with customers they’d never seen before, who moved from store to store searching for necessities to stockpile. Former in-person shoppers were buying everything from pajamas to prescriptions online. And everywhere, people were spending large chunks of money all at once.

Suddenly, the modeling team had to make a decision: Try and fit this square pattern into a round hole, or overhaul the model?

The latter won out. While the modeling team started on that, the fraud risk management teams temporarily pivoted to defensive mode, monitoring disruptions and implementing manual overrides for certain types of transactions that the pre-covid model would likely deem suspicious.

At this point, “People were on at all hours,” says Tina Eide, Amex’s SVP of global fraud risk management. The fraud risk management and tech teams in India and the US were literally working around the clock, on short-term patch-ups and bigger-picture model fixes. As the Gurgaon team was signing off, the Phoenix team was brewing their coffee.

Jain and the modeling team were still evaluating the Gen X model to see how much more work stretched ahead of them. Things were changing quickly, and they needed to add features that allowed cardholders’ responses to fraud alerts to be analyzed faster.

But as the world moved online, the biggest obstacle was still ahead of them.

Cloudy days

As e-commerce became all commerce, some essential digital infrastructure began to creak and groan. “[The] entire world was looking for online resources,” says Krish—and many times, those resources weren’t readily available.

Crucially, that seemed to include Amazon Web Services, Amex’s cloud provider. Once lockdown began, it didn’t seem to have enough capacity to support the sudden worldwide influx, according to members of Amex’s global fraud risk management team. AWS declined to share usage figures for the period with us.

It was a high-stakes time, and the modeling team felt a bit helpless: They needed to test a lot of new scenarios to shore up the Gen X model for a post-Covid world. But because of dwindling cloud capacity, testing one new idea took up to a month, says Jain. The team worked on that part of the process from around March to September. It usually takes a matter of weeks.

So working weekends was the best solution, at least temporarily. The modeling team had tried less labor-intensive workarounds, but they just weren’t cutting it. Weekends were key because AWS traffic seemed much higher on weekdays—US daylight hours were the equivalent of rush hour. It also helped that Krish’s team in India generally worked while most US businesses were offline.

“It’s not a good work-life balance, I don’t want to give that outlook...but it’s also fun when it’s a boardroom-like situation, when you’re doing something critical and the entire team is with you—so there is a good camaraderie,” says Krish.

On Saturday or Sunday, team members would log on, try to secure the AWS servers they needed, and build an iterational model—plugging in simulations, like data from different time periods, then analyzing how the model treated them. If things didn’t turn out as anticipated, they’d look at where things went wrong and either plug in more data, try it again from the last checkpoint, or both.

The fraud modeling team—all working remotely, of course—was in touch constantly. Besides using Slack and every other communication channel available to them, they often virtually met multiple times a day, sometimes even sitting on WebEx video calls for hours while they all worked in tandem.

“It’s just on, so when you want to talk, you can just unmute yourself and discuss,” says Krish. “Because if we were in the office, that’s literally how it would be.”

Still, it wasn’t quite the same as tossing around their yellow stress ball.

Time went on, and Claire Carnero, director in fraud risk management, reached out to Krish in November to see how the 2020 holiday season—a potentially unprecedented time for fraud and purchasing—might look according to the modeling team. Her Phoenix-based team, as the ones responsible for the potential fraud alerts that customers receive, knew there was a chance they'd be putting out fires left and right.

Completely unnoticed

Around 5am on November 14, 2020, Jain woke up to the email he’d been looking forward to for almost a year. His wife chafed at him checking his phone in bed, but he couldn’t help it: The model was finally out there.

“We did it!!!!😀😀 Happy Diwali!!!!,” Krish had written to Jain the previous evening—hours before India’s biggest holiday—and just before the model went live for 10% of Amex cardholders worldwide.

In the Phoenix suburbs, the fraud risk strategy team’s work was just beginning amid the rollout. If the Gen X model’s decisions were less accurate than previous models in any way, she’d hear that directly from the phone servicing teams, who’d hear it straight from the customers.

For two weeks, Carnero recalls feeling like she was holding her breath, on high alert with both “nervous anticipation and hypervigilance.” The best possible outcome was for the model to go completely unnoticed by customers.

As far as they could tell, those hopes were realized. Each week, she reached out to a director on the fraud risk strategy team, bracing herself for a sudden crisis. “It seems like everything’s okay; are you seeing everything’s okay?” She’d say. There weren’t any big blazes to speak of.

Meanwhile, Krish and the modeling team had already started thinking about the next generation of the model, and all the updates it would need to keep pace with the humans it was built to predict. As sophisticated as the system is—and most deployed ML models are—it still needs its human training wheels.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.