A primer on machine learning operations

MLOps is the “scaffolding” that helps AI systems scale, and it's growing fast.

article cover — Francis Scialabba, Dianna “Mick” McDougall

May 23, 2022

· 4 min read

MLOps is like air-traffic control for machine learning: It provides the support and safety zones—and connects all the different teams working to make sure the plane, well, doesn’t crash.

The field is all about cementing best practices for machine learning—thus the name, short for machine learning operations—and it has grown quickly over the past five years. Amazon debuted an MLOps product in 2017, Google debuted one last year, and MLOps startups have increasingly raised large-scale funding rounds from headline investors.

In October, San Francisco-based Domino Data Lab raised $100 million from investors, including Great Hill Partners and Nvidia. In November, Comet, a New York-based startup with clients including Uber and Etsy, raised $63 million. And in February, Wallaroo Labs, raised $25 million funding led by M12, Microsoft’s venture fund. And Databricks and DataRobot—two leading MLOps providers—have together raised $4.5 billion to date, according to Crunchbase data.

“It is the scaffolding that keeps it all together,” Diego Oppenheimer, EVP of MLOps at DataRobot, told Emerging Tech Brew. “So you can get machine learning into production without MLOps, [but] you can’t do it at scale—at all.”

‘Three-dimensional chess’

Machine learning models are everywhere: They decide what’s on your social media timeline, power online product recommendations, and can even determine how far you might get in a loan-approval process. And they do this not only for you, but for billions of people around the world.

MLOps aims to make these models operate as efficiently, securely, and accurately as possible at this massive scale.

“MLOps only really works if you look at the DevOps, the DataOps, and the ModelOps pieces together,” Clemens Mewald, head of machine learning and data science efforts at Databricks, told us. “That’s what makes it difficult, and that’s why I call it ‘three-dimensional chess,’ because you have to really align these three things, and make them work.”

As for the chess pieces? First, DevOps is essentially tools and best practices that help companies create, deploy, and monitor software quickly and at scale. Next, DataOps helps standardize the tools and pipelines you need to process large amounts of data and deploy them. And finally, ModelOps helps manage the life cycle and deployment of the model that uses all of that data (for instance, using internal data to rank products on a retailer’s website).

Put it in practice

About 96% of CIOs and tech executives either had AI “in their deployment pipelines or had initiated projects,” according to a recent Gartner survey, but the firm’s 2021 research suggests that “only half of AI projects make it from pilot into production.”

There’s a significant gap there, and some MLOps professionals believe the field can help.

Patrick Butler, a senior research associate at Virginia Tech, and Oppenheimer both told us the field is increasingly the bridge between best-practices research and reality, helping codify the processes that make ML work sustainably and securely. But they believe there’s plenty of room to grow.

“There are already [MLOps] frameworks out there that are…toddlers in the software-engineering arena—and they are going to grow up a bit and get stronger,” Butler said.

For Oppenheimer, part of the growth of MLOps boils down to executives wanting to take successes they’ve seen in “pockets” of their companies and wanting to take the next step: deploying machine learning at scale.

“That’s a big part of what I’ve observed over the last six months,” Oppenheimer added. “We’ve stopped talking about proof of concepts, and now we’re talking about production systems.”

One example of how MLOps can function in practice: A financial services organization might use MLOps on some of its trading models to guard against both concept drift (i.e., what it’s built for) and data drift (e.g., skew)—as well as alert the team about inaccuracies and automate guardrails in case of a problem.

“[It’s about] making sure that the data and the methodologies are coded in such a way that no one person leaving or entering the team creates technical debt,” Butler said.

Keeping up with the cutting edge of machine learning isn’t enough, he added—and that’s why best practices and MLOps are necessary.

“We’re entering the realm of machine learning where you’re talking about billions and billions of parameters in deep learning, and understanding those models is difficult,” Butler said. “So MLOps helps at least, if not make them explainable models, then at least make them reproducible, repeatable.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.