Researchers Are Exploring Alternatives to This Fiercely Debated AI Technique

Meet the alternatives to large language models

April 7, 2021

• 4 min read

“With great computing power comes great responsibility.” That’s how the saying goes, right?

Large language models (LLMs) eat up a whole lot of computing power compared to other AI models—and they’ve got an outsized amount of responsibility, too.

These powerful machine learning algorithms analyze large-scale patterns in text, and they underpin services billions of people use every day, like predictive texting and Google Search. In recent months, they also sparked a fierce debate in the tech world and beyond.

Because these models are trained on large swaths of the internet, they can learn and amplify harmful behaviors, like generating racist and sexist language—and since they’re so big and opaque, it’s virtually impossible to know how a particular answer is generated. Their use also has tremendous computational and environmental costs.

Alternative approaches to large language models are slowly gaining traction in research communities, but for the most part, they haven’t caught on industry-wide yet. They center on smaller systems that can do some of the same tasks as larger models, but in a more computationally efficient and explainable way.

Three types of alternatives

Moderate, retrieval-based language models. Unlike large language models, they keep a lot of the information they need in external storage, so it’s not adding a constant computational burden. Think of it like regularly checking out a reference book from the library instead of trying to memorize all the contents.

This can add efficiency and explainability: “If you know one answer you gave is based on this new book you found on the shelf, you can associate an answer with the thing that created it,” Gadi Singer, VP and director of cognitive computing research at Intel Labs, told us.

AI-augmented versions of rule-based models. (Think: “If this, then that”-style operations.) Rule-based models are relatively static, so machine learning can “identify clauses the rules may be missing and invent new rules moving forward,” says Dr. Aya Soffer, VP of AI technology at IBM Research.

Since the model’s decisions can be tied to the rules it’s operating by, even if those rules are written by AI, this method can help with explainability.

The tinyML movement. It focuses on small, simple models. The perks: They’re cheaper, can run on more devices, and have strongly reduced power consumption. A tinyML model is even small enough to run on a hearing aid—e.g., slowing down sounds and boosting volume when someone tells you their name for the first time.

“You can get a model that’s...maybe one one-thousandth the size of a general-purpose LLM, that can focus on one task and do it just as well,” says Leon Derczynski, a machine learning and language scientist. “It’s like David and Goliath; it’s impossible not to smile about.”

There’s always a “but”

There are potential problems with alternative approaches, too. For one, the data that retrieval-based models keep around for explainability purposes still takes up space. And rules written by a machine learning model are just as susceptible to bias.

And at the end of the day, “All of these language models would have these issues because of how they’re created,” says Dr. Xiang Ren, assistant professor at USC: They’re trained on data with some level of inherent bias.

“I think those models should be thought of as being in the same class, with the same risks,” says Dr. Jacob Andreas, assistant professor at MIT. “It’s easier to mitigate them—easier to understand, when a model starts behaving in a bad way, why it happened—but that’s not something fundamentally different.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.