At Stanford’s “foundation models” workshop, large language model debate resurfaces

The university created a new center to study so-called foundation models, but some AI experts criticized the framing

article cover — Mint Images/Getty Images

August 30, 2021

• 5 min read

This month, the Stanford Institute for Human-Centered Artificial Intelligence (HAI) announced a brand-new research arm, the Center for Research on Foundation Models. The majority of these are typically known as large language models—powerful AI techniques have been the subject of both intense scrutiny and lofty praise.

Last week, during the CRFM’s first-ever workshop, the debate bubbled up again.

After a dispute over a research paper about large language models, Google fired its AI ethics co-leader Timnit Gebru in December 2020. The decision prompted massive backlash and directed a spotlight on the AI technique. These massive algorithms are trained on large swaths of the internet, causing them to both produce highly accurate approximations of natural language but also to replicate some human biases.

Large language models already fuel a wide range of popular tools and services, including Gmail’s Smart Compose. An increasing number of startups are powered by OpenAI’s GPT-3, and Google recently announced it will double down on the use of the tools to underpin services like Search.

As commercialization of these algorithms moves full steam ahead, some leaders, researchers, and academics in the AI community continue to warn that the tech warrants a much more cautious approach. That was on full display with the CRFM workshop, where HAI’s chosen phrasing and framing of the tech was met with criticism.

Some saw HAI’s decision to call the tools “foundation models” as an attempt to give large language models a clean slate of sorts—or as a reframing that awards them too much credit.

“They renamed something that already had a name; they’re called large language models,” Meredith Whittaker, faculty director of the AI Now Institute, told Emerging Tech Brew. “This move constitutes an attempt at erasure; very literally—if someone Googles ‘foundation model,’ they’re not going to find the history that includes Timnit’s being fired for critiquing large language models. This rebrand does something very material in an SEO’d universe. It works to disconnect LLMs from previous criticisms.”

In a recent report, Stanford researchers wrote they chose “foundation models” as a term “to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications.” It's also worth noting that while language models make up the majority of this category, there are other types included, too—like image-text models.

Percy Liang, director of CRFM as well as a HAI faculty member, told us the choice underscored two points, “First is these models’ incomplete nature—it’s just the foundation, it’s not the entire house—[and] these models still need to be adapted for downstream applications. Second, foundations are critical infrastructure that support a lot of things. We emphasize ‘foundations’ doesn’t mean that they are good ‘foundations’; in fact, they are shaky foundations.”

The CRFM workshop itself focused in part on how companies, government entities, and academic institutions can come together productively to develop universal models, which communities would benefit the most from their use, and how to handle matters of responsibility and accountability when the models cause harm.

But critics argued the framing of the workshop and its topics assumed large language models’ benefits and depicted their widespread use as inevitable, when there are plenty of situations in which the models should not be deployed.

“Is ‘Don’t’ a possible answer here? If academia/gov’t says ‘don’t’ [use these models], will industry listen?” Emily M. Bender, director of the University of Washington’s Computational Linguistics Laboratory, wrote on Twitter.

Large language models have the potential to perpetuate harmful patterns on a wide scale. For example, when researchers asked OpenAI’s GPT-3 model to complete a sentence containing the word Muslims, it turned to violent language in more than 60% of cases.

They also require significant computing resources and funding—GPT-3, for instance, reportedly cost $12 million to train. This can limit the tech’s ownership, for the most part, to the world’s largest tech companies. And even if access to the necessary resources to build these models were expanded, that could “concurrently increase...the power of the few companies that gatekeep these resources,” Whittaker wrote on Twitter.

“We’ve seen where this kind of industry control of research leads,” Stella Biderman, an officer of AI Village and lead researcher at EleutherAI, an online collective working to open source AI research, told us. “It leads to doctors endorsing cigarettes in the 50s and energy companies squashing climate change info in the 80s. The world is already a dystopian sci-fi.”

Correction: A previous version of this article implied all foundation models are large language models. LLMs make up the majority of the category.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.