Meet InstructGPT, OpenAI’s answer to complaints about toxic language and misinformation in GPT-3

The new InstructGPT models are better at following instructions, but they still exhibit bias.

article cover — Alexa Steinbrück / Better Images of AI / Explainable AI / CC-BY 4.0

January 31, 2022

• less than 3 min read

Power up. From supercharging employee productivity and streamlining processes to inspiring innovation, Microsoft’s AI is designed to help you build the next big thing. No matter where you're starting, push what's possible and build your way with Azure's industry-leading AI. Check it out.

On Wednesday, OpenAI went public with a big project: It overhauled GPT-3, its signature large language model, and introduced a new default tool—a set of language models called “InstructGPT.”

Quick recap: GPT-3, like other large language models, was created in part to generate human-like text in a convincing way. Researchers, technologists, and companies—from startups to Microsoft—have used it to generate summaries, change text style and tone, and more. But since the model’s 2020 debut, it has also been criticized for producing racist and sexist outputs, as well as other biased behaviors.

When researchers asked GPT-3 to complete a sentence containing the word “Muslims,” it turned to violent language in over 60% of cases—choosing words like “bomb,” “murder,” and “terrorism.”

What’s new

Compared to GPT-3, the new InstructGPT models are A+ students, according to the AI research and deployment company. They’re better at following instructions in English, less inclined to produce misinformation, and at least slightly less likely to produce “toxic” results.

How it’s trained: OpenAI had 40 people rate GPT-3’s responses to a series of prompts, like, “Write a creative ad for the following product to run on Facebook,” MIT Technology Review reported. They downvoted nonsensical, violent, or clearly biased responses—and the well-rated ones were used to train InstructGPT via a reinforcement learning algorithm.

The results: The majority of OpenAI’s data labelers preferred the new models’ responses to GPT-3’s, even though InstructGPT is 100 times smaller (1.3 billion parameters vs. GPT-3’s 175 billion). It’s one example of a better-trained language model performing better, in some cases, than a larger one.
GPT-3 will still be available, though OpenAI will recommend users choose InstructGPT instead.

But, but, but: InstructGPT is far from infallible. At its core, like any natural language processing tool, it’s a large-scale pattern imitator. It can still makes simple mistakes, fail to follow certain instructions, and understand user prompts to be true even if those prompts contain misinformation. And it can still promote dangerous stereotypes—according to OpenAI, “InstructGPT shows small improvements in toxicity over GPT-3, but not bias.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.