AI

Meet InstructGPT, OpenAI’s answer to complaints about toxic language and misinformation in GPT-3

The new InstructGPT models are better at following instructions, but they still exhibit bias.
article cover

Alexa Steinbrück / Better Images of AI / Explainable AI / CC-BY 4.0

· less than 3 min read

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.

On Wednesday, OpenAI went public with a big project: It overhauled GPT-3, its signature large language model, and introduced a new default tool—a set of language models called “InstructGPT.”

Quick recap: GPT-3, like other large language models, was created in part to generate human-like text in a convincing way. Researchers, technologists, and companies—from startups to Microsoft—have used it to generate summaries, change text style and tone, and more. But since the model’s 2020 debut, it has also been criticized for producing racist and sexist outputs, as well as other biased behaviors.

  • When researchers asked GPT-3 to complete a sentence containing the word “Muslims,” it turned to violent language in over 60% of cases—choosing words like “bomb,” “murder,” and “terrorism.”

What’s new

Compared to GPT-3, the new InstructGPT models are A+ students, according to the AI research and deployment company. They’re better at following instructions in English, less inclined to produce misinformation, and at least slightly less likely to produce “toxic” results.

How it’s trained: OpenAI had 40 people rate GPT-3’s responses to a series of prompts, like, “Write a creative ad for the following product to run on Facebook,” MIT Technology Review reported. They downvoted nonsensical, violent, or clearly biased responses—and the well-rated ones were used to train InstructGPT via a reinforcement learning algorithm.

  • The results: The majority of OpenAI’s data labelers preferred the new models’ responses to GPT-3’s, even though InstructGPT is 100 times smaller (1.3 billion parameters vs. GPT-3’s 175 billion). It’s one example of a better-trained language model performing better, in some cases, than a larger one.
  • GPT-3 will still be available, though OpenAI will recommend users choose InstructGPT instead.

But, but, but: InstructGPT is far from infallible. At its core, like any natural language processing tool, it’s a large-scale pattern imitator. It can still makes simple mistakes, fail to follow certain instructions, and understand user prompts to be true even if those prompts contain misinformation. And it can still promote dangerous stereotypes—according to OpenAI, “InstructGPT shows small improvements in toxicity over GPT-3, but not bias.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.