Large language models

By Tech Brew Staff

less than 3 min read

Definition:

These are the massive predictive language models underlying systems like ChatGPT and Anthropic’s Claude. Large language models—along with image-generating diffusion models—are at the heart of the generative AI boom that has transfixed the world since 2022.

The technology behind modern LLMs—what took natural language processing (NLP) from Siri’s autocomplete to ChatGPT—is the transformer architecture, first outlined in a seminal 2017 paper from Google researchers. Transformers ingest long sequences of words—split into bite-sized tokens—and decide which parts to pay the most attention to. This method helps the model better understand context and the relationships between long paragraphs more effectively and quickly than previous types of neural networks used in NLP: recurrent neural networks.

Researchers found that when transformers are trained on ever-increasing amounts of raw text data, they perform predictably better, creating the scaling laws that researchers explored in 2020 that drive the ongoing demand for ever more Nvidia GPUs. Transformer models can also be easily fine-tuned on more narrow, specialized datasets on top of the giant base layer, which is why they’re also referred to as foundation models, though that term is not limited to transformers. Another technique that made LLMs possible was the widespread use of unsupervised and self-supervised learning—training on unlabeled raw data—which allowed transformer models to learn from huge troves of text without intensive organization. Modern-day LLMs can consist of billions to more than 1 trillion weights, biases, and other parameters across multiple neural networks.

The evolution of LLMs

While Google researchers pioneered transformers with that landmark “Attention is All You Need” paper in 2017, it was OpenAI that garnered the most early attention for its work with a transformer architecture. OpenAI particularly grabbed headlines when it proclaimed in 2019 that its new Generative Pre-trained Transformer-2 (GPT-2) was too advanced to release publicly, for fear of its falling into the wrong hands (OpenAI later opted for a staggered release). GPT-3 leveled up GPT-2’s size and text-generating capabilities even more, and OpenAI released it only through an API, a departure from its “open” roots. Google had its own transformer model in this time, BERT, but it was more concerned with understanding text than generation. It wasn’t until the release of ChatGPT in late 2022 that LLMs and transformers first vaulted into mainstream public consciousness.

Definition:

The evolution of LLMs

Related content on Large language models

A common hallucination-proofing measure could have unintended consequences

What is an AI agent? Definitions may vary amid hype

Why AI models might seem to perform worse over time