Skip to main content
SPONSORED BY
Sponsor Logo
AI

Anthropic’s Claude 3 models look to edge out rivals in AI race

The releases are part of a new phase of incremental improvements.
article cover

Future Publishing/Getty Images

3 min read

Power up. From supercharging employee productivity and streamlining processes to inspiring innovation, Microsoft’s AI is designed to help you build the next big thing. No matter where you're starting, push what's possible and build your way with Azure's industry-leading AI. Check it out.

Another week, another family of AI models that claim to have a slight edge over leading systems.

This time, Anthropic has sought to one-up competitors with a new family of large language models (LLMs) under the banner of Claude 3—Opus, Sonnet, and Haiku, in order of size, from largest to smallest. (Opus and Sonnet debuted immediately, Anthropic said, while “Haiku will be available soon.”)

The fast-growing startup claims Opus can outperform OpenAI’s GPT-4 and Google’s Gemini across a slew of industry benchmarks. It’s also the first Claude upgrade to offer multimodal capability, meaning, in this case, that it can digest both text and photos.

Meanwhile, another AI up-and-comer, Inflection, revealed the latest version of its own digital assistant, which it claims “approaches GPT-4” in terms of performance, with only 40% of the computing power required to train it.

The new releases come amid a stretch of incremental improvement in the race to own the LLM industry. Google, Mistral, and now Anthropic have each rolled out models in recent weeks that claim to perform around the same level as or marginally edge out some of the others.

Context overload: One of Anthropic’s selling points for Claude 3 is its larger context window, or amount of information it can take in and recall later. The models that make up Claude 3 can effectively hold a conversation for 200,000 tokens—or units of data—off the bat and up to 1 million for “select customers who need enhanced processing power,” according to a company blog post. For context, 1 million tokens is about three times the length of the novel Moby Dick, according to OpenAI’s Tokenizer.

Forrester Research Analyst Zeid Khater said an elongated context window could be useful for companies looking to do things like create synthetic data for a market research project, or train a virtual assistant that needs to retain a large memory.

“Companies…who have a need for a huge context window do tend to float toward Claude as one of many options, but it’s usually between them and GPT-4,” Khater told Tech Brew.

But Jimmy Lin, Waterloo University computer science professor and co-director of the Waterloo AI Institute, said the ever-expanding context windows are also about “a little bit of showmanship.”

“That’s not actually how you use these LLMs,” Lin told Tech Brew. “You don’t just take all of War and Peace, copy and paste, and stuff it into the prompt, and ask questions about it.”

Benchmark doubts: Some experts are also beginning to question whether the specific benchmarks used to gauge the progress of these models, which cover areas like reasoning, knowledge, and math skills, are really the best way to measure new generations of all-purpose assistants.

“These days, I totally discount it because it’s so easily gamable…You can easily cheat your way into a leaderboard without actually making the model better,” Lin said, adding that “It ultimately boils down to real-world use cases…hallucinations, summarization, and accuracy really matter. Everything else is essentially bullshit.”

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.