We tried out Anthropic’s Claude 2. Here’s how it stacks up against ChatGPT

The startup claims the latest release has better safety guardrails.

July 13, 2023

· 3 min read

A new AI chatbot just dropped.

This week, Anthropic introduced the latest version of its chatbot, Claude. Founded by former members of OpenAI, Anthropic has raised more than $1 billion—with plans to rake in more—to develop a large language model (LLM) to rival ChatGPT.

The startup announced in a blog post that the beta version of Claude 2 has lengthier responses and “improved performance” compared to the first iteration of the chatbot, which was released in March. It claims Claude 2 scored better in a bar exam—76.5% on the multiple choice section, compared to Claude 1.3’s 73%—and above the 90th percentile on the reading and writing portion of the GRE.

Anthropic also now allows users to enter hundreds of pages of input, making for more efficient summarization of complex documents. Anthropic claims the latest release has better safety guardrails; Claude 2 is supposedly twice as adept at providing harmless outputs than the previous version, per the company’s blog post.

Claude’s “beta chat experience” is available via a public-facing website and an API in the US and the UK.

Its debut comes as tech giants and startups alike have raced to release language-generating AI models in the wake of OpenAI’s viral unveiling of ChatGPT last November. Since then, companies of all kinds have been attempting to build their own applications, and budding companies like Anthropic are getting in on the action with their own massive models trained on huge troves of text.

Like other products of its ilk, Claude’s advertised capabilities include conversation, document summary, and tasks like coding and math. Anthropic has previously partnered with Slack, Zoom, and Google Cloud to integrate versions of Claude into their products.

With that in mind, we put Claude 2 to the test to see how well it stacks up against OpenAI’s ChatGPT.

Writing an essay

While both chatbots answered a prompt about the role of AI and writing in the fairly generic manner characteristic of LLM writing output, Claude 2 got a few points for better readability and writing style.

Summarization

When asked to summarize the full text of the Declaration of Independence in under 100 words in plain English, Claude did a better job of staying within the word limit—108 words versus 205—and giving historical context clearly pulled from elsewhere in its training.

Simple math and reasoning explanation

Both Claude and ChatGPT seemed comparably adept at solving a simple math problem and walking through the reasoning behind it, though Claude did so with slightly more explanation.

Overall, Claude 2 seems fine for addressing common-sense questions or simple problems, and it can adequately summarize a document. But when it comes to more qualitative queries, the chatbot’s writing suffers from the same dreadful tics that many broadly trained models (and high school seniors) do: repetitive word choices, generalizations, and rambling answers.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.