Are AI agents the next big thing in the generative race?

The nascent concept comes with a whole slew of new safety risks.

April 26, 2024

• 5 min read

Power up. From supercharging employee productivity and streamlining processes to inspiring innovation, Microsoft’s AI is designed to help you build the next big thing. No matter where you're starting, push what's possible and build your way with Azure's industry-leading AI. Check it out.

Last year, researchers from Google and Stanford University populated a virtual village with about two-dozen ChatGPT-powered characters. The Sims-like personas could remember their roles, go on dates, challenge each other to competitions; they even coordinated to throw a Valentine’s Day party.

Believe it or not, this sleepy little AI town held some clues about a new stage in the generative AI hype cycle. Tech giants and startups alike are increasingly thinking about how large language models (LLMs) can move beyond chatbots and into autonomous “agents” that can perform tasks on their own accord. More than just answering questions and proffering information, this new crop of systems tap LLMs to actually complete multi-step actions, from developing software to booking flights.

While the tech is still relatively nascent, it’s progressed in the past year. Google DeepMind recently unveiled an AI agent called SIMA that was trained on 3D video games, including something called Goat Simulator 3, to handle around 600 skills, from navigating the games to opening menus, demonstrating what the research arm called “the potential to develop a new wave of generalist, language-driven AI agents.”

Another company called Cognition AI grabbed tech workers’ attention with its claim that it had created “the world’s first fully autonomous AI software engineer,” Devin. Beyond just generating snippets of code, the startup says Devin can plan and deploy programs throughout the entire development process.

Big Tech abuzz

The Information reported last week that Microsoft, OpenAI, and Google DeepMind are all readying AI agents designed to automate more difficult multi-step tasks both for enterprise undertakings like recording sales transactions and compiling presentations, as well as consumer functions like vacation research and booking travel accommodations.

Mike Gozzo, chief product officer at customer service AI company Ada, said he’s seen a shift from a nearly singular industry focus on retrieval-augmented generation (RAG)—LLMs that can navigate and interact with big databases of information—to autonomous agents in recent months. He said Ada has been working with agents since around the time the tech first began to pique developer interest last year, when Auto-GPT debuted.

“I’ve got this meeting with [OpenAI CEO] Sam [Altman] coming up later today. I was in to see [Microsoft CEO] Satya [Nadella] a few weeks ago. And like, that’s all anyone is talking about is how do we get to autonomy in these systems?” Gozzo said. “And the reason why it’s even relevant at all is that, far beyond customer service, I think there’s going to be a trend in all software to move through this autonomous agent-style workflow.”

A whole new set of risks?

There are, of course, safety concerns when it comes to signing over tasks to automated systems, especially for companies still grappling with the idea of giving up control to hallucination-prone chatbots.

Reece Hayden, senior analyst at ABI Research, said that in addition to the risks already inherent to generative AI—e.g., hallucinations, bias—agents could also drift in response consistency as they evolve based on their own interactions, compound the risk of fabrications as they network among each other, and slip in latency due to the complex processes required from agents.

“The risk with AI agents [is that] you provide them with the autonomy to iterate…change their responses, react to feedback, and all of those different elements. That brings in more challenges around response inconsistency,” Hayden said. “Guardrails for AI agents, who have a much wider understanding, who have a much wider reach across an organization, who can pull from different data sets, who can perform iterative tasks, are much more difficult to implement.”

A paper from Google DeepMind published last week outlines a host of potential ethical dilemmas around AI agents. The paper discusses the risks of anthropomorphism as AI becomes more humanlike and personalized, which could “make people vulnerable to inappropriate influence by the technology.” It also touches on the potential for a wider spread of misinformation.

“AI assistants could radically alter the nature of work, education, and creative pursuits, as well as how we communicate, coordinate, and negotiate with one another, ultimately influencing who we want to be and to become,” the authors wrote in the paper.

Still, even as Big Tech companies and startups pour resources into developing these systems, widespread deployment is still likely a ways away, according to Hayden. Most businesses remain “stuck in the proof-of-concept phase” when it comes to deploying any kind of customer-facing generative AI, he said.

“In the longer term—three to five years even from now—that’s when I would expect to see a wide usage of automated AI agents to be running end-to-end operations,” Hayden said.

Correction 04/30/24: This piece has been updated to correct a quote from Mike Gozzo and clarify a description of the Auto-GPT rollout.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.