With rapidly improving AI, voice cloning has only gotten more convincing. That’s a problem

Voice calls imitating public figures or even relatives can have disastrous consequences.

February 7, 2024

• 5 min read

When some New Hampshire voters answered the phone in January, they heard a very familiar voice.

“What a bunch of malarkey,” President Joe Biden said, urging residents to “save your vote for the November election” and skip the state’s Jan. 23 presidential primary.

If you’re thinking that doesn’t sound like something Biden would actually say, you’d be right. New Hampshire officials now believe that the robocalls residents received were AI-generated, mimicking the president’s voice in an “unlawful attempt to…suppress New Hampshire voters,” the state AG’s office said in a statement.

It’s the latest high-profile example of the fast-improving AI technology that can churn convincing audio clones for potentially nefarious purposes. And it’s not only being used to clone the voices of celebrities and public figures: Everyday people could find themselves as the victims—or targets—of a robocall clone campaign.

It’s become enough of a concern that the Federal Communications Commission announced Thursday that it voted to make the use of artificial voices in robocalls illegal.

And last spring, CNN detailed an Arizona mom’s terrifying experience with a call that cloned her daughter’s voice to fake a kidnapping and carry out an extortion plot (which ultimately failed when the teen called her mom and assured her everything was OK).

Tech Brew recently spoke with call platform TNS about how much audio is needed to make a high-quality voice clone and how to fight back against audio-based scams. Common audio sources include voicemail prompt messages and responses to phishing robocalls, according to Greg Bohl, chief data officer of TNS.

“It takes as little as three seconds of your voice to go ahead and duplicate it,” he said. “The ideal situation is a little bit more time…When you start saying, ‘Here’s my address,’ or ‘I didn’t make that appointment,’ when you start getting sentences going, they have you.”

Anatomy of a voice clone

TNS offered to clone my voice to demonstrate how easy it is to do. I said OK—as long as my mom doesn’t think I’m being kidnapped.

We began with a small voice sample—an audio clip simulating how you might interact with a purported robocaller: “Hello? Yes, this is her. No, I’m not interested. Goodbye!”

Using this sample, the sort that could easily be collected from countless consumers, TNS generated this message in my cloned voice.

Bohl told us that the quality of the duplicate usually depends on how much input audio is used, so I sent him a link to this panel event I moderated for Tech Brew to see how it would change the clone.

Here’s what TNS came up with.

Bohl emphasized that voice-cloning capabilities are rapidly improving, including the ability to type in real-time text strings for a cloned voice to repeat (enabling live-spoofing during calls) and simulating noisy environments.

“They can bring in the sound of a subway station, an airport,” he said. “You’re at the airport; you’re the executive traveling. You’ve been in a car accident, and you’ve got that street noise going on behind you. It sounds exactly like what’s taking place.”

Another concerning twist? “The scammers have moved into the ability to add accents, and to add in age ranges,” he said.

This could prove particularly useful for those trying to trick someone into believing they’re speaking with someone else, even if they aren’t able to sample a voice from the purported speaker. For example, let’s say my younger sister is living abroad and she’s picked up a French accent. Bohl explained that a bad actor could take my voice, adjust the age range, add an accent, and hope to convince a potential victim that they’re speaking with her instead.

Stay safe out there

Bohl’s No. 1 tip for thwarting audio scams? Don’t use a personalized voicemail message, which can give bad actors easy access to what your voice sounds like.

Instead, “use the automated tool that’s offered on cell phones where you can basically have it do a recording for you,” he said.

Living in a hyper-online world, however, it’s probably unrealistic to assume that clips of your voice are hard to find. If you’ve ever posted a TikTok video featuring your voice, chatted on Instagram stories, or spoken on a podcast, your voice is out there.

That’s why Bohl encourages people to establish a safe word with family members. Using a verbal password that only you and your loved ones know can help determine whether a suspicious-sounding call for help is actually coming from you, he said.

In the meantime, FCC Chair Jessica Rosenworcel warned that consumers should be vigilant, even if they think they recognize the voice on the other end of the line.

"AI-generated voice cloning and images are already sowing confusion by tricking consumers into thinking scams and frauds are legitimate. No matter what celebrity or politician you favor, or what your relationship is with your kin when they call for help, it is possible we could all be a target of these faked calls,” she said in a Jan. 31 statement.

Update 02/08/24: This piece has been updated to reflect the FCC’s vote to make the use of AI-created voices in robocalls illegal.

Keep up with the innovative tech transforming business

Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.