We’ve said it several times in our blogs — it’s tough knowing what’s real and what’s fake out there. And that’s absolutely the case with AI audio deepfakes online.
Bad actors of all stripes have found out just how easy, inexpensive, and downright uncanny AI audio deepfakes can be. With only a few minutes of original audio, seconds even, they can cook up phony audio that sounds like the genuine article — and wreak all kinds of havoc with it.
A few high-profile cases in point, each politically motivated in an election year where the world will see more than 60 national elections:
- In January, thousands of U.S. voters in New Hampshire received an AI robocall that impersonated President Joe Biden, urging them not to vote in the primary.
- In the UK, more than 100 deepfake social media ads impersonated Prime Minister Rishi Sunak on the Meta platform last December.i
- Similarly, the 2023 parliamentary elections in Slovakia spawned deepfake audio clips that featured false proposals for rigging votes and raising the price of beer.ii
Yet deepfakes have targeted more than election candidates. Other public figures have found themselves attacked as well. One example comes from Baltimore County in Maryland, where a high school principal has allegedly fallen victim to a deepfake attack.
It involves an offensive audio clip that resembles the principal’s voice which was posted on social media, news of which spread rapidly online. The school’s union has since stated that the clip was an AI deepfake, and an investigation is ongoing.iii In the wake of the attack, at least one expert in the field of AI deepfakes said that the clip is likely a deepfake, citing “distinct signs of digital splicing; this may be the result of several individual clips being synthesized separately and then combined.”iv
And right there is the issue. It takes expert analysis to clinically detect if an audio clip is an AI deepfake.
What makes audio deepfakes so hard to spot?
Audio deepfakes give off far fewer clues, as compared to the relatively easier-to-spot video deepfakes out there. Currently, video deepfakes typically give off several clues, like poorly rendered hands and fingers, off-kilter lighting and reflections, a deadness to the eyes, and poor lip-syncing. Clearly, audio deepfakes don’t suffer any of those issues. That indeed makes them tough to spot.
The implications of AI audio deepfakes online present themselves rather quickly. In a time where general awareness of AI audio deepfakes lags behind the availability and low cost of deepfake tools, people are more prone to believe an audio clip is real. Until “at home” AI detection tools become available to everyday people, skepticism is called for.
Just as “seeing isn’t always believing” on the internet, we can “hearing isn’t always believing” on the internet as well.
How to spot audio deepfakes.
The people behind these attacks have an aim in mind. Whether it’s to spread disinformation, ruin a person’s reputation, or conduct some manner of scam, audio deepfakes look to do harm. In fact, that intent to harm is one of the signs of an audio deepfake, among several others.
Listen to what’s actually being said. In many cases, bad actors create AI audio deepfakes designed to build strife, deepen divisions, or push outrageous lies. It’s an age-old tactic. By playing on people’s emotions, they ensure that people will spread the message in the heat of the moment. Is a political candidate asking you not to vote? Is a well-known public figure “caught” uttering malicious speech? Is Taylor Swift offering you free cookware? While not an outright sign of an AI audio deepfake alone, it’s certainly a sign that you should verify the source before drawing any quick conclusions. And certainly before sharing the clip.
Think of the person speaking. If you’ve heard them speak before, does this sound like them? Specifically, does their pattern of speech ring true or does it pause in places it typically doesn’t … or speak more quickly and slowly than usual? AI audio deepfakes might not always capture these nuances.
Listen to their language. What kind of words are they saying? Are they using vocabulary and turns of phrase they usually don’t? An AI can duplicate a person’s voice, yet it can’t duplicate their style. A bad actor still must write the “script” for the deepfake, and the phrasing they use might not sound like the target.
Keep an ear out for edits. Some deepfakes stitch audio together. AI audio tools tend to work better with shorter clips, rather than feeding them one long script. Once again, this can introduce pauses that sound off in some way and ultimately affect the way the target of the deepfake sounds.
Is the person breathing? Another marker of a possible fake is when the speaker doesn’t appear to breathe. AI tools don’t always account for this natural part of speech. It’s subtle, yet when you know to listen for it, you’ll notice it when a person doesn’t pause for breath.
Living in a world of AI audio deepfakes.
It’s upon us. Without alarmism, we should all take note that not everything we see, and now hear, on the internet is true. The advent of easy, inexpensive AI tools has made that a simple fact.
The challenge that presents us is this — it’s largely up to us as individuals to sniff out a fake. Yet again, it comes down to our personal sense of internet street smarts. That includes a basic understanding of AI deepfake technology, what it’s capable of, and how fraudsters and bad actors put it to use. Plus, a healthy dose of level-headed skepticism. Both now in this election year and moving forward.
[iii] https://www.baltimoresun.com/2024/01/17/pikesville-principal-alleged-recording/
[iv] https://www.scientificamerican.com/article/ai-audio-deepfakes-are-quickly-outpacing-detection/