A visitor to The Dalí Museum in St Petersburg, Florida presses a doorbell beside a dark life-sized screen. A darkened figure wearing a dapper suit and sporting a pencil moustache slowly leaves his easel and comes toward her into the light.
It is, of course, Salvador Dalí, who looks at the visitor and speaks about his art and his museum. When the visitor is about to leave, he appears again. He asks if she would like a picture, then whips out his mobile phone and takes a selfie with her.
The fascinating thing about this encounter is that it’s actually Dalí himself. How is it possible that the great Spanish surrealist can interact with members of the public years after his death, even using a phone that didn’t exist when he was alive? Welcome to the world of deepfakes, an unsettling technology with a high potential to deceive – and also some unexpectedly positive uses.
Deepfakes are a powerful new technique to create realistic yet fake video or audio content. To breathe life into Dalí, the museum used deep learning to enable a computer to seamlessly exchange the face of a living actor who was dressing and behaving like Dalí with a digitally generated image of the artist’s face and expressions.
This involves a “training process” in which advanced machine-learning algorithms sift through footage of Dalí and the actor to learn to generate new real-looking facial images of both men. It also learns to take an existing image of either man and generate an image of the other that perfectly matches the facial expressions and head posture of the first one.
This makes it possible to generate Dalí faces that match the actor’s movements, which are then automatically inserted into the new video – creating an illusion of Dalí himself. There’s more detail here for those who are interested.
To date, most producers of deepfakes have exploited the dark side of the technology. This has ranged from satire, such as this April Fool’s Day clip showing Mark Zuckerberg announcing he is deleting Facebook; to reputation-damaging footage of Hollywood stars supposedly starring in porn films; to fraud, such as mimicking a chief executive’s voice to request the transfer of a large sum of money.
The risks from deepfakes are undeniable. Yet the Dalí example illustrates that it is impossible to be black and white about this technology. In our research, we group deepfakes into five categories: voice swapping, text-to-speech, video face swapping, full-body puppetry and lip-synching. In each category, we see clear business opportunities. Some are still to materialise while others are being realised already.
1. Ventriloquism 2.0
Voice swapping can change a person’s voice or make it imitate someone else’s. It can be manipulated to sound younger or older, male or female, and with different dialects or accents. Possible uses include an audio-book narrator speaking in the voices of different characters, or using a famous person as a narrator without them having to go to the trouble of reading out the entire story.
It also opens up fascinating possibilities for virtual assistants like Siri. Rather than needing to record voice actors with different accents and genders, audio voice swapping makes it possible to do this with just one voice – does anyone feel a blockbuster app coming on?
2. Giving voices back
It has been possible for many years to make a computer speak by typing text into an application. Now the deepfake technology exists to do this with a particular person’s voice even where they haven’t previously recorded the words in question. This is becoming a life-changing technology for people who have lost the ability to speak intelligibly, such as those who have had strokes or have a progressive disease such as amyotrophic lateral sclerosis.
Other possible uses of this audio-text-to-speech technology include correcting misspoken words in a voiceover rather than having to get the person to record it again.
3. ‘Are you talking to me?’
As we saw with the Dalí example, video face swapping can replace the face of one person in a video with the face of someone else. This has great potential in the movies.
For instance, a professional deepfake artist has demonstrated how similar techniques to The Dalí Museum could have been used to de-age Robert De Niro in The Irishman, rather than the expensive and time-consuming CGI that helped drive the movie’s total production cost to US$175 million (£135 million). The clip below shows how deepfake technology can achieve similar quality. Another possible use of this technology is more lifelike stunt doubles.
4. Game on
Video full-body puppetry can transpose movement from one person’s body to that of another. Possible uses include more immersive video games in which players can insert themselves into the action, with their own gait and movement characteristics; and movies where non-dancer actors can seemingly dance using footage of professional dancers.
5. Subtitles RIP
Audio and video lip-synching can change mouth movements and spoken words in a video. It will soon be possible to make cost-effective, high-quality translations of movies, TV shows and other videos. A trained algorithm would imitate the original actor’s voice but in a different language, with the lip movement in sync with the new words.
So while it’s clear that deepfakes can and are being used deleteriously, the same deep-learning technology is also opening up many innovative business applications. Many creative and productive possibilities are becoming apparent – and no doubt many others that people haven’t even spotted yet.