News headlines might not be the only things that are fake in the future.
Powerful machine-learning techniques (see “The Dark Secret at the Heart of AI”) are making it increasingly easy to manipulate or generate realistic video and audio, and to impersonate anyone you want with amazing accuracy.
A smartphone app called FaceApp, released recently by a company based in Russia, can automatically modify someone’s face to add a smile, add or subtract years, or swap genders. The app can also apply “beautifying” effects that include smoothing out wrinkles and, more controversially, lightening the skin.
And last week a company called Lyrebird, which was spun out of the University of Montreal, demonstrated technology that it says can be used to impersonate another person’s voice. The company posted demonstration clips of Barack Obama, Donald Trump, and Hillary Clinton all endorsing the technology.
These are just two examples of how the most powerful AI algorithms can be used for generating content rather than simply analyzing data.
Powerful graphics hardware and software, as well as new video-capture technologies, are also driving this trend. Last year researchers at Stanford University demonstrated a face-swapping program called Face2Face. This system can manipulate video footage so that a person’s facial expressions match those of someone being tracked using a depth-sensing camera. The result is often eerily realistic.
The ability to manipulate voices and faces so realistically could raise a number of issues, as the creators of Lyrebird acknowledge.
“Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries,” reads an ethics statement posted to the company’s website. “Our technology questions the validity of such evidence as it allows to easily manipulate audio recordings. This could potentially have dangerous consequences.”
Both FaceApp and Lyrebird use deep generative convolutional networks to enable these tricks. This means the company is applying a technique that has emerged in recent years as a way of getting algorithms to go beyond just learning to classify things and generate plausible data of their own.
Like many tasks in artificial intelligence today, this involves using very large, or deep, neural networks. Such networks are normally fed training data and tweaked so that they respond in the desired way to new input. For example, they can be trained to recognize faces or objects in images with amazing accuracy.
But the same networks can then be made to generate their own data based on what were able to internalize about the data set they were trained on.
It is possible to train such a network to generate images from scratch that look almost like the real thing. In the future, using the same techniques, it may become a lot easier to manipulate video, too. “At some point it’s likely that generating whole videos with neural nets will become possible,” says Alexandre de Brébisson, a cofounder of Lyrebird. “It’s more challenging because there is a lot of variability in the high dimensional space representing videos, and current models for it are still not perfect.”
Given the technologies that are now emerging, it may become increasingly important to be able to detect fake video and audio.
Justus Thies, a doctoral student at Friedrich Alexander University in Germany and one of the researchers behind Face2Face, the real-time face-swapping app, says he has started a project aimed at detecting manipulation of video. “Intermediate results look promising,” he says.