Systems designed to detect deepfakes (videos that manipulate real-life images through artificial intelligence) can be fooled, as computer scientists first showed at the WACV 2021 conference held online from January 5 to 9, 2021.
Researchers have shown that detectors can be developed by inserting entries called opposing examples into all video frames. Opposite examples are somewhat manipulated inputs that cause an error in artificial intelligence systems such as machine learning models. In addition, the team showed that the attack still works after the videos are compressed.
“Our work shows that attacks on deepfake detectors can be a real threat,” said Shehzeen Hussain, a doctor of computer engineering at UC San Diego. Student and first co-author on WACV paper. “More worryingly, we are proving that it is possible to create strong, strong oppositions that an opposition cannot be aware of in the internal workings of the machine learning model used by the detector.”
In Deepfakes, a subject’s face is changed to create footage of events that have never happened in a compelling way. As a result, typical deepfake detectors target the face in videos: they first track and then pass the cut facial data to a neural network that determines whether it is true or false. For example, blinking eyes do not reproduce well in deepfakes, so detectors rely on eye movements as a way to make that determination. The most advanced deepfake detectors are based on machine learning models to identify fake videos.
XceptionNet, a deep fake detector, labels a video created by researchers as real. Credit: University of San Diego, California
The spread of fake videos through social media platforms has caused great concern around the world, especially as they hinder the credibility of digital media, the researchers stressed. “If attackers have some knowledge of the detection system, they can design the input to correct and prevent the detector’s blind spots,” said Paarth Neekhara, another first author of the paper and a UC San Diego computer science student.
The researchers created a counter-example for all faces in a video frame. But while standard operations such as compressing and resizing videos typically remove opposite examples, those examples are built to withstand these processes. The attack algorithm calculates in a set of input transformations how the model classifies the images as true or false. From there, it uses this estimate to transform images to remain effective even after compression and decompression.
The modified version of the face is embedded in every video frame. The process is then repeated to create a deepfake video for all the frames in the video. The attack can also be applied to detectors that work on entire video frames, compared to crop faces.
The group renounced its code so that it would not be used by enemy parties.
High success rate
The investigators tested the attacks in two attacks: one with the attackers having full access to the detector model, including the face extraction pipe, and the architecture and parameters of the classification model; and attackers can only consult the learning model automatically to know the probabilities of having frames to classify the frame as real or false.
In the first scenario, the success rate of the attack is over 99 percent in uncompressed videos. In compressed videos, it was 84.96 percent. In the second scenario, the success rate was 86.43 percent in uncompressed and 78.33% in compressed videos. Deepfake is the first work to show successful attacks on top detectors.
“In order to use these deepfake detectors in practice, we argue that it is essential to evaluate against an adaptive opponent who is aware of these defenses and is deliberately trying to frustrate those defenses,” ?? writers write. “We have shown that the current state of deepfake detection methods can be easily overcome if the opponent has full or partial knowledge of the detector.”
To improve the detectors, the researchers recommend a similar approach to the so-called counter-training: during training, an adaptive opponent continues to create new deepfakes that can overcome the status of the current detector; and the detector continues to improve to detect new deepfakes.
Adversarial Deepfakes: Facing Examples of Deepfake Detector Damage Assessment
* Shehzeen Hussain, Malhar Jere, Farinaz Koushanfar, Department of Electrical and Computer Engineering, UC San Diego
Paarth Neekhara, Julian McAuley, Department of Computer Science and Engineering, UC San Diego