Protecting Deep Learning Systems from Adversarial Attacks
Every day, neural networks are being integrated into a wide range of high impact systems, from self driving cars to bio-medical screening and diagnosis. This raises the extremely important question —just how secure are these systems from attack?
The unfortunate answer is, they are quite vulnerable. Researchers at Georgia Tech and Intel have recently demonstrated how attackers can trick computer vision systems (i.e., neural networks) so they see things that don’t actually exist. This has serious implications for self driving cars and other safety critical systems, where human life is at stake.
In order to preempt these attacks, Georgia Tech’s Polo Club of Data Science and Intel are working to defend deep learning systems from adversarial attacks through DARPAs Guaranteeing Artificial Intelligence Robustness against Deception (GARD) program.
Our research is an attempt to detect adversarial attacks in real-time before an attacker can cause significant damage. Currently, these deep learning systems do not distinguish objects in ways that humans would. For example, when humans see a bicycle, we see its handlebar, frame, wheels, saddle, and pedals (Figure 2, top). Through our visual perception and cognition, we synthesize these detection results with our knowledge to determine that we are actually seeing a bicycle.
However, when a stop sign or a bicycle is modified to fool a model into misclassifying it as a bird, to humans, we still see the bicycle’s robust features (e.g., handlebar). On the other hand, the deep learning systems fails to perceive these robust features, and is often tricked into misclassifying the image.
The question is, how do we incorporate this intuitive detection capability natural to human beings, into deep learning models to protect them from harm?
Defending Deep Learning Systems using UnMask
We propose the simple, yet effective idea that robust feature alignment offers a powerful, explainable and practical method of detecting and defending against adversarial perturbations in deep learning models. A significant advantage of our proposed concept is that while an attacker may be able to manipulate the class label by subtly changing the object, it is much more challenging to simultaneously manipulate all the individual features that jointly compose the image. We demonstrate that by adapting an object detector, we can effectively extract higher-level robust features contained in images to detect and defend against adversarial perturbations.
Through extensive evaluation, we demonstrate that our proposed defense system, UnMask, can effectively detect adversarial images, and defend against attacks by rectifying misclassification. As seen in Figure 3 below, UnMask (UM) performs 31.18% better than one of the leading defense techniques, adversarial training (AT), and 74.44% than no defense (None).