Protecting Deep Learning Systems from Adversarial Attacks

Scott Freitas
3 min readNov 25, 2020

Every day, neural networks are being integrated into a wide range of high impact systems, from self driving cars to bio-medical screening and diagnosis. This raises the extremely important question —just how secure are these systems from attack?

The unfortunate answer is, they are quite vulnerable. Researchers at Georgia Tech and Intel have recently demonstrated how attackers can trick computer vision systems (i.e., neural networks) so they see things that don’t actually exist. This has serious implications for self driving cars and other safety critical systems, where human life is at stake.

Figure 1. Using the ShapeShifter attack, researchers at Georgia Tech and Intel have shown how vulnerable self driving cars’ computer vision systems are to attack.

In order to preempt these attacks, Georgia Tech’s Polo Club of Data Science and Intel are working to defend deep learning systems from adversarial attacks through DARPAs Guaranteeing Artificial Intelligence Robustness against Deception (GARD) program.

Our research is an attempt to detect adversarial attacks in real-time before an attacker can cause significant damage. Currently, these deep learning systems do not distinguish objects in ways that humans would. For example, when humans see a bicycle, we see its handlebar, frame, wheels, saddle, and pedals (Figure 2, top). Through our visual perception and cognition, we synthesize these detection results with our knowledge to determine that we are actually seeing a bicycle.

Figure 2. UnMask combats adversarial attacks (in red) by extracting robust features from an image (“Bicycle” at top), and comparing them to expected features of the classification (“Bird” at bottom) from the unprotected model. Low feature overlap signals an attack.

However, when a stop sign or a bicycle is modified to fool a model into misclassifying it as a bird, to humans, we still see the bicycle’s robust features (e.g., handlebar). On the other hand, the deep learning systems fails to perceive these robust features, and is often tricked into misclassifying the image.

The question is, how do we incorporate this intuitive detection capability natural to human beings, into deep learning models to protect them from harm?

Defending Deep Learning Systems using UnMask

We propose the simple, yet effective idea that robust feature alignment offers a powerful, explainable and practical method of detecting and defending against adversarial perturbations in deep learning models. A significant advantage of our proposed concept is that while an attacker may be able to manipulate the class label by subtly changing the object, it is much more challenging to simultaneously manipulate all the individual features that jointly compose the image. We demonstrate that by adapting an object detector, we can effectively extract higher-level robust features contained in images to detect and defend against adversarial perturbations.

Through extensive evaluation, we demonstrate that our proposed defense system, UnMask, can effectively detect adversarial images, and defend against attacks by rectifying misclassification. As seen in Figure 3 below, UnMask (UM) performs 31.18% better than one of the leading defense techniques, adversarial training (AT), and 74.44% than no defense (None).

Figure 3. Across multiple experiments, UnMask (UM) can protect deep learning systems 31.18% better than adversarial training (AT) and 74.44% than no defense (None).

Want to read more?

While we aren’t able to cover everything in this blog post, the interested reader can read more about UnMask through our IEEE Big Data’20 paper on arXiv or check out the code on Github.



Scott Freitas

PhD student @ Georgia Tech. I work at the intersection of applied and theoretical machine learning.