Adversarial attacks: why the network is easy to deceive?

In recent years, as systems deep learning are becoming more common, scientists have demonstrated how controversial the samples can affect anything from a simple classification of images to systems of diagnostics of cancer — and even create life-threatening situation. Despite their danger, however, the adversarial examples are poorly known. And scientists concerned about: is it possible to solve this problem?

What is adversarial attack (adversarial attack)? It is a way to fool the network that it gave an incorrect result. They are mainly used in scientific research to test sustainability models for non-standard data. But in real life example might be changing a few pixels in the image of the Panda so that the neural network will be sure that the image is Gibbon.scientists only add to the image “noise”.

Adversarial attack: how to cheat the network?

New work at mit indicates a possible way to overcome this problem. Deciding it, we could create a much more reliable model of deep learning, which would be much harder to manipulate malicious ways. But let’s first review the basics of the controversial samples.

As you know, the power of deep learning stems from a superior ability to recognize patterns (patterns, templates, charts, patterns) in the data. We’ll feed the neural network of tens of thousands of labeled photos of animals and she learns what patterns are associated with a Panda, and some with the monkey. Then she will be able to use these patterns to recognize new images of animals, which she had not previously seen.

But model deep learning is also very fragile. Since the image recognition system relies only on pixel patterns, but not on a more conceptual understanding of what she sees, it is easy to cheat to get her to see something completely different — just a certain way violating patterns. Classic example: add a bit of noise on the image of the Panda and system klassificeret it as a Gibbon with almost 100 percent certainty. This noise and it is a controversial attack.


For several years scientists have observed this phenomenon, especially in computer vision systems, not really knowing how to get rid of such vulnerabilities. In fact, work presented last week at a major conference dedicated to research of artificial intelligence — ICLR — calls into question the inevitability of adversarial attacks. It may seem that regardless of how many images of pandas you feed the classifier of the image will always be a kind of indignation, with which you break the system.

But new work by MIT shows that we incorrectly speculated about the controversial attacks. Instead of having to come up with ways to collect more qualitative data that feeds the system, we need to fundamentally reconsider our approach to her learning.

The work demonstrates this by revealing some interesting properties of adversarial examples that help us understand what is the reason for their effectiveness. What’s the trick: random, it would seem that the noise or labels that confuse the neural network, it actually uses very spot, barely noticeable patterns that the visualization system has learned to strongly associate with specific objects. In other words, the machine doesn’t crash at the sight of the Gibbon’s where we see the Panda. In fact, she sees the logical arrangement of pixels invisible man, which more often appeared in pictures with Gibbons than in the pictures with the pandas while learning.

Scientists demonstrated this by experiment: they created a dataset with images of dogs, all of which were modified so that the standard classification of images wrongly identified them as cats. Then they put these images of “cats” and used them to train a new neural network from scratch. After training, they showed that neural networks of real pictures of cats, and she correctly identified them all as cats.

The researchers suggested that in each dataset there are two types of correlations: the templates, which are actually correlated with sense data, like a mustache on pictures of cats fur color in the pictures with the pandas, and templates that exist in the training data, but does not extend to other contexts. These latter “misleading” correlations, let us call them, are used to adversarial attacks. Recognition system trained to recognize “deceptive” patterns, finds them and thinks she sees a monkey.

It tells us that if we want to eliminate the risk of an adversarial attack, we need to change the way learning our models. Currently, we allow the neural network to choose the correlation that she wants to use to identify objects in the image. As a result, we cannot control the correlations that it finds, regardless of, real or misleading. If, instead, we would have trained their models to be able to remember the actual patterns are tied to the semantic pixels — in theory it would be possible to produce a system of deep learning, which could be confusing.

When scientists tested this idea using only real correlation for training their model, they actually reduced its vulnerability: it was manipulated only in 50% of cases, while the model trained on real and false correlations, were manipulated in 95% of cases.

If you draw a short summary, from adversarial attacks can be prevented. But we need more research to eliminate them completely.