An Intuitive Introduction to Classification and Adversarial Input
Nick Stoltzfus
Oct 27, 2023
How to Trick a Human
Humans learn how to distinguish certain things about their environment through repeated analysis of information from their eyes, nose, ears, touch, and taste. From a young age, we learn what different inputs from these senses may indicate in terms of future events. The painful sting of a bee may be associated with the sound of a bee buzzing. The quenched relief from drinking water on a hot day is the result of realizing one’s thirsts.
Visually, many orientations of our surroundings yield additional information regarding our position in space. This particular facet of our visual learning makes us susceptible to the following optical illusion:
https://www.verywellmind.com/cool-optical-illusions-2795841 - Kendra Cherry, MSEd 2023
In the above image, it appears that the yellow line at the top of the image is slightly larger than that of the lower yellow line. This is because our brains expects objects that extend into the horizon to be far away.
Therefore, if the yellow line is wider than the track and far away, it must be larger than the yellow line that is not wider than the track.
This effect can be better seen in the image below:
https://www.verywellmind.com/cool-optical-illusions-2795841 - Kendra Cherry, MSEd 2023
From the above example, at least at a quick glance, humans are susceptible to mistaking facts about an image due to previous visual conditioning. If a newborn toddler had not yet learned a sense of distance or scale yet, they may not fall prey to this kind of optical illusion.
2 How to Trick a Machine
Adversarial input is the term given to a machine learning model that has mistakenly produced incorrect output due to an adversary. This adversary has, in one of a number of different ways, produced a sample that is specially crafted to confuse a machine learning model.
Take the famous image below as an example:
“Explaining and Harnessing Adversarial Examples” - Goodfellow, et al 2015
The image is one of about 50000 test images in the ImageNet dataset. Among these 50000 images, there are about 1000 possible classifications that can be assigned to each of the images. As shown in the above image, on the left we have an image of a panda that has been correctly classified as a panda. The classification of a panda was assigned a relatively high confidence of 57% given that there are thousands of possible classifications to assign to the given image.
In the center, we see a jumble of colors that represents a specially crafted adversarial perturbation. The adversary has created what is known as a “perturbation”, with regards to images a perturbation is change of pixel values that results in a change in the classification of the image. This change may be large or small. In this case, the adversary has found that only a very small pixel value is required (note that the shown pixel array has been multiplied by 0.007) to create a misclassification that causes the model to have extremely high confidence (>99%) in its output!
As a result, the machine learning model that believed the image on the left to be a panda with high confidence has been confounded by the human-imperceptible perturbation added by the adversary. The image on the right looks nothing like a gibbon to the human eye, yet the machine learning model believes it to be a gibbon. How can this be?
2.1 Simple Classification
Let’s first look at a very simple example of what classification means. Classification is simple the result of separating data into two or more groups based on some aspect of the data. Below is a simple Two-dimensional example of classification.
“Learning with Data” - Abu-Mostafa et al., 2012
From the above image, it is very clear to see that the data is not separated correctly by the line on the left and is correctly separated into X’s and O’s on the right. When a mere line can be drawn to separate the data, classification seems to be a simple task. However, there can exist much more complex sets of data.
2.2 Complex Classification
Although the above example seems simple to classify, a more complex data set may require a more complex solution. Take the image below for example:
“Learning with Data” - Abu-Mostafa et al., 2012
On the left, an image of a line is drawn across the data to attempt to classify them. However, no matter where a straight line is drawn, the X’s and O’s will never be perfectly separated.
On the right, we have exactly the same set of data, the same X’s and the same O’s. However, now a perfect separation between X points and O points is made, but the line used is a 4th-order polynomial, which is much more complex than a line.
This goes to show that, when training, a perfect classifier may be found that can achieve 100% accuracy while training. However, if 1000 samples were added from a test set, the classified on the left would likely have a higher accuracy, and be much less computationally expensive. From this we learn that complex datasets may require very complex classifiers to be accurate. But conversely, the more complex we make the classifier, the higher we push the computational cost without necessarily improving our results during real world testing.
3 Adversarial Sample Generation
The primary goal of an adversarial sample is to create the misclassification of an input through a small, human-imperceptible perturbation on a benign (i.e. not-yet-adversarial) input. There are a number of different ways to create adversarial samples for a given machine learning model.
“Learning with Data” - Abu-Mostafa et al., 2012(arrows added)
As shown above, each of the points with arrows assigned to them have undergone a change in their X and Y coordinates. The larger the arrow used to move them corresponds with a larger change in their coordinates. Because adversaries are looking for data points that only require a small perturbation, points that exist close to the boundary line are ideal targets for adversarial attack.
It can easily be seen that the more complex model on the right has more points that exist close to the boundary line. As such, we can infer that, in general, as a model increases in complexity to fit a given dataset a smaller change is needed to push a data point from one side of a boundary line to the other.
Take the first figure with a panda for example, this is an extremely complex model with dimensions equal to the size of the image and color scheme used. Instead of a simple two dimensional model like we have here, that model has thousands of dimensions. As such, we expect the boundary line to be potentially quite close to a number of data points. As shown in that example, only a sub-pixel (0.007) color value change is needed to radically change the classification of the image.
To many, the opinions I present above may be antiquated in the field of machine learning, but I hope they may entertain enthusiasts which may not be invested in the most recent literature.
4 Next Time
In my next post, I’ll be performing a short empirical study on traffic signs as a potentially viable target for adversarial attack. I’ll show the dangers and the shortcomings of autonomous vehicles as well as some positive aspects such as real world viability.