Comparing Human and Artificial Image Recognition: some considerations – Nonteek

When it comes to Artificial Intelligence, the debate can get hot quite quickly, usually between a faction crouching in a self-defense position, sustaining that human capabilities are not to be reached from machines any time soon, and a faction advocating that the era of AI is instead almost here, if not already arrived.

This post is not meant to be an introduction to the arguments mentioned (I might write a more in-depth post later on), but to expose some considerations about how misleading a crude comparison between the results of the two can be if the whole context is not taken into account.

Talking about Deep Neural Networks (DNNs), they are nowadays considered state-of-the-art in many areas of Artificial Intelligence, especially computer vision, so we might as well consider them a significant benchmark for this debate. So, how do they relate to human vision? Are they on par with our own capabilities? Turns out that the answer is not exactly straightforward.

An interesting paper of Christian Szegedy & coll.[1] showed that DNNs have counter-intuitive properties, that is, they seem to be very good at generalization, even better than humans, yet they can be easily fooled with Adversarial Negative Examples. The authors hypothesized that a possible explanation was the extremely low probability of those adversarial sets to be observed in a test set, but (like the rational numbers) dense enough to be found virtually in every test case.

Adversarial Examples from MNIST digit images. The odd columns are the original images, while the even ones are images slightly distorted with an appropriate function. The distorted images are very easy to recognize for humans but are **never recognized by the Neural Network**(0% accuracy).

In this image, the even columns are the images processed with random distortion. Interestingly, the recognition accuracy here was about 51%, while totally **unrecognizable by humans**.

Many years have passed since the first pioneering works on adversarial classification[2,3], and nowadays many adversarial examples are generated with Evolutionary Algorithms (EA) that evolve as a population of images. With this kind of algorithms, it is interesting to note that is possible to fool state-of-the-art neural networks to “recognize” with almost 100% certainty that images evolved to be totally unrecognizable to humans as natural objects[4].

Using evolutionary algorithms to produce images that match DNN classes can produce a terrific variety of different images, and looking at these, and interestingly the authors note that:

“For many of the produced images, one can begin to identify why the DNN believes the image is of that class once given the class label. This is because evolution need only to produce features that are unique to, or discriminative for, a class, rather than produce an image that contains all of the typical features of a class .”

These examples demonstrate how AI recognition can be intentionally fooled, making it fail to recognize some images which are obvious to us (false negatives), and also making it recognize with strong confidence, something that to us is obviously not there. There is plenty of literature on this topic[5–7], which can be pretty important also from a cybersecurity perspective[8].

However, we should underline that human recognition has its own shortcomings too: there is plenty of optical illusions to demonstrate it, not least the famous white and gold vs blue and black dress, which sparked a lot of debate.

The famous black and blue dress: some people see it as blue and black, while others as white and gold. The lack of context together with the bad quality of the image forces us to a guesswork, and what we “see” depends on our own interpretation of the ambient luminosity.

A visual explanation about how the context can trick us to see what’s not there: the two images above are the same shot, where the one to the right had the model and background slightly darkened, without touching the dress.

There are cases where artificial recognition can consistently outperform humans[9,10], like fine-grained intra-class recognition (e.g. dog breeds, snakes, etc). It also appears that humans can be even more susceptible than AI when there is insufficient training data, that is, the human himself did not have enough exposure to that kind of class.

Human perception is a tricky beast, it seems extremely good to us, because it can be pretty robust and adaptive, but as we have just seen it depends a lot on pregressed knowledge as we also need training (sometime lifelong training) in order to perform it with some degree of success. Sure enough, we have also some innate categories where we are very skilled at recognizing since our birth (e.g. human faces of our own race), but guess what? We are also susceptible to be fooled there too, if we only change illumination[11,12].

Even human faces can be hard to recognize for us, with just a change of illumination.

Also, we are reliant on aspects of the reality that is not objective at all, like colors. Everyone knows that colors depend on light wavelengths reflected by the objects, but we often forget that what really makes colors be what they are to us, is our brain interpretation. In short, colors do not exist in nature, they are just a small portion of the light that our brain encodes in specific sensations. We don’t see infrared or ultraviolet, or gamma rays as color, which are definitely there, and we also see colors that do not really “exist” in the spectrum, like brown.

Our perception is strongly tied not only with our neurophysiology but also with our cultural context. There is a by now famous Namibian tribe, named Himba, which has dozens of terms to define green, while it has no words at all for blue, and apparently its members don’t seem able to distinguish blue from green at all, while they are still much better than us at spotting very slight differences of greens[13,14]. Furthermore, very recent studies demonstrated that humans can be prone to be fooled by some kind of adversarial images as much as machines[9,15,16].

The variations of shortcomings between human and artificial image recognition suggest that the process is very different. Human recognition is not better or worse than machine recognition, or at least is a very ill-posed problem, since we consistently neglect to take in account the knowledge and training that is needed to us to perform any recognition at all.

References

[1]
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus, Intriguing properties of neural networks, in: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
[2]
N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma, Adversarial classification, in: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’04, ACM Press, 2004. doi:10.1145/1014052.1014066.
[3]
D. Lowd, C. Meek, Adversarial learning, in: Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining – KDD ’05, ACM Press, 2005. doi:10.1145/1081870.1081950.
[4]
A. Nguyen, J. Yosinski, J. Clune, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, ArXiv E-Prints. (2014) arXiv:1412.1897.
[5]
B. Biggio, F. Roli, Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning, (2017).
[7]
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM. (2017) 84–90. doi:10.1145/3065386.
[11]
C. Hong Liu, C.A. Collin, A.M. Burton, A. Chaudhuri, Lighting direction affects recognition of untextured faces in photographic positive and negative, Vision Research. (1999) 4003–4009. doi:10.1016/s0042-6989(99)00109-1.
[12]
A. Missinato, Face Recognition With Photographic Negatives: Role of Spatial Frequencies and Face Specificity, University of Aberdeen, 1999.
[15]
Gamaeldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goofellow, Jascha Sohl-Dickstein, Adversarial Examples that Fool both Computer Vision and Time-Limited Humans, (2018).
[16]
E. Watanabe, A. Kitaoka, K. Sakamoto, M. Yasugi, K. Tanaka, Illusory Motion Reproduced by Deep Neural Networks Trained for Prediction, Front. Psychol. (2018). doi:10.3389/fpsyg.2018.00345.

Source link

Comparing Human and Artificial Image Recognition: some considerations – Nonteek

References

Like this:

Related