Human’s vision is robust in such a way that when we see a stranger; look away for a few seconds, and then look back again at the same stranger, we will still be able to recognize that’s the person we saw moments ago, even though he/she might have moved off from his/her original position. However, such seemingly intuitive capability turns out to be rather daunting for machines.
A neural network needs to be trained with innumerable representations of that stranger in different environment and in a series of transformations, before it can pull off something similar. As such, improving training materials and modelling deep neural networks after primate visual systems are two areas neuroscientists and engineers thought could improve machine vision. Recently, some researchers are exploring the use of a different computational strategy to train machines to see the way we see.
Machines need to master invariance
Yena Han, PhD candidate in electrical engineering and computer science and Tomaso Poggio, Director of the Center for Brains, Minds and Machines (CBMM) and the Eugene McDermott Professor of Brain and Cognitive Sciences at Massachusetts Institute of Technology (MIT) believe the problem lies in the fact that machines aren’t able to learn from limited examples and do not comprehend and perform invariance. This prompted the duo to establish definite measurements of basic invariances in human vision.
In order to do so, Han, Poggio and their research team at MIT measures invariance in one-shot learning by showing human participants Korean characters which they at all familiar with. These characters were first presented under a specific condition, before they are being shifted, scaled, or transformed from their original positions. As expected, participants do not experience difficulty in recognizing these foreign characters even after only one single exposure.
In the second part of the study, researchers performed a corresponding task on deep neural networks. They found by liaising out invariant recognition of objects by human, neural networks are able to in-built such scale invariance. Besides, neural network that comes with increased receptive fields away from the center of the visual field will improve its perception of limited position invariance. All findings were published in Nature Scientific Reports in January.
Machine vision and medicine
Han remarked the study renders new insights into invariance, an area mostly neglected by the artificial intelligence (AI) community, as well as “a new understanding of the brain representation of objects under different viewpoints… and what is a good architectural design for deep neural networks”.
In medicine, machine vision is predominantly used to process images and assist physicians to make quicker and more accurate diagnoses. A more flexible and enhanced neural network means training can be acquired with lesser data and possible false positive can be minimized. For example, the Triton Sponge, a mobile monitoring system which count the number of surgical sponges used during a surgery and estimate the amount of blood loss. It combines AI and machine vision to differentiate whether the same sponge had been shown twice to prevent over-counting and likelihood of a sponge being left inside a patient’s body.
At the same time, researchers are also tapping in the making of more portable diagnostic imaging devices. The University of Illinois planned to conduct optical spectroscopy via the use of AI and smartphone camera. On the other hand, a Rice University-led team was looking into developing new wearables or point-of-care microscope which uses machine vision and sensors to monitor conditions that depends on biopsy or blood tests.
The present machine vision market size is estimated to be around $8.44 billion and it’s expected to grow at a rate of 6.86% in the next five years. Computer vision held the promise of making AI into more than just an effective disease detective. Right now, the pressing need is appropriate regulations and more research to sharpen its capabilities.