“The paradigm shift of the ImageNet thinking is that while a lot of people are paying attention to models, let’s pay attention to data. Data will redefine how we think about models.” Fei-Fei Li, Stanford computer science professor

 

It is about a decade since Alex Krizhevsky and Geoffrey Hinton deployed a deep convolutional neural network (CNN) in the ImageNet classification of 1.3 million high resolution images into 1,000 different classes at the large scale visual recognition challenge, or LSVRC-2010. The neural network they deployed had 500,000 neurons and 60 million parameters with 5 convolutional layers. Since this AI inflection point, much progress has been made in utilizing CNN in medical computer vision. Even with challenges of data size, medical imaging applications now range from radiology and cardiology to pathology, dermatology, and ophthalmology with physician level diagnostic performance.

Multiple instance learning enables learning in datasets with massive images with few labels (histopathology slides), 3-D convolutions enables learning from 3D volumes (MRI and CT), and spatio-temporal models and image registration enables time-series images (ultrasound and fluoroscopy). Medical video in various formats such as surgical applications and human activity is also a dimension being explored with CNN (when coupled with sensors and video streams, computer video can enable safety applications in clinical and home settings for ambient intelligence).

One key development is the use of transfer learning to minimize the problem of small medical datasets. In addition, data augmentation and generative adversarial networks (GANs) have been developed. Another methodology is self-supervised learning in which implicit labels are extracted from data points and used to train algorithms.

All of these techniques are pushing towards fully unsupervised learning. The concept of federated learning in which centralized algorithms can be trained on distributed data that never leaves protected enclosures may enable future clinical data science projects. Computer vision with medical images are also now part of multimodal learning to incorporate language, time-series, and genomic data.

In spite of all these advances, however, the real-world deployment of these technologies remains a challenge. In addition, medical data access with its ethical and legal questions as well as bias and trust issues need to be addressed. Finally, the promise of medical computer vision with deep learning is excellent in the longer term but will need clinicians and AI experts to be in synergy.

Read the full article here