 Part one of your essential glossary of AI’s key terms and phrases:

A

Activation function

A non-linear mathematical function that generally takes the sum of multiple inputs, passes it through a non-linear mathematical transformation and produces a single scalar value. Historically, the sigmoid function was used as the activation function of the hidden layers of a neural network, but this has largely been replaced by ReLU. Classification models often use tangential (for binary classification) or “softmax” (for multinomial classification) functions for their output nodes.

Assistive intelligence

An AI system that assists a human, often by automating part of a workflow. Example: a system that automatically delineate anatomical structures on CT scans for radiotherapy planning, so that the oncologist only has to fine-tune the target area.

Augmented intelligence

Similar to assistive intelligence, but sometimes refers to AI systems that improve rather than simply streamline human performance. Example: a system that makes guideline-based recommendations for complex oncology treatment planning.

Autonomous intelligence

An AI system that performs tasks independently of humans. Example: self-driving cars. (It may be some time before we have ready examples of autonomous intelligence in the clinical domain, largely due to ethical issues.)

B

Backpropagation

The process by which a neural network algorithm applies small, constructive updates to its internal mathematical architecture during its training phase, backpropagation is generally used in conjunction with a loss function and gradient descent. In technical terms, each time the network makes a series of predictions during the training phase, the loss of those predictions is evaluated using the loss function, and the derivative of the loss is calculated with respect to each of the network’s weights (based on that weight’s position in the computing graph). Each weight is then updated as a function of its individual contribution to the loss.

Bias

Has two meanings in the world of AI:

1) The connections between fully connected layers of a neural network are generally governed by the equation wx+c, where w is the weight matrix, x is the input vector and c is the bias – a constant scalar applied to all connections within that layer.

2) Bias can also refer to the extent to which a machine underfits the training data, such that a model with too little bias (or too much variance) overfits the data, where a model with too much bias (or too little variance) makes predictions that are too broad to be useful (e.g. a histopathological analyser that always labels the probability of a tissue sample
being malignant as 0.5).

Bounding box

A square or rectangle drawn around a target object within an image.

C

Computer vision

The study of developing computer systems that can perform useful functions based on image data (e.g. detecting the presence of a target entity, classifying images, delineating structures). Probably the most advanced area of modern artificial intelligence, with major applications in radiology, histopathology and dermatology.

Convolutional neural network (CNN)

A type of neural network that uses the mathematical technique of “convolution” to detect localised features within input data. CNNs are most commonly used in computer vision problems. By passing data through large numbers of sequential convolutional layers, a network can detect increasingly abstract objects within image data.

D

Deep learning library

A library of pre-compiled functions that simplify the creation of deep learning models. Popular libraries include TensorFlow, Theano, PyTorch, Caffe and Keras.

E

Expert system

Any system that aims to reproduce the cognitive processes of a domain expert. The term generally refers to systems based on classical programming rather than machine learning. Example: a program that checks antimicrobial prescriptions against local guidelines and suggests amendments when
the two don’t match.

G

General AI

A human-level intelligence. Think HAL from 2001: A Space Oddysey. The dream of many AI researchers but probably (or perhaps hopefully) still several decades away from being realised.

Generative Adversarial Networks (GANs)

A concept cooked up by Ian Goodfellow (currently the machine learning director at Apple and co-author of the standard textbook on deep learning). Two neural networks are trained together – one is the “generator” and one the “discriminator”. The discriminator is tasked with discerning between real vs fake data, whereas the generator is tasked with producing fakes good enough to fool the discriminator. As one improves, so must the other, such that data (often images and videos) produced by the generator network can become indiscernible from real data. GANs are responsible for the “deep fake” videos currently doing the rounds on social media. They can also help to boost small datasets and can increase the robustness of existing deep
learning
models.

After a machine learning model has made a series of predictions, the error of those predictions is calculated based on the loss function. Next, the gradient of the loss function at the point of intersection with the error is evaluated. Loss functions are generally exponential (imagine a graph with a U-shaped line), so their gradient varies. The closer to the bottom of the loss function curve (which is the point we want our model to reach – the point where the loss is lowest), the smaller the gradient. Conversely, the further from the bottom of the curve, the higher the gradient.

Having calculated the gradient of the loss function as the point of intersection with the error, changes are made to the machine learning algorithm(using backpropogation) in proportion to both the gradient and the chosen learning rate. This process is known as gradient descent.

H

Hierarchical AI

An AI system comprising of layered modules or subroutines, somewhat akin to human cognition. For example, an autonomous vehicle may use a CNN to make sense of a video feed, then feed the CNN’s output along with the output of other sensory modules to a reinforcement learning network. This network may determine a high-level course of action (e.g. “retreat”), then delegate this command to a series of RNNs that will generate the sequence of motor controls necessary to
execute this action.

This is broadly analogous to a
human seeing, say, a bar of chocolate (by virtue of photoreceptor signals being processed by the visual cortex), making a decision to eat it (based on executive reasoning occurring within the frontal cortex, which weight the short-term vs long-term rewards), then reaching out for the bar (after delegating that command to the motor cortex, which generates an appropriate sequence of action potentials along efferent motor neurons).

Hyperparameter

A value that governs the behaviour (usually during the training phase)
of a machine learning algorithm.
For example, the learning rate is probably the most commonly discussed hyperparameter.

I

Intersection over union (IOU)

A means of evaluating the accuracy of a bounding box supplied by an AI algorithm when compared with the ground truth bounding box (usually supplied by a human). The area of overlap between the two bounding boxes is divided by the total area covered by the two boxes, resulting in a scalar metric that can be used to evaluate a box’s accuracy.

L

Learning rate

hyperparameter of a neural network, which governs the degree of change made to its internal mathematical structure at each training update.

Long short term memory
(LSTM) units

Usually a vector or matrix of numbers within a recurrent neural network that is used to store persistent contextual information about previous items in a sequence of input data (beyond just the immediately preceding item – see recurrent neural network description).

Loss function (aka: cost function)

A mathematical function (or equation) that produces a single (scalar) value representing the error of a machine learning algorithm’s prediction. Much of the art of data science is working out how to frame a problem in numerical terms. Choosing an appropriate loss function is a key part of this.

Look out for part two: M-Z