**Part two of your essential glossary of AI’s key terms and phrases:**

**M**

**Machine Learning**

The study of algorithms that can infer patterns and rules within data without needing explicit instructions. Note that this definition technically includes deep learning, but that the two terms tend to be used with mutual exclusivity. Machine learning generally refers to older algorithms including support vector machines, random forest regression, K nearest neighbour, etc.

** **

Deep learning is very much in fashion at the moment and there is a tendency to apply it (perhaps somewhat injudiciously) to all analytical problems. However, while deep learning algorithms are remarkably potent computational tools, they are heavily reliant on large volumes of data. Traditional machine learning still has a significant place in data science, particularly with regard to data-poor areas (including many less common medical conditions).

**Matrix**

A rectangular array of **scalar**s (in lay terms: a grid of numbers) that behave as a unit.

**Model**

Used interchangeably with the term **algorithm** in the context of **machine learning algorithm**s, or with the term *network* in the context of **neural network**s.

**N**

Natural Language Processing (NLP)

The study of developing computer systems that can perform useful functions based on natural spoken or written language (think Amazon Alexa, Google Assistant, etc.). Medical NLP applications are mostly early stage, but the last 18 months have seen some major advances in the field and it’s definitely a space to watch.

**Neural Network**

A machine learning algorithm inspired by the neuronal architecture of the biological brain. Neural networks form the cornerstone of **deep learning**.

**O**

**Overfitting**

When a **machine learning model** learns features that are too specific to its training set and will not generalise well to real-world examples. For example, a weather prediction **algorithm** trained on three years of data may predict rain on the 27th March next year with 100% certainty, based on the fact it rained on 27th March each of the last three years. Overfitting is a major issue in **machine learning** and is often the consequence of small datasets, or datasets that do not contain an adequate spread of data.

**P**

**Python**

Currently the most popular programming language for **machine learning **applications

**R**

Rectified linear activation unit (ReLU)

A non-linear **activation function **whose output equals its input for positive numbers, but which outputs 0 for all negative input numbers. So ReLU(2) = 2, whereas ReLU(-1) = 0.

Intuitively, this may seem too simple to allow for powerful data manipulation. However, it has proven very effective and has replaced the sigmoid function as the **activation function** of choice for fully-connected hidden layers of most **neural networks**.

**R**

A statistical programming language, probably a close second to **Python** in popularity for **machine learning** applications. Note that NHS Digital has chosen R over **Python** as their official programming language.

**Recurrent neural network (RNN)**

A type of** neural network** that ‘remembers’ information from the previous item within a sequence of data in order to inform the interpretation of the current item. RNNs are most commonly used in **natural language processing**, where previous words in a phrase inform the interpretation of the current word. For example, knowledge of the previous item is essential in interpreting the word *complaint* in the phrases *presenting complaint *vs *written complaint.*

* *

**Reinforcement learning (RL)**

A relatively nascent field of AI focussed on the concept of training machines to develop behaviour strategies based on distant rewards. RL was pivotal to the celebrated success of AlphaGo and has some very interesting (though currently theoretical) applications in clinical medicine, where sequential decision making based on learned behaviour models is central to the activity of many clinicians.

**S**

**Scalar**

A single, real number (e.g. 725 or 3.142).

**Supervised learning**

The process of training a **machine learning model** by tasking it with mapping from input data to pre-assigned labels. For example: feeding a **CNN** a large collection of chest X-rays and training it to detect pneumonia, where each X-ray in the training set has been labelled with a 0 or 1 (denoting the absence or presence of pneumonia) by an expert radiologist. Most **deep learning** applications in production today are based on the supervised learning framework.

**T**

**Tensor**

For the purposes of **machine learning**, a tensor refers to a **scalar**, **vector** or **matrix** whose values will be transformed as part of an algorithm – hence, Google’s **deep learning library **is called TensorFlow.

**Test set**

The dataset used for the final evaluation of a completed **machine learning model**.

**Tokenisation**

The process of converting words into numerical “tokens” so that they may be used by mathematical algorithms (e.g. **deep learning** models). In the simplest example, a word’s token is simply its position within the dictionary (list) of words used for a given **NLP** task.

**Train set**

The dataset used during the primary training of a **machine learning model** (often 80% of the data, where 10% is reserved for the **validation set** and 10% for the **test set**).

**Transfer learning**

The process by which a **deep learning** **model** can re-use knowledge acquired in one domain to improve performance in another domain. For example, a **CNN** trained to identify real-world objects from photographs (of which there is an abundance online) could be fine-tuned to identify cerebral haemorrhages from CT brains (examples of which may be harder to acquire). The feature abstraction functions learned in the early convolutional layers of the network (e.g. the ability to detect edges and rudimentary geometric shapes) will be common to both tasks, so by re-training only the later layers of the network, one can both expedite training and achieve high performance with less training data.

**U**

**Unsupervised learning**

The process of training a **machine learning model** where the labels for the data are not provided. Often, unsupervised learning tasks centre around the concept of “clustering”, such as grouping patients from a certain disease population into a prespecified number of phenotypical groups.

**V**

**Validation set**

Usually, the dataset that is used to tune a **machine learning model**‘s hyperparameters (e.g. the dichotomisation threshold when results are produced by the model on a continuous scale but required by the clinician in binary form, such as 1 for “malignant” or 0 for “benign”) after the initial training phase based on the **training set**.

**Vanilla neural network**

A **neural network** that consists of an input layer, a small number of fully connected **hidden layer**s and an output layer. The term “vanilla” refers to the fact that this is the simplest form of **neural network **architecture, which does not contain **convolutional** features, **recurrent **features, **LSTM** **unit**s, etc.

**Vector**

A list of **scalar **numbers that behave as a unit. The length of the list is referred to as the “dimension” of the vector, such that a 3-dimensional vector is a list of 3 numbers. For example, if is a 3-dimensional vector , then .

**W**

**Weights**

The essential differentiable component of a **neural network**. Weights are **scalar** values that are adjusted during a network’s training phase to alter its internal mathematical structure, which in turn adjusts the function performed by the network. A “weight matrix” contains all the weights of a given layer within a network.

The black magic of **deep learning** lies in the fact that a **neural network**’s weight matrices starts out as a collection of randomly initialised **scalar** values (assuming it is not making use of **transfer learning**) but, by incrementally tweaking these values, the network can transform itself into a cutting edge data processing tool.

**Word vectorization**

An interesting idea whereby a computer learns to represent words as multi-dimensional **vectors**, which can be used to perform meaningful functions.

## Leave A Comment

You must be logged in to post a comment.