Machine learning (ML) is powerful. It finds patterns and makes predictions from a sea of random data, assisting researchers and medical professionals to perform their job more efficiently. However, ML is also prone to make mistakes and deriving at false positives. Most of the time, algorithms are complex, making it hard for a human to sift out and pinpoint possible errors. 

A classic example will be the “tank problem” occurred in the 1960s. Researchers derived at an algorithm to identify tanks in photographs. The algorithm performed flawlessly when test images were involved but failed miserably when real photographs from the field were used. Researchers realized confusions arise because most of the actual photographs contained other elements such as sunset, clouds, trees and so on, which were absent in test images. This affected the algorithm to make the right judgment. 

Surprisingly, half a century later, ML is still not entirely immune to some of these pitfalls. Patrick Riley, Principal Engineer and Senior Research of Google Accelerated Science Team wrote to Nature recently, highlighting some of the issues which we need to avoid while training ML. 

Do not spilt data at random 

According to Riley, most researchers spilt their data randomly into training and testing sets, without knowing real-life data may not be as unsystematic. These natural data tend to demonstrate some trends over time because of the way they are being gathered. As such, the way the data is spilt should correspond to the question that researchers try to answer. 

For example, in the case of training the intended algorithm to find the right drug candidate or molecule. If researchers wish the ML model to predict the effects of different atoms on the selected molecule, then each molecule in the test set should be a few atoms different from the training set. If researchers wish to make the best predictions out of vastly different molecules, then the test set should also be vastly different from the training set. 

Beware of unintentional variations 

Riley said his team at Google was in charge of working with a startup to optimize experiments that produce high energy plasma. Like all experiments, they control for certain factors while allowing others to change as they train their ML model. Nevertheless, the team discovered the predictive power of the algorithm did not vary, whether they control only time or whether they control all the factors. It appears that time is the unexpected element that is affecting everything here. Because ML is sensitive to unintentional variations, Riley suggested using one ML model to detect and underline these factors, while others rule out confounders. 

It’s hard to stay objective 

A loss function is when a decision making or optimization model requires to estimate the cost of various errors it made. At times, as Riley cited, the decision is between “whether it is better to make two errors of 1% each or a single error of 2%”. Thus, researchers have to be careful what is the loss function based upon and again, reflect on the question that they are trying to answer. 

For example, Riley’s team had developed an ML model to screen for diabetic retinopathy, a diabetes-related complication and one of the leading causes of preventable blindness in the world. Riley’s team discovered the ophthalmologist would often disagre with the diagnosis. Hence, the ML model should not be based upon a single prediction or a majority vote. Otherwise, it will be inaccurate. 

Likewise, the model should also not be based on the diagnosis of one disease because, at the end of the day, it doesn’t mean that if the patient does not have diabetic retinopathy, he or she does not need a doctor visit. These are patient data and it’s likely they may have other medical condition that falls out of the research focus. The model should be based on “whether this patient needs to visit a doctor”. 

Some suggestions 

Overall, ML is not magic, those who are using it should have a thorough understanding of what it is. Besides, there is a need for different disciplines to come up with clear standards on how ML could be performed or reported. These standards, logically, should vary from one domain to another because of the disparity in control and measurements. Last but not least, anyone who is interested in ML, should also expand their interests in other areas, to keep any potential bias at its minimal. 

Author Bio
synthetic gene empathy chinese artificial intelligence data medicine healthcare ai

Hazel Tang

A science writer with data background and an interest in the current affair, culture, and arts; a no-med from an (almost) all-med family. Follow on Twitter.