“Accuracy and clarity of statement are mutually exclusive.”
Neils Bohr, Danish physicist and Nobel laureate

I recently returned from a few meetings on machine learning in medicine and healthcare, and many abstracts and presentations were focused on the use of Receiver Operating Characteristic (ROC) curves and Area Under the Curve (AUC) as a performance measure of classification problems (also known as Area Under the Receiver Operating Characteristic, or AUROC). The higher the AUC, as we know (or lead to believe), the better the model is at distinguishing between classes (such as disease vs no disease). A few thoughts:

AUC of ROC is not always reflective of accuracy. The dichotomous nature of these classification models (disease vs no disease) is often overly simplistic as a myriad of diseases have precursor disease forms or incomplete phenotypic expressions. In addition, a disease with relatively low prevalence in the population will result in an imbalanced data (the so-called “imbalanced classification problem”), and this can give the false impression that performance is high (when in fact it is not as high). Corrective measures include the use of a precision-recall curve that will accommodate an imbalanced dataset as well as the F-score, which is the harmonic mean between precision and recall.

Improved accuracy of disease diagnosis does not equal improved outcomes. With increased diagnostic accuracy of the medical image or other medical data, many subsequent steps will need to be executed for there to be an improvement in outcome. These include information transfer, appropriate intervention, serial followup, and final proof of disease. Just because well-trained and certified clinicians have labelled images as disease (vs no disease), this does not always guarantee that it is ultimately ground truth (especially with subtle nuances of pathological images so labels such as “probably abnormal”, ”borderline dilated”, or “uncertain” are in the data set).

Improved accuracy of disease also does not change human behaviour. Informing a patient that he has 27.8% chance of having diabetes in the next five years is not necessarily more effective than simply stating that he has higher risk of diabetes based on his or her data. Ultimately increased accuracy of prediction for disease will need to be coupled to effective strategies to change human behavior and modify social determinants of health to result in better outcomes.

Anthony Chang, MD, MBA, MPH, MS
Founder, AIMed
Chief Intelligence and Innovation Officer
Medical Director, The Sharon Disney Lund
Medical Intelligence and Innovation Institute (mi3)
Children’s Hospital of Orange County