By LA Celi, DJ Stone

Marvin Minsky, the great cognitive scientist at MIT, defined artificial intelligence (AI) as “the science of making machines do things that would require intelligence if done by man.”

Medicine, an area currently undergoing an accelerating and sometimes painful period of digitization, presents a particular problem for creating these intelligent machines because the issues and tasks involved are often neither clearly defined nor black and white.

In harsher terms, it is particularly difficult to create ‘artificial’ intelligence when there are still disagreements about concept definitions, what processes are important, and at times, even what outcomes are desirable.

These latter realities involving the dynamic complexities and ambiguities of medical practice have resulted, for example, in the creation of electronic health records (EHRs) with which many clinicians are deeply unhappy. Advances in AI are predicated on the availability of good quality data, and if the current state of EHRs is an indication of such data quality, then this field is doomed from the start.

Medicine is a surprisingly subjective endeavor whereas valid and useful AI requires not only reliable, unbiased, and extensive data, but also objective (and similarly, unbiased) definitions and objectives.

Where we need assistance is in the day-to-day complex decision-making that requires data synthesis and integration, tasks we now approach with our clinical intuition.

It makes sense that the early successes in AI applications in healthcare are in the field of image recognition: 2016 was a landmark year with publication of two important papers in this area. In both papers, computer systems were reported to be as good as, if not better, than doctors in diagnosing a disease (e.g. eye complication from diabetes and skin cancer) based on review of an image.

These were relatively easy targets: there were few disagreements about the definitions of these specific conditions. Large datasets that were clearly labeled were available.

The impact of these successes cannot be disputed. Specialists who interpret these images are lacking in most parts of the world. Even in countries such as the US, the benefits of and the opportunities that arise from freeing specialists from mundane tasks that can be performed better by computers are tremendous. As an analogy, it’s hard to imagine primary care without midlevel practitioners, anesthesia without nurse practitioners and technicians, nursing without nurse’s aides.

Can we move a little closer to science and away from this illusion of art?

But image recognition in medicine is a low-hanging fruit and improved diagnosis may or may not translate to better patient outcomes. A good example is thyroid cancer, whose detection in the US has tripled over the last 2 decades because of better imaging technology and the rise in the use of fine needle biopsies. Korea has seen a 15-fold increase in the diagnosis of thyroid cancer, but despite improved detection and treatment, neither the US or Korea has seen a drop in the death rate from thyroid cancer.

Where we need assistance is in the day-to-day complex decision-making that requires data synthesis and integration, tasks we now approach with our clinical intuition. The latter, often referred to as clinical judgement, is riddled with cognitive biases and typically based on large information gaps, but nonetheless generally accepted as representing the ‘art’ of medicine.

But is it possible, with the data now available from electronic health records, for computers to predict disease trajectory or individualized response to treatment based precisely on what we know about the patient?

Can we move a little closer to science and away from this illusion of art? Can our decisions be informed by leveraging population data obtained from other individuals who previously presented similar issues, and experienced different treatments, and identifiable outcomes from those treatments? But most relevant to this discussion, is the information documented in electronic health records sufficiently objective, accurate, complete, and free of bias to build tools upon?

Patient reports of symptoms are obviously subjective, but so are clinicians’ attempts to elucidate further historical details and gather more objective data.

Clinicians spend millions of hours in reviewing laboratory values in comparison to these ranges, and probably billions of dollars attempting to bring abnormal values into those ranges that may not, in fact, represent what is normal or even desirable for the individual in question.

The next step of incorporating additional ‘hard’ data elements into the scheme would superficially appear to be objective, but also is subject to a great deal of subjectivity.  For example the ‘hard’ numbers that represent physiological values such as blood pressure may be obtained by methods that are imperfectly reliable and reproducible.  And whether it is blood pressure taken by a cuff in a medical clinic or with an intra-arterial catheter in the ICU, we may not agree on exact acceptable limits for the values obtained.

Laboratory data may provide a more reproducible reflection of the patient’s state, but the reference ranges currently employed are still generated from healthy people.  It is not clear that this concept of ‘normal’ really applies to specific groups of sick people such as those in an intensive care unit, or those with a particular set of chronic medical conditions.  In spite of that, clinicians spend millions of hours in reviewing laboratory values in comparison to these ranges, and probably billions of dollars attempting to bring abnormal values into those ranges that may not, in fact, represent what is normal or even desirable for the individual in question.

Bloggers have pointed out that certain medical specialties such as diagnostic radiology and surgical pathology are seriously threatened by these advances.  But when one looks more carefully at the data, the issue is more subjective and less fully resolved than these blogs may indicate.

Radiologists frequently do not agree on the interpretation of what may be the most common imaging modality utilized in clinical medicine, the chest X-ray.  The chest X-ray can provide an enormous amount of information regarding the state of the critical respiratory and cardiovascular systems, as well as other issues, but presents a problem in interpretative reliability that even human medical intelligence has not yet solved. Important interventions are taken (or not) on the basis of these interpretations that clearly remain subjective to a surprising extent.

Are we clear about definitions and what we consider as therapeutic targets? In 2016, the clinical criteria for sepsis, a form of infection with organ dysfunction, were changed, a radical departure from a definition that has been in use for more than 20 years. Last year, the target blood pressure for the treatment of hypertension was lowered significantly.

What about therapeutic goals – are we, patients, providers and payers, in agreement about what constitutes a good outcome? Risk prediction models typically use mortality as the outcome as it is clear cut and reliably captured. But to most of us, including doctors and nurses, quality of life is more important than quantity.

How many will choose a year of life incapacitated in a skilled nursing facility interrupted by hospital readmissions? How many will select additional months or years of cancer survival if the quality of life from the side effects of chemotherapy is impaired.

At present, outcomes that are relevant to patients are not consistently documented in health records. And what about social determinants of health and disease? Why are we building outcome prediction models based on clinical data alone?

A treatment for a specific disease that was effective 30 years ago may no longer be effective, or worse, harmful now

Lastly, the concept of a ground truth is a myth in medicine. If one opens the 1978 edition of Harrison’s Principles of Internal Medicine, considered the source of ground truth in internal medicine, and reads about the treatment of heart attack, this is what one will find: rest in bed for 6 weeks, avoid cardiac catheterization and beta-blocker medications, and administer lidocaine to everyone to prevent irregular heart rhythm. All of these would now be considered medical malpractice.

This is not to say the studies on which these recommendations were based were flawed. A treatment for a specific disease that was effective 30 years ago may no longer be effective, or worse, harmful now because the demography of the patients has changed or new treatments have been incorporated or abandoned in the interim. Ground truth, in medicine, is a moving target.

Resolving the subjectivity of medicine with the objectivity required for digitization and the secondary creation of AI first involves resolution of a number of questions: What do we want to do?  What do we need to do?  What can we do?

The creation of useful AI for healthcare involves an optimization process in which the subjectivity of medicine is as aligned as possible with the objectivity required for digitization, robust data collection, and the employment of applications that have positive effects on processes and outcomes, as well as acceptable impacts on costs.

Authors

Dr. David Stone is a graduate of Yale University (1974) and the NYU School of Medicine (1978).  He trained in internal medicine and anesthesiology at the University of Virginia, in critical care medicine at the Massachusetts General Hospital.  In 1986, he joined the University of Virginia faculty where he  became Professor of Anesthesiology and Neurosurgery, and was Vice Chair for Education in the Anesthesiology department. In 2001, he joined VISICU.com where he worked with founding physicians from Johns Hopkins to develop and implement a tele-ICU system that included a specialty-based EHR, data analytic tools, and a variety of real time clinical decision support features for critical care medicine.  He left VISICU, then a part of Philips, in 2012 when their eICU product was being used in about 11% of US ICU beds. He remains Visiting Professor of Anesthesiology and Neurological Surgery at UVa and is a member of the MIT Critical Data Group. He is currently working on several projects related to clinical data analytics, outcomes research, and artificial intelligence.

 

Leo Anthony Celi MD has practiced medicine in three continents, giving him broad perspectives in healthcare delivery. As clinical research director and principal research scientist at the MIT Laboratory for Computational Physiology (LCP), and as an attending physician at the Beth Israel Deaconess Medical Center (BIDMC), he brings together clinicians and data scientists to support research using data routinely collected in the process of care. His group built and maintains the public-access Medical Information Mart for Intensive Care (MIMIC) database, which holds clinical data from over 60,000 stays in BIDMC intensive care units (ICU). It is an unparalleled research resource; over 5000 investigators from more than 70 countries have free access to the clinical data under a data use agreement. In 2016, LCP partnered with Philips eICU Research Institute to host the eICU database with more than 2 million ICU patients admitted across the United States.

Leo also founded and co-directs Sana, a cross-disciplinary organization based at the Institute for Medical Engineering and Science at MIT, whose objective is to leverage information technology to improve health outcomes in low- and middle-income countries. He is one of the course directors for HST.936 – global health informatics to improve quality of care, and HST.953 – collaborative data science in medicine, both at MIT. He is an editor of the textbook for each course, both released under an open access license. The textbook “Secondary Analysis of Electronic Health Records” came out in October 2016 and was downloaded more the 100,000 times in the first year of publication. The massive open online course HST.936x “Global Health Informatics to Improve Quality of Care” was launched under edX in February 2017. Finally, Leo has spoken in 25 countries about the value of data in improving health outcomes.