The life-saving reason why medical annotation company Centaur Labs want us all to start analyzing medical images

 

In 1987, Jack Treynor, Finance Professor at the University of Southern California, conducted an experiment with his class in an effort to prove market efficiency. Treynor asked each of his 56 students to estimate the number of jelly beans inside a jar. The jar contained 850 beans and the median answer Treynor’s students gave was 870.

Only one student managed to give an estimate that was closer to the true value than the group median. The experiment became a classic example of the wisdom of crowds – where the average answer obtained from a group of individuals tends to be more accurate than the individuals themselves.

Erik Duhaime, Co-Founder and CEO of medical data labeling company, Centaur Labs, knew the experiment. But when he studied it again during his PhD at the MIT Center for Collective Intelligence, he saw another potential use. At the Center, researchers look at how humans and computers can be better connected to become more powerful.

“I was partly inspired by my wife who was attending medical school and residency at that time,” says Duhaime. “My PhD research focused on how to aggregate the opinions of multiple experts. Particularly, overcoming the challenge of making the wisdom of crowds work for certain tasks like analyzing medical images, where some people might have the professional knowledge and skills to do so while others do not.”

Duhaime likens his solution to putting together a pub quiz team. “Some people might be better at pop culture, others might be better at sports or history and so on,” he laughs. “It’s the same for analyzing medical images. Some experts might be better at identifying melanoma from dermatological images, while others are better at identifying basal cell carcinoma. The gist is to find out what an individual knows and how much they know, and then put them together in a team so that their expertise complements the rest of the team.”

To discover if his solution worked, Duhaime ran experiments by asking experts, semi experts, and even novices to annotate medical images. He discovered that by intelligently aggregating their opinions, he could get extraordinarily accurate results, far better than those from individual experts alone.

This success in utilizing collective intelligence prompted Duhaime to launch Centaur Labs in Boston in 2017 to harness human and artificial intelligence for more accurate medical image analyses. The company has since gone on to raise $4M in funding from investors such as Y Combinator and Accel Partners.

“When Russian chess grandmaster Garry Kasparov lost to IBM supercomputer Deep Blue in 1997, he became a proponent of ‘Centaur chess’, an idea to team up human beings and computers in chess playing,” says Duhaime. “But interestingly, it doesn’t have to be the best human chess player that’s paired with the best computer. It needs to be someone who works well with a computer, and complements the computers skills. I see the same thing in aggregating multiple opinions for training AI to analyze medical images.”

To arrive at the desired wisdom of crowds, Duhaime and his team created DiagnosUs, a game-like mobile application that allows users to learn, improve their skills, and compete with one another in annotating real medical images to earn cash prizes. The application is designed in such a way that it disregards users’ background and education level and judges them solely on their performances.

Each and every annotation made by a user is verified by multiple experts, and the system learns whether this user is trustworthy or not. If a user is not trustworthy, his or her opinion is disregarded, to achieve a scenario where a minority of skilled individuals can overrule the majority of the people.

“Our app manages diverse opinions by aligning on a benchmark dataset of cases that we know we had annotated well,” explains Duhaime. “We measure users’ performances on these cases against the benchmarks to highlight the ones who are exceptionally good. We then overweight their opinions on cases that we don’t already have the answers to. For example, if you ask a group of people where is the New York state capital, the majority might say New York City. But we’ll correctly identify that the capital is Albany when we overweight the opinions of people who we know already correctly said the capital of California is Sacramento and the capital of Washington is Olympia.”

It’s a similar approach used by Google to train its self-driving cars. When we log into our accounts, the search engine will ask us to point out trees, buses, and crosswalks to prove that we are human. It knows the answers to some of the images and uses those to prove whether you are a human, but it also relies on the public to identify and annotate everyday objects in other images. In order to train an AI model, there is a need for thousands, if not millions of images, examples, and data points that are accurately annotated. In medicine, this can be a costly process and accuracy is another big concern.

“We all know that AI is impacting today’s medicine and healthcare,” says Duhaime. “We believe this is just the beginning and we are only scratching the surface. But to create real, big impacts, we need scalable human intelligence – the human intelligence to accurately annotate millions of medical images, and this is what we are providing at Centaur Labs.

“We want to catalyze the AI revolution in healthcare by providing skilled, scalable data annotation services. At the same time, we also want to build on the belief that AI will never replace human doctors. The ideal scenario is AI working together with humans to solve medical problems.”