Training accurate AI applications for children will inherently require access to large amounts of diverse data. The question is whether the same centralized architecture approach used in ImageNet and responsible for much of the progress in consumer AI also work for AI in medicine.

The traditional approach to developing AI applications, such as the one used in ImageNet, is to aggregate large amounts of data in a central repository. In healthcare, such efforts are represented by the National Institute of Health’s Imaging Data Commons (https://portal.imaging.datacommons.cancer.gov/), which has managed to access 33 terabytes (TB) of image data over several years. While that’s admirable, the fundamental approach of aggregation in a central repository is not the right strategy for five key reasons.

  1. Central architectures are not application friendly

Aggregation of data in a centralized architecture, repository, or data commons, requires that the data be organized. Anytime a database is created, along with it comes a schema, or a designated way to organize the data. Unfortunately, a schema perfectly designed for one application might inherent in its design, make it difficult if not nearly impossible to use for another application.

  1. Central architectures are not network preserving

Additionally, there’s the challenge of where to centrally house, or host the data in a literal sense.  If, for example, you were to aggregate just the pediatric cardiac echo data from every children’s hospital in the world, it would amount to an estimated 6,000,000 TB annually. If this data were to be centralized, where would we locate the servers? Who would pay for it? And who would be responsible for paying the network cost of moving the data to and from that central location?

  1. Central architectures are not real-time

To understand why real-time access to accurate AI-generated learning is important, simply imagine if an autonomous car had to access a central server to decide what to do at every turn or stop sign, or to avoid a pedestrian. The time it would take to ask and retrieve each decision from a central repository (in other words, the network latency) means a central architecture would be unrealistic. The same challenge would hold true for any real-time pediatric application.

  1. Central architectures are not privacy-preserving

Let’s assume we were to agree to aggregate all the pediatric echo data from around the world in Amsterdam. How would we preserve privacy for a patient in London whose data is sent to Amsterdam? How would we control who was able to access that data?  And how might we set limits on which data is shared?  Someone in Amsterdam with no connection to the patient may have access, not only to a patient’s pediatric echo data but their personal identifying information as well. These questions bring to the forefront an important consideration of data sharing and privacy known as “purpose limitation”. In a world where we increasingly expect specified data to be used only with our permission, by certain people, and for clearly defined purposes, accumulating data in a central repository offers the opposite, with no stated purpose other than keeping the data for the future does nothing to preserve privacy.

  1. Central architectures are not data location preserving

The public central cloud providers (Amazon Web Services, Google Cloud Platform, Azure) operate out of ten of the 200 countries in the world. At the same time, many countries are adopting a Las Vegas-like mentality: data created in the country stays in the country.

While centralized architectures have powered the development of many consumer AI applications to date, there are multiple reasons why this is not the right approach for the training and deployment of AI applications for use in pediatric healthcare. Central architectures are simply not application friendly, not network-preserving, not real-time, and not privacy-preserving. What we really need – especially in the context of medicine and pediatric healthcare – is a decentralized AI architecture.

The use of AI in pediatric healthcare, as well as other domains, will be discussed at the in-person AIMed Global Summit scheduled for June 4-7th 2023 in San Diego with the remainder of the week filled with exciting AI in medicine events like the Stanford AIMI Symposium on June 8th. Book your place now!

We believe in changing healthcare one connection at a time. If you are interested in the opinions in this piece, in connecting with the author, or the opportunity to submit an article, let us know. We love to help bring people together! [email protected]