How the Penn AI project makes machine learning accessible to everyone  


Dr. Jason Moore, Professor of Biostatistics, Epidemiology and Informatics at the University of Pennsylvania is in full flow:

“If I were to give everybody who is reading this a dataset and tell them they can do whatever analysis they want and we’ll share our results in a week, every person involved would take the dataset through a different path, depending on their education, training, and more importantly, their experiences with data analysis. We all gravitate to certain methods that we know would work or are easy for us to use. So I am interested in whether we can get a computer to do the same. Can we get a computer to select methods, run analyses and learn to improve from experiences?”

That was the motivation behind the Penn AI project undertaken by Dr. Moore and his University of Pennsylvania colleagues in 2016. The goal of Penn AI is to make machine learning accessible to everybody, regardless of their background and expertise, by automating the technology and making it free and open-source. This means Penn AI can work out different analyses with different variables and methods on its own, without the need for human input. Usually, machine learning requires someone to select a specific method and manually adjust each parameter for the AI engine to work on. This requires users to have a lot more advanced data science knowledge to arrive at meaningful results. Even for the experts, sometimes the process can get challenging or complicated. Automating ML will eliminate much of this.

“The problem with machine learning tools is that they tend to be built by people who have years of high levels training,” says Dr. Moore. “We want to make a free and simple system that is still robust enough to transform the way we approach the technology, which I think we’ve accomplished”.

Dr. Moore and his team hope Penn AI will lower the barrier of entry. With the automated tool, users can either bring their datasets or use the several hundred made available for download. Penn AI is believed to be the first of its kind as it’s designed to learn as it goes; its analysis suggestions will improve over time based on the ‘experience’ it gains through use.

To make Penn AI work as intended, Dr. Moore and his team either built certain components from scratch or adopted them from existing tools. Penn AI’s selection of machine learning methods came from the scikit-learn library, a comprehensive collection of classification, regression, and pre-processing methods. Its abilities to gather a dataset and a machine learning method, set the parameters and launch the analysis on a computer, were drafted from the Future Gadget Lab, an open-source project on github. Dr. Moore and his team represented past machine learning experiments, specifically, details of the data and the chosen machine learning methods and parameters as a json file to be fed into a digital document store called Mongo to establish Penn AI’s memory and its ability to learn from new experience.

Every machine learning analysis performed by the users will now be saved and converted into knowledge that’s utilizable by Penn AI’s engine. Dr. Moore and his team also corresponded the results from these past experiments with the chosen machine learning methods, parameters and other meta features like sample size, distribution and correlation structures, so that Penn AI understands the context with which a particular machine learning algorithm might work. It also recommends users as to the machine learning methods they should use when they introduce new datasets into the system.

Dr. Moore and his team believe Penn AI will encourage more people to adopt AI in the clinical space. “I want Penn AI to be self-service, clinical AI,” says Dr. Moore. “I hope it will soon be routine for a doctor to say, ‘I want to look at the association between sex, age, smoking and different diseases’ and have this tool to answer their questions.” Apart from automation, Penn AI also enables users to see the mechanisms behind each analysis by peeking inside the coding – how the tool gets from one endpoint to another. “If you’re going to use machine learning for patients, you want to trust it completely. You want to be able to look under the hood. Penn AI allows for that, which builds some faith among clinicians, which is important for user buy-in.”

Dr. Moore plans to add more complex features into Penn AI so advanced users can also make good use of the tool. “I think this tool is going to accelerate research in certain areas,” he says. “We’ll be able to do almost instantly what it takes weeks and months and thousands or millions of dollars to do.”