“There were long stretches of DNA in between genes that didn’t seem to be doing very much; some even referred to these as “junk DNA,” though a certain amount of hubris was required for anyone to call any part of the genome “junk,” given our level of ignorance.”

Francis S. Collins, The Language of God: A Scientist Presents Evidence for Belief


This landmark article published in Nature with the collaboration of the European Molecular Biology Laboratory (EMBL) and the DeepMind group of Google headed by Demis Hassabis reported the creation of the most complete and accurate definition of the human proteome.

The paper elucidated that deep learning in the form of a program called AlphaFold has been deployed to predict the structure (by using the genomic sequence) of almost the entire human proteome of 350,000 proteins (still a relatively small number compared to over 100 million proteins known to scientists). Prior to this AI feat by DeepMind, only 17% of the human proteins had been decoded after several decades of dedicated and arduous work by scientists. The DeepMind predictions are freely available via a public database that is hosted by the European Bioinformatics Institute.

This accomplishment is on the caliber of the mapping of the human genome that was completed in 2003. The AI methodology ushers in a new era of protein structure determination work that easily eclipses the traditional methodologies of X-ray crystallography and and cryogenic electron microscopy. The speed of determining the protein structure has decreased from six months down to a mere few minutes with the advent of AI, thus the exponential scaling of this important biological work is now possible. The confidence in this work was relatively high with the resulting dataset covering 58% of residues with a confident prediction.

The transformative benefits of this treasure lode of human protein structures include not only a better understanding of disease processes as well as generation of biological hypotheses but also drug discovery, repurposing, and design for human disease. Even discovering enzymes that degrade plastic can be an important benefit of this monumental work. Human diseases such as Alzheimer’s can now benefit from this work as new drugs can be created to either decrease the morbidity or even cure these diseases.

With this study, we are a big step closer to understanding nature’s wondrous biological milieu, but we are probably still relatively ignorant as Francis Collins stated. Let us not hype this latest AI contribution but rather understand first its current limitations and then its endless possibilities.

The full paper in Nature can be read here