A team of researchers from Estonia have developed a machine learning system capable of generating unique genome sequences. These computer-generated fakes could play a vital role in the future of DNA research.

Dubbed “artificial genomes,” these AI-created sequences are indistinguishable from actual human genomes with the exception of being completely synthetic. This means researchers needn’t be concerned with ethical privacy concerns.

Under the current research paradigm, researchers have to safeguard DNA in order to ensure the privacy of the humans it belongs to. This has resulted in a drought of available data due to the inability of facilities to share their datasets. Synthetic genomes should go a long way towards solving that thorny problem.

The team of researchers from Estonia’s University of Tartu and France’s Paris-Saclay University who developed the artificial genome project say that their fake genetic sequences have real value as a tool for research geneticists.

In their paper, just published in the journal PLOS Genetics, the authors state, “Generative neural networks have been effectively used in many different domains in the last decade, including machine dreamt photo-realistic imagery. In our work, we apply a similar concept to genetic data to automatically learn its structure and, for the first time, produce high quality realistic genomes. These novel genomes are distinct from the original ones used for training the generative networks.

“We show that artificial genomes retain many complex characteristics of real genomes and the heterogeneous relationships between individuals. They can be used in intricate analyses such as imputation of missing data. We believe they have a high potential to become alternatives for many genome databases which are not publicly available or require long application procedures or collaborations and remove an important accessibility barrier in genomic research in particular for underrepresented populations.”

The full paper can be read here