Study Uses AI to Find Autism Clues in “Junk” DNA

AI deep learning predicts impact of mutations in regulatory DNA on autism risk.

Posted Jun 19, 2019

Source: lisichik/Pixabay

“One man’s trash is another man’s treasure,” is a familiar expression. When it comes to health and genomics, “junk” DNA may turn out to be a goldmine. In a recent study, Princeton University-led researchers used whole-genome sequencing and artificial intelligence (AI) deep learning to identify the contribution of noncoding mutations to autism risk—demonstrating that mutations in “junk” DNA can contribute to a complex disease.

The study was led by Princeton professor Olga Troyanskaya, who is also deputy director for genomics at the Flatiron Institute’s Center for Computational Biology (CCB) in New York City, along with professor Robert Darnell of The Rockefeller University, also an investigator at the Howard Hughes Medical Institute.

Published on May 27 in Nature Genetics, the study presented an AI deep learning framework that “predicts the specific regulatory effects and the deleterious impact of genetic variants,” and used it on autism spectrum disorder (ASD).

The World Health Organization estimates that one in 160 children globally has ASD. In the United States, autism affects one in 68 American children according to statistics from the Centers for Disease Control and Prevention (CDC).

Symptoms of ASD are present in early childhood and generally can be diagnosed at around age two based on behavior and delays in achieving developmental milestones. ASD negatively impacts a person’s ability to function at school, work, socially and other aspects of daily life. ASD affects people of all races, ethnicities and socioeconomic groups.

According to the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders (DSM-5), people with Autism spectrum disorder (ASD) have persistent deficits in social communication and interactions, and show restricted, repetitive patterns of behavior, interests or activities.

Existing research studies suggest that it is likely that there are many causes for multiple types of ASD. All of the causes of autism are not yet established. Scientific studies have shown that vaccines do not cause autism. There may be many risk factors for ASD, such as biological, environmental and genomic factors. For this study, the researchers focused on the genomic realm—studying the impact of non-coding DNA on ASD.

DNA, or deoxyribonucleic acid, is hereditary material in the cell nucleus (or small amounts in the mitochondria) in nearly every living organism. DNA molecules consist of two twisting paired strands, where each strand is made of the nucleotide bases adenine (A), thymine (T), guanine (G) and cytosine (C). There are an estimated 3 billion base pairs in the human genome. Only a tiny fraction, an estimated one to two percent of DNA, consists of protein-coding genes. The remaining 98 to 99 percent are non-coding, regulatory DNA, also known as “junk” DNA.

“A potential role for noncoding mutations in complex human diseases including ASD has long been speculated,” wrote the researchers. The team applied the scientific method to test this hypothesis.

The study’s approach was simple—look at the entire genome and identify the parts of the DNA that regulate genes, then construct a model to predict how mutations to “junk” DNA might play a role in complex disease. However, the execution of this approach was rather complex.

This study used 7,097 genomes from 1,790 families whose whole genomes were sequenced from the Simons Simplex Collection (SSC)—a repository of genetic samples from 2,600 families where only one child out of the entire family has ASD. Whole genomic sequencing is a lab procedure that identifies almost all of the three billion nucleotides of an individual’s complete DNA sequence—both coding DNA and non-coding “junk” DNA.

The team trained a deep convolutional neural network with biochemical data that distinguishes the interplay between binding proteins of DNA and RNA, as well as their targets, in order to predict the impact (functional and disease-causing) of mutations in the database.

Convolutional neural networks (CNNs) are well-suited to process two-dimensional pixel data and often used for image analysis. Inspired by the operations of the human brain and neuroscience, convolutional neural networks architecture is somewhat analogous to the biological visual cortex where individual cortical neurons respond to stimuli in the visual, or receptive field. CNNs are used in natural language processing (NLP), recommendation systems, image recognition and classification, and similar purposes.

“Our analysis identified new candidate noncoding disease-associated mutations that potentially affect ASD through regulation of gene expression,” wrote the researchers. “Our approach addresses the statistical challenge of detecting the contribution of noncoding mutations by predicting their specific effects on transcriptional and post-transcriptional regulation. This approach is general and can be applied to study the contributions of noncoding mutations to any complex disease or phenotype.”

“The approach could be particularly helpful for neurological disorders, cancer, heart disease and many other conditions that have eluded efforts to identify genetic causes,” Troyanskaya said in a Princeton University news report.

In conclusion, the researchers wrote, “Our predictive genomics framework illuminates the role of noncoding mutations in ASD and prioritizes mutations with high impact for further study, and is broadly applicable to complex human diseases.”

Innovations in genomics and artificial intelligence are enabling new discoveries in health and medicine, including providing insights into the role of mutations in "junk" DNA.

Copyright © 2019 Cami Rosso All rights reserved.


Zhou, Jian, Park, Christopher Y., Theesfeld, Chandra L., Wong, Aaron K., Yuan, Yuan, Scheckel, Claudia, Fak, John J., Funk, Julien, Yao, Kevin, Tajima, Yoko, Packer, Alan, Darnell, Robert B., Troyanskaya, Olga G. “Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.” Nature Genetics. May 27, 2019.

Autism Speaks. “DSM-5 Criteria.” Retrieved 6-19-2019 from  

CDC. “Vaccine Safety.” Retrieved 6-19-2019 from