New Deep Learning Method for Genomics Is More Transparent

Novel AI technique for genomics improves robustness and interpretability.

Posted Feb 18, 2021 | Reviewed by Kaja Perina

Source: Geralt/Pixabay

Artificial intelligence (AI) machine learning is rapidly emerging as a powerful tool in the quest for novel diagnostics, therapies, and treatment for complex diseases such as cancer. Increasingly, machine learning is being used in genomics, the branch of molecular biology that studies genes and their functions. Recently scientists have discovered a technique that improves the robustness and interpretability of applied machine learning in genomics and published a peer-reviewed study last week in Nature Machine Intelligence.

AI and genomics are growing markets. The worldwide artificial intelligence market is expected to increase at a compound annual growth rate (CAGR) of 42 percent during 2020-2027 to reach USD $733.7 billion by 2027, according to Grand View Research. Per the same report, health care is a vertical segment expected to gain a large part of that share by 2027. Grand View Research estimates the global genomics market to reach USD $31 billion by 2027, with a CAGR of 7.7 percent during 2020-2027.

The research duo of Peter Koo, Ph.D., an assistant professor at the Simons Center for Quantitative Biology at Cold Springs Harbor Laboratory in New York, and Matt Ploenzke, Ph.D., at the Department of Biostatistics at the T.H. Chan School of Public Health at Harvard University in Massachusetts, contributed equally to the study that was funded in part by a grant by the NCI Cancer Center.  

Gene regulation refers to the process of turning genes on and off. Deep convolutional neural networks provide state-of-the-art accuracy for predicting gene regulation by learning features from training data in order to make predictions. Koo and Ploenzke use chromatin accessibility and transcription factor (TF) binding as examples.

Chromatin is the DNA and protein within a chromosome that is found in eukaryotic cells—cells that have a nucleus and organelles that are enclosed by a plasma membrane such as those in animals, plants, fungi, and protists. Chromatin plays a major role in regulating gene expression and DNA replication.

Transcription refers to the process of converting DNA to RNA. In molecular biology, proteins that bind to DNA-regulatory sequences and modulate the rate of gene transcription are called transcription factors.

“In biology, it is critical that we understand what features it has learned in order to build trust in these black box predictive models and to potentially gain new biological insights from them,” wrote Koo and Ploenzke. “Model interpretability is key to understanding these features. Deep CNNs, however, tend to learn distributed representations of sequence motifs that are not necessarily human interpretable.”

The team hypothesized that using an exponential activation to the first layer filters versus frequently used activations will lead to more explainable artificial intelligence.

“Deep convolutional neural networks (CNNs) trained on regulatory genomic sequences tend to build representations in a distributed manner, making it a challenge to extract learned features that are biologically meaningful, such as sequence motifs,” wrote the researchers. “Here we perform a comprehensive analysis on synthetic sequences to investigate the role that CNN activations have on model interpretability.”

The researchers trained and tested a variety of convolutional neural networks using various types of standard activation functions such as Relu, Sigmoid, Tanh, Softplus, Elu, and Linear.

“We show that employing an exponential activation in the first layer filters consistently leads to interpretable and robust representations of motifs compared with other commonly used activations,” the researchers wrote. “Strikingly, we demonstrate that CNNs with better test performance do not necessarily imply more interpretable representations with attribution methods.”

By taking an innovative approach to AI deep learning, Koo and Ploenzke found a way to make AI deep convolutional neural networks for genomics more interpretable—to better understand what goes on insides AI’s black-box and to accelerate future life sciences research.

Copyright © 2021 Cami Rosso. All rights reserved.