New AI Framework May Accelerate Novel Therapeutics

UCSD's machine learning tool for studying DNA-binding of transcription factors.

Posted Jan 20, 2021

LaCasadeGoethe/Pixabay
Source: LaCasadeGoethe/Pixabay

The rise of artificial intelligence (AI) machine learning is making an impact in genomics, biotech, pharmaceuticals, and life sciences. In a study published on Monday in Nature Machine Intelligence, a team of University of California San Diego (UCSD) researchers led by Melissa Gymrek created an AI machine learning framework to enable genomics insights on the binding of transcription factors (TF)—a framework that may one day help speed up precision medicine and the development of novel treatments for diseases.

“Altogether, our study provides a valuable machine-learning framework for helping decode the rules by which TFs bind their target sites and identifying specific non-coding nucleotides with the strongest effects on binding,” Gymrek and her research team wrote.

Gymrek was named to “Forbes 30 Under 30: Science” in 2017 for her scientific achievements that include a patented algorithm that enables scientists to research the impact of genetic segments on human traits. The mission of the Gymrek Lab at the UCSD is to research the variations in the DNA sequences that may lead to diseases by developing computational tools. Other UCSD researchers who participated in the study include Hao Su, Cynthia Wu, Hanquing Zhao, Michael Lamkin, and An Zheng.

A genome refers to all of the genetic information in an organism that is stored in chromosomes that contain DNA. The human body has 23 pairs of chromosomes. There are roughly 20,500 genes in the human genes, according to the Human Genome Project, a global research effort conducted during 1990-2003 to determine the sequence of the three billion chemical base pairs that make up human DNA.

Gene activity is controlled by transcription factors, proteins that typically have a DNA-binding region that controls whether a gene is “on” or “off.” These proteins may activate or repress transcription, the process where the DNA in a gene is copied into an RNA molecule.

For example, Rett syndrome, a neurodevelopment disorder, is attributed to the dysfunction of a transcriptional repressor called MeCP2, according to a study published in Current Opinion in Genetics & Development.

Maturity-onset diabetes of the young is caused by mutations in the transcriptional regulator HNF1ß (TCF2) according to a European study on neonatal diabetes and pancreatic development.

Fuch’s endothelial corneal dystrophy (FECD), an eye disease, is due to the CTG TNR (CTG trinucleotide repeat) expansion in intron 3 of the TCF4 transcription factor; and may increase the risk of developing bipolar disorder, according to a different European study.

Transcription factors is also an area of interest in cancer research as they may play a role in tumor suppression or promotion.

The UCSD researchers created a machine learning framework called AgentBind by expanding upon DeepSEA, consisting of three convolutional neural network layers with one fully connected layer, and DanQ, which is a combination of a convolutional neural network and recurrent neural network.

“Here we present a machine learning framework leveraging existing convolutional neural network architectures and state of the art model interpretation techniques to identify, visualize, and interpret context features most important for determining binding activity for a particular TF,” wrote the researchers.

They used transfer learning to make the training more efficient by utilizing less data. The convolutional neural network is trained on ChIP-seq and DNAseI-seq. Then the researchers applied Grad-CAM, a post-analytical method for neural networks, to “compute importance scores for the context regions of the binding motifs at single base-pair resolution.”

“Overall, our framework enables novel insights into sequence features determining TF binding and identifies specific non-coding variants with potential disease relevance,” wrote the researchers.

Transcription factors that are impaired or mutated can cause autoimmune disorders, cardiovascular issues, cancer, and other diseases. With this new AI machine learning framework for transcription factor binding, researchers have a new method to apply innovative technologies in order to understand the disease process and accelerate future therapeutics.

Copyright © 2021 Cami Rosso All rights reserved.