Skip to main content

Verified by Psychology Today

NVIDIA and Harvard Create New AI Deep Learning Genomics Tool

AtacWorks applies AI to lower the costs to run rare and single-cell research.

Qimono/Pixabay
Source: Qimono/Pixabay

Advances in artificial intelligence (AI) deep learning, genomics, and computing hardware is accelerating life sciences research and discovery. In a new study published today in Nature Communications , researchers from NVIDIA Corporation (NASDAQ: NVDA) and Harvard University’s Department of Stem Cell and Regenerative Biology create an AI deep learning tool called AtacWorks that denoises genomic sequencing data and find areas with accessible DNA that may help speed up new diagnostics, de novo drugs, and treatments for diseases in the future.

Early intervention and treatment of cancer and genetic diseases may make the difference in outcomes and requires early detection. The challenge is that the sample size of cell data may be small and the data itself may contain extraneous “noise.” Having a way to filter and reduce the non-relevant data, or noise, and to boost the relevant data, or signal, in those cases can help speed up research.

“We show that our method can be used on small subsets of rare lineage-priming cells to denoise signal and identify accessible regulatory regions at [a] previously unattainable genomic resolution,” wrote the study’s lead researcher from NVIDIA, Avantika Lal, along with Nikolai Yakovenko and Johnny Israeli also from NVIDIA, in collaboration with Harvard University researchers Zachary D. Chiang and Jason D. Buenrostro. “Based on these advancements, we anticipate that AtacWorks will broadly enhance the utility of epigenomic assays, providing a powerful platform to investigate the regulatory circuits that underlie cellular heterogeneity.”

Assay for Transposase-Accessible Chromatin using Sequencing (ATAC-seq) is a commonly used assay to measure chromatin accessibility in the genome. ATAC-seq datasets are used to train AtacWorks, a deep convolutional neural network based on PyTorch with ResNet (residual neural network) architecture.

“Models trained by AtacWorks can detect peaks from cell types not seen in the training data and are generalizable across diverse sample preparations and experimental platforms,” the researchers wrote.

The deep neural network is trained with matched data from the same cell type that is of low and high coverage or quality. The model was trained using a multi-part loss function that used Mean Squared Error (MSE) and 1 - Pearson Correlation for the regression output, and Binary Cross-Entropy (BCE) for the classification output.

“AtacWorks is not provided with the DNA sequence as an input, which means it is agnostic to cell- or condition-specific correlations between chromatin accessibility and sequence motifs,” wrote the researchers. “Instead, the model learns features based on the shape of the coverage track, which generalize across datasets. In addition to generalization across different cell types, we also observed that our trained models can generalize to data from different species, experimental platforms, and quality levels.”

The deep learning tool filters or denoises low quality or low coverage ATAC-seq signal and converts it to higher quality. AtacWorks predicts chromatin accessibility at the genomic location of accessible regulatory regions and at the base-pair resolution.

“AtacWorks is not provided with the DNA sequence as an input, which means it is agnostic to cell- or condition-specific correlations between chromatin accessibility and sequence motifs,” wrote the researchers. “Instead, the model learns features based on the shape of the coverage track, which generalize across datasets. In addition to generalization across different cell types, we also observed that our trained models can generalize to data from different species, experimental platforms, and quality levels.”

AtacWorks runs on NVIDIA Tensor Core GPUs. The parallel processing of GPUs for general-purpose computing is one of the key contributing factors in the overall renaissance in artificial intelligence, along with improved algorithms, availability of big data sets, and decreasing costs of computing.

“We demonstrate that AtacWorks enhances the sensitivity of single-cell experiments by producing results on par with those of conventional methods using ~10 times as many cells, and further show that this framework can be adapted to enable cross-modality inference of protein-DNA interactions,” the researchers reported.

In biotechnology, the pharmaceutical industry, and life sciences, locating areas in healthy and diseased cells the genome enables scientists and researchers to discover new drugs and novel treatments. The challenge in researching rare cell types such as the stem cells that create platelets and blood cells is having enough cells to produce a clear signal in the data. With AtacWorks, studying conditions with small samples or noisy data is no longer a barrier, thus speeding up disease research for rare cell types to detect genetic mutations in the future.

Copyright © 2021 Cami Rosso All rights reserved.

advertisement