Verified by Psychology Today

How Synthetic Data Accelerates AI for Neuroscience

Detecting brain tumors using synthetic data.

Key points

  • Brain tumors can present as mental health disorders with neuropsychiatric symptoms.
  • Synthetic data can be used to improve AI accuracy when datasets are sparse and when privacy is an issue.
  • Michigan Medicine demonstrated an increase in AI performance to over 94 percent accuracy for detecting brain tumors using synthetic data.
Source: Meo/Pexels

Synthetic data is an emerging area within artificial intelligence (AI) that is an innovative approach to solving problems where datasets are sparse. One use case involves patient medical records in health care and life sciences research due to privacy compliance. Recently University of Michigan Hospitals-Michigan Medicine increased the accuracy of their artificial intelligence for detecting brain tumors to over 94 percent using synthetic data.

Neurobehavioral and Neuropsychiatric Symptoms of Brain Tumors

Brain tumors can present as mental health disorders with neuropsychiatric symptoms such as anxiety, depression, confusion, insomnia, memory loss, profound personality change, suicidal behavior, psychosis, rage, impulsivity, hallucinations, and other common psychiatric symptoms according to the American Brain Tumor Association. Brain tumors are often diagnosed after the patient experiences symptoms according to the American Society of Clinical Oncology (ASCO). If brain imaging by methods such as magnetic resonance imaging (MRI) reveal a tumor, a tissue sample via a biopsy or surgery will help doctors determine whether the tumor is malignant.

Worldwide there were over 300,000 new cases of brain and nervous system cancers last year alone according to the GLOBOCAN 2020 report. In the U.S. there are an estimated 700,000 Americans with a primary brain tumor and over 84,000 more will be diagnosed this year according to the National Brain Tumor Society.

Using Synthetic Data in Artificial Intelligence

Synthetic data owes its rise to the increased demand for datasets to train deep learning algorithms in the midst of an artificial intelligence renaissance. The global AI training dataset market size was USD 1.16 billion in 2020 and is projected to reach USD 4.8 billion by 2027 with a compound annual growth rate (CAGR) of 22.5 percent over the period of 2020-2027 according to a May 2020 report published by U.S.-based Grand View Research. Synthetic data is an emerging segment within the AI dataset market that enables high-quality predictive modeling for AI machine learning.

“Rare events are a common bottleneck when it comes to developing computer vision systems, diagnostic systems, and decision support tools in medicine,” said American neurosurgeon Dr. Todd Hollon who specializes in the treatment of brain tumors and is the principal investigator of the Machine Learning in Neurosurgery Laboratory (MLiNS) at the University of Michigan Hospitals-Michigan Medicine, one of the top 15 hospitals in the nation, and top 20 for neurology and neurosurgery according to U.S. News Best Hospitals 2020-2021 rankings. “If it’s a rare disease, it’s much harder to diagnose and much harder to know what the best treatment is. So that’s where the use of synthetic data has been really important.”

“We had a dataset of primary central nervous system lymphomas—we only had 10 cases total—and we wanted to train on two or three of those, and test using the others,” said Dr. Hollon. “We were not performing well, but when we started using Synthetaic data, our accuracy went up to well over 90 percent on those cases. I should emphasize too that we selected out the cases where humans were incorrectly classifying those as other tumor types. So, we thought of this as the hardest dataset, with both rare cases and cases in which humans were making diagnostic errors. And when we noticed that we started to get accuracies that far surpassed human performance, we knew that we were on to something special, and that this technique really had the opportunity to flourish in this domain.”

“We’ve come up with a way to make complex data generatable with GANs, and because of that we can create this really human tissue under a microscope,” said Cory Jaskolski, the CEO and Founder of Synthetaic. Jaskolski is a National Geographic Fellow and the recipient of the Rolex National Geographic Explorer of the Year in 2020 for his pioneering the creation of imagery techniques that help redefine exploration and conservation. Synthetaic is a synthetic data company headquartered in Delafield, Wisconsin backed by venture capital investors that include TitletownTech (Green Bay Packers and Microsoft), James Murdoch’s Lupa Systems, and Betawork Ventures.

“Although it has a nuclei and cells, it’s very unstructured compared to a human face, car or cat with similar features and symmetry,” said Jaskolski. “We basically built this system that can build more unstructured data. The great part is we can now grow brand new data that is photorealistic and are able to do that en masse.”

Generative Adversarial Networks (GANs)

Synthetaic uses a technique called generative adversarial networks (GANs). Synthetaic’s MEGAN (Massively Extensible GAN) enabled neurosurgeon Dr. Hollon to improve the performance of Michigan Medicine’s AI machine learning for detecting brain tumors. Generative adversarial networks (GANs) are a type of AI neural network used for training for AI deep learning that was introduced in 2014 at the Neural Information Processing Systems conference by Ian Goodfellow, Yoshua Bengio, and others. GANs consists of two artificial neural networks (ANNs) that compete while simultaneously training one another. The training goal of the generative network is to create samples that its opponent, the discriminative network, thinks is from the actual data distribution. The generative network creates synthetic samples, the other is a discriminative network that tries to detect whether samples are created or from actual data.

According to Jaskolski, Michigan Medicine saw increases from 63 percent to 94 percent accuracy for a type of rare class of tumors and boosted overall performance from 83 percent to over 94 percent consistently across all brain tumor types.

“In my lab the strength of the validation is really emphasized,” said Dr. Hollon. “Synthetaic has been really instrumental in terms of developing the best possible product and making it easier to validate the results and test it on new datasets.”

Copyright © 2021 Cami Rosso All rights reserved.

More from Cami Rosso
More from Psychology Today
Most Popular