New AI Model Shortens Drug Discovery to Days, Not Years

Insilico Medicine applies machine learning to create potential drug candidates.

Posted Sep 06, 2019

Source: Elchinator/Pixabay

Biotechnology, pharmaceutical, and life sciences industries are where applied artificial intelligence (AI) can greatly accelerate innovation and shorten the product development life-cycle.

Developing a drug typically takes 10 to 15 years on average, with only approximately 12 percent of drugs in clinical trials ultimately gaining U.S. Food and Drug Administration (FDA) approval. In an AI milestone in life sciences, Insilico Medicine announced a new machine learning tool for drug discovery that can generate a novel molecule in days instead of years and published their findings in Nature Biotechnology on September 2, 2019.   

Insilico Medicine is a venture-backed start-up with multiple investors that include WuXi AppTec, Juvenescence, Peter Diamandis’ BOLD Capital Partners, and Pavilion Capital. Led by CEO and Founder Alex Zhavoronkov, the company’s mission is to extend longevity by applied AI solutions for drug discovery and aging research.

For the research study, Zhavoronkov led a team from Insilico Medicine in Hong Kong, with scientists affiliated with WuXi AppTec in Shanghai, the Vector Institute for Artificial Intelligence in Toronto, as well as the University of Toronto. The research team applied AI deep learning to rapidly identify de novo small molecules and named their solution GENTRL—Generative Tensorial Reinforcement Learning.

“GENTRL prioritizes the synthetic feasibility of a compound, its effectiveness against a given biological target, and how distinct it is from other molecules in the literature and patent space,” wrote the researchers in Nature Biotechnology.

In this proof-of-concept study, the AI medical researchers focused on a collagen-activated proinflammatory receptor enzyme (tyrosine kinase) that is involved in fibrosis and expressed in epithelial tissue, the cells that line the surfaces of the body’s organs and blood vessels, called DDR1 (Discoidin Domain Receptor 1).

Fibrosis is due to inflammation or damage; it is triggered by immune cells that release soluble factors to stimulate fibroblasts to lay down connective tissues. Fibrosis can be benign or pathological. An example of pathological fibrosis is Cystic fibrosis (CF): a progressive, genetic disease with no known cure that primarily impacts the lungs, but can also cause dysfunction of the pancreas and other organs. Other examples of pathological fibrosis include myelofibrosis, Crohn’s disease, Peyronie’s disease, and liver cirrhosis.

What exact role, if any, does DDR1 play in directly regulating fibrotic processes or inflammation? Does DDR1 have potential as a therapeutic target for cancerous tumors, such as breast carcinoma? Having an extensive assortment of DDR1 inhibitors could be instrumental in helping advance scientific medical research to address these questions.

But how to use a computer to come up with a wide range of potential DDR1 inhibitors? That is where artificial intelligence can greatly accelerate the process. Artificial intelligence is the ability of a computer to learn without requiring explicit hard-coding or programming.

The strategy that the researchers deployed was to use artificial imagination to come up with novel de novo small molecules. Artificial imagination, or machine creativity, is the enablement of computers to produce and simulate their own novel images or concepts using generative adversarial networks (GANs)—a type of AI neural network architecture used for training AI deep learning that was introduced by Dr. Ian Goodfellow, Dr. Yoshua Bengio, and their research colleagues at the Neural Information Processing Systems conference in 2014.

GANs are a machine learning framework with two dueling artificial neural networks (ANNs). One ANN is a generative network that produces synthetic samples, and the other is a discriminative network that tries to detect if the samples are generated or from real-world data. The two ANNs simultaneously train each other via competition.  

“I met Alex when working at OpenAI and have been excited to see him pioneer the use of GANs/RL for the pharmaceutical industry since 2016,” said Goodfellow. “One major criticism of GANs is that their usefulness has been limited to image editing applications, so I’m glad that Alex and his team are finding ways to use them for molecular generation.”

To create the solution, the team utilized tensor decompositions, variational inference, and reinforcement learning for their generative machine learning algorithm. Their solution is a two-step process. The initial step involved the system learning the mapping of a set of discrete molecular graphs.

“First, we learned a mapping of chemical space, a set of discrete molecular graphs, to a continuous space of 50 dimensions,” reported the researchers who conducted the study. “We parameterized the structure of the learned manifold in the tensor train format to use partially known properties. Our auto-encoder-based model compresses the space of structures onto a distribution that parameterizes the latent space in a high-dimensional lattice with an exponentially large number of multidimensional Gaussians in its nodes. This parameterization ties latent codes and properties, and works with missing values without their explicit input.”

Next, the team applied AI reinforcement learning to the space from the prior step in order to discover new compounds. Reinforcement learning requires rewarding the AI system.

The reward function used three self-organizing maps (SOMs)—trending SOM, general kinase SOM, and specific kinase SOM.

Self-organizing maps, also known as Kohonen maps, are a type of artificial neural networks (ANNs) that were trained using unsupervised learning that applies competitive learning, instead of backpropagation with gradient descent or other error-correction learning methods, in order to output a map that represents the input space of the training samples.

Finnish professor and scientist Teuvo Kohonen created the SOM algorithm in the early 1980s, hence the name Kohonen maps. SOMs are a way to perform dimensionality reduction—transforming high-dimensional datasets to very low dimensions, typically a 2D feature map, or less frequently, a 3D map.

Overall, six data sets were used for GENTRL. These data sets include DDR1 kinase inhibitors, molecules that act on non-kinase targets, patent data for biologically active molecules, 3D structures for DDR1 inhibitors, molecules from a filtered ZINC data set, and common kinase inhibitors. GENTRL was initially trained on the filtered ZINC database, followed by ongoing training on the common kinase inhibitors and DDR1 databases.

After applying reinforcement learning with rewards, GENTRL produced 30,000 structures in 21 days which were then filtered and prioritized based on a number of criteria, including the general and specific kinase SOMs and pharmacophore modeling.

The team narrowed the candidate molecules via random selection of 40 out of the 30,000 structures. From the randomly-selected 40 structures, six were picked for experimental validation in two days.

By day 35 of the experiment, these six compounds were successfully synthesized. These six compounds were then tested for biological evaluation using in vitro microsomes and rodent models.

In less than two months, on day 46, the researchers have identified, designed, synthesized, prioritized, and validated experimentally molecules that target DDR1 kinase. In effect, the team used artificial intelligence to shorten the drug discovery cycle to days instead of years.

“We used GENTRL to discover potent inhibitors of discoidin domain receptor 1 (DDR1), a kinase target implicated in fibrosis and other diseases, in 21 days,” wrote the researchers in the study. “Four compounds were active in biochemical assays, and two were validated in cell-based assays. One lead candidate was tested and demonstrated favorable pharmacokinetics in mice.”

“This is an important demonstration of the power of AI, using a GAN approach, to markedly accelerate the design and experimental validation of a new molecule, no less one targeting fibrosis, a major unmet medical need,” stated Dr. Eric Topol, who did not participate in this study. Topol is the Founder and Director of Scripps Research Translational Institute, Executive Vice-President of Scripps Research, Professor of Molecular Medicine, and author of more than 1,200 peer-reviewed articles, and three books—The Creative Destruction of Medicine, The Patient Will See You Now, and Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again.

“The generative tensorial reinforcement learning in this paper substantially advances the efficiency of biochemistry implementation in drug discovery,” said Taiwanese-born American AI pioneer Dr. Kai-Fu Lee, Founder of Sinovation Ventures, former executive of Microsoft and Google, and New York Times bestselling author of AI Superpowers, who received a copy of the paper, but was not a part of the research team. “This method signals a breakthrough of pharmaceutical artificial intelligence at the industrial level, and may bring significant social and economic impact to our society.”  

“This illustrates the utility of our deep generative model for the successful, rapid design of compounds that are synthetically feasible, active against a target of interest, and potentially innovative with respect to existing intellectual properties,” reported the researchers. “We anticipate that this technology will be improved further as a useful tool to identify drug candidates.”

Copyright © 2019 Cami Rosso All rights reserved.


Zhavoronkov, Alex, et. al., “Deep learning enables rapid identification of potent DDR1 kinase inhibitors.” Nature Biotechnology. September 2019.

Insilico Medicine. (2019, September 3). Novel Molecules Designed by Artificial Intelligence May Accelerate Drug Discovery [Press Release].

Deep Knowledge Analytics. (2019, September 3). A breakthrough in imaginative AI with experimental validation to accelerate drug discovery [Press Release].

More Posts