Interview With Atomwise CEO Abraham Heifets

How an AI startup is disrupting pharma using convolutional neural networks.

Posted Jun 24, 2020

 Atomwise CEO Abraham Heifets
Source: Atomwise CEO Abraham Heifets

On May 21, 2020, Atomwise announced 15 research collaborations to accelerate the exploration of broad-spectrum therapies for COVID-19 and other coronaviruses, with targets such as the Spike-ACE2, IL-6 Signaling Pathway, Nucleocapsid (N-protein), NSP15, Papain-Like Protease (PLpro), RdRp in NSP12, and Spike (heptad region).  

Atomwise is a pioneer of harnessing artificial intelligence (AI) for drug discovery. The startup invented the first deep-learning AI technology for structure-based small molecule drug discovery and holds patents for applying convolutional networks to spatial data. Atomwise provides academic institutions, research hospitals, and pharmaceutical companies with small molecules to test using convolutional neural networks (CNNs) that deliver results a hundred times faster than ultra-high-throughput (uHTS) screening.

Chief Executive Officer Abraham Heifets, Ph.D., and Chief Technology Officer Izhar Wallach, Ph.D., co-founded Atomwise in 2012. The startup has venture capital backing from the OS Fund, Dolby Family Ventures, DCVC (Data Collective), AME Cloud Ventures, Leaps by Bayer, B Capital, DFJ Venture (now Threshold Ventures), Khosla Ventures, Baidu Ventures, Tencent Holdings, Y Combinator, Mission and Market, and others.

This interview with Atomwise CEO Abraham Heifets has been edited and condensed.

Cami Rosso: Can you comment a bit about your patented AtomNet solution?

Abraham Heifets: What we’re doing is convolutional neural networks. We investigate different architectures all of the time. The results that we’ve been able to demonstrate are from pure play convolutional neural networks.

I think the important thing is that you have to match the algorithm to the problem. Think about image recognition. There are a number of features of the problem that mean that CNNs are a good mapping of the architecture for the problem. If you think of molecular recognition, it actually has all of those same features.

CR: Would it be fair to say that it’s a spatial geometric problem?

AH: You could very well say it’s a spatial geometric problem, yes.

CR: Is it a single platform or more of a hybrid?

AH: We build one global model that we use for all the proteins that we work on. There are actually a couple of benefits to that. One is every project that we do is a chance to improve that model. We get feedback. We saw what worked and what didn’t. And we learn something about what went right. Every project is a chance to improve that model for every other project. That’s the magic of a self-learning system.

So that’s one advantage of having a single global model. The other advantage is that about half of our projects we have no training data for­­­; there aren’t molecules or protein structures that are known. There are challenges like that. For half the cases, we do not have training data, which for other computational approaches and systems would mean that you can’t make any progress. You just say, “I have no input; I can’t run the algorithm.” But with a global model, you can say, “Let me take the learnings from other proteins and try to apply them here.” We’ve shown repeated success.

CR: How does this relate to predicting crystalline structures?

AH: For a medicine to work, it’s got to stick to the disease protein to shut it down and bounce off other proteins in your body, so it doesn’t cause a bunch of side effects. And there’s a bunch of other things like it has to dissolve in water and be stable in the blood. But one of the key fundamental properties for efficacy and safety is that it has to stick to it and bounce off what it needs to bounce off of. So, what we’re predicting is given a protein, and given a potential medicine, what is the propensity to stick. That’s the thing we’re predicting.

In some sense, you can think of it as the other half of protein folding. Lots of people have thought of protein folding as a grand challenge. And there’ve been recent events like AlphaFold, and that’s how in a sequence you get the shape of a protein. But I would argue that you’re not done once you have the shape of the protein; that’s the first half. The second half is to do something useful with that shape of the protein. So now you have the shape of the protein, now find the molecule that will shut down that disease.

CR: Does AtomNet require an initial crystalline structure, whether it’s presumed or dreamed up by AI, to get to that level of analysis? Or is there a part of the solution that creates a strawman crystalline structure, and then works from that?

AH: Great question. I think we have had a breakthrough here. We are doing things the rest of the world still thinks are impossible. I think you’ve gotten to a core piece. People have been trying to use computers for this type of problem for 40 years. People have been working on it for a long time. But today, all of the leading systems out there that I’ve seen reported, other than Atomwise, need very high-quality x-ray crystal structures. And those are hard to get. And many proteins we just don’t have those. And so, there are many diseases we’d like to work on, but which are just shut down; you just can’t.

Roughly there are about 20,000 human genes, 4 percent of those have ever had FDA approved drugs, and 16 percent have been implicated in human disease, but we’ve never drugged them. This data is from the Human Protein Atlas. The vast majority of proteins we have never drugged.

The other algorithms that are out there need high-quality data. They need molecules that were previously known so that they can modify them or build from—the high-quality crystal structures that we were talking about.

We have over 100 projects where we have data back where we’re running real discovery projects. For about half of them, we have no training data; we have no known molecules for that protein. About a third of those projects, there’s no crystal structure. So, we don’t know the shape of the protein; we have to infer it. Where you have no training data or no crystal structure, people talk about how those are impossible problems. But we’re able to deliver over and over again discovery.

An example of this that has been published already is with Stanford and the Mayo Clinic. They identified a protein in a large cohort of Parkinson’s disease patients that points to the possibility of disease progression by reversing that defect. It’s a protein called MIRO1  (Mitochondrial Rho GTPase 1). They have this evidence from patients that they could intercede this way. And my understanding is that for Parkinson’s, we can alleviate some of the symptoms (L-Dopa), but we can’t really block the progression of the disease. So, this would be transformative if we had a medicine for this. But for MIRO1, nobody had a molecule. And so with Atomwise working together with Stanford and Mayo Clinic, we actually identified the first molecule for MIRO1. The molecule that we identified is now called the “MIRO1 reducer” in the literature.

CR: Was this a de novo molecule, or was it a molecule that was in the big data that had the characteristics that were needed?

AH: Great question. There was no prior knowledge about small molecules that interacted with MIRO1, so there’s nothing to learn from. So, from that point of view, it's de novo in terms of knowledge around that protein. What we did, in this case, was we tested a large database of potential molecules. We screen about 16 billion molecules, which is like 5,000 times the size of a big pharma corporate collection.

There’s actually a revolution going around us in medicinal chemistry. The vast majority of molecules that a medicinal chemist and biologist have access to today do not exist on this planet and have never existed before on the planet. But you can put an order in and have them synthesized and shipped to you in less than a month, like three to four weeks. The only way of interrogating that 16 billion is computationally. You have to have computational algorithms that work super accurately. So now we can do 16 billion in about two days.

Copyright © 2020 Cami Rosso All rights reserved.