A smarter way to develop new drugs

Pharmaceutical companies are using artificial intelligence to streamline the process of discovering new medicines. Machine-learning models can propose new molecules that have specific properties which could fight certain diseases, doing in minutes what might take humans months to achieve manually.

But there's a major hurdle that holds these systems back: The models often suggest new molecular structures that are difficult or impossible to produce in a laboratory. If a chemist can’t actually make the molecule, its disease-fighting properties can't be tested.

A new approach from MIT researchers constrains a machine-learning model so it only suggests molecular structures that can be synthesized. The method guarantees that molecules are composed of materials that can be purchased and that the chemical reactions that occur between those materials follow the laws of chemistry.

When compared to other methods, their model proposed molecular structures that scored as high and sometimes better using popular evaluations, but were guaranteed to be synthesizable. Their system also takes less than one second to propose a synthetic pathway, while other methods that separately propose molecules and then evaluate their synthesizability can take several minutes. In a search space that can include billions of potential molecules, those time savings add up.

"This process reformulates how we ask these models to generate new molecular structures. Many of these models think about building new molecular structures atom by atom or bond by bond. Instead, we are building new molecules building block by building block and reaction by reaction," says Connor Coley, the Henri Slezynger Career Development Assistant Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science, and senior author of the paper.

Joining Coley on the paper are first author Wenhao Gao, a graduate student, and Rocío Mercado, a postdoc. The research is being presented this week at the International Conference on Learning Representations.

Building blocks

To create a molecular structure, the model simulates the process of synthesizing a molecule to ensure it can be produced.

The model is given a set of viable building blocks, which are chemicals that can be purchased, and a list of valid chemical reactions to work with. These chemical reaction templates are hand-made by experts. Controlling these inputs by only allowing certain chemicals or specific reactions enables the researchers to limit how large the search space can be for a new molecule.

The model uses these inputs to build a tree by selecting building blocks and linking them through chemical reactions, one at a time, to build the final molecule. At each step, the molecule becomes more complex as additional chemicals and reactions are added.

It outputs both the final molecular structure and the tree of chemicals and reactions that would synthesize it.

"Instead of directly designing the product molecule itself, we design an action sequence to obtain that molecule. This allows us to guarantee the quality of the structure," Gao says.

To train their model, the researchers input a complete molecular structure and a set of building blocks and chemical reactions, and the model learns to create a tree that synthesizes the molecule. After seeing hundreds of thousands of examples, the model learns to come up with these synthetic pathways on its own.

Molecule optimization

The trained model can be used for optimization. Researchers define certain properties they want to achieve in a final molecule, given certain building blocks and chemical reaction templates, and the model proposes a synthesizable molecular structure.

"What was surprising is what a large fraction of molecules you can actually reproduce with such a small template set. You don’t need that many building blocks to generate a large amount of available chemical space for the model to search," says Mercado.

They tested the model by evaluating how well it could reconstruct synthesizable molecules. It was able to reproduce 51 percent of these molecules, and took less than a second to recreate each one.

Their technique is faster than some other methods because the model isn’t searching through all the options for each step in the tree. It has a defined set of chemicals and reactions to work with, Gao explains.

When they used their model to propose molecules with specific properties, their method suggested higher quality molecular structures that had stronger binding affinities than those from other methods. This means the molecules would be better able to attach to a protein and block a certain activity, like stopping a virus from replicating.

For instance, when proposing a molecule that could dock with SARS-Cov-2, their model suggested several molecular structures that may be better able to bind with viral proteins than existing inhibitors. As the authors acknowledge, however, these are only computational predictions.

"There are so many diseases to tackle," Gao says. "I hope that our method can accelerate this process so we don’t have to screen billions of molecules each time for a disease target. Instead, we can just specify the properties we want and it can accelerate the process of finding that drug candidate."

Their model could also improve existing drug discovery pipelines. If a company has identified a particular molecule that has desired properties, but can't be produced, they could use this model to propose synthesizable molecules that closely resemble it, Mercado says.

Now that they have validated their approach, the team plans to continue improving the chemical reaction templates to further enhance the model's performance. With additional templates, they can run more tests on certain disease targets and, eventually, apply the model to the drug discovery process.

This research was supported, in part, by the U.S. Office of Naval Research and the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium.

Gao W, Mercado R, Coley CW.
Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design.
arXiv preprint, 2021. doi: 10.48550/arXiv.2110.06389

Most Popular Now

FDA grants Breakthrough Therapy Designation to Pfi…

Pfizer Inc. (NYSE:PFE) today announced that its investigational Group B Streptococcus (GBS) vaccine candidate, GBS6 or PF-06760805, received Breakthrough Therapy Designat...

Novartis invests in early technical development ca…

Novartis today announced it is investing in next-generation biotherapeutics with the creation of a fully integrated, dedicated USD 300m scientific environment that will b...

Pfizer and BioNTech receive positive CHMP opinion …

Pfizer Inc. (NYSE: PFE) and BioNTech SE (Nasdaq: BNTX) announced a 30-µg booster dose of their Omicron BA.4/BA.5 bivalent-adapted COVID-19 vaccine (COMIRNATY® Original/Om...

Malaria booster vaccine shows durable high efficac…

Researchers from the University of Oxford and their partners have today reported new findings from their Phase 2b trial following the administration of a booster dose of ...

Strict COVID lockdowns in France improved cardiova…

A new paper in European Heart Journal - Digital Health, published by Oxford University Press, indicates that social-distancing measures like total lockdown have a measura...

U.S. clinical trial evaluating antiviral for monke…

A Phase 3 clinical trial evaluating the antiviral tecovirimat, also known as TPOXX, is now enrolling adults and children with monkeypox infection in the United States. St...

Stem cell-gene therapy shows promise in ALS safety…

Cedars-Sinai investigators have developed an investigational therapy using support cells and a protective protein that can be delivered past the blood-brain barrier. This...

Drug turns cancer gene into "eat me" fla…

Tumor cells are notoriously good at evading the human immune system; they put up physical walls, wear disguises and handcuff the immune system with molecular tricks. Now...

Mucosal antibodies in the airways protect against …

High levels of mucosal antibodies in the airways reduce the risk of being infected by omicron, but many do not receive detectable antibodies in the airways despite three ...

WHO grants prequalification to GSK's Mosquirix - t…

GSK plc (LSE/NYSE: GSK) announced that the World Health Organization (WHO) has awarded prequalification to Mosquirix (also known as RTS,S/AS01), GSK's groundbreaking mala...

Bird's enzyme points toward novel therapies

Thank the rare crested ibis for a clue that could someday help our bodies make better drugs. The species of bird is the only one known to naturally produce an enzyme ...

WHO strongly advises against antibody treatments f…

The antibody drugs sotrovimab and casirivimab-imdevimab are not recommended for patients with COVID-19, says a WHO Guideline Development Group of international experts in...